|
|
 Michele Moscioni - 2016-03-24 18:50:43
Hi,
Iam using the parser and it works great.
In a specific mail with this headers :
.....
Content-Type: application/octet-stream;
name="20160323 pres Fassino x piccoli comuni.pdf"
Content-Disposition: attachment;
filename*=20160323%20pres%20Fassino%20x%20piccoli%20comuni.pdf
Content-Transfer-Encoding: base64
.....
the 'FileName' element of the decoded result return empty.
This is because filename* parameter requires ext-value encoding, [RFC5987] = charset "'" [ language ] "'" value-chars .
But charset, language and spearators are missing. I think this is malformation.
In this case the parser put the value into $character_sets array and then set the vaule as empty.
.
.
$character_sets[$parameter] = strtolower($this->Tokenize($value, '\''));
$languages[$parameter] = $this->Tokenize('\'');
$value = UrlDecode($this->Tokenize(''));
.
.
Now I see that some agents like thunderbird can reach the filename from a malformed ext-value.
I think this is a workaround about the malformation.
Do you think it is possible to fall back into a specific examination in case of malformation or use the name parameter as alternative?
I am sending you a simple temporary resolution in Function ParseStructuredHeader.
Your help is appreciated.
Code snippet:
....
if (($l = strlen($parameter))
&& !strcmp($parameter[$l - 1], '*')) {
$parameter = $this->Tokenize($parameter, '*');
if (IsSet($parameters[$parameter])
&& IsSet($character_sets[$parameter]))
$value = $parameters[$parameter] . UrlDecode($value);
else {
/*
* workaround for some malformations
*/
$len = 2 - substr_count($value, '\'');
$value = ($len >0) ? str_repeat('\'', $len) . $value : $value;
// End workaround
$character_sets[$parameter] = strtolower($this->Tokenize($value, '\''));
$languages[$parameter] = $this->Tokenize('\'');
$value = UrlDecode($this->Tokenize(''));
}
....
Thanks in advantage
Michele
 Manuel Lemos - 2016-03-25 05:30:10 - In reply to message 1 from Michele Moscioni
Can you please upload somewhere public a complete sample message with headers and body so I can reproduce the problem?
 Michele Moscioni - 2016-03-25 09:13:28 - In reply to message 2 from Manuel Lemos
 Manuel Lemos - 2016-03-26 01:53:18 - In reply to message 3 from Michele Moscioni
OK, I see the problem. It would be easy to support filename* however I am not sure what happens when the filename has non-ISO-8859-1 characters.
In your example , the file name has only ASCII characters. Would it be possible to provide another example that includes a file that has non-ISO-8859-1 characters in the file name?
I tried Thunderbird and it uses q-encoding to encode such file names.
 Manuel Lemos - 2016-03-26 02:22:05 - In reply to message 3 from Michele Moscioni
Oh, wait, now I see the real problem. The header is indeed malformed because it must specify the character set and language.
So you want me to add a workaround when those parameters are missing, take the whole parameter as the filename?
 Michele Moscioni - 2016-03-29 06:51:22 - In reply to message 5 from Manuel Lemos
Yes, what do you think about a definitive solution and
what do you think about the workaround I sent?
Thanks a lot
 Manuel Lemos - 2016-03-29 07:50:09 - In reply to message 6 from Michele Moscioni
Well since the Content-Disposition header is missing the character set, maybe it is more reliable to get the file name from the Content-Type name attribute if present or take the whole Content-Disposition filename* as the file name. Would that be a better solution?
 Michele Moscioni - 2016-03-30 19:14:07 - In reply to message 7 from Manuel Lemos
Hi,
first I'm sorry for the late response.
I think it is good solution.
Yes the name attribute is the best fall back.
In the code mentioned below I'm going to sanitize partial malformations but do not consider what is missing : charset , or language.
This is absolutely a test workaround.
/*
* workaround for some malformations
*/
$len = 2 - substr_count($value, '\'');
$value = ($len >0) ? str_repeat('\'', $len) . $value : $value;
// End workaround
Thanks a lot
Michele
|