PHP Classes

incorrectly receiving reached a premature end of data

Recommend this page to a friend!

      PHP MIME Email Message Parser  >  All threads  >  incorrectly receiving reached a...  >  (Un) Subscribe thread alerts  
Subject:incorrectly receiving reached a...
Summary:incorrectly handles when attachments contain word "from"
Messages:10
Author:Daniel Kim
Date:2009-07-01 05:48:01
Update:2009-07-02 06:06:39
 

  1. incorrectly receiving reached a...   Reply   Report abuse  
Picture of Daniel Kim Daniel Kim - 2009-07-01 05:48:01
I was decoding a file containing many emails, on the order of 5000. I seemed to randomly get the following error:
MIME message decoding error: reached a premature end of data at position -1

After binary searching for the email in violation, I realized it was incorrectly decoding attachments that contained "from" at the beginning of the line. The attachment itself was an email. Here's a rough outline of the stream that would cause this:

From xxx
...
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="A-DIVIDER"
..

--A-DIVIDER
Content-Type: message/rfc822
Content-Disposition: inline

<random mail headers>
Date:
To:
From:
Reply-to:
Subject:
<more random headers>

email body!
from me myself & i <<<<<<<<<< breaks!
note, deleting the above line with "from" in it fixes the problem!

--A-DIVIDER--


It would be great if you could suggest a fix for this, thanks!

  2. Re: incorrectly receiving reached a...   Reply   Report abuse  
Picture of Manuel Lemos Manuel Lemos - 2009-07-01 07:04:21 - In reply to message 1 from Daniel Kim
That is a multiple message file in mbox format. Make sure the mbox variable is set to 1.

  3. Re: incorrectly receiving reached a...   Reply   Report abuse  
Picture of Daniel Kim Daniel Kim - 2009-07-01 19:35:34 - In reply to message 2 from Manuel Lemos
Thank you for your prompt reply, but yes, mbox is already set to 1.

..
$mime->mbox=1;
..

My exact configuration is:
$mime = new mime_parser_class;
$mime->mbox = 1;
$mime->decode_bodies = 1;
$mime->ignore_syntax_errors = 1;

$parameters = array(
'Data'=>$message_stream,
);

Any ideas? I've traced it through to mime_parser.php line 662, but I'm not sure how to fix it so it ignores any "from" in the message body.

  4. Re: incorrectly receiving reached a...   Reply   Report abuse  
Picture of Manuel Lemos Manuel Lemos - 2009-07-01 20:46:55 - In reply to message 3 from Daniel Kim
No, the From header is correct. It marks the beginning of a new message in a mailbox in mbox format.

Probably the mailbox has truncated messages. I need to try to reproduce the problem with the original message file to see if it is an error that can be tolerated. Can you upload the message somewhere?

  5. Re: incorrectly receiving reached a...   Reply   Report abuse  
Picture of Daniel Kim Daniel Kim - 2009-07-01 21:18:41 - In reply to message 4 from Manuel Lemos
The message is definitely not truncated. The logic you are following is: if we are in the middle of parsing an email body, and we encounter a new line followed by "from" at the beginning of the next line, advance to the beginning of the line (the "from") and start parsing a new email. This simply doesn't work, because if we have an email and within the email body there is a "from your friend, dk" or something, we believe we are starting on a new email when really its just part of the message body of the email.

Again, if I simply delete (or even misspell by making it "f rom") the single line within the message body, it parses correctly. I do see the logic you are using but there has to be a better way; I'm investigating as well.

Sorry, I'd love to upload the email text file but I feel that is a breach of privacy. If I'm unable to find a good solution I'll go ahead and obscure the sensitive parts and upload it.

Thanks again, I've found your package very helpful, and your responsiveness is very much appreciated.

  6. Re: incorrectly receiving reached a...   Reply   Report abuse  
Picture of Manuel Lemos Manuel Lemos - 2009-07-01 21:33:37 - In reply to message 5 from Daniel Kim
Keep in mind this is not the logic made up for the class but rather is the way the mbox format is defined.

Files in mbox format separate messages with a line starting with a From header. That is a fake header meant precisely to split multiple messages.

Messages that have lines starting with the From text are escaped by the programs that generate mbox files.

If you are trying to parse individual messages, then that is not a mbox file. In that case, you should set the mbox variable to 0 .

  7. Re: incorrectly receiving reached a...   Reply   Report abuse  
Picture of Daniel Kim Daniel Kim - 2009-07-01 22:01:03 - In reply to message 6 from Manuel Lemos
Oh, wow, I apologize for my ignorance. I didn't realize mbox was an official format; now that I look it up, I completely understand your logic (you special case ">from", since that is how mbox escapes lines starting with "from").

I'm not quite sure how to proceed; I pipe incoming email messages into a file which I then want to parse, but it is not mbox format and it is not a single email message. I suppose there is another class that I could find that will turn email messages into the mbox format. I am extremely frustrated!

If you have any other suggestions I more than welcome them. Thanks again for all your help.

  8. Re: incorrectly receiving reached a...   Reply   Report abuse  
Picture of Manuel Lemos Manuel Lemos - 2009-07-01 22:44:10 - In reply to message 7 from Daniel Kim
I don't know what kind of system pipes multiple messages to a program at once. Regular mail systems just pipe one message and exit.

In any case, you should figure how to pass messages individually to the parser class.

  9. Re: incorrectly receiving reached a...   Reply   Report abuse  
Picture of Daniel Kim Daniel Kim - 2009-07-01 23:22:26 - In reply to message 8 from Manuel Lemos
I was using procmail/formail, which actually do adhere to the mbox standard. The "From_" header literally involves a newline followed by "From" with a sender email and a timestamp. It escapes lines in the email body that begin with "From" with ">From". I'm sure you're aware of all this.

However, though I'm sure it was done with the best intentions, your parser logic lowercases everything, ie.
strtolower(substr(.., strlen($break.'from ')));
strcmp($break.'from ', ..);

which -does- break on "from", "FROM", etc. which would not have been escaped to ">from" or ">FROM" by any standard function that adheres to the mbox format for storing emails.

Thanks again, I apologize for not understanding things fully before commenting. And I do suggest you remove the strtolowers and compare to "From" and ">From"! :)


  10. Re: incorrectly receiving reached a...   Reply   Report abuse  
Picture of Manuel Lemos Manuel Lemos - 2009-07-02 06:06:40 - In reply to message 9 from Daniel Kim
Oh, I see. I just noticed that From fake headers are not case insensitive like others.

I just uploaded a fixed version of the class to deal with that problem. Thank you for reporting.