Note that there are some explanatory texts on larger screens.

plurals
  1. POPerl MIME::Parser and encoding in nested bodys (message/rfc_822)
    primarykey
    data
    text
    <p>arghhh, it's not easy. I'm trying to parse some mails with perl. Let's take an example:</p> <pre><code>From: abc@def.de Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01CBE273.65A0E7AA" To: ghi@def.de This is a multi-part message in MIME format. ------_=_NextPart_001_01CBE273.65A0E7AA Content-Type: multipart/alternative; boundary="----_=_NextPart_002_01CBE273.65A0E7AA" ------_=_NextPart_002_01CBE273.65A0E7AA Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: base64 [base64-content] ------_=_NextPart_002_01CBE273.65A0E7AA Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: base64 [base64-content] ------_=_NextPart_002_01CBE273.65A0E7AA-- ------_=_NextPart_001_01CBE273.65A0E7AA Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_003_01CBE272.13692C80" From: bla@bla.de To: xxx@xxx.de This is a multi-part message in MIME format. ------_=_NextPart_003_01CBE272.13692C80 Content-Type: multipart/alternative; boundary="----_=_NextPart_004_01CBE272.13692C80" ------_=_NextPart_004_01CBE272.13692C80 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable =20 Viele Gr=FC=DFe ------_=_NextPart_004_01CBE272.13692C80 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable &lt;html&gt;...&lt;/html&gt; ------_=_NextPart_004_01CBE272.13692C80-- ------_=_NextPart_003_01CBE272.13692C80 Content-Type: application/x-zip-compressed; name="abc.zip" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="abc.zip" [base64-content] ------_=_NextPart_003_01CBE272.13692C80-- ------_=_NextPart_001_01CBE273.65A0E7AA-- </code></pre> <p>This mail is sent from Outlook with another attached message. As you can see, this is a very complex mail with many different content types (text/plain, text/html, message/rfc_822, application/xyz)... And the rfc_822 part is the problem. I've written a script in Perl 5.8 (Debian Squeeze) to parse this message with MIME::Parser.</p> <pre><code>use MIME::Parser; my $parser = MIME::Parser-&gt;new; $parser-&gt;output_to_core(1); my $top_entity = $parser-&gt;parse(\*STDIN); my $plain_body = ""; my $html_body = ""; my $content_type; foreach my $part ($top_entity-&gt;parts_DFS) { $content_type = $part-&gt;effective_type; $body = $part-&gt;bodyhandle; if ($body) { if ($content_type eq 'text/plain') { $plain_body = $plain_body . "\n" if ($plain_body ne ''); $plain_body = $plain_body . $body-&gt;as_string; } elsif ($content_type eq 'text/html') { $html_body = $html_body . "\n" if ($html_body ne ''); $html_body = $html_body . $body-&gt;as_string; } } } # parsing of attachment comes later print $plain_body; </code></pre> <p>The first message part (base64-content) contains german umlauts, which are shown correctly at STDOUT. The nested rfc_822 message is parsed by MIME::Parser automatically and is pooled with the top level body as one entity. This nested rfc_822 contains also german umlauts in quoted-printable as you can see. But these are not shown correctly at STDOUT. When doing a </p> <pre><code>utf8::encode($plain_body); </code></pre> <p>before print, the quoted-printable umlauts are shown correctly, but not the base64 encoded ones. I'm trying now for hours to extract the rfc_822 seperatly and doing some encoding, but nothing helps. Who else can help?</p> <p>Regards</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload