Note that there are some explanatory texts on larger screens.

plurals
  1. POError parsing emails using Python's email module when the encoding is in shift_jis
    primarykey
    data
    text
    <p>I am getting an error that says "UnicodeDecodeError: 'shift_jis' codec can't decode bytes in position 2-3: illegal multibyte sequence" when I try to use my email parser to decode a shift_jis encoded email and convert it to unicode. The code and email can be found below:</p> <pre><code>import email.header import base64 import sys import email def getrawemail(): line = ' ' raw_email = '' while line: line = sys.stdin.readline() raw_email += line return raw_email def getheader(subject, charsets): for i in charsets: if isinstance(i, str): encoding = i break if subject[-2] == "?=": encoded = subject[5 + len(encoding):len(subject) - 2] else: encoded = subject[5 + len(encoding):] return (encoding, encoded) def decodeheader((encoding, encoded)): decoded = base64.b64decode(encoded) decoded = unicode(decoded, encoding) return decoded raw_email = getrawemail() msg = email.message_from_string(raw_email) subject = decodeheader(getheader(msg["Subject"], msg.get_charsets())) print subject </code></pre> <p>Email: <a href="http://pastebin.com/L4jAkm5R" rel="nofollow">http://pastebin.com/L4jAkm5R</a></p> <p>I have read on another Stack Overflow question that this may be related to a difference between how Unicode and shift_jis are encoded (they referenced <a href="http://support.microsoft.com/kb/170559" rel="nofollow" title="this">this</a> Microsoft Knowledge Base article). If anyone knows what in my code could be causing it to not work, or if this is even reasonably fixable, I would very much appreciate finding out how.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload