StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PORetrieving binary file content using Javascript, base64 encode it and reverse-decode it using Python
text
Body
copied!<p>I'm trying to download a binary file using <code>XMLHttpRequest</code> (using a recent Webkit) and base64-encode its contents using this simple function:</p> <pre><code>function getBinary(file){ var xhr = new XMLHttpRequest(); xhr.open("GET", file, false); xhr.overrideMimeType("text/plain; charset=x-user-defined"); xhr.send(null); return xhr.responseText; } function base64encode(binary) { return btoa(unescape(encodeURIComponent(binary))); } var binary = getBinary('http://some.tld/sample.pdf'); var base64encoded = base64encode(binary); </code></pre> <p>As a side note, everything above is standard Javascript stuff, including <code>btoa()</code> and <code>encodeURIComponent()</code>: <a href="https://developer.mozilla.org/en/DOM/window.btoa">https://developer.mozilla.org/en/DOM/window.btoa</a></p> <p>This works pretty smoothly, and I can even decode the base64 contents using Javascript:</p> <pre><code>function base64decode(base64) { return decodeURIComponent(escape(atob(base64))); } var decodedBinary = base64decode(base64encoded); decodedBinary === binary // true </code></pre> <p>Now, I want to decode the base64-encoded contents using Python which consume some JSON string to get the <code>base64encoded</code> string value. Naively this is what I do:</p> <pre><code>import urllib import base64 # ... retrieving of base64 encoded string through JSON base64 = "77+9UE5HDQ……………oaCgA=" source_contents = urllib.unquote(base64.b64decode(base64)) destination_file = open(destination, 'wb') destination_file.write(source_contents) destination_file.close() </code></pre> <p>But the resulting file is invalid, looks like the operation's messaed up with UTF-8, encoding or something which is still unclear to me.</p> <p>If I try to decode UTF-8 contents before putting them in the destination file, an error is raised:</p> <pre><code>import urllib import base64 # ... retrieving of base64 encoded string through JSON base64 = "77+9UE5HDQ……………oaCgA=" source_contents = urllib.unquote(base64.b64decode(base64)).decode('utf-8') destination_file = open(destination, 'wb') destination_file.write(source_contents) destination_file.close() $ python test.py // ... UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128) </code></pre> <p>As a side note, here's a screenshot of two textual representations of a same file; on left: the original; on right: the one created from the base64-decoded string: <a href="http://cl.ly/0U3G34110z3c132O2e2x">http://cl.ly/0U3G34110z3c132O2e2x</a></p> <p>Is there a known trick to circumvent these problems with encoding when attempting to recreating the file? How would you achieve this yourself?</p> <p>Any help or hint much appreciated :)</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload