Note that there are some explanatory texts on larger screens.

plurals
  1. PODjango with MySQL and UTF-8
    text
    copied!<blockquote> <p><strong>Possible Duplicate:</strong><br> <a href="https://stackoverflow.com/questions/3220031/how-to-filter-or-replace-unicode-characters-that-would-take-more-than-3-bytes">How to filter (or replace) unicode characters that would take more than 3 bytes in UTF-8?</a> </p> </blockquote> <p><strong>Background:</strong></p> <p>I am using Django with MySQL 5.1 and I am having trouble with 4-byte UTF-8 characters causing fatal errors throughout my web application.</p> <p>I've used <a href="https://stackoverflow.com/a/11597447/98187">a script</a> to convert all tables and columns in my database to UTF-8 which has fixed most unicode issues, but there is still an issue with 4-byte unicode characters. As <a href="https://stackoverflow.com/a/3060537/98187">noted elsewhere</a>, MySQL 5.1 does not support UTF-8 characters over 3 bytes in length.</p> <p>Whenever I enter a 4-byte unicode character (e.g. ) into a ModelForm on my Django website the form validates and then an exception similar to the following is raised:</p> <pre><code>Incorrect string value: '\xF0\x9F\x80\x90' for column 'first_name' at row 1 </code></pre> <p><strong>My question:</strong></p> <p>What is a reasonable way to avoid fatal errors caused by 4-byte UTF-8 characters in a Django web application with a MySQL 5.1 database.</p> <p>I have considered:</p> <ol> <li>Selectively disabling MySQL warnings to avoid specifically that error message (not sure whether that is possible yet)</li> <li>Creating middleware that will look through the <code>request.POST</code> <code>QueryDict</code> and substitute/remove all invalid UTF8 characters</li> <li>Somehow hook/alter/monkey patch the mechanism that outputs SQL queries for Django or for MySQLdb to substitute/remove all invalid UTF-8 characters before the query is executed</li> </ol> <hr> <p>Example middleware to replacing invalid characters (inspired by <a href="https://stackoverflow.com/a/11597447/98187">this SO question</a>):</p> <pre><code>import re class MySQLUnicodeFixingMiddleware(object): INVALID_UTF8_RE = re.compile(u'[^\u0000-\uD7FF\uE000-\uFFFF]', re.UNICODE) def process_request(self, request): """Replace 4-byte unicode characters by REPLACEMENT CHARACTER""" request.POST = request.POST.copy() for key, values in request.POST.iterlists(): request.POST.setlist(key, [self.INVALID_UTF8_RE.sub(u'\uFFFD', v) for v in values]) </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload