Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I'd definitely recommend UTF-8 over all other encoding schemes.</p> <p>Make sure that your DBMS is fully UTF-8 compliant if you're storing multilingual data in a database</p> <p>Also, ensure that all files, including css, javascript, application template files are themselves encoded in UTF-8 with BOM. If not, the <code>charset</code> directives may not be interpreted correctly by the browser.</p> <p>We have over 30 languages in a big database-backed CMS and it's working like a charm. The client has human editors for all languages who do the data entry.</p> <p>You may run into collation issues with some languages (the example of the dreaded Turkish dotless <code>i</code> - ı - in case-insensitive databases springs to mind). There's always an answer to that, but it'll be very database-specific. </p> <p>I am not familiar with the specifics of Java Resource Bundles. We do use some Java libraries like <code>markdownj</code> that process UTF-8 encoded text in and out of the database without problems.</p> <hr> <p><strong>Edited to answer the OP's comments:</strong></p> <p>I think the main reason for mainstreaming UTF-8 is that you never know in what direction your systems will evolve. You may assume that you'll only be handling one language today but that's not true even in perfectly monolingual environments, as you may have to store names, or references containing non US-ASCII octet values. </p> <p>Also, a UTF-8 encoded character stream will not alter US-ASCII octet values, and this provides full compatibility with non UTF-8 enabled file systems or other software. </p> <p>Today's modern browsers will all interpret UTF-8 correctly provided the application/text file was encoded with UTF-8 and you include the <code>&lt;meta charset="utf-8"&gt;</code> on any page that's served to a browser. </p> <p>Do check whether your middleware (php, jsp, etc) supports UTF-8 anywhere, and do so in conjunction with your database. </p> <p>I fail to see what the problem is with developers potentially dealing with data they don't understand. Isn't that also potentially the case when we deal with data in our own native languages? At least with a fully unicode system they'll be able to recognize whether the glyphs they see in the browser or in the database match the language they're supposed to be dealing with instead of getting streams of ???? ?????? ??? ????</p> <p>I do believe that using UTF-8 as your character encoding for everything is a safe bet. This should work for pretty much every situation, and you're all set for the day you boss comes around and insists you must go multilingual.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload