Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I relate Unicode blocks to Languages/Scripts?
    primarykey
    data
    text
    <p>I am trying to find a resource that can be used to connect Languages (or more probably Scripts) to blocks of Unicode characters. Such a resource would be used to lookup questions such as "What Unicode Blocks are used in French?" or "What languages use the block from 0A80-0AFF (<a href="http://unicodinator.com/#Block-Gujarati">http://unicodinator.com/#Block-Gujarati</a>)?" Do you know of such a resource? </p> <p>I would have expected to be able to find this information easily at <a href="http://unicode.org">unicode.org</a>. I was quickly able to find a great table that relates Country Codes to Languages (<a href="http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/territory_language_information.html">http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/territory_language_information.html</a>). But I've spent quite a bit of time poking around with no luck finding something that relates Unicode Blocks to Languages. Its possible I've got a terminology issue blocking me from connecting the dots here...</p> <p>I am not picky about exactly what is meant by "language" (Java Locale code or ISO 639 code or whatever) in this case. I also understand that there may not be exact answers because, for instance, an Arabic document can contain Latin and other text in addition to characters from the Arabic blocks (<a href="http://unicodinator.com/#Block-Arabic">http://unicodinator.com/#Block-Arabic</a>, <a href="http://unicodinator.com/#Block-Arabic_Supplement">http://unicodinator.com/#Block-Arabic_Supplement</a>). But surely there must be some table that says "these languages go with these blocks"... I'm also not picky about the format (XML, CSV, whatever), I can easily transform this into data I can use for my application. And again, I do realize the reference would probably connect <em>Scripts</em> to Blocks, not Languages (though Scripts can be mapped to Languages).</p> <p>I do realize this will be a many-to-many table (since many languages use characters from multiple blocks, and many blocks are used by multiple languages); I do realize this cannot be precisely answered since Unicode codepoints are not language specific -- however, neither can the question of "what languages are there in this country" (answer is probably "most of them" for most countries), yet a table like this (<a href="http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/territory_language_information.html">http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/territory_language_information.html</a>) is still possible to create, meaningful and useful.</p> <p>As to <em>why</em> I'd want such a thing: I would like to enhance <a href="http://unicodinator.com">http://unicodinator.com</a> with global heat-maps for the code blocks, and lists of languages; I also have a game concept I am tinkering with. Beyond that, there are probably many other uses other people could have for this (font creation? heuristic, quick, best-guess language detection now that the Google Translate API is going away? research projects?).</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload