StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow do you set strings to uppercase / lowercase in Unicode?
text
Body
copied!<p>This is mostly a theoretical question I'm just very curious about. (I'm not trying to do this by coding it myself or anything, I'm not reinventing wheels.)</p> <p>My question is how the uppercase/lowercase table of equivalence works for Unicode.</p> <p>For example, if I had to do this in ASCII, I'd take a character, and if it falls withing the [a-z] range, I'd sum the difference between A and a.</p> <p>If it doesn't fall on that range, I'd have a small equivalence table for the 10 or so accented characters plus ñ. (Or, I could just have a full equivalence array with 256 entries, most of which would be the same as the input)</p> <p>However, I'm guessing that there's a better way of specifying the equivalences in Unicode, given that there are hundreds of thousands of characters, and that theoretically, a new language or set of characters can be added (and I'm expecting that you wouldn't need to patch windows when that happens).</p> <p>Does Windows have a huge hard-coded equivalence table for each character? Or how is this implemented?</p> <p>A related question is how SQL Server implements Unicode-based accent-insensitive and case-insensitive queries. Does it have an internal table that tells it that é ë è E É È and Ë are all equivalent to "e"?</p> <p>That doesn't sound very fast when it comes to comparing strings.</p> <p>How does it access Indexes quickly? Does it already index values converted to their "base" characters, corresponding to that field's collation?</p> <p>Does anyone know the internals for these things?</p> <p>Thank you!</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload