Note that there are some explanatory texts on larger screens.

plurals
  1. POWhere can I find a specific set of collation rules for equality comparison of strings?
    primarykey
    data
    text
    <p>We all know that using String's equals() method for equality comparison will fail miserably. Instead, one should use <a href="http://docs.oracle.com/javase/7/docs/api/java/text/Collator.html">Collator</a>, like this:</p> <pre><code>// we need to detect User Interface locale somehow Locale uiLocale = Locale.forLanguageTag("da-DK"); // Setting up collator object Collator collator = Collator.getInstance(uiLocale); collator.setStrength(Collator.SECONDARY); collator.setDecomposition(Collator.CANONICAL_DECOMPOSITION); // strings for equality testing String test1 = "USA lover Grækenland støtte"; String test2 = "USA lover graekenland støtte"; boolean result = collator.equals(test1, test2); </code></pre> <p>Now, this code works, that is result is true <strong><em>unless</em></strong> uiLocale is set to Danish. In such case it will yield false. I certainly understand <strong><em>why</em></strong> this happened: this is just because the method equals is implemented like this:</p> <pre><code>return compare(s1, s2) == Collator.Equal; </code></pre> <p>This method calls the one that is used for sorting and check if strings are the same. They are not, because Danish specific collation rules requires that <strong>æ</strong> to be sorted after (if I understand the result of compare method correctly) <strong>ae</strong>. However, these strings are <strong>really</strong> the same, with this strength both case differences and such compatibility characters (that's what its called) should be treated as equal.</p> <p>To fix this, one would use <a href="http://docs.oracle.com/javase/7/docs/api/java/text/RuleBasedCollator.html">RuleBasedCollator</a> with specific set of rules that will work for the equality case.<br> Finally the question is: does anyone know where I can get such specific rules (not only for Danish, but for other languages as well), so that compatibility characters, ligatures, etc. be treated as equal (<a href="http://cldr.unicode.org/index">CLDR</a> <a href="http://unicode.org/repos/cldr-tmp/trunk/diff/summary/da.html">chart</a> does not seem to contain such or I failed searching for it)?</p> <p>Or maybe I want to do something stupid here, and I should really use simply <a href="http://unicode.org/reports/tr10/">UCA</a> for equality comparison (any code sample, please)?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload