Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>What do you mean by "without extra modules"?</p> <p>Here is a solution with <code>use Unicode::Normalize;</code> <a href="http://perldoc.perl.org/Unicode/Normalize.html" rel="nofollow">see on perl doc</a></p> <p>I removed the "ţ" and the "ļ" from your string, my eclipse didn't wanted to save the script with them.</p> <pre><code>use strict; use warnings; use UTF8; use Unicode::Normalize; my $str = "Îñtérñåtîöñålîžåtîöñ"; for ( $str ) { # the variable we work on ## convert to Unicode first ## if your data comes in Latin-1, then uncomment: #$_ = Encode::decode( 'iso-8859-1', $_ ); $_ = NFD( $_ ); ## decompose s/\pM//g; ## strip combining characters s/[^\0-\x80]//g; ## clear everything else } if ($str =~ /nation/) { print $str . "\n"; } </code></pre> <p>The output is </p> <blockquote> <p>Internationaliation </p> </blockquote> <p>The "ž" is removed from the string, it seems not to be a composed character.</p> <p>The code for the for loop is from this side <a href="http://ahinea.com/en/tech/accented-translate.html" rel="nofollow">How to remove diacritic marks from characters</a></p> <p>Another interesting read is <a href="http://www.joelonsoftware.com/articles/Unicode.html" rel="nofollow">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)</a> from Joel Spolsky</p> <p><strong>Update:</strong></p> <p>As @tchrist pointed out, there is a algorithm existing, that is better suited, called UCA (Unicode Collation Algorithm). @nordicdyno, already provided a implementation in his question.</p> <p>The algorithm is described here <a href="http://www.unicode.org/reports/tr10/" rel="nofollow">Unicode Technical Standard #10, Unicode Collation Algorithm</a></p> <p>the perl module is described here on <a href="http://perldoc.perl.org/Unicode/Collate.html#DESCRIPTION" rel="nofollow">perldoc.perl.org</a></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload