Note that there are some explanatory texts on larger screens.

plurals
  1. POTransliterate any convertible utf8 char into ascii equivalent
    primarykey
    data
    text
    <p>Is there any good solution out there that does this transliteration in a good manner?</p> <p>I've tried using <code>iconv()</code>, but is very annoying and it does not behave as one might expect.</p> <ul> <li>Using <code>//TRANSLIT</code> will try to replace what it can, leaving everything nonconvertible as "?" </li> <li>Using <code>//IGNORE</code> will not leave "?" in text, but will also not transliterate and will also raise <code>E_NOTICE</code> when nonconvertible char is found, so you have to use iconv with @ error suppressor</li> <li>Using <code>//IGNORE//TRANSLIT</code> (as some people suggested in PHP forum) is actually same as <code>//IGNORE</code> (tried it myself on php versions 5.3.2 and 5.3.13)</li> <li>Also using <code>//TRANSLIT//IGNORE</code> is same as <code>//TRANSLIT</code></li> </ul> <p>It also uses current locale settings to transliterate.</p> <p><strong>WARNING - a lot of text and code is following!</strong></p> <p>Here are some examples:</p> <pre><code>$text = 'Regular ascii text + čćžšđ + äöüß + éĕěėëȩ + æø€ + $ + ¶ + @'; echo '&lt;br /&gt;original: ' . $text; echo '&lt;br /&gt;regular: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text); //&gt; regular: Regular ascii text + ????? + ???ss + ?????? + ae?EUR + $ + ? + @ setlocale(LC_ALL, 'en_GB'); echo '&lt;br /&gt;en_GB: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text); //&gt; en_GB: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @ setlocale(LC_ALL, 'en_GB.UTF8'); // will this work? echo '&lt;br /&gt;en_GB.UTF8: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text); //&gt; en_GB.UTF8: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @ </code></pre> <p>Ok, that did convert č ć š ä ö ü ß é ĕ ě ė ë ȩ and æ, but why not đ and ø?</p> <pre><code>// now specific locales setlocale(LC_ALL, 'hr_Hr'); // this should fix croatian đ, right? echo '&lt;br /&gt;hr_Hr: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text); // wrong &gt; hr_Hr: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @ setlocale(LC_ALL, 'sv_SE'); // so this will fix swedish ø? echo '&lt;br /&gt;sv_SE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text); // will not &gt; sv_SE: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @ //this is interesting setlocale(LC_ALL, 'de_DE'); echo '&lt;br /&gt;de_DE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text); //&gt; de_DE: Regular ascii text + cczs? + aeoeuess + eeeeee + ae?EUR + $ + ? + @ // actually this is what any german would expect since ä ö ü really is same as ae oe ue </code></pre> <p>Lets try with <code>//IGNORE</code>:</p> <pre><code>echo '&lt;br /&gt;ignore: ' . iconv("UTF-8", "ASCII//IGNORE", $text); //&gt; ignore: Regular ascii text + + + + + $ + + @ //+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 49" // with translit? echo '&lt;br /&gt;ignore/translit: ' . iconv("UTF-8", "ASCII//IGNORE//TRANSLIT", $text); //same as ignore only&gt; ignore/translit: Regular ascii text + + + + + $ + + @ //+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 54" // translit/ignore? echo '&lt;br /&gt;translit/ignore: ' . iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $text); //same as translit only&gt; translit/ignore: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @ </code></pre> <p>Using <a href="https://stackoverflow.com/a/6857767/555097">solution of this guy</a> also does not work as wanted: <code>Regular ascii text + YYYYY + aous + eYYYeY + aoY + $ + � + @</code></p> <p>Even using PECL intl <a href="http://www.php.net/manual/en/class.normalizer.php" rel="nofollow noreferrer">Normalizer</a> class (which is not awailable always even if you have PHP > 5.3.0, since ICU package intl uses may not be available to PHP i.e. on certain hosting servers) produces wrong result:</p> <pre><code>echo '&lt;br /&gt;normalize: ' .preg_replace('/\p{Mn}/u', '', Normalizer::normalize($text, Normalizer::FORM_KD)); //&gt;normalize: Regular ascii text + cczsđ + aouß + eeeeee + æø€ + $ + ¶ + @ </code></pre> <p>So is there any other way of doing this right or the only proper thing to do is to do <code>preg_replace()</code> or <code>str_replace()</code> and define transliteration tables yourself?</p> <p>// appendix: I have found on ZF wiki debate from 2008 about <a href="http://framework.zend.com/wiki/display/ZFPROP/Zend_Filter_Transliteration+-+Martin+Hujer" rel="nofollow noreferrer">proposal for Zend_Filter_Transliterate</a> but project was dropped since in some languages it is not possible to convert (i.e. chinese), but still for any latin- and cyrilic-based language IMO this option should exist.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload