Note that there are some explanatory texts on larger screens.

plurals
  1. PODetect Unicode Character Range in PHP
    primarykey
    data
    text
    <p>Evening,</p> <p>Does anyone have an idea what the quickest way to detect the Unicode range of a string is in PHP? I thought there would be something to do this in PHP, but I can't find anything. Ideally what I want is a function that will say, 100% of 'John Jones' is Latin OR 'Jones језик' is 50% Latin and 50% Cyrillic.</p> <p>You could do it with something like the below in ReEx:</p> <pre><code>strA = 'John Jones'; $strB = 'Српски језик'; $strC = 'Հայաստանի Հանրապետություն'; preg_match( '~[\p{Cyrillic}\p{Common}]+~u', $strB, $res ); </code></pre> <p>But this will require checking against each range, which does not seem a good idea. Alternatively, you could get the unicode value of each character and check which range it is in. But I'd imagine someone has already made something like this.</p> <p><strong>EDIT</strong></p> <p>To give a little more idea on why this may be useful, as pointed out in the comments, some people sometimes mix the visually identical Latin and Cyrillic characters. e.g. this is a search for Croatia with a Cyrillic 'С' and the rest in Latin:</p> <p><a href="https://www.google.am/search?q=%22%D0%A1roatia%22&amp;aq=f&amp;oq=%22%D0%A1roatia%22" rel="nofollow">https://www.google.am/search?q=%22%D0%A1roatia%22&amp;aq=f&amp;oq=%22%D0%A1roatia%22</a></p> <p>Search again with full-Latin and you will get about 100,000,000 results instead of 20,000. In such cases it would be desirable to replace characters as is appropriate in the context of the text. A good example of where such detection is useful is people who use Cyrillic letter to bypass profanity filters.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload