Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>To deal with it conceptually before getting into mechanisms, and apologies if any of this is obvious, a string can be defined as a sequence of Unicode characters, Unicode being a database that gives an id number known as a code point to every character you might need to work with. GSM-338 contains a subset of the Unicode characters, so what you're doing is extracting a set of codepoints from your string, and checking to see if that set is contained in GSM-338.</p> <pre><code>// second column of http://unicode.org/Public/MAPPINGS/ETSI/GSM0338.TXT $gsm338_codepoints = array(0x0040, 0x0000, ..., 0x00fc, 0x00e0) $can_use_gsm338 = true; foreach(codepoints($mystring) as $codepoint){ if(!in_array($codepoint, $gsm338_codepoints)){ $can_use_gsm338 = false; break; } } </code></pre> <p>That leaves the definition of the function codepoints($string), which isn't built in to PHP. PHP understands a string to be a sequence of bytes rather than a sequence of Unicode characters. The best way of bridging the gap is to get your strings into UTF8 as quickly as you can and keep them in UTF8 as long as you can - you'll have to use other encodings when dealing with external systems, but isolate the conversion to the interface to that system and deal only with utf8 internally. </p> <p>The functions you need to convert between php strings in utf8 and sequences of codepoints can be found at <a href="http://hsivonen.iki.fi/php-utf8/" rel="noreferrer">http://hsivonen.iki.fi/php-utf8/</a> , so that's your codepoints() function.</p> <p>If you're taking data from an external source that gives you Unicode slash-escaped characters ("Let's test \u00f6\u00e4\u00fc..."), that string escape format should be converted to utf8. I don't know offhand of a function to do this, if one can't be found, it's a matter of string/regex processing + the use of the hsivonen.iki.fi functions, for example when you hit \u00f6, replace it with the utf8 representation of the codepoint 0xf6.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload