Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I recommend merely not allowing garbage to get in. Don't rely on custom functions, which can bog your system down. Simply walk the submitted data against an alphabet you design. Create an acceptable alphabet string and walk the submitted data, byte by byte, as if it were an array. Push acceptable characters to a new string, and omit unacceptable characters. The data you store in your database then is data triggered by the user, but not actually user-supplied data.</p> <p>EDIT #4: Replacing bad character with entiy: &#65533;</p> <p>EDIT #3: Updated : Sept 22 2010 @ 1:32pm Reason: Now string returned is UTF-8, plus I used the test file you provided as proof.</p> <pre><code>&lt;?php // build alphabet // optionally you can remove characters from this array $alpha[]= chr(0); // null $alpha[]= chr(9); // tab $alpha[]= chr(10); // new line $alpha[]= chr(11); // tab $alpha[]= chr(13); // carriage return for ($i = 32; $i &lt;= 126; $i++) { $alpha[]= chr($i); } /* remove comment to check ascii ordinals */ // /* // foreach ($alpha as $key=&gt;$val){ // print ord($val); // print '&lt;br/&gt;'; // } // print '&lt;hr/&gt;'; //*/ // // //test case #1 // // $str = 'afsjdfhasjhdgljhasdlfy42we875y342q8957y2wkjrgSAHKDJgfcv kzXnxbnSXbcv '.chr(160).chr(127).chr(126); // // $string = teststr($alpha,$str); // print $string; // print '&lt;hr/&gt;'; // // //test case #2 // // $str = ''.'©?™???'; // $string = teststr($alpha,$str); // print $string; // print '&lt;hr/&gt;'; // // $str = '©'; // $string = teststr($alpha,$str); // print $string; // print '&lt;hr/&gt;'; $file = 'http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt'; $testfile = implode(chr(10),file($file)); $string = teststr($alpha,$testfile); print $string; print '&lt;hr/&gt;'; function teststr(&amp;$alpha, &amp;$str){ $strlen = strlen($str); $newstr = chr(0); //null $x = 0; if($strlen &gt;= 2){ for ($i = 0; $i &lt; $strlen; $i++) { $x++; if(in_array($str[$i],$alpha)){ // passed $newstr .= $str[$i]; }else{ // failed print 'Found out of scope character. (ASCII: '.ord($str[$i]).')'; print '&lt;br/&gt;'; $newstr .= '&amp;#65533;'; } } }elseif($strlen &lt;= 0){ // failed to qualify for test print 'Non-existent.'; }elseif($strlen === 1){ $x++; if(in_array($str,$alpha)){ // passed $newstr = $str; }else{ // failed print 'Total character failed to qualify.'; $newstr = '&amp;#65533;'; } }else{ print 'Non-existent (scope).'; } if(mb_detect_encoding($newstr, "UTF-8") == "UTF-8"){ // skip }else{ $newstr = utf8_encode($newstr); } // test encoding: if(mb_detect_encoding($newstr, "UTF-8")=="UTF-8"){ print 'UTF-8 :D&lt;br/&gt;'; }else{ print 'ENCODED: '.mb_detect_encoding($newstr, "UTF-8").'&lt;br/&gt;'; } return $newstr.' (scope: '.$x.', '.$strlen.')'; } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload