Note that there are some explanatory texts on larger screens.

plurals
  1. POPHP - Mixed charset filenames (Latin, Japanese, Korean) error with RecursiveDirectoryIterator + RecursiveIteratorIterator + RegexIterator
    primarykey
    data
    text
    <p>I'm reading my music directory to populate a JSON for jPlayer, as follow:</p> <pre><code>&lt;?php //tried utf-8, shift_jis, etc. No difference header('Content-Type: application/json; charset=SHIFT_JIS'); //cant be blank so i put . to make current file dir as base $Directory = new RecursiveDirectoryIterator('.'); $Iterator = new RecursiveIteratorIterator($Directory); $Regex = new RegexIterator($Iterator, '/^.+\.mp3$/i', RecursiveRegexIterator::GET_MATCH); //instead of glob(*/*.mp3) because isnt recursive $filesJson = []; foreach ($Regex as $key =&gt; $value) { $whatever = str_ireplace(['.mp3','.\\'], '', $key); $filesJson['mp3'][] = [ 'title' =&gt; htmlspecialchars($whatever), 'mp3' =&gt; $key ]; } echo json_encode($filesJson); exit(); ?&gt; </code></pre> <p>The problem lies in files which filename isn't standard UTF-8 - as Latin, Japanese and Korean ones. Examples:</p> <p><strong>Japanese</strong></p> <p><img src="https://i.stack.imgur.com/hf9c9.png" alt="enter image description here"></p> <p><strong>Korean</strong></p> <p><img src="https://i.stack.imgur.com/XosoL.png" alt="enter image description here"></p> <p><strong>Latin (pt-br)</strong></p> <p><img src="https://i.stack.imgur.com/riq6l.png" alt="enter image description here"></p> <p>Which converts into <code>?</code>, or simply becomes <code>null</code> when parsing latin names ( <code>Geração</code> or <code>4º</code> for e.g.)</p> <p><img src="https://i.stack.imgur.com/fpBru.png" alt="enter image description here"></p> <hr> <p>So, how make the filenames/paths be parsed correctly with different kinds of languages? The header charset isn't helping.</p> <h3>Info:</h3> <p>XAMPP with Apache2 + PHP 5.4.2 at Win7 x86</p> <hr> <h3>Update #1:</h3> <p>Tried @infinity's answer but no changes. Still <code>?</code> on JP, <code>null</code> on Latin.</p> <pre><code>&lt;?php header('Content-Type: application/json; charset=UTF-8'); mb_internal_encoding('UTF-8'); $Directory = new RecursiveDirectoryIterator('.'); $Iterator = new RecursiveIteratorIterator($Directory); $Regex = new RegexIterator($Iterator, '/^.+\.mp3$/i', RecursiveRegexIterator::GET_MATCH); $filesJson = []; foreach ($Regex as $key =&gt; $value) { $whatever = mb_substr($key, 2, mb_strlen($key)-6, "utf-8"); // 2 to remove .\ and -6 to remove .mp3 (-4 + -2) $filesJson['mp3'][] = [ 'title' =&gt; $whatever, //tried with and without htmlspecialchars 'mp3' =&gt; $key ]; } echo json_encode($filesJson); exit(); ?&gt; </code></pre> <p>If I use <code>HTML-ENTITIES</code> instead of <code>utf-8</code> on <code>mb_substr()</code>, latin characters works but asian still <code>?</code>.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload