Note that there are some explanatory texts on larger screens.

plurals
  1. POPHP DOMDocument loadHTML not encoding UTF-8 correctly
    primarykey
    data
    text
    <p>I'm trying to parse some HTML using DOMDocument, but when I do, I suddenly lose my encoding (at least that is how it appears to me).</p> <pre><code>$profile = "&lt;div&gt;&lt;p&gt;various japanese characters&lt;/p&gt;&lt;/div&gt;"; $dom = new DOMDocument(); $dom-&gt;loadHTML($profile); $divs = $dom-&gt;getElementsByTagName('div'); foreach ($divs as $div) { echo $dom-&gt;saveHTML($div); } </code></pre> <p>The result of this code is that I get a bunch of characters that are not Japanese. However, if I do:</p> <pre><code>echo $profile; </code></pre> <p>it displays correctly. I've tried saveHTML and saveXML, and neither display correctly. I am using PHP 5.3.</p> <p>What I see:</p> <pre><code>ã¤ãªãã¤å·ã·ã«ã´ã«ã¦ãã¢ã¤ã«ã©ã³ãç³»ã®å®¶åº­ã«ã9人åå¼ã®5çªç®ã¨ãã¦çã¾ãããå½¼ãå«ãã¦4人ã俳åªã«ãªã£ããç¶è¦ªã¯æ¨æã®ã»ã¼ã«ã¹ãã³ã§ãæ¯è¦ªã¯éµä¾¿å±ã®å®¢å®¤ä¿ã ã£ãã髿 ¡æä»£ã¯ã­ã£ãã£ã®ã¢ã«ãã¤ãã«å¤ãã¿ãæè²è³éãåããªããã«ããªãã¯ç³»ã®é«æ ¡ã¸é²å­¦ã </code></pre> <p>What should be shown:</p> <pre><code>イリノイ州シカゴにて、アイルランド系の家庭に、9人兄弟の5番目として生まれる。彼を含めて4人が俳優になった。父親は木材のセールスマンで、母親は郵便局の客室係だった。高校時代はキャディのアルバイトに勤しみ、教育資金を受けながらカトリック系の高校へ進学 </code></pre> <p>EDIT: I've simplified the code down to five lines so you can test it yourself.</p> <pre><code>$profile = "&lt;div lang=ja&gt;&lt;p&gt;イリノイ州シカゴにて、アイルランド系の家庭に、&lt;/p&gt;&lt;/div&gt;"; $dom = new DOMDocument(); $dom-&gt;loadHTML($profile); echo $dom-&gt;saveHTML(); echo $profile; </code></pre> <p>Here is the html that is returned:</p> <pre><code>&lt;div lang="ja"&gt;&lt;p&gt;イリノイ州シカゴã«ã¦ã€ã‚¢ã‚¤ãƒ«ãƒ©ãƒ³ãƒ‰ç³»ã®å®¶åº­ã«ã€&lt;/p&gt;&lt;/div&gt; &lt;div lang="ja"&gt;&lt;p&gt;イリノイ州シカゴにて、アイルランド系の家庭に、&lt;/p&gt;&lt;/div&gt; </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload