Note that there are some explanatory texts on larger screens.

plurals
  1. POPHP HTML encoding
    text
    copied!<p>I'm trying to parse a HTML page, but the encoding is messing my results. After some research I found a very popular solution using <code>utf8_encode()</code> and <code>utf8_decode()</code>, but it doesn't change anything. In the following lines, you can check my code and the output.</p> <h2>Code</h2> <pre><code>$str_html = $this-&gt;curlHelper-&gt;file_get_contents_curl($page); $str_html = utf8_encode($str_html); $dom = new DOMDocument(); $dom-&gt;resolveExternals = true; $dom-&gt;substituteEntities = false; @$dom-&gt;loadHTML($str_html); $xpath = new DomXpath($dom); (...) $profile = array(); for ($index = 0; $index &lt; $table_lines-&gt;length; $index++) { $desc = utf8_decode($table_lines-&gt;item($index)-&gt;firstChild-&gt;nodeValue); } </code></pre> <h2>Output</h2> <pre><code>Testar é bom </code></pre> <p><strong>Should be</strong></p> <pre><code>Testar é bom </code></pre> <h2>What I've tried</h2> <ul> <li><p>htmlentities():</p> <p><code>htmlentities($table_lines-&gt;item($index)-&gt;lastChild-&gt;nodeValue, ENT_NOQUOTES, ini_get('ISO-8859-1'), false);</code></p></li> <li><p>htmlspecialchars(): </p> <p><code>htmlspecialchars($table_lines-&gt;item($index)-&gt;lastChild-&gt;nodeValue, ENT_NOQUOTES, 'ISO- 8859-1', false);</code></p></li> <li><p>Change my file's charset as decribed <a href="https://stackoverflow.com/a/6306027/1488993">here</a>.</p></li> </ul> <h2>Some more information</h2> <ul> <li>Website encoding: <code>&lt;meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" /&gt;</code></li> </ul> <p>Thanks in advance!</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload