StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPHP HTML encoding
text
Body
copied!<p>I'm trying to parse a HTML page, but the encoding is messing my results. After some research I found a very popular solution using <code>utf8_encode()</code> and <code>utf8_decode()</code>, but it doesn't change anything. In the following lines, you can check my code and the output.</p> <h2>Code</h2> <pre><code>$str_html = $this->curlHelper->file_get_contents_curl($page); $str_html = utf8_encode($str_html); $dom = new DOMDocument(); $dom->resolveExternals = true; $dom->substituteEntities = false; @$dom->loadHTML($str_html); $xpath = new DomXpath($dom); (...) $profile = array(); for ($index = 0; $index < $table_lines->length; $index++) { $desc = utf8_decode($table_lines->item($index)->firstChild->nodeValue); } </code></pre> <h2>Output</h2> <pre><code>Testar Ã© bom </code></pre> <p><strong>Should be</strong></p> <pre><code>Testar é bom </code></pre> <h2>What I've tried</h2> <ul> <li><p>htmlentities():</p> <p><code>htmlentities($table_lines->item($index)->lastChild->nodeValue, ENT_NOQUOTES, ini_get('ISO-8859-1'), false);</code></p></li> <li><p>htmlspecialchars(): </p> <p><code>htmlspecialchars($table_lines->item($index)->lastChild->nodeValue, ENT_NOQUOTES, 'ISO- 8859-1', false);</code></p></li> <li><p>Change my file's charset as decribed <a href="https://stackoverflow.com/a/6306027/1488993">here</a>.</p></li> </ul> <h2>Some more information</h2> <ul> <li>Website encoding: <code><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" /></code></li> </ul> <p>Thanks in advance!</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload