Note that there are some explanatory texts on larger screens.

plurals
  1. POPHP, MySQL and XML = garbled HTML output
    primarykey
    data
    text
    <p>I have a field in MySQL of type text, using the following collation: <code>utf8_general_ci</code>.</p> <p>This XML field is populated using a variable built using DOMDocument:</p> <pre><code>function ed_audit_node($dom, $field, $new, $old){ //create audit_detail node $ad = $dom-&gt;createElement('audit_detail'); $fn = $dom-&gt;createElement('fieldname'); $fn-&gt;appendChild($dom-&gt;createTextNode($field)); $ad-&gt;appendChild($fn); $ov = $dom-&gt;createElement('old_value'); $ov-&gt;appendChild($dom-&gt;createTextNode($old)); $ad-&gt;appendChild($ov); $nv = $dom-&gt;createElement('new_value'); $nv-&gt;appendChild($dom-&gt;createTextNode($new)); $ad-&gt;appendChild($nv); //append to document return $ad; } </code></pre> <p>Here's how I save to the db ( $xml comes from $dom->saveXML() ):</p> <pre><code>function ed_audit_insert($ed, $xml){ global $visitor; $sql = &lt;&lt;&lt;EOF INSERT INTO ed.audit (employee_id, audit_date, audit_action, audit_data, user_id) VALUES ( {$ed[emp][employee_id]}, now(), '{$ed[audit_action]}', '{$xml}', {$visitor[user_id]} ); EOF; $req = mysql_query($sql,$ed['db']) or die(db_query_error($sql,mysql_error(),__FUNCTION__)); //snip } </code></pre> <p>See an older, parallel, slightly related thread on how I’m creating this XML: <a href="https://stackoverflow.com/questions/4662008/another-php-xml-parsing-error-input-is-not-proper-utf-8-indicate-encoding">Another PHP XML parsing error: &quot;Input is not proper UTF-8, indicate encoding!&quot;</a></p> <p><strong>What works</strong>: - querying the database, selecting the field and outputting it using jQuery (.ajax()) and populating a textarea. Firebug and the textarea match what's in the database (confirmed with Toad).</p> <p><strong>What doesn't work</strong>: - outputting the text from the database into an HTML page. This HTML page has the content-type ISO-8859-1, which I cannot change.</p> <p>Here’s the code that outputs that to the screen:</p> <pre><code>$xmlData = simplexml_load_string($d['audit_data']); foreach ($xmlData-&gt;audit_detail as $a){ echo "&lt;p&gt; straight from db = ".$a-&gt;new_value."&lt;/p&gt;"; echo "&lt;p&gt; utf8_decode() = ".utf8_decode($a-&gt;new_value)."&lt;/p&gt;"; } </code></pre> <p>I’ve also used a charset changer extension for Firefox: tried ISO-8859-1, UTF-8 and 1252 without success.</p> <p>If it was UTF-8, shouldn’t I be seeing diamonds with question marks inside (since it's content-type = ISO-8859-1)? If it’s not UTF-8, what is it?</p> <p><strong>Edit #1</strong></p> <p>Here's snapshot of other tests that I have made:</p> <pre><code>$xmlData = simplexml_load_string($d['audit_data']); foreach ($xmlData-&gt;audit_detail as $a){ echo "&lt;p&gt;encoding is, straight from db, using mb_detect_encoding: ".mb_detect_encoding($a-&gt;new_value)."&lt;/p&gt;"; echo "&lt;p&gt;encoding is, with utf8_decode, using mb_detect_encoding: ".mb_detect_encoding(utf8_decode($a-&gt;new_value))."&lt;/p&gt;"; echo "&lt;hr/&gt;"; echo "&lt;p&gt; straight from db = &lt;pre&gt;".$a-&gt;new_value."&lt;/pre&gt;&lt;/p&gt;"; echo "&lt;p&gt; utf8_decode() = &lt;pre&gt;".utf8_decode($a-&gt;new_value)."&lt;/pre&gt;&lt;/p&gt;"; echo "&lt;hr/&gt;"; $iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $a-&gt;new_value); $iso88591_3 = mb_convert_encoding($a-&gt;new_value, 'ISO-8859-1', 'UTF-8'); echo "&lt;p&gt; iconv() = ".$iso88591_2."&lt;/p&gt;"; echo "&lt;p&gt; mb_convert_encoding() = ".$iso88591_3."&lt;/p&gt;"; } </code></pre> <p><strong>Edit #2</strong></p> <p>I added the FF proprietary tag, xmp.</p> <p>Code:</p> <pre><code>$xmlData = simplexml_load_string($d['audit_data']); foreach ($xmlData-&gt;audit_detail as $a){ echo "&lt;p&gt;encoding is, straight from db, using mb_detect_encoding: ".mb_detect_encoding($a-&gt;new_value)."&lt;/p&gt;"; echo "&lt;p&gt;encoding is, with utf8_decode, using mb_detect_encoding: ".mb_detect_encoding(utf8_decode($a-&gt;new_value))."&lt;/p&gt;"; echo "&lt;hr/&gt;"; echo "&lt;p&gt; straight from db = &lt;pre&gt;".$a-&gt;new_value."&lt;/pre&gt;&lt;/p&gt;"; echo "&lt;p&gt; utf8_decode() = &lt;pre&gt;".utf8_decode($a-&gt;new_value)."&lt;/pre&gt;&lt;/p&gt;"; echo "&lt;hr/&gt;"; $iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $a-&gt;new_value); $iso88591_3 = mb_convert_encoding($a-&gt;new_value, 'ISO-8859-1', 'UTF-8'); echo "&lt;p&gt; iconv() = ".$iso88591_2."&lt;/p&gt;"; echo "&lt;p&gt; mb_convert_encoding() = ".$iso88591_3."&lt;/p&gt;"; echo "&lt;hr/&gt;"; echo "&lt;p&gt;straight from db, using &amp;lt;xmp&amp;gt; = &lt;xmp&gt;".$a-&gt;new_value."&lt;/xmp&gt;&lt;/p&gt;"; echo "&lt;p&gt;utf8_decode(), using &amp;lt;xmp&amp;gt; = &lt;xmp&gt;".utf8_decode($a-&gt;new_value)."&lt;/xmp&gt;&lt;/p&gt;"; } </code></pre> <p>Here are some meta tags from the page:</p> <pre><code>&lt;meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /&gt; &lt;meta name="dc.language" scheme="ISO639-2/T" content="eng" /&gt; </code></pre> <p>IMO, the last meta tag has no bearing.</p> <p><strong>Edit #3</strong></p> <p>Source code:</p> <pre><code>&lt;p&gt;encoding is, straight from db, using mb_detect_encoding: UTF-8&lt;/p&gt;&lt;p&gt;encoding is, with utf8_decode, using mb_detect_encoding: ASCII&lt;/p&gt;&lt;hr/&gt;&lt;p&gt; straight from db = &lt;pre&gt;Ro马eç ³é ¥n franê¡©s&lt;/pre&gt;&lt;/p&gt;&lt;p&gt; utf8_decode() = &lt;pre&gt;Ro?e??n fran?s&lt;/pre&gt;&lt;/p&gt;&lt;hr/&gt;&lt;p&gt; iconv() = Ro&lt;/p&gt;&lt;p&gt; mb_convert_encoding() = Ro?e??n fran?s&lt;/p&gt;&lt;hr/&gt;&lt;p&gt;straight from db, using &amp;lt;xmp&amp;gt; = &lt;xmp&gt;Ro马eç ³é ¥n franê¡©s&lt;/xmp&gt;&lt;/p&gt;&lt;p&gt;utf8_decode(), using &amp;lt;xmp&amp;gt; = &lt;xmp&gt;Ro?e??n fran?s&lt;/xmp&gt;&lt;/p&gt; </code></pre> <p><strong>Edit #4</strong></p> <p>Here is the SQL statement going in to the db:</p> <pre><code>INSERT INTO ed.audit (employee_id, audit_date, audit_action, audit_data, user_id) VALUES ( 75, now(), 'u', '&lt;?xml version="1.0"?&gt; &lt;audit&gt;&lt;audit_detail&gt;&lt;fieldname&gt;role_fra&lt;/fieldname&gt;&lt;old_value&gt;aRo&amp;#x9A6C;e&amp;#x7833;&amp;#x9825;n fran&amp;#xA869;s&lt;/old_value&gt;&lt;new_value&gt;bRo&amp;#x9A6C;e&amp;#x7833;&amp;#x9825;n fran&amp;#xA869;s&lt;/new_value&gt;&lt;/audit_detail&gt;&lt;/audit&gt; ', 333 ); </code></pre> <p>! Note, the text from this XML doesn't necessarily match the screenshots provided above.</p> <p><strong>Edit #5</strong></p> <p>Here's my new function that wraps the CDATA tag around my values for the old_value and new_value nodes:</p> <pre><code>function ed_audit_node($dom, $field, $new, $old){ //create audit_detail node $ad = $dom-&gt;createElement('audit_detail'); $fn = $dom-&gt;createElement('fieldname'); $fn-&gt;appendChild($dom-&gt;createTextNode($field)); $ad-&gt;appendChild($fn); $ov = $dom-&gt;createElement('old_value'); $ov-&gt;appendChild($dom-&gt;createCDATASection($old)); $ad-&gt;appendChild($ov); $nv = $dom-&gt;createElement('new_value'); $nv-&gt;appendChild($dom-&gt;createCDATASection($new)); $ad-&gt;appendChild($nv); //append to document return $ad; } </code></pre> <p>I also added the encoding to the XML document:</p> <pre><code>$dom = new DomDocument('1.0', 'UTF-8'); </code></pre> <p>Here's my new simpleXML call:</p> <pre><code>$xmlData = simplexml_load_string($d['audit_data'], "SimpleXMLElement", LIBXML_NOENT | LIBXML_NOCDATA); </code></pre> <p>I see the CDATA tags in Toad as well. However, I'm still getting an error:</p> <pre><code>Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 2: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE9 0xE9 0x6C 0x65 in &lt;snip&gt; </code></pre> <p><strong>Edit #6</strong></p> <p>I just noticed that the jQuery call returns the proper accented characters in the CDATA.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload