Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to prevent the PHP DomDocument from "fixing" your HTML string
    primarykey
    data
    text
    <p>I have been trying to parse webpages by use of the HTML DomObject in order to use them for an application to scan them for SEO quality.</p> <p>However i have run into abit of a problem. For testing purposes i've written a small html page containing the following incorrect html :</p> <pre><code>&lt;head&gt; &lt;meta name="description" content="randomdesciption"&gt; &lt;/head&gt; &lt;title&gt;sometitle&lt;/title&gt; </code></pre> <p>As you can see the title is outside the head tag wich is the error i am trying to detect.</p> <p>Now comes the problem, when i use curl to catch the responce string from this page then send it to the dom document to load it as HTML it actually fixes this by ADDING another tags around the title.</p> <pre><code>&lt;head&gt; &lt;meta name="description" content="randomdesciption"&gt; &lt;/head&gt; &lt;head&gt;&lt;title&gt;sometitle&lt;/title&gt;&lt;/head&gt; </code></pre> <p>I have checked the curl responce data and that infact is not the problem, somehow the php DomDocument during the execution of the loadHTML() method fixes the html syntax.</p> <p>I have also tried turning off the DomDocument recover, substituteEntities and validateOnParse attributes by setting them to false, without succes.</p> <p>I have been searching google but i am unable to find any answers so far. I guess it is abit rare for some one that actually want the broken HTML not being fixed.</p> <p>Anyone know how to prevent the DomDocument from fixing my broken html?</p> <p>Thanks in advance</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload