Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><b>EDIT:</b> About the head element - you want to get only the attributes of the head element, you can use xpath( "//head" ) <a href="http://bg.php.net/manual/en/function.simplexml-element-attributes.php" rel="nofollow noreferrer">and then $head->attributes</a>.</p> <p>I won't directly answer your question which is not very full of details, but I will rather tell a story about my own experience. I believe that you can solve your problems if you understand the implications of the examples I am giving.</p> <p>I understand from the tags that you want to use PHP on the job. I had a similar problem lately, where I had to parse around 100 static html documents, and extract parts of the information to place it in a database. Initially I thought about regular expressions, but as I went along I saw that will be a tedious task.</p> <p>So I ended up messing with XPath and SimpleXML in PHP.</p> <p>Here is how I ended up:</p> <pre><code>$file_contents = file_get_contents( $file ); $dom = new DOMDocument; $dom-&gt;loadHTML( $file_contents ); $document = simplexml_import_dom( $dom ); </code></pre> <p>Now I have a SimpleXML object which holds the HTML code. That is really great - here is how it rolls:</p> <p>suppose you have the following html code:</p> <pre><code>&lt;div id="content"&gt; &lt;div class="description"&gt; &lt;dl&gt; &lt;dt&gt;Title&lt;/dt&gt; &lt;dd&gt; &lt;ul&gt;&lt;li&gt; first item &lt;/li&gt; &lt;li&gt; second item&lt;/li&gt;&lt;/ul&gt; &lt;p&gt; a paragraph.. &lt;/p&gt; &lt;/dd&gt; &lt;/dl&gt; &lt;/div&gt; &lt;/div&gt; </code></pre> <p>Now, you can iterate over all the &lt;dl&gt; elements in your code, which are children of div#description and grandchildren of div#content like that:</p> <pre><code>foreach( $document-&gt;xpath( "//div[@id='content']/div[@class='description']/dl" ) as $element ) </code></pre> <p>and then all the children are parsed through a recursive function like this one:</p> <pre><code>function recurse( $parent ) { echo '&lt;' . $parent-&gt;getName() . '&gt;' . "\n"; #echo $parent # you might want to strip any white spaces like \t and \n here foreach( $parent-&gt;children() as $child ) { if( count( $child-&gt;children() ) &gt; 0 ) { recurse( $child ); } else { echo '&lt;' . $child-&gt;getName() . '&gt;'; echo $child; echo '&lt;/' . $child-&gt;getName() . '&gt;' . "\n"; } } echo '&lt;/' . $parent-&gt;getName() . '&gt;' . "\n"; } </code></pre> <p>I hope that I've been of help, good luck!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload