Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>You absolutely do NOT want to parse HTML with Regex.</p> <p>There are far too many variations, for one, and more importantly, regex isn't very good with the hierarchal nature of HTML. It's best to use an XML parser or better-yet an HTML-specific parser.</p> <p>Whenever I need to scrape HTML, I tend to use the <a href="http://simplehtmldom.sourceforge.net/" rel="noreferrer">Simple HTML DOM Parser</a> library, which takes an HTML tree and parses it into a traversable PHP object, which you can query something like JQuery.</p> <pre><code>&lt;?php require 'simplehtmldom/simple_html_dom.php'; $sHtml = &lt;&lt;&lt;EOS &lt;table border="1" &gt; &lt;tbody style="" &gt; &lt;tr style="" &gt; &lt;td style="color:blue;"&gt; data0 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data1 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data2 &lt;/td&gt; &lt;td style="color:blue;"&gt; data3 &lt;/td&gt; &lt;td style="color:blue;"&gt; data4 &lt;/td&gt; &lt;/tr&gt; &lt;tr style="" &gt; &lt;td style="color:blue;"&gt; data00 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data11 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data22 &lt;/td&gt; &lt;td style="color:blue;"&gt; data33 &lt;/td&gt; &lt;td style="color:blue;"&gt; data44 &lt;/td&gt; &lt;/tr&gt; &lt;tr style="color:black" &gt; &lt;td style="color:blue;"&gt; data000 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data111 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data222 &lt;/td&gt; &lt;td style="color:blue;"&gt; data333 &lt;/td&gt; &lt;td style="color:blue;"&gt; data444 &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt; EOS; $oHTML = str_get_html($sHtml); $oTRs = $oHTML-&gt;find('table tr'); $aData = array(); foreach($oTRs as $oTR) { $aRow = array(); $oTDs = $oTR-&gt;find('td'); foreach($oTDs as $oTD) { $aRow[] = trim($oTD-&gt;plaintext); } $aData[] = $aRow; } var_dump($aData); ?&gt; </code></pre> <p>And the output:</p> <pre><code>array 0 =&gt; array 0 =&gt; string 'data0' (length=5) 1 =&gt; string 'data1' (length=5) 2 =&gt; string 'data2' (length=5) 3 =&gt; string 'data3' (length=5) 4 =&gt; string 'data4' (length=5) 1 =&gt; array 0 =&gt; string 'data00' (length=6) 1 =&gt; string 'data11' (length=6) 2 =&gt; string 'data22' (length=6) 3 =&gt; string 'data33' (length=6) 4 =&gt; string 'data44' (length=6) 2 =&gt; array 0 =&gt; string 'data000' (length=7) 1 =&gt; string 'data111' (length=7) 2 =&gt; string 'data222' (length=7) 3 =&gt; string 'data333' (length=7) 4 =&gt; string 'data444' (length=7) </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload