Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You absolutely do NOT want to parse HTML with Regex.</p> <p>There are far too many variations, for one, and more importantly, regex isn't very good with the hierarchal nature of HTML. It's best to use an XML parser or better-yet an HTML-specific parser.</p> <p>Whenever I need to scrape HTML, I tend to use the <a href="http://simplehtmldom.sourceforge.net/" rel="noreferrer">Simple HTML DOM Parser</a> library, which takes an HTML tree and parses it into a traversable PHP object, which you can query something like JQuery.</p> <pre><code>&lt;?php require 'simplehtmldom/simple_html_dom.php'; $sHtml = &lt;&lt;&lt;EOS &lt;table border="1" &gt; &lt;tbody style="" &gt; &lt;tr style="" &gt; &lt;td style="color:blue;"&gt; data0 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data1 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data2 &lt;/td&gt; &lt;td style="color:blue;"&gt; data3 &lt;/td&gt; &lt;td style="color:blue;"&gt; data4 &lt;/td&gt; &lt;/tr&gt; &lt;tr style="" &gt; &lt;td style="color:blue;"&gt; data00 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data11 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data22 &lt;/td&gt; &lt;td style="color:blue;"&gt; data33 &lt;/td&gt; &lt;td style="color:blue;"&gt; data44 &lt;/td&gt; &lt;/tr&gt; &lt;tr style="color:black" &gt; &lt;td style="color:blue;"&gt; data000 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data111 &lt;/td&gt; &lt;td style="font-size:15px;"&gt; data222 &lt;/td&gt; &lt;td style="color:blue;"&gt; data333 &lt;/td&gt; &lt;td style="color:blue;"&gt; data444 &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt; EOS; $oHTML = str_get_html($sHtml); $oTRs = $oHTML-&gt;find('table tr'); $aData = array(); foreach($oTRs as $oTR) { $aRow = array(); $oTDs = $oTR-&gt;find('td'); foreach($oTDs as $oTD) { $aRow[] = trim($oTD-&gt;plaintext); } $aData[] = $aRow; } var_dump($aData); ?&gt; </code></pre> <p>And the output:</p> <pre><code>array 0 =&gt; array 0 =&gt; string 'data0' (length=5) 1 =&gt; string 'data1' (length=5) 2 =&gt; string 'data2' (length=5) 3 =&gt; string 'data3' (length=5) 4 =&gt; string 'data4' (length=5) 1 =&gt; array 0 =&gt; string 'data00' (length=6) 1 =&gt; string 'data11' (length=6) 2 =&gt; string 'data22' (length=6) 3 =&gt; string 'data33' (length=6) 4 =&gt; string 'data44' (length=6) 2 =&gt; array 0 =&gt; string 'data000' (length=7) 1 =&gt; string 'data111' (length=7) 2 =&gt; string 'data222' (length=7) 3 =&gt; string 'data333' (length=7) 4 =&gt; string 'data444' (length=7) </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload