Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>It's not that hard, really... <a href="http://www.php.net/manual/en/domdocument.loadhtml.php" rel="nofollow">just google, and click this link</a>, you now know how to parse a DOM, <a href="http://php.net/manual/en/class.domdocument.php" rel="nofollow">here</a> you can see what methods you can use to select all elements of interest, iterate the DOM, get its contents and what have you...</p> <pre><code>$DOM = new DOMDocument(); $DOM-&gt;loadHTML($htmlString); $spans = $DOM-&gt;getElementsByTagName('span'); for ($i=0, $j = count($spans); $i &lt; $j; $i++) { echo $spans[$i]-&gt;childNodes[0]-&gt;nodeValue.' - '.$spans[$i]-&gt;parentNode-&gt;nodeValue."\n"; } </code></pre> <p>That seems to be what you're after, if I'm not mistaken. This is just off the top of my head, but I think this should output something like:</p> <pre><code>Case Size: - 44mm Case Thickness: - 13mm </code></pre> <p><strong>UPDATE:</strong><br/> Here's a tested solution, that returns the desired result, if I'm not mistaken:</p> <pre><code>$str = "&lt;div id='productDetails' class='tabContent active details'&gt; &lt;span&gt; &lt;b&gt;Case Size:&lt;/b&gt; &lt;/span&gt; 44mm &lt;br&gt; &lt;span&gt; &lt;b&gt;Case Thickness:&lt;/b&gt; &lt;/span&gt; 13mm &lt;br&gt; &lt;span&gt; &lt;b&gt;Water Resistant:&lt;/b&gt; &lt;/span&gt; 5 ATM &lt;br&gt; &lt;span&gt; &lt;b&gt;Brand:&lt;/b&gt; &lt;/span&gt; Fossil &lt;br&gt; &lt;span&gt; &lt;b&gt;Warranty:&lt;/b&gt; &lt;/span&gt; 11-year limited &lt;br&gt; &lt;span&gt; &lt;b&gt;Origin:&lt;/b&gt; &lt;/span&gt; Imported &lt;br&gt; &lt;/div&gt;"; $DOM = new DOMDocument(); $DOM-&gt;loadHTML($str); $txt = implode('',explode("\n",$DOM-&gt;textContent)); preg_match_all('/([a-z0-9].*?\:).*?([0-9a-z]+)/im',$txt,$matches); //or if you don't want to include the colon in your match: preg_match_all('/([a-z0-9][^:]*).*?([0-9a-z]+)/im',$txt,$matches); for($i = 0, $j = count($matches[1]);$i&lt;$j;$i++) { $matches[1][$i] = preg_replace('/\s+/',' ',$matches[1][$i]); $matches[2][$i] = preg_replace('/\s+/',' ',$matches[2][$i]); } $result = array_combine($matches[1],$matches[2]); var_dump($result); //result: array(6) { ["Case Size:"]=&gt; "44mm" ["Case Thickness:"]=&gt; "13mm" ["Water Resistant:"]=&gt; "5" ["ATM Brand:"]=&gt; "Fossil" ["Warranty:"]=&gt; "11" ["year limited Origin:"]=&gt; "Imported" } </code></pre> <p>To insert this in your DB:</p> <pre><code>foreach($result as $key =&gt; $value) { $stmt = $pdo-&gt;prepare('INSERT INTO your_db.your_table (meta_key, meta_value) VALUES (:key, :value)'); $stmt-&gt;execute(array('key' =&gt; $key, 'value' =&gt; $value); } </code></pre> <p><em>Edit</em><br/> To capture the <code>11-year limit</code> substring entirely, you'll need to edit the code above like so:</p> <pre><code>//replace $txt = implode('',explode("\n",$DOM-&gt;textContent));etc... by: $txt = $DOM-&gt;textContent;//leave line-feeds preg_match_all('/([a-z0-9][^:]*)[^a-z0-9]*([a-z0-9][^\n]+)/im',$txt,$matches); for($i = 0, $j = count($matches[1]);$i&lt;$j;$i++) { $matches[1][$i] = preg_replace('/\s+/',' ',$matches[1][$i]); $matches[2][$i] = preg_replace('/\s+/',' ',$matches[2][$i]); } $matches[2] = array_map('trim',$matches[2]);//remove trailing spaces $result = array_combine($matches[1],$matches[2]); var_dump($result); </code></pre> <p>The output is:</p> <pre><code>array(6) { ["Case Size"]=&gt; "44mm" ["Case Thickness"]=&gt; "13mm" ["Water Resistant"]=&gt; "5 ATM" ["Brand"]=&gt; "Fossil" ["Warranty"]=&gt; "11-year limited" ["Origin"]=&gt; "Imported" } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload