Note that there are some explanatory texts on larger screens.

plurals
  1. POPinterest style / PHP Image Scraper Crashing Server
    primarykey
    data
    text
    <p>I must have a memory leak or something that is just eating memory on my server somewhere in this class. For example if I file_get_contents(<a href="http://www.theknot.com" rel="nofollow">http://www.theknot.com</a>) it will not be able to connect to the server tho its not down, or mysql closes the connection, or in extreme situations completed knock out the server for a mount of time that we can not even get a ping. I know its somewhere within the preg_match_all if block, but I dont know what would get run away to what I can only assume is a lot of processing on the regex match due to whatever is within the content that is fetched from the remote site. Any ideas?</p> <pre><code>&lt;?php class Utils_Linkpreview extends Zend_Db_table { public function getPreviews($url) { $link = $url; $width = 200; $height = 200; $regex = '/&lt;img[^\/]+src="([^"]+\.(jpe?g|gif|png))/'; /// $regex = '/&lt;img[^\/]+src="([^"]+)/'; $thumbs = false; try { $data = file_get_contents($link); } catch (Exception $e) { print "Caught exception when attempting to find images: ". $e-&gt;getMessage(). "\n"; } if (($data) &amp;&amp; preg_match_all($regex, $data, $m, PREG_PATTERN_ORDER)) { if (isset($m[1]) &amp;&amp; is_array($m[1])) { $thumbs = array(); foreach (array_unique($m[1]) as $url) { if ( ($url = $this-&gt;rel2abs($url, $link)) &amp;&amp; ($i = @getimagesize($url)) &amp;&amp; $i[0] &gt;= ($width-10) &amp;&amp; $i[1] &gt;= ($height-10) ) { $thumbs[] = $url; } } } } return $thumbs; } private function rel2abs($url, $host) { if (substr($url, 0, 4) == 'http') { return $url; } else { $hparts = explode('/', $host); if ($url[0] == '/') { return implode('/', array_slice($hparts, 0, 3)) . $url; } else if ($url[0] != '.') { array_pop($hparts); return implode('/', $hparts) . '/' . $url; } } } } ?&gt; </code></pre> <p><strong>EDIT</strong> - Amal Murali's comment pointed me in a better direction using PHP's DomDocument. Thanks bud!</p> <p>Here is the result:</p> <pre><code>public function getPreviews($url) { $link = $url; $thumbs = false; try { $html = file_get_contents($link); } catch (Exception $e) { print "Caught exception when attempting to find images: ". $e-&gt;getMessage(). "\n"; } $dom = new DOMDocument(); @$dom-&gt;loadHTML($html); $x = new DOMXPath($dom); foreach($x-&gt;query("//img[@width &gt; 200 or substring-before(@width, 'px') &gt; 200 or @height &gt; 200 or substring-before(@height, 'px') &gt; 200]") as $node) { $url = $node-&gt;getAttribute("src"); $thumbs[] = $this-&gt;rel2abs($url, $link); } return $thumbs; } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload