Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I will share with you my code which I have used to collect email addresses from certain website. You can modify it to fit your needs. There were some problems with relative URL's there. And I do not use CURL here.</p> <pre><code>&lt;?php error_reporting(E_ALL); $home = 'http://kharkov-reklama.com.ua/jborudovanie/'; $writer = new RWriter('C:\parser_13-09-2012_05.txt'); set_time_limit(0); ini_set('memory_limit', '512M'); function scan_page($home, $full_url, &amp;$writer) { static $done = array(); $done[] = $full_url; // Scan only internal links. Do not scan all the internet!)) if (strpos($full_url, $home) === false) { return false; } $html = @file_get_contents($full_url); if (empty($html) || (strpos($html, '&lt;body') === false &amp;&amp; strpos($html, '&lt;BODY') === false)) { return false; } echo $full_url . '&lt;br /&gt;'; preg_match_all('/([A-Za-z0-9_\-]+\.)*[A-Za-z0-9_\-]+@([A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9]\.)+[A-Za-z]{2,4}/', $html, $emails); if (!empty($emails) &amp;&amp; is_array($emails)) { foreach ($emails as $email_group) { if (is_array($email_group)) { foreach ($email_group as $email) { if (filter_var($email, FILTER_VALIDATE_EMAIL)) { $writer-&gt;write($email); } } } } } $regexp = "&lt;a\s[^&gt;]*href=(\"??)([^\" &gt;]*?)\\1[^&gt;]*&gt;(.*)&lt;\/a&gt;"; preg_match_all("/$regexp/siU", $html, $matches, PREG_SET_ORDER); if (is_array($matches)) { foreach($matches as $match) { if (!empty($match[2]) &amp;&amp; is_scalar($match[2])) { $url = $match[2]; if (!filter_var($url, FILTER_VALIDATE_URL)) { $url = $home . $url; } if (!in_array($url, $done)) { scan_page($home, $url, $writer); } } } } } class RWriter { private $_fh = null; private $_written = array(); public function __construct($fname) { $this-&gt;_fh = fopen($fname, 'w+'); } public function write($line) { if (in_array($line, $this-&gt;_written)) { return; } $this-&gt;_written[] = $line; echo $line . '&lt;br /&gt;'; fwrite($this-&gt;_fh, "{$line}\r\n"); } public function __destruct() { fclose($this-&gt;_fh); } } scan_page($home, 'http://kharkov-reklama.com.ua/jborudovanie/', $writer); </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload