Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This should work but it's messy and possible it will break if the site you are scraping happens to change it's markup which will affect the scraping:</p> <pre><code>$sites[0] = 'http://www.traileraddict.com/'; // use this if you want to retrieve more than one page: // $sites[1] = 'http://www.traileraddict.com/trailers/2'; foreach ($sites as $site) { $ch = curl_init($site); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $html = curl_exec($ch); // ok, you have the whole page in the $html variable // now you need to find the common div that contains all the review info // and that appears to be &lt;div class="info"&gt; (I think you could use abstract aswell) $title_start = '&lt;div class="info"&gt;'; $parts = explode($title_start,$html); // now you have an array of the info divs on the page foreach($parts as $part){ // so now you just need to get your title and link from each part $link = explode('&lt;a href="/trailer/', $part); // this means you now have part of the trailer url, you just need to cut off the end which you don't need: $link = explode('"&gt;', $link[1]); // this should give something of the form: // overnight-2012/trailer // so just make an absolute url out of it: $url = 'http://www.traileraddict.com/trailer/'.$link[0]; // now for the title we need to follow a similar process: $title = explode('&lt;h2&gt;', $part); $title = explode('&lt;/h2&gt;', $title[1]); $title = strip_tags($title[0]); // INSERT DB CODE HERE e.g. $db_conn = mysql_connect('$host', '$user', '$password') or die('error'); mysql_select_db('$database', $db_conn) or die(mysql_error()); $sql = "INSERT INTO trailers(url, title) VALUES ('".$url."', '".$title."')" mysql_query($sql) or die(mysql_error()); } </code></pre> <p>That should be it, now you have a variable for the link and title that you can insert into your database.</p> <p><strong>DISCLAIMER</strong></p> <p>I have written this from the top of my head at work so I apologise if it doesn't work straight off the bat but let me know if it doesn't and I will try and help further.</p> <p>ALSO, I am aware this could be done smarter and using less steps but that would involve more thinking on my part and the OP can do this if they wish once they have understood the code I have written, since I would assume it would be a lot more important that they understand what I have done and be able to edit it themselves.</p> <p>Also, I would advise scraping the site at night so as not to burden it with extra traffic and I would suggest asking for the permission of that site aswell since if they catch you they will be able to put an end to your scraping :(</p> <p>To answer your final point - to run this at a set time period you would use a cron job.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload