Note that there are some explanatory texts on larger screens.

plurals
  1. POPHP Screen Scraping and Sessions
    primarykey
    data
    text
    <p>Ok still new to the screen scraping thing.</p> <p>I've managed to log into the site I need but now how do I redirect to another page? After I login I'm trying to do another GET request on the page that I need but it has a redirect on it that takes me back to the login page.</p> <p>So I'm thinking the SESSION variables are not being passed, how can I over come this?</p> <p>Problem:</p> <p>Even if I post the 2nd page URL it still redirects me back to the login page, unless I'm logged in already, but the screen scrape code is not allowing the SESSION data to be passed?</p> <p>I found this code from <a href="https://stackoverflow.com/questions/26947/how-to-implement-a-web-scraper-in-php">another screen scraper question here @stack</a></p> <pre><code>class Curl { public $cookieJar = ""; public function __construct($cookieJarFile = 'cookies.txt') { $this-&gt;cookieJar = $cookieJarFile; } function setup() { $header = array(); $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[] = "Cache-Control: max-age=0"; $header[] = "Connection: keep-alive"; $header[] = "Keep-Alive: 300"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Accept-Language: en-us,en;q=0.5"; $header[] = "Pragma: "; // browsers keep this blank. curl_setopt($this-&gt;curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7'); curl_setopt($this-&gt;curl, CURLOPT_HTTPHEADER, $header); curl_setopt($this-&gt;curl, CURLOPT_COOKIEJAR, $cookieJar); curl_setopt($this-&gt;curl, CURLOPT_COOKIEFILE, $cookieJar); curl_setopt($this-&gt;curl, CURLOPT_AUTOREFERER, true); curl_setopt($this-&gt;curl, CURLOPT_FOLLOWLOCATION, true); curl_setopt($this-&gt;curl, CURLOPT_RETURNTRANSFER, true); } function get($url) { $this-&gt;curl = curl_init($url); $this-&gt;setup(); return $this-&gt;request(); } function getAll($reg, $str) { preg_match_all($reg, $str, $matches); return $matches[1]; } function postForm($url, $fields, $referer = '') { $this-&gt;curl = curl_init($url); $this-&gt;setup(); curl_setopt($this-&gt;curl, CURLOPT_URL, $url); curl_setopt($this-&gt;curl, CURLOPT_POST, 1); curl_setopt($this-&gt;curl, CURLOPT_REFERER, $referer); curl_setopt($this-&gt;curl, CURLOPT_POSTFIELDS, $fields); return $this-&gt;request(); } function getInfo($info) { $info = ($info == 'lasturl') ? curl_getinfo($this-&gt;curl, CURLINFO_EFFECTIVE_URL) : curl_getinfo($this-&gt;curl, $info); return $info; } function request() { return curl_exec($this-&gt;curl); } } </code></pre> <p>Calling the class</p> <pre><code>include('/var/www/html/curl.php'); $curl = new Curl(); $url = "here.com"; $newURL = "here.com/newpage.php"; $fields = "usr=user1&amp;pass=PassWord"; // Calling URL $referer = "http://here.com/index.php"; $html = $curl-&gt;postForm($url, $fields, $referer); $html = $curl-&gt;get($newURL); echo $html; // takes me back to $url instead of $newURL </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload