Note that there are some explanatory texts on larger screens.

plurals
  1. POHow do I avoid IP-based redirections in target page with cURL?
    primarykey
    data
    text
    <p>I am trying to obtain data from a Web and show it to the user <strong>using cURL</strong> and Simple HTML Dom PHP class.</p> <p><strong>Some pages have a redirection</strong> depending on the client's language, I am using a function to determine the final page that is to be scraped. </p> <p>In order to show it as the user would see it, I am using this:</p> <pre><code>$useragent = $_SERVER['HTTP_USER_AGENT']; curl_setopt($ch, CURLOPT_USERAGENT, $useragent); </code></pre> <p>At the moment most of my current users are Spanish speakers, therefore I am temporarily limiting accepted languages so if there is a <strong>language redirect on the target page</strong>, it will show Spanish or English first.</p> <pre><code>$header[] = "Accept-Language: es-es,es;q=0.8,en-us;q=0.5,en;q=0.3"; </code></pre> <p>However, since my server is located in the Netherlands and some pages have an <strong>IP-based redirector</strong>, sometimes the pages redirect to the /nl/ directory, ignoring the language parameters.</p> <p>This happens, for example, with the <strong>www.econsultancy.com</strong> Website.</p> <p>Is it possible to avoid this kind of redirect, maybe using the <strong>client's IP address</strong> in the cURL request?</p> <p>Also, is it possible to use the <strong>client's browser language settings</strong> to make the <em>Accept-Language</em> parameter dynamic?</p> <p>Here's the entire function script:</p> <pre><code>&lt;? function redirector($originalurl) { $ch = curl_init(); $useragent = $_SERVER['HTTP_USER_AGENT']; $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[] = "Cache-Control: max-age=0"; $header[] = "Connection: keep-alive"; $header[] = "Keep-Alive: 300"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Accept-Language: es-es,es;q=0.8,en-us;q=0.5,en;q=0.3"; $header[] = "Pragma: "; curl_setopt($ch, CURLOPT_HTTPHEADER, $header); curl_setopt($ch, CURLOPT_USERAGENT, $useragent); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_URL, $originalurl); $out = curl_exec($ch); $out = str_replace("\r", "", $out); $headers_end = strpos($out, "\n\n"); if( $headers_end !== false ) { $out = substr($out, 0, $headers_end); } $headers = explode("\n", $out); foreach($headers as $header) { if( substr($header, 0, 10) == "Location: " ) { $target = substr($header, 10); $targeturl = $target; } } return $targeturl; } ?&gt; </code></pre> <p>Thanks in advance!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload