Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The HTTP request sent by <code>get_meta_tags</code> does not contain the traditional <code>Accept-Language</code> header that normal web browsers send in order to notify the server which language might be appropriate.</p> <p>It seems like some sites (e.g. Twitter) will use a geographical IP lookup to determine the content language:</p> <p><strong>From my local computer in Sweden</strong></p> <p><em>Koppla direkt upp dig mot det som är viktigast för dig. Följ dina vänner, experter, favoritkändisar, och nyheter.</em></p> <p><strong>From my VPS in London, UK</strong></p> <p><em>Instantly connect to what&#39;s most important to you. Follow your friends, experts, favourite celebrities, and breaking news.</em></p> <p>So, it seems that if you intend to only look at English meta-data you would need to make your script act like an English localised web browser, using <code>Accept-language</code> and possibly other means as well.</p> <p><strong>EDIT</strong>: Here is an example of <a href="https://stackoverflow.com/a/9917109/794003">how to extract the meta tags by first fetching the HTML using cURL</a>. Details on <a href="http://www.php.net/manual/en/function.curl-setopt.php#78046" rel="nofollow noreferrer">setting the cURL headers to include <code>Accept-Language</code></a>.</p> <p><strong>Code example</strong>:</p> <pre><code>&lt;?php function file_get_contents_curl($url) { $ch = curl_init(); $header = array(); $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[] = "Cache-Control: max-age=0"; $header[] = "Connection: keep-alive"; $header[] = "Keep-Alive: 300"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Accept-Language: en-us,en;q=0.5"; curl_setopt($ch, CURLOPT_HTTPHEADER, $header); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); $data = curl_exec($ch); curl_close($ch); return $data; } $html = file_get_contents_curl("http://twitter.com"); //parsing begins here: $doc = new DOMDocument(); @$doc-&gt;loadHTML($html); $nodes = $doc-&gt;getElementsByTagName('title'); //get and display what you need: $title = $nodes-&gt;item(0)-&gt;nodeValue; $metas = $doc-&gt;getElementsByTagName('meta'); for ($i = 0; $i &lt; $metas-&gt;length; $i++) { $meta = $metas-&gt;item($i); if($meta-&gt;getAttribute('name') == 'description') $description = $meta-&gt;getAttribute('content'); if($meta-&gt;getAttribute('name') == 'keywords') $keywords = $meta-&gt;getAttribute('content'); if($meta-&gt;getAttribute('language') == 'language'); $language = $meta-&gt;getAttribute('language'); } echo "Title: $title". '&lt;br/&gt;&lt;br/&gt;'; echo "Description: $description". '&lt;br/&gt;&lt;br/&gt;'; echo "Keywords: $keywords"; ?&gt; </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload