Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You could look for large portions of the document that have less markup and less vertical whitespace. Download the page's source and strip out any markup using <code>strip_tags()</code>. Then you can search for, say, five consecutive sentences using regular expressions.</p> <p>Here's an example script. It uses a class not included (an abstraction of curl_multi functions), but that class isn't really relevant for your question.</p> <pre class="lang-php prettyprint-override"><code>&lt;?php require_once("./../MultipleRequester.php"); $requester = new MultipleRequester(); $requester-&gt;addGetRequest( 'test', 'http://www.businessweek.com/news/2011-08-24/gold-tumbles-most-since-march-2008-as-demand-for-haven-wanes.html'); $requester-&gt;execute(); $content = $requester-&gt;getContent('test'); $plainText = strip_tags( $content ); $search = preg_match('/(\h{0,2}\v{0,2}\h{0,2}[A-Z]{1}[A-z0-9 ,\'")(.$]{10,1000}\.){2,5}/', $plainText, $matches); if( $search ) print trim($matches[0]); else print "Could not extract anything."; print "\n\n"; ?&gt; </code></pre> <p>This prints: </p> <blockquote> <p>The dollar rose against a basket of six major currencies amid speculation about whether Federal Reserve Chairman Ben S. Bernanke will say this week that the central bank is willing to provide more stimulus to the economy. Central bankers meet this week in Jackson Hole, Wyoming, to address the U.S. recovery.</p> </blockquote> <p>You may still have trouble with sites that mark up their content a lot. You might want to make the regular expression more lenient, particularly towards whitespace.</p> <p>The regexp is a little messy, but you can tune it or write your own.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload