Note that there are some explanatory texts on larger screens.

plurals
  1. POPage scraping and regex give no results when they should
    primarykey
    data
    text
    <p>I have this little wp-based script that scrapes a web page and count the occurrence of 4 keywords using preg_match_all().</p> <p>This is the code for a url that I know contains the keywords:</p> <pre><code>&lt;?php $url ='http://www.leggioggi.it/2013/08/16/i-tre-amici-discutono-di-servizio-sanitario-casuale-e-differenze-nord-sud/'; $response = wp_remote_get($url); $the_body = wp_remote_retrieve_body($response); //echo htmlentities($the_body); $matches = array(); $matches_count = preg_match_all("/gravidanz|preconcezional|prenatal|concepimento/i", $the_body, $matches); var_dump ($matches_count); var_dump ($matches); ?&gt; </code></pre> <p>I'm having some odd problems. On some pages I got zero matches, even though I know that the those pages contain the keywords. I noticed that for those pages, uncommenting the line <code>echo htmlentities($the_body);</code> solves the problem. If I comment it again the oddity is back.</p> <p>My guess is that some caching mechanism is involved.</p> <p>PS: the code is not written on a template file but in a pods framework page.</p> <p>UPDATE: I put a <code>var_dump($the_body);</code> after the htmlentities line. The behavior is interesting. If echo <code>htmlentities($the_body);</code> is commented out the var_dump($the_body); returns an empty string; if the same line is active, var_dump($the_body); returns the whole page html. So I really don't get what's going on!</p> <p>SOLVED: I checked the $response var (my bad not thinking about it) and I discovered that when indeed there was a remote server error, the error was reported in the response returned by wp_remote_get(). This is what I get back:</p> <pre><code>object(WP_Error)#30 (2) { ["errors"]=&gt; array(1) { ["http_request_failed"]=&gt; array(1) { [0]=&gt; string(69) "Operation timed out after 5000 milliseconds with 25692 bytes received" } } ["error_data"]=&gt; array(0) { } } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload