Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This is not elegant and probably not very robust, but it should work for this case.</p> <p>The first 4 lines after the <code>require</code> calls retrieve the URL and extract the text. The <code>grep</code> returns a <code>TRUE</code> or <code>FALSE</code> depending on whether the string we are looking for has been found, <code>which</code> converts that to an index in the list. We increment this by 1 because if you look at <code>cleantext</code> you will see that the date updated is the next element in the list after the string "Date Updated". So the <code>+1</code> gets us the element after "Date Updated". The <code>gsub</code> lines just clean up the strings.</p> <p>The problem with the "P27M" is that it is not anchored to anything - it is just free text floating about in an arbitrary position. If you are sure that the price is always going to be a "P" followed by 1 to 3 digits, followed by an "M" AND that you only have one such string in the page, then a grep or regex would work, otherwise tough to get.</p> <pre><code>require(XML) require(RCurl) myurl &lt;- 'http://www.sulit.com.ph/index.php/view+classifieds/id/3991016/BEAUTIFUL+AYALA+HEIGHTS+QC+HOUSE+FOR+SALE' mytext &lt;- getURL(myurl) myhtml &lt;- htmlTreeParse(mytext, useInternal = TRUE) cleantext &lt;- xpathApply(myhtml, "//body//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)]", xmlValue) cleantext &lt;- cleantext[!cleantext %in% " "] cleantext &lt;- gsub(" "," ", cleantext) date_updated &lt;- cleantext[[which(grepl("Date Updated",cleantext))+1]] date_posted &lt;- cleantext[[which(grepl("Date Posted",cleantext))+1]] date_posted &lt;- gsub("^[[:space:]]+|[[:space:]]+$","",date_posted) date_updated &lt;- gsub("^[[:space:]]+|[[:space:]]+$","",date_updated) print(date_updated) print(date_posted) </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload