Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I use R (Rcurl/XML packages ?!) to scrape this webpage?
    text
    copied!<p>I have a (somewhat complex) web scraping challenge that I wish to accomplish and would love for some direction (to whatever level you feel like sharing) here goes:</p> <p>I would like to go through all the "species pages" present in this link:</p> <p><a href="http://gtrnadb.ucsc.edu/" rel="nofollow noreferrer">http://gtrnadb.ucsc.edu/</a></p> <p>So for each of them I will go to:</p> <ol> <li>The species page link (for example: <a href="http://gtrnadb.ucsc.edu/Aero_pern/" rel="nofollow noreferrer">http://gtrnadb.ucsc.edu/Aero_pern/</a>)</li> <li>And then to the "Secondary Structures" page link (for example: <a href="http://gtrnadb.ucsc.edu/Aero_pern/Aero_pern-structs.html" rel="nofollow noreferrer">http://gtrnadb.ucsc.edu/Aero_pern/Aero_pern-structs.html</a>)</li> </ol> <p>Inside that link I wish to scrap the data in the page so that I will have a long list containing this data (for example):</p> <pre><code>chr.trna3 (1-77) Length: 77 bp Type: Ala Anticodon: CGC at 35-37 (35-37) Score: 93.45 Seq: GGGCCGGTAGCTCAGCCtGGAAGAGCGCCGCCCTCGCACGGCGGAGGcCCCGGGTTCAAATCCCGGCCGGTCCACCA Str: &gt;&gt;&gt;&gt;&gt;&gt;&gt;..&gt;&gt;&gt;&gt;.........&lt;&lt;&lt;&lt;.&gt;&gt;&gt;&gt;&gt;.......&lt;&lt;&lt;&lt;&lt;.....&gt;&gt;&gt;&gt;&gt;.......&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;.... </code></pre> <p>Where each line will have it's own list (inside the list for each "trna" inside the list for each animal)</p> <p>I remember coming across the packages Rcurl and XML (in R) that can allow for such a task. But I don't know how to use them. So what I would love to have is: 1. Some suggestion on how to build such a code. 2. And recommendation for how to learn the knowledge needed for performing such a task.</p> <p>Thanks for any help,</p> <p>Tal</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload