Note that there are some explanatory texts on larger screens.

plurals
  1. POFeedparser - retrieve old messages from Google Reader
    primarykey
    data
    text
    <p>I'm using the feedparser library in python to retrieve news from a local newspaper (my intent is to do Natural Language Processing over this corpus) and would like to be able to retrieve many past entries from the RSS feed.</p> <p>I'm not very acquainted with the technical issues of RSS, but I think this should be possible (I can see that, e.g., Google Reader and Feedly can do this ''on demand'' as I move the scrollbar). </p> <p>When I do the following:</p> <pre><code>import feedparser url = 'http://feeds.folha.uol.com.br/folha/emcimadahora/rss091.xml' feed = feedparser.parse(url) for post in feed.entries: title = post.title </code></pre> <p>I get only a dozen entries or so. I was thinking about hundreds. Maybe all entries in the last month, if possible. Is it possible to do this only with feedparser?</p> <p>I intend to get from the rss feed only the link to the news item and parse the full page with BeautifulSoup to obtain the text I want. An alternate solution would be a crawler that follows all local links in the page to get a lot of news items, but I want to avoid that for now. </p> <p>--</p> <p>One solution that appeared is to use the Google Reader RSS cache:</p> <p><a href="http://www.google.com/reader/atom/feed/http://feeds.folha.uol.com.br/folha/emcimadahora/rss091.xml?n=1000" rel="noreferrer">http://www.google.com/reader/atom/feed/http://feeds.folha.uol.com.br/folha/emcimadahora/rss091.xml?n=1000</a></p> <p>But to access this I must be logged in to Google Reader. Anyone knows how I do that from python? (I really don't know a thing about web, I usually only mess with numerical calculus). </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload