StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POCircumvent errors in loop function (used to extract data from Twitter)
text
Body
copied!<p>I created a loop function that extract tweets using the search api with a certain interval (lets say every 5 min.). This function does what it suppose to do: connect to twitter, extracts tweets that contain a certain keyword, and saves them in a csv file. However occasionally (2-3 times a day) the loop is stopped because of one of these two errors:</p> <ul> <li><p>Error in htmlTreeParse(URL, useInternal = TRUE) : error in creating parser for <a href="http://search.twitter.com/search.atom?q=" rel="nofollow">http://search.twitter.com/search.atom?q=</a> 6.95322e-310tst&rpp=100&page=10</p></li> <li><p>Error in UseMethod("xmlNamespaceDefinitions") : no applicable method for 'xmlNamespaceDefinitions' applied to an object of class "NULL"</p></li> </ul> <p>I hope you can help me deal with these errors, by answering some of my questions:</p> <ul> <li>What causes these errors to occur?</li> <li>How can I adjust my code to avoid these errors?</li> <li>How can I 'force' the loop to keep running if it experiences an error (e.g. by using the Try function)?</li> </ul> <p>My function (based on several scripts found online) is as follows:</p> <pre><code> library(XML) # htmlTreeParse twitter.search <- "Keyword" QUERY <- URLencode(twitter.search) # Set time loop (in seconds) d_time = 300 number_of_times = 3000 for(i in 1:number_of_times){ tweets <- NULL tweet.count <- 0 page <- 1 read.more <- TRUE while (read.more) { # construct Twitter search URL URL <- paste('http://search.twitter.com/search.atom?q=',QUERY,'&rpp=100&page=', page, sep='') # fetch remote URL and parse XML <- htmlTreeParse(URL, useInternal=TRUE, error = function(...){}) # Extract list of "entry" nodes entry <- getNodeSet(XML, "//entry") read.more <- (length(entry) > 0) if (read.more) { for (i in 1:length(entry)) { subdoc <- xmlDoc(entry[[i]]) # put entry in separate object to manipulate published <- unlist(xpathApply(subdoc, "//published", xmlValue)) published <- gsub("Z"," ", gsub("T"," ",published) ) # Convert from GMT to central time time.gmt <- as.POSIXct(published,"GMT") local.time <- format(time.gmt, tz="Europe/Amsterdam") title <- unlist(xpathApply(subdoc, "//title", xmlValue)) author <- unlist(xpathApply(subdoc, "//author/name", xmlValue)) tweet <- paste(local.time, " @", author, ": ", title, sep="") entry.frame <- data.frame(tweet, author, local.time, stringsAsFactors=FALSE) tweet.count <- tweet.count + 1 rownames(entry.frame) <- tweet.count tweets <- rbind(tweets, entry.frame) } page <- page + 1 read.more <- (page <= 15) # Seems to be 15 page limit } } names(tweets) # top 15 tweeters #sort(table(tweets$author),decreasing=TRUE)[1:15] write.table(tweets, file=paste("Twitts - ", format(Sys.time(), "%a %b %d %H_%M_%S %Y"), ".csv"), sep = ";") Sys.sleep(d_time) } # end if </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload