StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
19058353
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2013-09-27T19:10:54.833
FavoriteCount
0
LastActivityDate
2013-09-27T20:17:07.777
LastEditDate
2013-09-27T20:17:07.777
LastEditorUserId
764519
OwnerUserId
764519
ParentId
19058352
PostTypeId
2
Score
3
ViewCount
0
LastEditorDisplayName
text
Body
We can rewrite this making it much more compact, eschewing the function. We'll do it in two steps, first we'll create a new column which holds a list (data.table columns can hold almost anything, even embedded data.tables), and then we'll extract these into a new data.table. <pre><code>url_pattern <- "http[^([:blank:]|\\\"|<|&|#\n\r)]+" db[(has_url), urls := str_match_all(text, url_pattern)] urls <- db[(has_url), list(url=unlist(urls)), by=id] </code></pre> Note that we use (has_url) instead of has_url == T, this uses binary indexing which is much faster (although in this case, most of the time is taken up by str_match_all, so it won't make that much difference). Make sure you use the () though, otherwise it won't work. The second line creates db$urls, which is a list of urls. The third line generates a new data.table, which has one entry for each URL, with the ID field linking it back to the forum post it came from. db has 146k rows, db[(has_url),] has 11k rows, and urls has 30k rows (some posts have several urls). Sample output from head(urls): <pre><code>id url 14 http://reganmian.net/blog 44 http://vg.no 59 http://koran.co.id </code></pre> Update, simple reproducible example Let's first generate some data <pre><code>texts = c("Stian fruit:apple, fruit:banana and fruit:pear", "Peter fruit:apple", "fruit:banana is delicious", "I don't agree") DT <- data.table(text = texts, id=1:length(texts)) DT text id 1: Stian fruit:apple, fruit:banana and fruit:pear 1 2: Peter fruit:apple 2 3: fruit:banana is delicious 3 4: I don't agree 4 </code></pre> We want to grab all the "fruits" from the text column (each row might have one, several or no fruits). We first use str_match_all to put a list of individual fruits into a new column. <pre><code>pattern <- "fruit:\\S*" DT[, fruit_list := str_match_all(text, pattern)] </code></pre> Now the fruit field looks like this: <pre><code>> DT[1]$fruit_list [[1]] [,1] [1,] "fruit:apple," [2,] "fruit:banana" [3,] "fruit:pear" </code></pre> Now we want to extract the fruits into a new table, with one row per fruit, keeping the link back to the ID <pre><code>fruits <- DT[, list(fruit=unlist(fruit_list)), by=id] </code></pre> And the result <pre><code>> fruits id fruit 1: 1 fruit:apple, 2: 1 fruit:banana 3: 1 fruit:pear 4: 2 fruit:apple 5: 3 fruit:banana </code></pre> (thank you to Matthew Dowle and Ricardo Saporta on data.table-help mailing list)
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POSplit parts of strings into a list column and then make a vector column
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USStian Håklev
UserOwnerUserId
1. USStian Håklev
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.