StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
3054644
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
9
CommunityOwnedDate
CreationDate
2010-06-16T15:19:32.897
FavoriteCount
0
LastActivityDate
2010-06-16T15:47:30.087
LastEditDate
2010-06-16T15:47:30.087
LastEditorUserId
163053
OwnerUserId
163053
ParentId
3054612
PostTypeId
2
Score
21
ViewCount
0
LastEditorDisplayName
text
Body
In general, you should try to use a vectorized function to begin with. Using <code>strsplit</code> will frequently require some kind of iteration afterwards (which will be slower), so try to avoid it if possible. In your example, you should use <code>nchar</code> instead: <pre><code>> nchar(words) [1] 1 5 5 3 </code></pre> More generally, take advantage of the fact that <code>strsplit</code> returns a list and use <code>lapply</code>: <pre><code>> as.numeric(lapply(strsplit(words,""), length)) [1] 1 5 5 3 </code></pre> Or else use an <code>l*ply</code> family function from <code>plyr</code>. For instance: <pre><code>> laply(strsplit(words,""), length) [1] 1 5 5 3 </code></pre> Edit: In honor of <a href="http://en.wikipedia.org/wiki/Bloomsday" rel="noreferrer">Bloomsday</a>, I decided to test the performance of these approaches using Joyce's Ulysses: <pre><code>joyce <- readLines("http://www.gutenberg.org/files/4300/4300-8.txt") joyce <- unlist(strsplit(joyce, " ")) </code></pre> Now that I have all the words, we can do our counts: <pre><code>> # original version > system.time(print(summary(sapply(joyce, function (x) length(strsplit(x,"")[[1]]))))) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 3.000 4.000 4.666 6.000 69.000 user system elapsed 2.65 0.03 2.73 > # vectorized function > system.time(print(summary(nchar(joyce)))) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 3.000 4.000 4.666 6.000 69.000 user system elapsed 0.05 0.00 0.04 > # with lapply > system.time(print(summary(as.numeric(lapply(strsplit(joyce,""), length))))) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 3.000 4.000 4.666 6.000 69.000 user system elapsed 0.8 0.0 0.8 > # with laply (from plyr) > system.time(print(summary(laply(strsplit(joyce,""), length)))) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 3.000 4.000 4.666 6.000 69.000 user system elapsed 17.20 0.05 17.30 > # with ldply (from plyr) > system.time(print(summary(ldply(strsplit(joyce,""), length)))) V1 Min. : 0.000 1st Qu.: 3.000 Median : 4.000 Mean : 4.666 3rd Qu.: 6.000 Max. :69.000 user system elapsed 7.97 0.00 8.03 </code></pre> The vectorized function and <code>lapply</code> are considerably faster than the original <code>sapply</code> version. All solutions return the same answer (as seen by the summary output). Apparently the latest version of <code>plyr</code> is faster (this is using a slightly older version).
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow to vectorize R strsplit?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USShane
UserOwnerUserId
1. USShane
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POHow to vectorize R strsplit?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.