StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
6852427
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
10
CommunityOwnedDate
CreationDate
2011-07-27T23:24:46.443
FavoriteCount
0
LastActivityDate
2014-03-27T08:12:47.263
LastEditDate
2014-03-27T08:12:47.263
LastEditorUserId
648658
OwnerUserId
86684
ParentId
1955505
PostTypeId
2
Score
221
ViewCount
0
LastEditorDisplayName
text
Body
To quickly extract the values for a particular key, I personally like to use "grep -o", which only returns the regex's match. For example, to get the "text" field from tweets, something like: <pre><code>grep -Po '"text":.*?[^\\]",' tweets.json </code></pre> This regex is more robust than you might think; for example, it deals fine with strings having embedded commas and escaped quotes inside them. I think with a little more work you could make one that is actually guaranteed to extract the value, if it's atomic. (If it has nesting, then a regex can't do it of course.) And to further clean (albeit keeping the string's original escaping) you can use something like: <code>| perl -pe 's/"text"://; s/^"//; s/",$//'</code>. (I did this for <a href="https://gist.github.com/1024217" rel="noreferrer">this analysis</a>.) To all the haters who insist you should use a real JSON parser -- yes, that is essential for correctness, but <ol> <li>To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting.</li> <li><code>grep -o</code> is orders of magnitude faster than the Python standard <code>json</code> library, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just because <code>json</code> is slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)</li> </ol> To write maintainable code, I always use a real parsing library. I haven't tried <a href="https://github.com/micha/jsawk" rel="noreferrer">jsawk</a>, but if it works well, that would address point #1. One last, wackier, solution: I wrote a script that uses Python <code>json</code> and extracts the keys you want, into tab-separated columns; then I pipe through a wrapper around <code>awk</code> that allows named access to columns. <a href="https://github.com/brendano/tsvutils" rel="noreferrer">In here: the json2tsv and tsvawk scripts</a>. So for this example it would be: <pre><code>json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}' </code></pre> This approach doesn't address #2, is more inefficient than a single Python script, and it's a little brittle: it forces normalization of newlines and tabs in string values, to play nice with awk's field/record-delimited view of the world. But it does let you stay on the command line, with more correctness than <code>grep -o</code>.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POParsing JSON with Unix tools
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USJens
UserOwnerUserId
1. USBrendan OConnor
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.