StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
10988668
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2012-06-11T23:15:04.650
FavoriteCount
0
LastActivityDate
2012-06-12T10:00:40.737
LastEditDate
2012-06-12T10:00:40.737
LastEditorUserId
452012
OwnerUserId
363028
ParentId
10985344
PostTypeId
2
Score
2
ViewCount
0
LastEditorDisplayName
text
Body
I assume you can create/find an eml-to-text converter. Then this is fairly close to what you want: <pre><code>find -type f | parallel --tag 'eml-to-text {} | grep -o -n -b -f /tmp/list_of_interesting_words' </code></pre> The output is not formatted 100% how you want it: filename \t line no : byte no (from start of file) : word If you have many interesting words the '-f' in <code>grep</code> is slow to start up, so if you can create an unpacked version of your maildir you can make parallel start <code>grep</code> fewer times: <pre><code>find . -type f | parallel 'eml-to-text {} >/tmp/unpacked/{#}' find /tmp/unpacked -type f | parallel -X grep -H -o -n -b -f /tmp/list_of_interesting_words </code></pre> Since the time complexity of <code>grep -f</code> is worse than linear, you may want to chop up /tmp/list_of_interesting_words into smaller blocks: <pre><code>cat /tmp/list_of_interesting_words | parallel --pipe --block 10k --files > /tmp/blocks_of_words </code></pre> And then process the blocks and the files in parallel: <pre><code>find /tmp/unpacked -type f | parallel -j1 -I ,, parallel --arg-file-sep // -X grep -H -o -n -b -f ,, {} // - :::: /tmp/blocks_of_words </code></pre> This output is formatted like: filename : line no : byte no (from start of file) : word To have it grouped by <code>word</code> instead of filename pipe the result through sort: <pre><code>... | sort -k4 -t: > index.by.word </code></pre> To count the frequency: <pre><code>... | sort -k4 -t: | tee index.by.word | awk 'FS=":" {print $4}' | uniq -c </code></pre> The good news is that this should be rather fast, and I doubt you will be able to achieve the same speed using Python. Edit: grep -F is way faster at starting, and you will want -w for grep (so the word 'gram' does not match 'diagrams'); this will also avoid the temporary files and is probably reasonably fast: <pre><code>find . -type f | parallel --tag 'eml-to-text {} | grep -F -w -o -n -b -f /tmp/list_of_interesting_words' | sort -k3 -t: | tee index.by.word | awk 'FS=":" {print $3}' | uniq -c </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POIdentifying top recurring words from a list of e-mails based on a dictionary of interesting words
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USOle Tange
UserOwnerUserId
1. USOle Tange
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POIdentifying top recurring words from a list of e-mails based on a dictionary of interesting words
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTApproveEditSuggestion
CommentsPostId
1. COAwesome. I had to study this quite a bit before understanding it, many options I had never used before. It does not solve my problem completely but I learned so much from it that I am going to accept the answer. Thanks!
 singulars
 PostPostId
 PO
 UserUserId
 USRobottinosino

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.