StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POpython top N word count, why multiprocess slower then single process
primarykey
Id
18300785
data
AcceptedAnswerId
18300956
AnswerCount
1
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-08-18T15:36:52.580
FavoriteCount
2
LastActivityDate
2013-08-18T20:30:33.417
LastEditDate
LastEditorUserId
0
OwnerUserId
487832
ParentId
0
PostTypeId
1
Score
3
ViewCount
991
LastEditorDisplayName
text
Body
I'm doing a frequency word count using python, the single process version: <pre><code>#coding=utf-8 import string import time from collections import Counter starttime = time.clock() origin = open("document.txt", 'r').read().lower() for_split = [',','\n','\t','\'','.','\"','!','?','-', '~'] #the words below will be ignoered when counting ignored = ['the', 'and', 'i', 'to', 'of', 'a', 'in', 'was', 'that', 'had', 'he', 'you', 'his','my', 'it', 'as', 'with', 'her', 'for', 'on'] i=0 for ch in for_split: origin = string.replace(origin, ch, ' ') words = string.split(origin) result = Counter(words).most_common(40) for word, frequency in result: if not word in ignored and i < 10: print "%s : %d" % (word, frequency) i = i+1 print time.clock() - starttime </code></pre> then the multiprocessing version looks like: <pre><code>#coding=utf-8 import time import multiprocessing from collections import Counter for_split = [',','\n','\t','\'','.','\"','!','?','-', '~'] ignored = ['the', 'and', 'i', 'to', 'of', 'a', 'in', 'was', 'that', 'had', 'he', 'you', 'his','my', 'it', 'as', 'with', 'her', 'for', 'on'] result_list = [] def worker(substr): result = Counter(substr) return result def log_result(result): result_list.append(result) def main(): pool = multiprocessing.Pool(processes=5) origin = open("document.txt", 'r').read().lower() for ch in for_split: origin = origin.replace(ch, ' ') words = origin.split() step = len(words)/4 substrs = [words[pos : pos+step] for pos in range(0, len(words), step)] result = Counter() for substr in substrs: pool.apply_async(worker, args=(substr,), callback = log_result) pool.close() pool.join() result = Counter() for item in result_list: result = result + item result = result.most_common(40) i=0 for word, frequency in result: if not word in ignored and i < 10: print "%s : %d" % (word, frequency) i = i+1 if __name__ == "__main__": starttime = time.clock() main() print time.clock() - starttime </code></pre> the "document.txt" is about 22M, my laptop has to cores, 2G memory, the result of first version is 3.27s, and the second one is 8.15s, I've changed num of processes(pool = multiprocessing.Pool(processes=5)), from 2 to 10, the results remain almost the same, why is that, how can I make this program run faser than the single process version?
Tags
<python><multiprocessing>
Title
python top N word count, why multiprocess slower then single process
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USnzomkxia
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POpython top N word count, why multiprocess slower then single process
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POpython top N word count, why multiprocess slower then single process
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POpython top N word count, why multiprocess slower then single process
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COI found in testing a 1Mb file that a counter using `defaultdict` was faster than `Counter`, even the compiled `Counter` in `Python3.2`. See http://stackoverflow.com/questions/18343472/efficient-way-to-count-the-element-in-a-dictionary-in-python-using-a-loop/18349337#18349337
 singulars
 PostPostId
 POpython top N word count, why multiprocess slower then single process
 UserUserId
 UShpaulj

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.