StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPython memory usage? loading large dictionaries in memory
primarykey
Id
2211965
data
AcceptedAnswerId
2212006
AnswerCount
4
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2010-02-06T03:56:56.570
FavoriteCount
23
LastActivityDate
2016-09-29T09:43:00.813
LastEditDate
2010-02-06T06:48:30.080
LastEditorUserId
252253
OwnerUserId
252253
ParentId
0
PostTypeId
1
Score
29
ViewCount
27121
LastEditorDisplayName
text
Body
hey all, I have a file on disk that's only 168MB. It's just a comma separated list of word,id the word can be 1-5 words long. There's 6.5 million lines. I created a dictionary in python to load this up into memory so I can search incoming text against that list of words. When python loads it up into memory it shows 1.3 GB's of RAM space used. Any idea why that is? so let's say my word file looks like this... <pre><code>1,word1 2,word2 3,word3 </code></pre> then add 6.5 million to that I then loop through that file and create a dictionary (python 2.6.1) <pre><code> def load_term_cache(): """will load the term cache from our cached file instead of hitting mysql. If it didn't preload into memory it would be 20+ million queries per process""" global cached_terms dumpfile = os.path.join(os.getenv("MY_PATH"), 'datafiles', 'baseterms.txt') f = open(dumpfile) cache = csv.reader(f) for term_id, term in cache: cached_terms[term] = term_id f.close() </code></pre> Just doing that blows up the memory. I view activity monitor and it pegs the memory to all available up to around 1.5GB of RAM On my laptop it just starts to swap. Any ideas how to most efficiently store key/value pairs in memory with python? thanks UPDATE: I tried to use the anydb module and after 4.4 million records it just dies the floating point number is the elapsed seconds since I tried to load it <pre><code>56.95 3400018 60.12 3600019 63.27 3800020 66.43 4000021 69.59 4200022 72.75 4400023 83.42 4600024 168.61 4800025 338.57 </code></pre> you can see it was running great. 200,000 rows every few seconds inserted until I hit a wall and time doubled. <pre><code> import anydbm i=0 mark=0 starttime = time.time() dbfile = os.path.join(os.getenv("MY_PATH"), 'datafiles', 'baseterms') db = anydbm.open(dbfile, 'c') #load from existing baseterm file termfile = os.path.join(os.getenv("MY_PATH"), 'datafiles', 'baseterms.txt.LARGE') for line in open(termfile): i += 1 pieces = line.split(',') db[str(pieces[1])] = str(pieces[0]) if i > mark: print i print round(time.time() - starttime, 2) mark = i + 200000 db.close() </code></pre>
Tags
<python><memory>
Title
Python memory usage? loading large dictionaries in memory
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USJames
UserOwnerUserId
1. USJames
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
3. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POPython memory usage? loading large dictionaries in memory
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POPython memory usage? loading large dictionaries in memory
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POPython memory usage? loading large dictionaries in memory
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.