StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POwhen does Python allocate new memory for identical strings?
primarykey
Id
2123925
data
AcceptedAnswerId
2124011
AnswerCount
4
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2010-01-23T17:08:14.030
FavoriteCount
15
LastActivityDate
2012-09-17T21:46:13.457
LastEditDate
2010-01-25T17:37:40.110
LastEditorUserId
86643
OwnerUserId
86643
ParentId
0
PostTypeId
1
Score
33
ViewCount
6133
LastEditorDisplayName
text
Body
Two Python strings with the same characters, a == b, may share memory, id(a) == id(b), or may be in memory twice, id(a) != id(b). Try <pre><code>ab = "ab" print id( ab ), id( "a"+"b" ) </code></pre> Here Python recognizes that the newly created "a"+"b" is the same as the "ab" already in memory -- not bad. Now consider an N-long list of state names [ "Arizona", "Alaska", "Alaska", "California" ... ] (N ~ 500000 in my case). I see 50 different id() s ⇒ each string "Arizona" ... is stored only once, fine. BUT write the list to disk and read it back in again: the "same" list now has N different id() s, way more memory, see below. How come -- can anyone explain Python string memory allocation ? <pre><code>""" when does Python allocate new memory for identical strings ? ab = "ab" print id( ab ), id( "a"+"b" ) # same ! list of N names from 50 states: 50 ids, mem ~ 4N + 50S, each string once but list > file > mem again: N ids, mem ~ N * (4 + S) """ from __future__ import division from collections import defaultdict from copy import copy import cPickle import random import sys states = dict( AL = "Alabama", AK = "Alaska", AZ = "Arizona", AR = "Arkansas", CA = "California", CO = "Colorado", CT = "Connecticut", DE = "Delaware", FL = "Florida", GA = "Georgia", ) def nid(alist): """ nr distinct ids """ return "%d ids %d pickle len" % ( len( set( map( id, alist ))), len( cPickle.dumps( alist, 0 ))) # rough est ? # cf http://stackoverflow.com/questions/2117255/python-deep-getsizeof-list-with-contents N = 10000 exec( "\n".join( sys.argv[1:] )) # var=val ... random.seed(1) # big list of random names of states -- names = [] for j in xrange(N): name = copy( random.choice( states.values() )) names.append(name) print "%d strings in mem: %s" % (N, nid(names) ) # 10 ids, even with copy() # list to a file, back again -- each string is allocated anew joinsplit = "\n".join(names).split() # same as > file > mem again assert joinsplit == names print "%d strings from a file: %s" % (N, nid(joinsplit) ) # 10000 strings in mem: 10 ids 42149 pickle len # 10000 strings from a file: 10000 ids 188080 pickle len # Python 2.6.4 mac ppc </code></pre> Added 25jan: There are two kinds of strings in Python memory (or any program's): <ul> <li>Ustrings, in a Ucache of unique strings: these save memory, and make a == b fast if both are in Ucache</li> <li>Ostrings, the others, which may be stored any number of times.</li> </ul> <code>intern(astring)</code> puts astring in the Ucache (Alex +1); other than that we know nothing at all about how Python moves Ostrings to the Ucache -- how did "a"+"b" get in, after "ab" ? ("Strings from files" is meaningless -- there's no way of knowing.) In short, Ucaches (there may be several) remain murky. A historical footnote: <a href="http://en.wikipedia.org/wiki/SPITBOL_compiler" rel="noreferrer">SPITBOL</a> uniquified all strings ca. 1970.
Tags
<python><memory><memory-management>
Title
when does Python allocate new memory for identical strings?
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USdenis
UserOwnerUserId
1. USdenis
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
3. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POwhen does Python allocate new memory for identical strings?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POwhen does Python allocate new memory for identical strings?
 UserUserId
 USezod
 VoteTypeVoteTypeId
 VTFavorite
3. VO
 singulars
 PostPostId
 POwhen does Python allocate new memory for identical strings?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.