StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSoftware transactional memory with a big, shared piece of data
primarykey
Id
9547325
data
AcceptedAnswerId
9646050
AnswerCount
3
ClosedDate
CommentCount
10
CommunityOwnedDate
CreationDate
2012-03-03T15:37:33.143
FavoriteCount
6
LastActivityDate
2012-03-10T20:52:43.943
LastEditDate
2017-05-23T10:27:28.130
LastEditorUserId
-1
OwnerUserId
453616
ParentId
0
PostTypeId
1
Score
10
ViewCount
1415
LastEditorDisplayName
text
Body
<h2>The original question</h2> I'm new to STM. One thing I'd like to do in Haskell involves a big piece of data, and lots of lightweight threads reading and writing to small parts of said big piece of data. The locations read from and written to can be considered essentially random and small. STM seems great for this, but I have some questions regarding how to attack the problem. I see several possible approaches, each with some drawbacks, and some that seem plain stupid. Some comments on these, or other ideas for alternatives, would be much appreciated. Let's for simplicity assume that the big piece of data is a <code>Data.Vector a</code>, where the elements are themselves small. <ol> <li>The entire piece of data as one <code>TVar (Vector a)</code>. I guess this will cause lots of copying of the huge piece of data, as STM will think each individual write has possibly affected the entire shared data. Surely there's no magic going on where STM identifies that the reads and writes are very localized, and that consistency across the big piece of data is not required?</li> <li>A huge amount of <code>TVar a</code>s, essentially one for each element, giving fully localized STM, but essentially duplicating the entire <code>Vector a</code>. This seems stupid.</li> <li>A compromise between 1 and 2 by segmenting the data so that I have a reasonable number of <code>TVar (Vector a)</code>s corresponding to subvectors of the data. I feel this solution depends too much on heuristics, like how big the segments should be.</li> <li>Messaging. Instead of each worker reading and writing to the data using STM, each writes messages with requests for data to be read or chunks of data to be written through some STM mechanism, such as a <code>TChan</code>. A special thread receives these messages, passing out data requested through another <code>TChan</code>, or taking received data and writing it to the shared data structure. This solution seems free of the problems plaguing solutions 1-3, but it also seems to me that it essentially gives up on using the niceties of STM to keep the data consistent. Rather, it's just message passing. Granted, the message passing part is implemented with STM, but my real problem is solved with message passing. STM seems so great, message passing is so... meh.</li> </ol> Am I thinking about this correctly? Does anybody have any hints or other suggestions? Keep in mind that I have no experience with STM, nor have I tried the above solutions yet. I'll get out of my armchair, but sometimes it can be good to think about these things before trying anything. Addendum: A fifth approach comes from Nathan Howell and uses <code>TArray</code>. It sounds like what I want, but the <a href="http://hackage.haskell.org/packages/archive/stm/2.3/doc/html/Control-Concurrent-STM-TArray.html" rel="nofollow noreferrer">documentation</a> says: <blockquote> It is currently implemented as Array ix (TVar e), but it may be replaced by a more efficient implementation in the future (the interface will remain the same, however). </blockquote> I take this to mean that <code>TArray</code> is just my approach number 2 in nicer clothes. The docs hinting about "a more efficient" implementation is interesting, as it hints of there actually being a nicer approach. <h2>Follow-up on Vagif Verdi's answer</h2> Vagif Verdi's <a href="https://stackoverflow.com/a/9549061/453616">answer</a> is very interesting, so I made a <a href="http://hpaste.org/65099" rel="nofollow noreferrer">little experiment</a> to try it out. I don't have time to slim down the code right now, so those interested in this will have to bear with me on the code not just containing the bare essentials. I decided to use a mutable vector with 10^8 <code>Int</code>s as the "big shared piece of data", and let the multiple readers/writers correspond to threads taking into on a network socket. Note that <a href="http://hpaste.org/65099" rel="nofollow noreferrer">the code</a> does not even read or write to the shared piece of data just. It's simply there, and each thread holds a <code>TVar</code> to it. So what happens? I run the program, and immediately it takes up about 780 MB of RAM, which is to be expected (it's roughly what the 10^8 <code>Int</code>s need). Now, if I use netcat to connect a few clients and write a bit of text, which the program is just supposed to print out and not even write to the shared data, the process' CPU usage spikes to 100% for upwards of a second! There's a noticeable lag before the text is displayed. On the bright side, memory usage stays constant (as per Vagif Verdi's answer). If I remove the vector and the <code>TVar</code>, i.e. take out all STM and shared data, everything is very quick and responsive, and there's no noticeable CPU usage whenever a client writes something. So, while it's very nice to see that the shared data is in fact not duplicated (although I suppose I should actually write to the shared data in order to verify this fully), there's a very heavy performance penalty involved in maintaining coherent state. For me, my question then remains: How should one properly attack this problem while keeping the niceties of STM? Thanks to Vagif Verdi for bringing up some very interesting points.
Tags
<haskell><stm>
Title
Software transactional memory with a big, shared piece of data
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USgspr
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POSoftware transactional memory with a big, shared piece of data
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POSoftware transactional memory with a big, shared piece of data
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POSoftware transactional memory with a big, shared piece of data
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.