StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
7256850
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
26
CommunityOwnedDate
CreationDate
2011-08-31T12:22:30.813
FavoriteCount
0
LastActivityDate
2011-09-01T18:40:20.900
LastEditDate
2011-09-01T18:40:20.900
LastEditorUserId
565518
OwnerUserId
565518
ParentId
7256563
PostTypeId
2
Score
7
ViewCount
0
LastEditorDisplayName
text
Body
This is a complete rewrite of my previous answer. It turns out that in my previous attempts, I overlooked a much simpler method based on a combination of packed arrays and sparse arrays, that is much faster and more memory - efficient than all previous methods (at least in the range of sample sizes where I tested it), while only minimally changing the original <code>SubValues</code> - based approach. Since the question was asked about the most efficient method, I will remove the other ones from the answer (given that they are quite a bit more complex and take a lot of space. Those who would like to see them can inspect past revisions of this answer). <h3>The original <code>SubValues</code> - based approach</h3> We start by introducing a function to generate the test samples of configurations for us. Here it is: <pre><code>Clear[generateConfigurations]; generateConfigurations[maxIndex_Integer, maxConfX_Integer, maxConfY_Integer, nconfs_Integer] := Transpose[{ RandomInteger[{1, maxIndex}, nconfs], Transpose[{ RandomInteger[{1, maxConfX}, nconfs], RandomInteger[{1, maxConfY}, nconfs] }]}]; </code></pre> We can generate a small sample to illustrate: <pre><code>In[3]:= sample = generateConfigurations[2,2,2,10] Out[3]= {{2,{2,1}},{2,{1,1}},{1,{2,1}},{1,{1,2}},{1,{1,2}}, {1,{2,1}},{2,{1,2}},{2,{2,2}},{1,{2,2}},{1,{2,1}}} </code></pre> We have here only 2 indices, and configurations where both "x" and "y" numbers vary from 1 to 2 only - 10 such configurations. The following function will help us imitate the accumulation of frequencies for configurations, as we increment <code>SubValues</code>-based counters for repeatedly occurring ones: <pre><code>Clear[testAccumulate]; testAccumulate[ff_Symbol, data_] := Module[{}, ClearAll[ff]; ff[_][_] = 0; Do[ doSomeStuff; ff[#1][#2]++ & @@ elem; doSomeMoreStaff; , {elem, data}]]; </code></pre> The <code>doSomeStuff</code> and <code>doSomeMoreStaff</code> symbols are here to represent some code that might preclude or follow the counting code. The <code>data</code> parameter is supposed to be a list of the form produced by <code>generateConfigurations</code>. For example: <pre><code>In[6]:= testAccumulate[ff,sample]; SubValues[ff] Out[7]= {HoldPattern[ff[1][{1,2}]]:>2,HoldPattern[ff[1][{2,1}]]:>3, HoldPattern[ff[1][{2,2}]]:>1,HoldPattern[ff[2][{1,1}]]:>1, HoldPattern[ff[2][{1,2}]]:>1,HoldPattern[ff[2][{2,1}]]:>1, HoldPattern[ff[2][{2,2}]]:>1,HoldPattern[ff[_][_]]:>0} </code></pre> The following function will extract the resulting data (indices, configurations and their frequencies) from the list of <code>SubValues</code>: <pre><code>Clear[getResultingData]; getResultingData[f_Symbol] := Transpose[{#[[All, 1, 1, 0, 1]], #[[All, 1, 1, 1]], #[[All, 2]]}] &@ Most@SubValues[f, Sort -> False]; </code></pre> For example: <pre><code>In[10]:= result = getResultingData[ff] Out[10]= {{2,{2,1},1},{2,{1,1},1},{1,{2,1},3},{1,{1,2},2},{2,{1,2},1}, {2,{2,2},1},{1,{2,2},1}} </code></pre> To finish with the data-processing cycle, here is a straightforward function to extract data for a fixed index, based on <code>Select</code>: <pre><code>Clear[getResultsForFixedIndex]; getResultsForFixedIndex[data_, index_] := If[# === {}, {}, Transpose[#]] &[ Select[data, First@# == index &][[All, {2, 3}]]]; </code></pre> For our test example, <pre><code>In[13]:= getResultsForFixedIndex[result,1] Out[13]= {{{2,1},{1,2},{2,2}},{3,2,1}} </code></pre> This is presumably close to what @zorank tried, in code. <h3>A faster solution based on packed arrays and sparse arrays</h3> As @zorank noted, this becomes slow for larger sample with more indices and configurations. We will now generate a large sample to illustrate that (note! This requires about 4-5 Gb of RAM, so you may want to reduce the number of configurations if this exceeds the available RAM): <pre><code>In[14]:= largeSample = generateConfigurations[20,500,500,5000000]; testAccumulate[ff,largeSample];//Timing Out[15]= {31.89,Null} </code></pre> We will now extract the full data from the <code>SubValues</code> of <code>ff</code>: <pre><code>In[16]:= (largeres = getResultingData[ff]); // Timing Out[16]= {10.844, Null} </code></pre> This takes some time, but one has to do this only once. But when we start extracting data for a fixed index, we see that it is quite slow: <pre><code>In[24]:= getResultsForFixedIndex[largeres,10]//Short//Timing Out[24]= {2.687,{{{196,26},{53,36},{360,43},{104,144},<<157674>>,{31,305},{240,291}, {256,38},{352,469}},{<<1>>}}} </code></pre> The main idea we will use here to speed it up is to pack individual lists inside the <code>largeres</code>, those for indices, combinations and frequencies. While the full list can not be packed, those parts individually can: <pre><code>In[18]:= Timing[ subIndicesPacked = Developer`ToPackedArray[largeres[[All,1]]]; subCombsPacked = Developer`ToPackedArray[largeres[[All,2]]]; subFreqsPacked = Developer`ToPackedArray[largeres[[All,3]]]; ] Out[18]= {1.672,Null} </code></pre> This also takes some time, but it is a one-time operation again. The following functions will then be used to extract the results for a fixed index much more efficiently: <pre><code>Clear[extractPositionFromSparseArray]; extractPositionFromSparseArray[HoldPattern[SparseArray[u___]]] := {u}[[4, 2, 2]] Clear[getCombinationsAndFrequenciesForIndex]; getCombinationsAndFrequenciesForIndex[packedIndices_, packedCombs_, packedFreqs_, index_Integer] := With[{positions = extractPositionFromSparseArray[ SparseArray[1 - Unitize[packedIndices - index]]]}, {Extract[packedCombs, positions],Extract[packedFreqs, positions]}]; </code></pre> Now, we have: <pre><code>In[25]:= getCombinationsAndFrequenciesForIndex[subIndicesPacked,subCombsPacked,subFreqsPacked,10] //Short//Timing Out[25]= {0.094,{{{196,26},{53,36},{360,43},{104,144},<<157674>>,{31,305},{240,291}, {256,38},{352,469}},{<<1>>}}} </code></pre> We get a 30 times speed-up w.r.t. the naive <code>Select</code> approach. <h3>Some notes on complexity</h3> Note that the second solution is faster because it uses optimized data structures, but its complexity is the same as that of <code>Select</code>- based one, which is, linear in the length of total list of unique combinations for all indices. Therefore, in theory, the previously - discussed solutions based on nested hash-table etc may be asymptotically better. The problem is, that in practice we will probably hit the memory limitations long before that. For the 10 million configurations sample, the above code was still 2-3 times faster than the fastest solution I posted before. EDIT The following modification: <pre><code>Clear[getCombinationsAndFrequenciesForIndex]; getCombinationsAndFrequenciesForIndex[packedIndices_, packedCombs_, packedFreqs_, index_Integer] := With[{positions = extractPositionFromSparseArray[ SparseArray[Unitize[packedIndices - index], Automatic, 1]]}, {Extract[packedCombs, positions], Extract[packedFreqs, positions]}]; </code></pre> makes the code twice faster still. Moreover, for more sparse indices (say, calling the sample-generation function with parameters like <code>generateConfigurations[2000, 500, 500, 5000000]</code> ), the speed-up with respect to the <code>Select</code>- based function is about 100 times. 
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POAlgorithm for picking pattern free downvalues from a sparse definition list
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USLeonid Shifrin
UserOwnerUserId
1. USLeonid Shifrin
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.