Note that there are some explanatory texts on larger screens.

plurals
  1. POPython group by array a, and summarize array b - Performance
    primarykey
    data
    text
    <p>Given two unordered arrays of same lengths a and b:</p> <pre><code>a = [7,3,5,7,5,7] b = [0.2,0.1,0.3,0.1,0.1,0.2] </code></pre> <p>I'd like to group by the elements in a:</p> <pre><code>aResult = [7,3,5] </code></pre> <p>summing over the elements in b (Example used to summarize a probability density function):</p> <pre><code>bResult = [0.2 + 0.1 + 0.2, 0.1, 0.3 + 0.1] = [0.5, 0.1, 0.4] </code></pre> <p>Alternatively, random a and b in python:</p> <pre><code>import numpy as np a = np.random.randint(1,10,10000) b = np.array([1./len(a)]*len(a)) </code></pre> <p>I have two approaches, which for sure are far from the lower performance boundary. Approach 1 (at least nice and short): Time: 0.769315958023</p> <pre><code>def approach_2(a,b): bResult = [sum(b[i == a]) for i in np.unique(a)] aResult = np.unique(a) </code></pre> <p>Approach 2 (numpy.groupby, horribly slow) Time: 4.65299129486</p> <pre><code>def approach_2(a,b): tmp = [(a[i],b[i]) for i in range(len(a))] tmp2 = np.array(tmp, dtype = [('a', float),('b', float)]) tmp2 = np.sort(tmp2, order='a') bResult = [] aResult = [] for key, group in groupby(tmp2, lambda x: x[0]): aResult.append(key) bResult.append(sum([i[1] for i in group])) </code></pre> <p>Update: Approach3, by Pablo. Time: 1.0265750885</p> <pre><code>def approach_Pablo(a,b): pdf = defaultdict(int); for x,y in zip(a,b): pdf[x] += y </code></pre> <p>Update: Approach 4, by Unutbu. Time: 0.184849023819 [WINNER SO FAR, but a as integer only]</p> <pre><code>def unique_Unutbu(a,b): x=np.bincount(a,weights=b) aResult = np.unique(a) bResult = x[aResult] </code></pre> <p>Maybe someone finds a smarter solution to this problem than me :)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload