Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to get data in a histogram bin
    primarykey
    data
    text
    <p>I want to get a list of the data contained in a histogram bin. I am using numpy, and Matplotlib. I know how to traverse the data and check the bin edges. However, I want to do this for a 2D histogram and the code to do this is rather ugly. Does numpy have any constructs to make this easier?</p> <p>For the 1D case, I can use searchsorted(). But the logic is not that much better, and I don’t really want to do a binary search on each data point when I don’t have to.</p> <p>Most of the nasty logic is due to the bin boundary regions. All regions have boundaries like this: [left edge, right edge). Except the last bin, which has a region like this: [left edge, right edge]. </p> <p>Here is some sample code for the 1D case:</p> <pre><code>import numpy as np data = [0, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 3] hist, edges = np.histogram(data, bins=3) print 'data =', data print 'histogram =', hist print 'edges =', edges getbin = 2 #0, 1, or 2 print '---' print 'alg 1:' #for i in range(len(data)): for d in data: if d &gt;= edges[getbin]: if (getbin == len(edges)-2) or d &lt; edges[getbin+1]: print 'found:', d #end if #end if #end for print '---' print 'alg 2:' for d in data: val = np.searchsorted(edges, d, side='right')-1 if val == getbin or val == len(edges)-1: print 'found:', d #end if #end for </code></pre> <p>Here is some sample code for the 2D case:</p> <pre><code>import numpy as np xdata = [0, 1.5, 1.5, 2.5, 2.5, 2.5, \ 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, \ 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 3] ydata = [0, 5,5, 5, 5, 5, \ 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, \ 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 30] xbins = 3 ybins = 3 hist2d, xedges, yedges = np.histogram2d(xdata, ydata, bins=(xbins, ybins)) print 'data2d =', zip(xdata, ydata) print 'hist2d =' print hist2d print 'xedges =', xedges print 'yedges =', yedges getbin2d = 5 #0 through 8 print 'find data in bin #', getbin2d xedge_i = getbin2d % xbins yedge_i = int(getbin2d / xbins) #IMPORTANT: this is xbins for x, y in zip(xdata, ydata): # x and y left edges if x &gt;= xedges[xedge_i] and y &gt;= yedges[yedge_i]: #x right edge if xedge_i == xbins-1 or x &lt; xedges[xedge_i + 1]: #y right edge if yedge_i == ybins-1 or y &lt; yedges[yedge_i + 1]: print 'found:', x, y #end if #end if #end if #end for </code></pre> <p>Is there a cleaner / more efficient way to do this? It seems like numpy would have something for this.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload