Note that there are some explanatory texts on larger screens.

plurals
  1. POPython: how to make an histogram with equally *sized* bins
    text
    copied!<p>I have a set of data, and want to make an histogram of it. I need the bins to have the same <em>size</em>, by which I mean that they must contain the same number of objects, rather than the more common (numpy.histogram) problem of having <em>equally spaced</em> bins. This will naturally come at the expenses of the bins widths, which can - and in general will - be different.</p> <p>I will specify the number of desired bins and the data set, obtaining the bins edges in return.</p> <pre><code>Example: data = numpy.array([1., 1.2, 1.3, 2.0, 2.1, 2.12]) bins_edges = somefunc(data, nbins=3) print(bins_edges) &gt;&gt; [1.,1.3,2.1,2.12] </code></pre> <p>So the bins all contain 2 points, but their widths (0.3, 0.8, 0.02) are different.</p> <p>There are two limitations: - if a group of data is identical, the bin containing them could be bigger. - if there are N data and M bins are requested, there will be N/M bins plus one if N%M is not 0.</p> <p>This piece of code is some cruft I've written, which worked nicely for small data sets. What if I have 10**9+ points and want to speed up the process?</p> <pre><code> 1 import numpy as np 2 3 def def_equbin(in_distr, binsize=None, bin_num=None): 4 5 try: 6 7 distr_size = len(in_distr) 8 9 bin_size = distr_size / bin_num 10 odd_bin_size = distr_size % bin_num 11 12 args = in_distr.argsort() 13 14 hist = np.zeros((bin_num, bin_size)) 15 16 for i in range(bin_num): 17 hist[i, :] = in_distr[args[i * bin_size: (i + 1) * bin_size]] 18 19 if odd_bin_size == 0: 20 odd_bin = None 21 bins_limits = np.arange(bin_num) * bin_size 22 bins_limits = args[bins_limits] 23 bins_limits = np.concatenate((in_distr[bins_limits], 24 [in_distr[args[-1]]])) 25 else: 26 odd_bin = in_distr[args[bin_num * bin_size:]] 27 bins_limits = np.arange(bin_num + 1) * bin_size 28 bins_limits = args[bins_limits] 29 bins_limits = in_distr[bins_limits] 30 bins_limits = np.concatenate((bins_limits, [in_distr[args[-1]]])) 31 32 return (hist, odd_bin, bins_limits) </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload