Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>When you are trying to optimise, the thing you <strong><em>have</em></strong> to do is profile and measure! Python provides the <code>timeit</code> module which makes measuring things easy!</p> <p>This will assume that you've converted fields to a list of floats beforehand (outside any of these functions), since the string &rarr; float conversion is very slow. You can do this via <code>fields = [float(f) for f in string_fields]</code>.</p> <p>Also, for doing numerical processing, pure python isn't very good, since it ends up doing a lot of type-checking (and some other stuff) for each operation. Using a C library like <a href="http://numpy.scipy.org/" rel="nofollow">numpy</a> will give massive improvements.</p> <h2>find_best</h2> <p>I have incorporated the answers of others (and a few more) into a profiling suite (say, <code>test_find_best.py</code>):</p> <pre><code>import random, operator, numpy as np, itertools, timeit fields = [random.random() for _ in range(3000)] fields_string = [str(field) for field in fields] weights = [[random.random() for _ in range(3000)] for c in range(100)] npw = np.array(weights) npf = np.array(fields) num_fields = len(fields) num_category = len(weights) def f_original(): winner = -1 best = -float('inf') for c in range(num_category): score = 0 for i in range(num_fields): score += float(fields_string[i]) * weights[c][i] if score &gt; best: best = score winner = c def f_original_no_string(): winner = -1 best = -float('inf') for c in range(num_category): score = 0 for i in range(num_fields): score += fields[i] * weights[c][i] if score &gt; best: best = score winner = c def f_original_xrange(): winner = -1 best = -float('inf') for c in xrange(num_category): score = 0 for i in xrange(num_fields): score += fields[i] * weights[c][i] if score &gt; best: best = score winner = c # Zenon http://stackoverflow.com/a/10134298/1256624 def f_index_comprehension(): winner = -1 best = -float('inf') for c in range(num_category): score = sum(fields[i] * weights[c][i] for i in xrange(num_fields)) if score &gt; best: best = score winner = c # steveha http://stackoverflow.com/a/10134247/1256624 def f_comprehension(): winner = -1 best = -float('inf') for c in xrange(num_category): score = sum(f * w for f, w in itertools.izip(fields, weights[c])) if score &gt; best: best = score winner = c def f_schwartz_original(): # https://en.wikipedia.org/wiki/Schwartzian_transform tup = max(((i, sum(t[0] * t[1] for t in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)), key=lambda t: t[1] ) def f_schwartz_opt(): # https://en.wikipedia.org/wiki/Schwartzian_transform tup = max(((i, sum(f * w for f,w in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)), key=operator.itemgetter(1) ) def fweight(field_float_list, wlist): f = iter(field_float_list) return sum(f.next() * w for w in wlist) def f_schwartz_iterate(): tup = max( ((i, fweight(fields, wlist)) for i, wlist in enumerate(weights)), key=lambda t: t[1] ) # Nolen Royalty http://stackoverflow.com/a/10134147/1256624 def f_numpy_mult_sum(): np.argmax(np.sum(npf * npw, axis = 1)) # me def f_imap(): winner = -1 best = -float('inf') for c in xrange(num_category): score = sum(itertools.imap(operator.mul, fields, weights[c])) if score &gt; best: best = score winner = c def f_numpy(): np.argmax(npw.dot(npf)) for f in [f_original, f_index_comprehension, f_schwartz_iterate, f_original_no_string, f_schwartz_original, f_original_xrange, f_schwartz_opt, f_comprehension, f_imap]: print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=10)/10 * 1000) for f in [f_numpy_mult_sum, f_numpy]: print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=100)/100 * 1000) </code></pre> <p>Running <code>python test_find_best.py</code> gives me:</p> <pre><code>f_original: 310.34 ms f_index_comprehension: 102.58 ms f_schwartz_iterate: 103.39 ms f_original_no_string: 96.36 ms f_schwartz_original: 90.52 ms f_original_xrange: 89.31 ms f_schwartz_opt: 69.48 ms f_comprehension: 68.87 ms f_imap: 53.33 ms f_numpy_mult_sum: 3.57 ms f_numpy: 0.62 ms </code></pre> <p>So the numpy version using <code>.dot</code> (sorry, I can't find the documentation for it atm) is the fastest. If you are doing a lot of numerical operations (which it seems you are), it might be worth converting <code>fields</code> and <code>weights</code> as numpy arrays as soon as you create them.</p> <h2>update_weights</h2> <p>Numpy is likely to offer a similar speed-up for <code>update_weights</code>, doing something like:</p> <pre><code>def update_weights(weights, fields, to_increase, to_decrease): weights[to_increase,:] += fields weights[to_decrease,:] -= fields return weights </code></pre> <p>(I haven't tested or profiled that btw, you need to do that.)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload