Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><strong>How to eliminate the alternation effect:</strong></p> <p>The very last line of the Proof of Theorem 2 reads,</p> <blockquote> <p>By reversing the roles of H and W, the update rule for W can similarly be shown to be nonincreasing.</p> </blockquote> <p>Thus we can surmise that updating <code>H</code> can be done independently of updating <code>W</code>. That means after updating <code>H</code>:</p> <pre><code>H = H * H_coeff </code></pre> <p>we should also update the intermediate value <code>WH</code> before updating <code>W</code>:</p> <pre><code>WH = W.dot(H) W = W * W_coeff </code></pre> <p>Both updates decrease the divergence.</p> <p>Try it: Just stick <code>WH = W.dot(H)</code> before the computation for <code>W_coeff</code>, and the alternation effect goes away.</p> <hr> <p><strong>Simplifying the code:</strong></p> <p>When dealing with NumPy arrays, use their <code>mean</code> and <code>sum</code> methods, and avoid using the Python <code>sum</code> function:</p> <pre><code>avg_V = sum(sum(V))/n/m </code></pre> <p>can be written as</p> <pre><code>avg_V = V.mean() </code></pre> <p>and </p> <pre><code>divergence = sum(sum(V * np.log(V/WH) - V + WH)) # equation (3) </code></pre> <p>can be written as </p> <pre><code>divergence = ((V * np.log(V_over_WH)) - V + WH).sum() </code></pre> <p>Avoid the Python builtin <code>sum</code> function because </p> <ul> <li>it is slower than the NumPy <code>sum</code> method, and</li> <li>it is not as versatile as the NumPy <code>sum</code> method. (It does not allow you to specify the axis on which to sum. We managed to eliminate two calls to Python's <code>sum</code> with one call to NumPy's <code>sum</code> or <code>mean</code> method.)</li> </ul> <hr> <p><strong>Eliminate the triple for-loop:</strong></p> <p>But a bigger improvement in both speed and readability can be had by replacing</p> <pre><code>H_coeff = np.zeros(H.shape) for a in range(r): for mu in range(m): for i in range(n): H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu] H_coeff[a, mu] /= sum(W)[a] H = H * H_coeff </code></pre> <p>with </p> <pre><code>V_over_WH = V/WH H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T </code></pre> <hr> <p><strong>Explanation:</strong></p> <p>If you look at the equation 5 update rule for <code>H</code>, first notice that indices for <code>V</code> and <code>(W H)</code> are identical. So you can replace <code>V / (W H)</code> with</p> <pre><code>V_over_WH = V/WH </code></pre> <p>Next, note that in the numerator we are summing over the index i, which is the first index in both <code>W</code> and <code>V_over_WH</code>. We can express that as matrix multiplication:</p> <pre><code>np.dot(V_over_WH.T, W).T </code></pre> <p>And the denominator is simply:</p> <pre><code>W.sum(axis=0).T </code></pre> <p>If we divide the numerator and denominator</p> <pre><code>(np.dot(V_over_WH.T, W) / W.sum(axis=0)).T </code></pre> <p>we get a matrix indexed by the two remaining indices, alpha and mu, in that order. That is the same as the indices for <code>H</code>. So we want to multiply H by this ratio element-wise. Perfect. NumPy multiplies arrays element-wise by default.</p> <p>Thus, we can express the entire update rule for <code>H</code> as</p> <pre><code>H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T </code></pre> <hr> <p><strong>So, putting it all together:</strong></p> <pre><code>import numpy as np np.random.seed(1) def update(V, W, H, WH, V_over_WH): # equation (5) H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T WH = W.dot(H) V_over_WH = V / WH W *= np.dot(V_over_WH, H.T) / H.sum(axis=1) WH = W.dot(H) V_over_WH = V / WH return W, H, WH, V_over_WH def factor(V, r, iterations=100): n, m = V.shape avg_V = V.mean() W = np.random.random(n * r).reshape(n, r) * avg_V H = np.random.random(r * m).reshape(r, m) * avg_V WH = W.dot(H) V_over_WH = V / WH for i in range(iterations): W, H, WH, V_over_WH = update(V, W, H, WH, V_over_WH) # equation (3) divergence = ((V * np.log(V_over_WH)) - V + WH).sum() print("At iteration {i}, the Kullback-Liebler divergence is {d}".format( i=i, d=divergence)) return W, H V = np.arange(0.01, 1.01, 0.01).reshape(10, 10) # V = np.arange(1,101).reshape(10,10).astype('float') W, H = factor(V, 6) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload