Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat's wrong with my PCA?
    primarykey
    data
    text
    <p>My code:</p> <pre><code>from numpy import * def pca(orig_data): data = array(orig_data) data = (data - data.mean(axis=0)) / data.std(axis=0) u, s, v = linalg.svd(data) print s #should be s**2 instead! print v def load_iris(path): lines = [] with open(path) as input_file: lines = input_file.readlines() data = [] for line in lines: cur_line = line.rstrip().split(',') cur_line = cur_line[:-1] cur_line = [float(elem) for elem in cur_line] data.append(array(cur_line)) return array(data) if __name__ == '__main__': data = load_iris('iris.data') pca(data) </code></pre> <p>The iris dataset: <a href="http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" rel="noreferrer">http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data</a></p> <p>Output:</p> <pre><code>[ 20.89551896 11.75513248 4.7013819 1.75816839] [[ 0.52237162 -0.26335492 0.58125401 0.56561105] [-0.37231836 -0.92555649 -0.02109478 -0.06541577] [ 0.72101681 -0.24203288 -0.14089226 -0.6338014 ] [ 0.26199559 -0.12413481 -0.80115427 0.52354627]] </code></pre> <p>Desired Output:<br> Eigenvalues - <code>[2.9108 0.9212 0.1474 0.0206]</code><br> Principal Components - <code>Same as I got but transposed</code> so okay I guess </p> <p>Also, what's with the output of the linalg.eig function? According to the PCA description on wikipedia, I'm supposed to this:</p> <pre><code>cov_mat = cov(orig_data) val, vec = linalg.eig(cov_mat) print val </code></pre> <p>But it doesn't really match the output in the tutorials I found online. Plus, if I have 4 dimensions, I thought I should have 4 eigenvalues and not 150 like the eig gives me. Am I doing something wrong?</p> <p><strong>edit</strong>: I've noticed that the values differ by 150, which is the number of elements in the dataset. Also, the eigenvalues are supposed to add to be equal to the number of dimensions, in this case, 4. What I don't understand is why this difference is happening. If I simply divided the eigenvalues by <code>len(data)</code> I could get the result I want, but I don't understand why. Either way the proportion of the eigenvalues isn't altered, but they are important to me so I'd like to understand what's going on.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload