Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I posted my answer even though another answer has already been accepted; the accepted answer relies on a <a href="http://matplotlib.org/api/mlab_api.html?highlight=mlab#deprecated-functions" rel="noreferrer">deprecated function</a>; additionally, this deprecated function is based on <em>Singular Value Decomposition</em> (SVD), which (although perfectly valid) is the much more memory- and processor-intensive of the two general techniques for calculating PCA. This is particularly relevant here because of the size of the data array in the OP. Using covariance-based PCA, the array used in the computation flow is just <em>144 x 144</em>, rather than <em>26424 x 144</em> (the dimensions of the original data array).</p> <p>Here's a simple working implementation of PCA using the <strong><em>linalg</em></strong> module from <em>SciPy</em>. Because this implementation first calculates the covariance matrix, and then performs all subsequent calculations on this array, it uses far less memory than SVD-based PCA. </p> <p>(the linalg module in <em>NumPy</em> can also be used with no change in the code below aside from the import statement, which would be <em>from numpy import linalg as LA</em>.)</p> <p>The two key steps in this PCA implementation are:</p> <ul> <li><p>calculating the <strong><em>covariance matrix</em></strong>; and</p></li> <li><p>taking the <strong><em>eivenvectors</em></strong> &amp; <strong><em>eigenvalues</em></strong> of this <em>cov</em> matrix</p></li> </ul> <p>In the function below, the parameter <strong>dims_rescaled_data</strong> refers to the desired number of dimensions in the <em>rescaled</em> data matrix; this parameter has a default value of just two dimensions, but the code below isn't limited to two but it could be <em>any</em> value less than the column number of the original data array.</p> <hr> <pre><code>def PCA(data, dims_rescaled_data=2): """ returns: data transformed in 2 dims/columns + regenerated original data pass in: data as 2D NumPy array """ import numpy as NP from scipy import linalg as LA m, n = data.shape # mean center the data data -= data.mean(axis=0) # calculate the covariance matrix R = NP.cov(data, rowvar=False) # calculate eigenvectors &amp; eigenvalues of the covariance matrix # use 'eigh' rather than 'eig' since R is symmetric, # the performance gain is substantial evals, evecs = LA.eigh(R) # sort eigenvalue in decreasing order idx = NP.argsort(evals)[::-1] evecs = evecs[:,idx] # sort eigenvectors according to same index evals = evals[idx] # select the first n eigenvectors (n is desired dimension # of rescaled data array, or dims_rescaled_data) evecs = evecs[:, :dims_rescaled_data] # carry out the transformation on the data using eigenvectors # and return the re-scaled data, eigenvalues, and eigenvectors return NP.dot(evecs.T, data.T).T, evals, evecs def test_PCA(data, dims_rescaled_data=2): ''' test by attempting to recover original data array from the eigenvectors of its covariance matrix &amp; comparing that 'recovered' array with the original data ''' _ , _ , eigenvectors = PCA(data, dim_rescaled_data=2) data_recovered = NP.dot(eigenvectors, m).T data_recovered += data_recovered.mean(axis=0) assert NP.allclose(data, data_recovered) def plot_pca(data): from matplotlib import pyplot as MPL clr1 = '#2026B2' fig = MPL.figure() ax1 = fig.add_subplot(111) data_resc, data_orig = PCA(data) ax1.plot(data_resc[:, 0], data_resc[:, 1], '.', mfc=clr1, mec=clr1) MPL.show() &gt;&gt;&gt; # iris, probably the most widely used reference data set in ML &gt;&gt;&gt; df = "~/iris.csv" &gt;&gt;&gt; data = NP.loadtxt(df, delimiter=',') &gt;&gt;&gt; # remove class labels &gt;&gt;&gt; data = data[:,:-1] &gt;&gt;&gt; plot_pca(data) </code></pre> <p>The plot below is a visual representation of this PCA function on the iris data. As you can see, a 2D transformation cleanly separates class I from class II and class III (but not class II from class III, which in fact requires another dimension).</p> <p><img src="https://i.stack.imgur.com/vxoxd.png" alt="enter image description here"></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload