Note that there are some explanatory texts on larger screens.

plurals
  1. PONumpy stateing that invalid value while calculating normalized mahalanobis distance
    primarykey
    data
    text
    <h2><strong>Note</strong>:</h2> <p>This is for a homework assignment in my data mining class.</p> <p>I'm going to put relevant code snippets on this SO post, but you can find my entire program at <a href="http://pastebin.com/CzNFbLJ2" rel="nofollow noreferrer">http://pastebin.com/CzNFbLJ2</a> </p> <p>The dataset I'm using for this program can be found at <a href="http://archive.ics.uci.edu/ml/datasets/Iris" rel="nofollow noreferrer">http://archive.ics.uci.edu/ml/datasets/Iris</a></p> <hr> <p>So I'm getting: RuntimeWarning: invalid value encountered in sqrt return np.sqrt(m)</p> <p>I am attempting to find the average Mahalanobis distance of the given iris dataset (for both raw and normalized datasets). The error is only happening on the normalized version of the dataset which is making me wonder if I have incorrectly understood what normalization means (both in code and mathematically).</p> <p>I thought that normalization means that each component of a vector is divided by it's vector length (causing the vector to add up to 1). I found this SO question <a href="https://stackoverflow.com/questions/8904694/how-to-normalize-a-2-dimensional-numpy-array-in-python-less-verbose">How to normalize a 2-dimensional numpy array in python less verbose?</a> and thought it matched up to my concept of normalization. But now my code is reporting that the Mahalanobis distance over the normalized dataset is NAN</p> <pre><code>def mahalanobis(data): import numpy as np; import scipy.spatial.distance; avg = 0 count = 0 covar = np.cov(data, rowvar=0); invcovar = np.linalg.inv(covar) for i in range(len(data)): for j in range(i + 1, len(data)): if(j == len(data)): break avg += scipy.spatial.distance.mahalanobis(data[i], data[j], invcovar) count += 1 return avg / count def normalize(data): import numpy as np row_sums = data.sum(axis=1) norm_data = np.zeros((50, 4)) for i, (row, row_sum) in enumerate(zip(data, row_sums)): norm_data[i,:] = row / row_sum return norm_data </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload