Note that there are some explanatory texts on larger screens.

plurals
  1. POMatLab Missing data handling in categorical data
    primarykey
    data
    text
    <p>I am trying to put my dataset into the MATLAB <code>[ranked,weights] = relieff(X,Ylogical,10, 'categoricalx', 'on')</code> function to rank the importance of my predictor features. The <code>dataset&lt;double n*m&gt;</code> has <code>n</code> observations and <code>m</code> discrete (i.e. categorical) features. It happens that each observation (row) in my dataset has at least one NaN value. These NaNs represent unobserved, i.e. missing or null, predictor values in the dataset. (There is no corruption in the dataset, it is just incomplete.)</p> <p><strong>relieff()</strong> uses this function below to remove any rows that contain a NaN:</p> <pre><code>function [X,Y] = removeNaNs(X,Y) % Remove observations with missing data NaNidx = bsxfun(@or,isnan(Y),any(isnan(X),2)); X(NaNidx,:) = []; Y(NaNidx,:) = []; </code></pre> <p>This is not ideal, especially for my case, since it leaves me with <code>X=[]</code> and <code>Y=[]</code> (i.e. no observations!)</p> <p>In this case:</p> <p><strong>1)</strong> Would replacing all NaN's with a random value, e.g. 99999, help? By doing this, I am introducing a new feature state for all the predictor features so I guess it is not ideal. </p> <p><strong>2)</strong> or is replacing NaNs with the mode of the corresponding feature column vector (as below) statistically more sound? (I am not vectorising for clarity's sake)</p> <pre><code>function [matrixdata] = replaceNaNswithModes(matrixdata) for i=1: size(matrixdata,2) cv= matrixdata(:,i); modevalue= mode(cv); cv(find(isnan(cv))) = modevalue; matrixdata(:,i) = cv; end </code></pre> <p><strong>3)</strong> Or any other sensible way that would make sense for "categorical" data? </p> <p>P.S: <a href="http://www.dtreg.com/MissingValues.htm" rel="nofollow">This link</a> gives possible ways to handle missing data.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload