Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>For a start both solutiona (1) and (2) do not help you handle your data more properly, since NaN is in fact a labelling that is handled appropriately by Matlab; warnings will be issued. What you should do is:</p> <ol> <li>Handle the NaNs per case</li> <li>Use try catch blocks </li> </ol> <p>NaN is like a number, and there is nothing bad about it. Even is you divide by NaN matlab will treat it properly and give you a NaN. </p> <p>If you still want to replace them, then you will need an assumption that holds. For example, if your data is engine speeds in a timeseries that have been input by the engine operator, but some time instances have not been specified then there are more than one ways to handle the NaN that will appear in the matrix. </p> <ol> <li>Replace with 0s</li> <li>Replace with the previous value</li> <li>Replace with the next value</li> <li>Replace with the average of the previous and the next value and many more.</li> </ol> <p>As you can see your problem is ill-posed, and depends on the predictor and the data source.</p> <p>In case of categorical data, e.g. three categories {0,1,2} and supposing NaN occurs in Y.</p> <pre><code>for k=1:size(Y,2) [ id ]=isnan(Y(:,k); m(k)=median(Y(~id),k); Y(id,k)=round(m(k)); end </code></pre> <p>I feel really bad that I had to write a for-loop but I cannot see any other way. As you can see I made a number of assumptions, by using <code>median</code> and <code>round</code>. You may want to use a threshold depending on you knowledge about the data.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. COHi, Your suggestions (1-4) are great for real / continuous data. I have specified data being 'categorical' (i.e. nominal, not even ordinal) to emphasise that simple interpolation or smoothing does not cut the cheese in this case. Can you elaborate on your suggestion: 'Handle NaNs per case' ?
      singulars
    2. CO@Berkan, Hi, it is not clear to me from the description how the NaN occurs is your case. But usually it happens in 0/0 , inf/inf, or if your input data has missing values. I suppose in your case it is 0/0, inf/inf. The reason that I am not suggesting something is because you don't give enough details on the predictor. Though, It is safe though to assume that you know more on that. One possible policy is to take the median for each column (without the nans) and then replace NaN with the median at each column. Another would be to take the mean, or put random values; depends on the predictor.
      singulars
      1. This table or related slice is empty.
    3. COSorry, I should have clarified it in the very beginning (I added it to the body of the question now and I am changing the title of the question as well). In my case, NaNs represent missing (unobserved) values in the dataset. I have looked into the option of replacing them with Mean/Median/Mode-Imputation but have a feeling that it is not good. I have also read in a couple of places that it is bad practice. One reference suggests using Maximum Likelihoods or Multiple Imputations but I am still trying to get a grasp of it.
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload