Note that there are some explanatory texts on larger screens.

plurals
  1. POhow to use classwt in randomForest of R?
    primarykey
    data
    text
    <p>I have a highly imbalanced data set with target class instances in the following ratio <em>(edit:)</em> <s>60000:1000:1000:1000</s> <code>60000:1000:1000:50</code>(i.e. a total of 4 classes). I want to use <code>randomForest</code> for making predictions of the target class.</p> <p>So, to reduce the class imbalance, I played with <code>sampsize</code> parameter, setting it to <em>(edit:)</em> <s><code>c(5000, 1000, 1000, 1000)</code></s> <code>c(5000, 1000, 1000, 50)</code> and some other values, but there was not much use of it. Actually, the accuracy of the 1st class decreased while I played with <code>sampsize</code>, though the improvement in other class predictions was very minute.</p> <p>While digging through the archives, I came across two more features of <code>randomForest()</code>, which are <code>strata</code> and <code>classwt</code> that are used to offset class imbalance issue. </p> <p>All the documents upon <code>classwt</code> were old (generally belonging to the 2007, 2008 years), which all suggested not the use the <code>classwt</code> feature of <code>randomForest</code> package in <code>R</code> as it does not completely implement its complete functionality like it does in <code>fortran</code>. So the first question is:<br> <strong>Is <code>classwt</code> completely implemented now in <code>randomForest</code> package of R? If yes, what does passing <code>c(1, 10, 10, 10)</code> to the <code>classwt</code> argument represent?</strong> (Assuming the above case of 4 classes in the target variable)</p> <p>Another argument which is said to offset class imbalance issue is stratified sampling, which is always used in conjunction with <code>sampsize</code>. I understand what <code>sampsize</code> is from the documentation, but there is not enough documentation or examples which gave a clear insight into using <code>strata</code> for overcoming class imbalance issue. So the second question is:<br> <strong>What type of arguments have to be passed to <code>strata</code>in <code>randomForest</code> and what does it represent?</strong></p> <p>I guess the word <strong><em>weight</em></strong> which I have not explicitly mentioned in the question should play a major role in the answer.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload