Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>There are a couple of points here.</p> <ol> <li><p>The use of 0.5: If you are predicting a binary outcome, a logistic model or any similar type of model that estimates a probability of event will use 0.5 as a default cutpoint because that is where above that value the model says an event is more likely and below it an event is less likely. You can change the cutpoint as you need to, but it is not always as simple as it seems. That brings me to the second point:</p></li> <li><p>Specifying the cutpoint: There are two ways to specify the cutpoint. One is using <em>a priori</em> knowledge about the system you are modelling. This could include a thought process like the following: the event is very rare so we will set the cutpoint high to avoid too many false negatives or the event is really bad so we want to catch a lot of them. In the latter case, we set the cutpoint low. You can also use the results of the model to change the cut point, but you have to be careful. The statistics for model performance are <strong>biased</strong> when calculated on the same dataset used to fit the model. </p></li> </ol> <p>In order to avoid bias, you can use cross validation. It is easy to program yourself to make it flexible. It goes like this:</p> <pre><code>n.subjects &lt;- nrow(data) predictions &lt;- for(subject in 1:n.subjects) { subset &lt;- data[-subject] # Fit Model # Find Cut point (using your code above) predicted.value &lt;- predict(model) if (predicted.value &lt; cut.point) { predictions[subject] &lt;- 'No Event' } else { predictions[subject] &lt;- 'Event' } } </code></pre> <p>Now you can look at the sensitivity and specificity of your model based on the vector <code>predictions</code>. This will allow you to assess the ability of your algorithm to find a good cut point. </p> <p>A better way would be to set aside some of your data as a 'validation' set. Using the above code, find an optimal cutpoint (tweak the algorithm until you are happy and the get the cutpoint by fitting the model on the entire data set less the validation set). Then fit the data from the 'validation' set and calculate the model performance.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload