Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>An "outlier" in the terminology of box-and-whisker plots is any point in the data set that falls farther than a specified distance from the median, typically <em>approximately</em> 2.5 times the difference between the median and the 0.25 (lower) or 0.75 (upper) quantile. To get there, see <code>?boxplot.stats</code>: first, look at the definition of <code>out</code> in the output</p> <blockquote> <p><code>out</code>: the values of any data points which lie beyond the extremes of the whiskers (<code>if(do.out)</code>).</p> </blockquote> <p>These are the "outliers".</p> <p>Second, look at the definition of the whiskers, which are based on the <code>coef</code> parameter, which is 1.5 by default:</p> <blockquote> <p>the whiskers extend to the most extreme data point which is no more than <code>coef</code> times the length of the box away from the box.</p> </blockquote> <p>Finally, look at the definition of the "hinges", which are the ends of the box:</p> <blockquote> <p>The two ‘hinges’ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4).</p> </blockquote> <p>Put these together, and you get outliers defined (approximately) as points that are farther from the median than 2.5 times the distance between the median and the relevant quartile. The reasons for these somewhat convoluted definitions are (I think) partly historical and partly the desire to have the components of the plots reflect actual values that are present in the data (rather than, say, the halfway point between two data points) as much as possible. (You would probably need to go back to the original literature referenced in the help page for the full justifications and explanations.)</p> <p>The thing to be careful about is that <strong>points defined as "outliers" by this algorithm are not necessarily outliers in the usual statistical sense (e.g. points that are surprisingly extreme based on a particular statistical model of the data)</strong>. In particular, if you have a big data set you will necessarily see lots of "outliers" (one indication that you might want to switch to a more data-hungry graphical summary such as a violin plot or beanplot).</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload