Note that there are some explanatory texts on larger screens.

plurals
  1. POPuzzling behavior in simple logical test applied to vector of values
    primarykey
    data
    text
    <p>Ok, this has me absolutely perplexed and worried- As part of a routine, I have been classifying individual observations of variables as <code>TRUE</code> or <code>FALSE</code> based on whether their values are above or below/equal to the median value. However, I have been getting a behavior in R that is largely unexpected from performing this simple test.</p> <p>So take this set of observations:</p> <pre><code>data=c(0.6666667, 0.8333, 0.6666667, 0.8333, 0.8333, 0.75, 0.9999, 0.7499667, 0.25, 0.6666667, 0.1667, 0.7499667, 0.5, 0.2500333, 0.3333667, 0.0834, 0.0001, 0.2500333, 0.8333, 0.9999, 0.9999, 0.2500333, 0.2500333, 0.3333667, 0.9166, 0.5, 0.2500333, 0.4166667, 0.0001, 0.1667333, 0.6666333, 0.0834, 0.1667, 0.6666333, 0.9166, 0.1667, 0.7499333, 0.9166, 0.9166, 0.9166, 0.7499667, 0.7499667, 0.4166667, 0.5, 0.2500333, 0.9166, 0.6666667, 0.1667333, 0.25, 0.0001, 0.3333667, 0.0001, 0.25, 0.0834, 0.9999, 0.0834, 0.1667, 0.5, 0.2500333, 0.3333667, 0.9166, 0.9166, 0.8333, 0.9166, 0.75, 0.0834, 0.4166667, 0.5, 0.0001, 0.9999, 0.8333, 0.6666667, 0.9166) </code></pre> <p>For me to classify these values, I did:</p> <pre><code>data_med=median(data) quant_data=data quant_data[quant_data&gt;data_med]="High" quant_data[quant_data&lt;=data_med]="Low" </code></pre> <p>I know there are 1 gazillion ways of doing this more efficiently, but what has me worried is that the output from this does not make sense. Since there are no <code>NaN</code>s on the set and the test is all inclusive (<code>&gt;</code> or <code>&lt;=</code>), I should end up with a list of only <code>TRUE</code>/<code>FALSE</code> values, but instead I get:</p> <pre><code>[1] "High" "High" "High" "High" "High" "High" "High" "High" "Low" "High" "Low" "High" "Low" "Low" "Low" "Low" "1e-04" [18] "Low" "High" "High" "High" "Low" "Low" "Low" "High" "Low" "Low" "Low" "1e-04" "Low" "High" "Low" "Low" "High" [35] "High" "Low" "High" "High" "High" "High" "High" "High" "Low" "Low" "Low" "High" "High" "Low" "Low" "1e-04" "Low" [52] "1e-04" "Low" "Low" "High" "Low" "Low" "Low" "Low" "Low" "High" "High" "High" "High" "High" "Low" "Low" "Low" [69] "1e-04" "High" "High" "High" "High" </code></pre> <p>See the "1e-04"s? What is even stranger, let's pick value 69, one of the ones that return odd values:</p> <pre><code>data[69] &gt;1e-04 </code></pre> <p>If I test this value alone, I get what I expected to get:</p> <pre><code>data[69]&lt;=data_med TRUE </code></pre> <p>Can someone explain this behavior? It just seems downright dangerous... </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload