Note that there are some explanatory texts on larger screens.

plurals
  1. PODifferent results from LOF implementation in ELKI and RapidMiner
    primarykey
    data
    text
    <p>I have written my own implementation of LOF and I'm trying to compare results with the implementations in ELKI and RapidMiner, but all 3 give different results! I'm trying to work out why.</p> <p>My reference dataset is one-dimensional, 102 real values with many duplicates. I'll try and post it below. </p> <p>First, the RapidMiner implementation. The LOF scores are wildly different from ELKI and from my results; many come back with a LOF of infinity. Has this implementation been validated as being correct?</p> <p>My results are similar to ELKI, but I don't get exactly the same LOF values. From a quick scan of the comments in the ELKI source code, I think this may be because of differences in the way the k-neighbourhood is calculated.</p> <p>In the LOF paper, the MinPts parameter (elsewhere called k) specifies the minimum no. of points to be included in the k-neighbourhood. In the ELKI implementation, I think they are defining the k-neighbourhood as exactly k points rather than all the points within the k-distance or k-distinct distance. Can anyone confirm exactly how ELKI constructs the k-neighbourhood? Also there is a private variable which allows the point itself to be included in its own neighbourhood, but it looks like the default is not to include it.</p> <p>Does anyone know of a public reference dataset which has the LOF scores attached for validation purposes?</p> <p>--- more details follow ---</p> <p>Reference: ELKI source code is here:</p> <p><a href="http://elki.dbs.ifi.lmu.de/browser/elki/trunk/src/de/lmu/ifi/dbs/elki/algorithm/outlier/lof/LOF.java" rel="nofollow">http://elki.dbs.ifi.lmu.de/browser/elki/trunk/src/de/lmu/ifi/dbs/elki/algorithm/outlier/lof/LOF.java</a></p> <p>RapidMiner source code is here:</p> <p><a href="http://code.google.com/p/rapidminer-anomalydetection/source/browse/trunk/src/de/dfki/madm/anomalydetection/evaluator/nearest_neighbor_based/LOFEvaluator.java" rel="nofollow">http://code.google.com/p/rapidminer-anomalydetection/source/browse/trunk/src/de/dfki/madm/anomalydetection/evaluator/nearest_neighbor_based/LOFEvaluator.java</a></p> <p>Here is my test dataset:</p> <p>4.32323 5.12595 5.12595 5.12595 5.12595 5.7457 5.7457 5.7457 5.7457 5.7457 5.7457 5.97766 5.97766 6.07352 6.07352 6.12015 6.12015 6.12015 6.44797 6.44797 6.48131 6.48131 6.48131 6.48131 6.48131 6.48131 6.6333 6.6333 6.6333 6.70872 6.70872 6.70872 6.70872 6.70872 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 6.77579 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.03654 7.10361 7.10361 7.10361 7.10361 7.10361 7.10361 7.10361 7.10361 7.15651 7.15651 7.15651 7.15651 7.15651 7.15651 7.15651 7.15651 8.22598 8.22598 8.22598 8.22598 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538 8.5538</p> <p>For example, I get the following LOF score for the first number (4.32323):</p> <ul> <li>RapidMiner: infinity (with MinPts lower/upper bounds set to 10,100)</li> <li>ELKI: 2.6774 (with k = 10 and distfunction/reachdistfunction set to default)</li> <li>My implementation: 1.9531</li> </ul> <p>Some more details on what my implementation is doing:</p> <ol> <li>MinPts is 10, so I'm finding the 10 distinct neighbours of the point. So the neighbourhood of 4.32323 is actually 48 points, from 5.12595 up to 6.77579.</li> <li>That gives me a k-distinct distance of 2.45256</li> <li>I'm calculating the reachability distance of the first neighbour as 1.58277</li> <li>I'm calculating the LRD of the sample as 1/(99.9103/48)</li> <li>Sum of lrd(o)/lrd(p) for all 48 neighbours is 93.748939</li> <li>Divide by 48 to get a LOF of 1.9531</li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload