Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Good plagiarism detection will apply heuristics based on the type of document (e.g. an essay or program code in a specific language).</p> <p>However, you can also apply a general solution. Have a look at the <a href="http://www.complearn.org/ncd.html" rel="noreferrer">Normalized Compression Distance</a> (NCD). Obviously you cannot exactly calculate a text's <a href="http://en.wikipedia.org/wiki/Kolmogorov_complexity" rel="noreferrer">Kolmogorov complexity</a>, but you can approach it be simply compressing the text.</p> <p>A smaller NCD indicates that two texts are more similar. Some compression algorithms will give better results than others. Luckily PHP provides support for <a href="http://us3.php.net/manual/en/refs.compression.php" rel="noreferrer">several</a> compression algorithms, so you can have your NCD-driven plagiarism detection code running in no-time. Below I'll give example code which uses <a href="http://en.wikipedia.org/wiki/Zlib" rel="noreferrer">Zlib</a>:</p> <p>PHP:</p> <pre><code>function ncd($x, $y) { $cx = strlen(gzcompress($x)); $cy = strlen(gzcompress($y)); return (strlen(gzcompress($x . $y)) - min($cx, $cy)) / max($cx, $cy); } print(ncd('this is a test', 'this was a test')); print(ncd('this is a test', 'this text is completely different')); </code></pre> <p>Python:</p> <pre><code>&gt;&gt;&gt; from zlib import compress as c &gt;&gt;&gt; def ncd(x, y): ... cx, cy = len(c(x)), len(c(y)) ... return (len(c(x + y)) - min(cx, cy)) / max(cx, cy) ... &gt;&gt;&gt; ncd('this is a test', 'this was a test') 0.30434782608695654 &gt;&gt;&gt; ncd('this is a test', 'this text is completely different') 0.74358974358974361 </code></pre> <p>Note that for larger texts (read: actual files) the results will be much more pronounced. Give it a try and report your experiences!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload