Note that there are some explanatory texts on larger screens.

plurals
  1. POHow does the rsync algorithm correctly identify repeating blocks?
    primarykey
    data
    text
    <p>I'm on a personal quest to learn how the rsync algorithm works. After some reading and thinking, I've come up with a situation where I think the algorithm fails. I'm trying to figure out how this is resolved in an actual implementation.</p> <p>Consider this example, where A is the receiver and B is the sender.</p> <pre><code>A = abcde1234512345fghij B = abcde12345fghij </code></pre> <p>As you can see, the only change is that <code>12345</code> has been removed.</p> <p>Now, to make this example interesting, let's choose a block size of 5 bytes (chars). Hashing the values on the sender's side using the weak checksum gives the following values list.</p> <pre><code>abcde|12345|fghij abcde -&gt; 495 12345 -&gt; 255 fghij -&gt; 520 values = [495, 255, 520] </code></pre> <p>Next we check to see if any hash values differ in A. If there's a matching block we can skip to the end of that block for the next check. If there's a non-matching block then we've found a difference. I'll step through this process.</p> <ol> <li>Hash the first block. Does this hash exist in the values list? <code>abcde -&gt; 495</code> (yes, so skip)</li> <li>Hash the second block. Does this hash exist in the values list? <code>12345 -&gt; 255</code> (yes, so skip)</li> <li>Hash the third block. Does this hash exist in the values list? <code>12345 -&gt; 255</code> (yes, so skip)</li> <li>Hash the fourth block. Does this hash exist in the values list? <code>fghij -&gt; 520</code> (yes, so skip)</li> <li>No more data, we're done.</li> </ol> <p>Since every hash was found in the values list, we conclude that A and B are the same. Which, in my humble opinion, isn't true.</p> <p>It seems to me this will happen whenever there is more than one block that share the same hash. I know I have skipped the step of calculating and checking the strong hash, but that won't make a difference since the second and third blocks are exactly the same</p> <p>What am I missing?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload