Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I would avoid an solution like this. I practice it might be close to imposible that two media files have the same size and the same data at coresponding positions for compressed formats. But if you have to deal with uncompressed images or wave files, chances that small local changes are not detected grow .</p> <p>So I think you should realy hash the whole file. While this seems expensive it might not be if you have access to all the files - for example if you build a file server or something like that. You could build the hash incrementaly.</p> <p>If you see a new file with an unique file length, just store the file length. If another file with the same length is added, calculate the hashes of both files block by block until they differ. Store the file length, the hash and how many blocks of the file are included in the hash. Whenever you detect matching file lengths and hashes and you have not yet hashed the whole file, you extend the hash by adding more blocks.</p> <p>Some thoughts about the performance. For small files, the chances of equal file length is quite high - there are not so many diffrent small file lengths. But it is not expensive to hash small files.</p> <p>For larger files the chances of file lenght collisons goes down as there are more and more possible file lengths. For diffrent media files the chances are very good that they differ directly beyond the header so you will need to hash only a short part of the begining of the file.</p> <p>Finally you will be sure to detect diffrent files (except for hash collisions) because you will hash the whole file if required.</p> <p><strong>UPDATE</strong></p> <p>For movies I would consider the file length practical unique, but files recoded to fit on a given medium probably render this idea void - (S)VCD movies will all be in a small range of file lenghs of about CD-ROM capacity.</p> <p>But for movie files in general, I would just hash one block (maybe 512 Byte) from the middle of the file. Two different movies with the same image and sound at the same position? Practicaly imposible besides you manipulate files to fail this test. But you could easily generate files to fail all deterministic sampling strategies - so this should not really matter.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload