Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>It's not really possible to answer this question with the right "but" without also explaining why it's not a problem. It's not possible to do that without really having a good grip on what a hash really is. It's more complicated than the simple cases you might have been exposed to in a CS program.</p> <p>There is a basic misunderstanding of information theory here. If you reduce a large amount of information into a smaller amount by discarding some amount (ie. a hash) there will be a chance of collision directly related to the length of the data. The shorter the data, the LESS likely it will be. Now, the vast majority of the collisions will be gibberish, making them that much more likely to actually happen (you would never check in gibberish...even a binary image is somewhat structured). In the end, the chances are remote. To answer your question, yes, git will treat them as the same, changing the hash algorithm won't help, it'll take a "second check" of some sort, but ultimately, you would need as much "additional check" data as the length of the data to be 100% sure...keep in mind you would be 99.99999....to a really long number of digits.... sure with a simple check like you describe. SHA-x are cryptographically strong hashes, which means is't generally hard to intentionally create two source data sets that are both VERY SIMILAR to each other, and have the same hash. One bit of change in the data should create more than one (preferably as many as possible) bits of change in the hash output, which also means it's very difficult (but not quite impossible) to work back from the hash to the complete set of collisions, and thereby pull out the original message from that set of collisions - all but a few will be gibberish, and of the ones that aren't there's still a huge number to sift through if the message length is any significant length. The downside of a crypto hash is that they are slow to compute...in general.</p> <p>So, what's it all mean then for Git? Not much. The hashes get done so rarely (relative to everything else) that their computational penalty is low overall to operations. The chances of hitting a pair of collisions is so low, it's not a realistic chance to occur and not be detected immediately (ie. your code would most likely suddenly stop building), allowing the user to fix the problem (back up a revision, and make the change again, and you'll almost certainly get a different hash because of the time change, which also feeds the hash in git). There is more likely for it to be a real problem for you if you're storing arbitrary binaries in git, which isn't really what it's primary use model is. If you want to do that...you're probably better off using a traditional database.</p> <p>It's not wrong to think about this - it's a good question that a lot of people just pass off as "so unlikely it's not worth thinking about" - but it's really a little more complicated than that. If it DOES happen, it should be very readily detectible, it won't be a silent corruption in a normal workflow.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload