+19 19 0
Published 10 years ago by patel with 2 Comments

Join the Discussion

  • Auto Tier
  • All
  • 1
  • 2
  • 3
Post Comment
  • idlethreat (edited 10 years ago)
    +3

    The trick with using something like MD5 for file hashing is that you will (eventually) run into the Birthday Problem. Imagine a photo of a duck. Create a hash of that photo. Compare the hash against hashes against every other photo on the internet. Eventually, if you look at enough hashes you will find a photo of a fire hydrant which has the exact same hash. According to the algorithm, they are the same file (even though they are not). This is know is a Hash Collision.

    In practice, it don't happen all that often. But, there can always be a case that one file can be mistaken for another one.

    Other hashing schemes (e.g. SHA-1, SHA-256 or SHA-512) increase the amount of bits which are compared, so have a much smaller mathematical chance of a collision. MD5 is by far the most popular hashing method, even though it's just a little bit broken.

    Hashing methods are exact. If the uploaded file is even slightly different, it will not match. If you upload files which you feel might be a little bit questionable, then I'd strongly recommend that you either encrypt them, or use a utility like 7-zip to zip them up. That will cause the hashes to be wildly different from what is being looked for.

    • drunkenninja
      +2

      This is interesting, I didn't know that... Thanks for the explanation.

Here are some other snaps you may like...