

10 years ago
2
How Dropbox Knows When You’re Sharing Copyrighted Stuff (Without Actually Looking At Your Stuff)
Late last night, a tweet was spread far and wide showing that a DMCA notice had blocked a file from being shared on a Dropbox user's account. What was..
Continue Reading
Join the Discussion
The trick with using something like MD5 for file hashing is that you will (eventually) run into the Birthday Problem. Imagine a photo of a duck. Create a hash of that photo. Compare the hash against hashes against every other photo on the internet. Eventually, if you look at enough hashes you will find a photo of a fire hydrant which has the exact same hash. According to the algorithm, they are the same file (even though they are not). This is know is a Hash Collision.
In practice, it don't happen all that often. But, there can always be a case that one file can be mistaken for another one.
Other hashing schemes (e.g. SHA-1, SHA-256 or SHA-512) increase the amount of bits which are compared, so have a much smaller mathematical chance of a collision. MD5 is by far the most popular hashing method, even though it's just a little bit broken.
Hashing methods are exact. If the uploaded file is even slightly different, it will not match. If you upload files which you feel might be a little bit questionable, then I'd strongly recommend that you either encrypt them, or use a utility like 7-zip to zip them up. That will cause the hashes to be wildly different from what is being looked for.
This is interesting, I didn't know that... Thanks for the explanation.