10 years ago
2
How Dropbox Knows When You’re Sharing Copyrighted Stuff (Without Actually Looking At Your Stuff)
Late last night, a tweet was spread far and wide showing that a DMCA notice had blocked a file from being shared on a Dropbox user's account. What was..
Continue Reading http://techcrunch.com
Join the Discussion
The trick with using something like MD5 for file hashing is that you will (eventually) run into the Birthday Problem. Imagine a photo of a duck. Create a hash of that photo. Compare the hash against hashes against every other photo on the internet. Eventually, if you look at enough hashes you will find a photo of a fire hydrant which has the exact same hash. According to the algorithm, they are the same file (even though they are not). This is know is a Hash Collision.
In practice, it don't happen all that often. But, there can always be a case that one file can be mistaken for another one.
Other hashing schemes (e.g. SHA-1, SHA-256 or SHA-512) increase the amount of bits which are compared, so have a much smaller mathematical chance of a collision. MD5 is by far the most popular hashing method, even though it's just a little bit broken.
Hashing methods are exact. If the uploaded file is even slightly different, it will not match. If you upload files which you feel might be a little bit questionable, then I'd strongly recommend that you either encrypt them, or use a utility like 7-zip to zip them up. That will cause the hashes to be wildly different from what is being looked for.
This is interesting, I didn't know that... Thanks for the explanation.