[tahoe-dev] Global deduplication of encrypted files
Kenny Taylor
kenny at corvettekenny.com
Thu May 5 14:21:43 PDT 2011
Regarding encrypted file stores and deduplication, SpiderOak published a
good article on file-level deduplication of encrypted files:
https://spideroak.com/blog/20100827150530-why-spideroak-doesnt-de-duplicate-
data-across-users-and-why-it-should-worry-you-if-we-did
Wuala seems to use the method SpiderOak cautions against. When a user
tries to upload a file, the client app encrypts it, hashes it, and asks the
network if an encrypted file already exists with the same hash. If so, the
existing file is linked into the user's account (no upload needed!). It's
a neat concept, but it has one big disadvantage: the network can see each
user who is sharing a file with a given hash.
So global file-level deduplication = bad. Not necessarily true for
block-level dedup. Let's say we break a file into 8kb chunks, encrypt each
chunk to the user's private key, then push those chunks to the network.
The same file uploaded by different users would produce completely
different block sets. Maybe each storage node maintains a hash table of
the blocks it's storing. So when the client node pushes out a block, it
queries the known storage nodes to see if someone is already holding a
block with that hash. The block size might need to be <= 4kB for that to
be effective.
I realize that's a big departure from the existing tahoe architecture.
Food for thought :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20110505/b7582e70/attachment.html>
More information about the tahoe-dev
mailing list