[tahoe-dev] Global deduplication of encrypted files
sickness
sickness at tiscali.it
Thu May 5 14:48:54 PDT 2011
On Thu, May 05, 2011 at 02:21:43PM -0700, Kenny Taylor wrote:
> Regarding encrypted file stores and deduplication, SpiderOak published a
> good article on file-level deduplication of encrypted files:
>
> https://spideroak.com/blog/20100827150530-why-spideroak-doesnt-de-duplicate-
> data-across-users-and-why-it-should-worry-you-if-we-did
>
> Wuala seems to use the method SpiderOak cautions against. When a user
> tries to upload a file, the client app encrypts it, hashes it, and asks the
> network if an encrypted file already exists with the same hash. If so, the
> existing file is linked into the user's account (no upload needed!). It's
> a neat concept, but it has one big disadvantage: the network can see each
> user who is sharing a file with a given hash.
>
> So global file-level deduplication = bad. Not necessarily true for
> block-level dedup. Let's say we break a file into 8kb chunks, encrypt each
> chunk to the user's private key, then push those chunks to the network.
> The same file uploaded by different users would produce completely
> different block sets. Maybe each storage node maintains a hash table of
> the blocks it's storing. So when the client node pushes out a block, it
> queries the known storage nodes to see if someone is already holding a
> block with that hash. The block size might need to be <= 4kB for that to
> be effective.
>
> I realize that's a big departure from the existing tahoe architecture.
> Food for thought :)
>
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
reminds me of this:
http://en.wikipedia.org/wiki/OFFSystem
More information about the tahoe-dev
mailing list