[tahoe-dev] Global deduplication of encrypted files

sickness sickness at tiscali.it
Thu May 5 14:48:54 PDT 2011


On Thu, May 05, 2011 at 02:21:43PM -0700, Kenny Taylor wrote:
> Regarding encrypted file stores and deduplication, SpiderOak published a 
> good article on file-level deduplication of encrypted files:
> 
> https://spideroak.com/blog/20100827150530-why-spideroak-doesnt-de-duplicate-
> data-across-users-and-why-it-should-worry-you-if-we-did
> 
> Wuala seems to use the method SpiderOak cautions against.  When a user 
> tries to upload a file, the client app encrypts it, hashes it, and asks the 
> network if an encrypted file already exists with the same hash.  If so, the 
> existing file is linked into the user's account (no upload needed!).  It's 
> a neat concept, but it has one big disadvantage:  the network can see each 
> user who is sharing a file with a given hash.
> 
> So global file-level deduplication = bad.  Not necessarily true for 
> block-level dedup.  Let's say we break a file into 8kb chunks, encrypt each 
> chunk to the user's private key, then push those chunks to the network.  
> The same file uploaded by different users would produce completely 
> different block sets.  Maybe each storage node maintains a hash table of 
> the blocks it's storing.  So when the client node pushes out a block, it 
> queries the known storage nodes to see if someone is already holding a 
> block with that hash.  The block size might need to be <= 4kB for that to 
> be effective.
> 
> I realize that's a big departure from the existing tahoe architecture.  
> Food for thought :)
> 

> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev

reminds me of this:
http://en.wikipedia.org/wiki/OFFSystem



More information about the tahoe-dev mailing list