[tahoe-dev] cleversafe says: 3 Reasons Why Encryption isOverrated

Sun Jul 26 10:02:13 PDT 2009

james hughes wrote:

> This was also posted to cryptography at metzdowd.com
>
> On Jul 24, 2009, at 9:33 PM, Zooko Wilcox-O'Hearn wrote:
>
>> [cross-posted to tahoe-dev at allmydata.org and  
>> cryptography at metzdowd.com]
>>
>> Disclosure:  Cleversafe is to some degree a competitor of my Tahoe- 
>> LAFS project.
> ...
>> I am tempted to ignore this idea that they are pushing about  
>> encryption being overrated, because they are wrong and it is  
>> embarassing.
>
> ....and probably patent pending regardless of there being significant  
> amounts of prior art for their work. One reference is "POTSHARDS:  
> Secure Long-Term Storage Without Encryption" by Storer, Greenan, and  
> Miller at http://www.ssrc.ucsc.edu/Papers/storer-usenix07.pdf. Maybe  
> they did include this in their application. I certainly do not know.  
> They seem to have one patent
> 	http://tinyurl.com/njq8yo
> and 7 pending.
> 	http://tinyurl.com/ntpsj9

Thanks for the link to POTSHARDS, it was interesting.  However, POTSHARDS is doing something different, it uses the Shamir Secret Sharing Scheme.  Therefore if you had a k of n system where k = 10, and n = 16 (a common configuration we use) then to store 1 GB of data using POTSHARDS will require N*(1 GB) or 16 GB of data to store.  Cleversafe using dispersal and the All-or-Nothing Transform would only require 1.6 GB to store the same data.  Another difference is that POTSHARDS uses conventional RAID to achieve reliability as opposed to more flexible arbitrary K of N erasure codes.

POTSHARDS is what one would get if they had the same design goals as Cleversafe, but were restricted to using well known convention techniques.  Cleversafe's advantage is increased reliability and efficiency compared to what was achieved by POTSHARDS.

>
> ...
>> But I've decided not to ignore it, because people who publicly  
>> spread this kind of misinformation need to be publicly contradicted,  
>> lest they confuse others.
> ...
>
> The trick is cute, but I argue largely irrelevant. Follows is a  
> response to this web page that can probably be broadened to be a  
> criticism of any system that claims security and also claims that key  
> management of some sort is not a necessary evil.

That key management is not a necessary evil was the same conclusion for the author of the POTSHARDS paper you cited.

>
>> http://dev.cleversafe.org/weblog/?p=111 # Response Part 2:  
>> Complexities of Key Management
>
> I agree with many of your points. I would like to make a few of my own.
>
> 1) If you are already paying the large penalty to Reed Solomon the  
> encrypted data, the cost of your secret sharing scheme is a small  
> additional cost to bear, agreed. Using the hash to "prove" you have  
> all the pieces is cute and does turn Reed Solomon into an AONT. I will  
> argue that if you were to do a Blakley key split of a random key, and  
> append each portion to each portion of the encrypted file you would  
> get similar performance results. I will give you that your scheme is  
> simpler to describe.

Actually what you describe was exactly the system we had considered (using SSSS to split a random key and append to each slice) before learning of the AONT.  The difficulty with that approach, however, was that it did not fit well architecturally into our software, as it required special handing of the data both before and after being processed by the IDA.  The approach we adopted is simpler in that only requires pre-processing.

>
> 2) In my opinion, key management is more about process than  
> cryptography. The whole premise of Shamir and Blakley is that each  
> share is independently managed. In your case, they are not. All of the  
> pieces are managed by the same people, possibly in the same data  
> center, etc. Because of this, some could argue that the encryption has  
> little value, not because it is bad crypto, but because the shares are  
> not independently controlled. I agree that if someone steals one  
> piece, they have nothing. They will argue, that if someone can steal  
> one piece, it is feasible to steal the rest.

Right it all comes down to process.  Ideally the sites where slices are stored should be geographically dispersed, not only for increased resistance to attack but for greater failure independence (higher reliability).  Nothing about this protocol requires systems at each site be managed by the same individuals or even the same organization.  Therefore the process can be adjusted to achieve independence from corrupt insiders.  I mention this idea in the third response which deals with disclosure.

>
> 3) Unless broken drives are degaussed, if they are discarded, they can  
> be considered lost. Because of this, there will be drive loss all the  
> time (3% per year according to several papers). As long as all N  
> pieces are not on the same media, you can actually lose the media, no  
> problem. This can be expanded to a loss of a server, raid controllers,  
> NAS box, etc. without problem as long as there is only N-1 pieces, no  
> problem. What if you lose N chunks (drives, systems, etc.) over time,  
> are you sure you have not lost control of someone's data? 

That is an interesting question.  The comfort provided by threshold schemes might make people careless when discarding drives.  Again in the post on disclosure I note that drives should still be zeroed before being discarded, to prevent just such an attack.

> Have you  
> tracked what was on each and every lost drive? What is your process  
> when you do a technology refresh and retire a complete configuration?  
> If media destruction is still necessary, will resulting operational  
> process really any easier or safer than if the data were just split?

Refreshes should be done whenever it is determined that there has been a compromise of one or more slices.  One may even adopt a continuous refresh schedule if they have the spare resources, we haven't defined our recommendations on this yet.  While zeroing the data or destroying the media is still recommended, geographic dispersion improves the process in cases where media destruction was not used.  These are interesting questions to consider.

>
> 4) What do you do if you believe your system has been compromised by a  
> hacker? Could they have read N pieces? Could they have erased the logs?

If you believe a system has been compromised, the best course of action would be to collect any logs possible to do an investigation and then re-image the machine entirely, as there is no telling what backdoor or root-kit the attacker might have left behind.  In fact one may never detect the compromise, and this makes a stronger case for using scheduled data refreshes.  Any one system, or site when using geo-dispersion, should only contain slices of the same "slice index" (if we number the slices produced by the IDA 1 - N, the number is the slice index).  Therefore the compromise of any single system, or site, will never yield K peer slices, maintaining the confidentiality of the data.  Using a periodic refresh, therefore, would force an attacker to compromise K different systems at K different sites within T time.

>
> 5) I also suggest that there is other prior art out there for this  
> kind of storage system. I suggest the paper "POTSHARDS: Secure Long- 
> Term Storage Without Encryption" by Storer, Greenan, and Miller at http://www.ssrc.ucsc.edu/Papers/storer-usenix07.pdf 
>   because it covers the same space, and has a good set of references  
> to other systems.

Thanks for this reference, we are planning to submit a paper to the usenix conference on this topic so this reference is greatly appreciated.  By my comments above, do you agree that dispersal+aont is a new technique with new useful benefits?

>
> My final comment is that you raised the bar, yes. I will argue that  
> you did not make the case that key management is not needed. Secrets  
> are still needed to get past the residual problems described in these  
> comments. Keys are small secrets that can be secured at lower cost  
> that securing the entire bulk of the data. Your system requires the  
> bulk of the data to to be protected, and thus in the long run, does  
> not offer operational efficiency that simple bulk encryption with a  
> traditional key management provides.
>
> Jim
>
>

Thanks Jim.  What do you think about the combination of a convention encryption system with dispersal+aont?  Would not the combination be perhaps the most confidential storage system created to date?  I appreciate your insightful comments and look forward to your reply.

Regards,

Jason

> ---------------------------------------------------------------------
> The Cryptography Mailing List
> Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com
>
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at allmydata.org
> http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev