[tahoe-dev] cleversafe says: 3 Reasons Why Encryption is Overrated

Fri Jul 31 13:36:53 PDT 2009

Zooko Wilcox-O'Hearn wrote:
> 
> Cleversafe has posted a series of blog entries entitled "3 Reasons  
> Why Encryption is Overrated".

The AONT is a neat bit of crypto, but it seems to me that it's merely
moving the problem elsewhere, not avoiding it completely. From what I
can see of the Cleversafe architecture (i.e. the picture on
http://dev.cleversafe.org/weblog/?p=111), they've got a system in which
someone who can retrieve "k" shares can reconstruct the plaintext, so
all of the policy around who-gets-to-read-what must be enforced by the
servers. They express this policy when they decide whether or not to
honor a request for one of those shares. The security of your file
depends upon enough (N-k+1) of them deciding to say "no" to the "wrong"
people; the availability of your file depends upon at least k of them
saying "yes" to the "right" people. The means by which those servers
make that decision is not described in the picture, but is a vital part
of the access-control scheme (and probably depends upon the same sort of
public-key encryption that the blog entry disparages).

I'd want to know more about how this undocumented access-control scheme
works before I was willing to believe that the Cleversafe approach
results in "fewer headaches". It is clear that the access-control policy
can be changed without re-encrypting the original data, but depending
upon how access is granted, it may be an even bigger hassle to change
the access keys. I suspect there is some notion of a "user", represented
by a private key, and that each server has a typical (user * file) ACL
matrix, and that some superuser retains the ability to manipulate that
matrix. The object-capability community has shown
(http://waterken.sourceforge.net/aclsdont/) that this approach breaks
down when trying to make sharing decisions among multiple users, and
quickly becomes vulnerable to things like the Confused Deputy attack.

Another possibility is that there is some master node which makes the
per-user access-control decisions, and all servers honor its requests.
In this case the security of the files depends upon both the servers
being able to distinguish the master's requests from an attacker's, and
upon the master being able to distinguish one user's requests from
another's. The issues with ACLs still apply, and both availability and
security depend upon the master (which is now a single point of
failure).

In Tahoe, we made a deep design decision very early: the security of the
system should not depend upon the behavior of the servers. We made that
choice so that users could take advantage of storage space on a wider
variety of servers, including ones that you don't trust to not peek
inside your shares. Tahoe shares (and the ciphertext of every file) are
considered to be public knowledge. This fact improves system-wide
reliability by enabling the use of additional tools without affecting
any security properties:

 * servers can independently copy shares to new servers when they know
   they're going to leave the grid

 * pairs of servers can be "buddies" for a single share (and challenge
   each other with integrity checks at random intervals)

 * arbitrary parties can perform repair functions and generate new
   shares without seeing the plaintext

 * "helpers" can safely perform the erasure coding/decoding for you, to
   offload bandwidth and CPU effort to a more suitable machine

 * third-party "relay servers" can transport and hold shares (e.g. for
   the benefit of servers stuck behind NAT boxes)

 * users can upload shares to anyone that seems useful: the only
   consequence is the bandwidth cost

The actual security properties we get in Tahoe are:

 * a small number of misbehaving servers can do absolutely nothing to
   hurt you

 * a larger number of misbehaving servers can cause lazy readers (those
   who do not do a thorough check) to get old versions of mutable files,
   known as a rollback attack

 * a very large number of misbehaving servers can cause unavailability
   of files, and rollback attacks even against thorough readers

where "very large" means N-k or more, and "small" means less than about
2*k (this depends upon the reader: they make a tradeoff between
bandwidth expended and vulnerability to rollback). Also, a reader who is
somehow able to remember the latest sequence number will never be
vulnerable to the rollback attack. And of course the rollback attack is
not even applicable to immutable files, which have only a single
"version".

Note that confidentiality and integrity (the lack of undetected
bitflips) are guaranteed even if every single server is colluding
against you, assuming the crypto primitives remain unbroken. We decided
that this was the most important property to achieve. Anything less
means you're vulnerable to server behavior, either that of a single
server or of a colluding group. The Sybil attack demonstrates that there
is no way to prove that your servers are independent, so it is hard to
have confidence in any property that depends upon non-collusion.

(Also, if all of your servers are managed by the same company, then by
definition they're colluding. We wanted to give allmydata.com users the
ability to not share their file's plaintext with allmydata.com)

By treating the ciphertext of any given file to be public knowledge, we
concentrate all of the security of the system into the encryption key.
This drastically simplifies the access-control policy, which can be
stated in one sentence: you can read a file's contents if and only if
you know or can obtain its readcap. There is no superuser who gets to
make access-control decisions, no ACL matrices to be updated or
consulted, no deputies to be confused, and sharing files is as easy as
sharing the readcap.

Finally, in all of this, it's important to be clear about the
differences between mutable and immutable files, and the differences
between symmetric and asymmetric encryption. Much of the Cleversafe blog
posts talk about "encryption" being bad and the AONT being the good
alternative, but of course the AONT that they depend upon is based on
symmetric AES-256. I think they're really trying to say that asymmetric
encryption (RSA/ECDSA) is threatened by quantum computing, and that
per-file keys are either weak or hard to manage. Any notion of mutable
versus immutable files must be expressed in the (undocumented) server
access control mechanisms.

In Tahoe, our immutable files use AES and SHA256d, and if you have a
secure place to store the filecaps, you don't need anything else. (this
layer of Tahoe can be construed as a secure DHT with filecaps as keys
and file plaintext as values). As the saying goes, cryptography is a
means of turning a large security problem into a small security problem:
you still need to keep the filecaps safe, but they're a lot smaller than
the original files.

If you want to keep track of fewer things, you can store those filecaps
in mutable directories, which are based upon mutable files that use AES,
SHA256d, and RSA2048. The use of an asymmetric algorithm like RSA makes
them vulnerable to more sorts of attacks, but their mutability makes
them far more flexible, allowing you to keep track of an arbitrary
number of changing files with a single 256ish-bit cryptovalue.

cheers,
 -Brian