[tahoe-dev] AllMyData Architecture
Norman Hardy
norm at cap-lore.com
Thu Nov 1 23:53:20 PDT 2007
I have picked up a great deal on Friday mornings about your
architecture but there are still holes in my understanding.
I don't see a top level architecture at your web site.
I don't know at which audience the page at (http://
tahoebs1.allmydata.com:8011/) is aimed.
Perhaps it is meant to be a part of a presentation.
For instance there is no file named (/data1/tahoe/client1/start.html)
on my computer; Perhaps I came in late but I can not find the context
that would explain that file name.
Starting from the page at (http://allmydata.org/trac/tahoe) I am
invited to install the software.
I have from this page no inkling what the software will do for me, or
to me; not even a bald claim.
I trust you guys but I don't even understand what I am trusting you
to do for me.
I understand some of the top level goals of the software (only
because of Friday mornings) but not how I can ask the software to do
those things for me.
I have been confused for sometime about which parts of the software
there are and where they run.
Comments on installing instructions.
I am not a python wizard and the term "Python Package Index" leaves
me cold.
Perhaps a link would help.
Perhaps this mode is only for Python practitioners.
In the "Running-In-Place" scheme I presume that the code that I avoid
loading runs on some other machine instead thereby providing a service.
There are trust issues here. If I build and run code on my machine
then I know can, in principle, read and understand it.
Otherwise I trust a service.
This may be OK but I now have no way yet of knowing what I am
trusting it to do.
I know enough about your architecture to know that it provides
unprecedented security properties.
If you are trying to gain adherents as a result of these properties
then there must be ways to understand how these properties arise.
Reading the code is necessary for those who don't trust you.
But even in that extreme case it is necessary to understand the
claims and invariants upon which these unique security properties rest,
I see not even bald claims for these properties.
I gather that it works something along the following lines.
There is a body of code that runs on the data owner's machine. (I
don't know what you call it; I will call it the 'adapter' here.)
The adapter presents some sort of file system interface to whoever
addresses it as a file system.
The adapter requires access to Internet to access a set of other
machines whose nature is mostly abstracted in this note.
Files written to this file system occupy space on the other machines.
The written information is encrypted so that only the size and time
of the written data is available outside your computer; indeed it is
known only to the adapter and the writer, and, of course, their
respective TCBs.
As a file system the adapter also supports reading files therein.
This activity is also visible to the other machines.
The data is represented on the other machines redundantly so that the
data remains available even when some of the other machines are not
available.
With these bald claims I can already begin to reason about security
maters even without understanding or buying into capability discipline.
This could perhaps motivate a class of hackers to look closer. Here
are some further high level, yet precise claims.
There is a software interface to the adapter whereby a program that
can read a file from the file system can instead acquire a token from
the adapter for that file.
That token (What I suppose you call the URI) is about 100 characters
long and is pure data.
If another computer somewhere, attached to the Internet, running an
adapter with access to the same(?) set of 'other machines' acquires
this token, it can be delivered to that adapter so as to create a
virtual hard-link (Unix speak) to the original file.
Alternatively there is available a token from the first adapter for
the file that affords only read access.
This token is cryptographically secure, as is the read-only restriction.
Tokens for directories come in three flavors, (1) read-write which
allows the token holder to add and delete entries from the directory,
(add only is another obvious candidate) (2) read only which does not
allow modification to the directory, and (3) transitive read which
allows only read-only access to files therein, and transitive read
access to directories therein. (This text is wobbly.)
Now some of the capability discipline leaks into the description;
some will recognize it and others will not.
Many more and perhaps most of the current security properties are now
made clear, at least as claims.
All without reading any code, if they trust you guys.
Another claim: secrecy and integrity of your data depends on only the
logic of your adapter; it does not depend on the logic of a possibly
modified adapter in the machines to which you send tokens.
Availability depends on the logic of code running in the other
computers and the availability of those other computers.
I have ignored here those important properties relating to data healing.
There is now a logical frame work to begin understanding the more
detailed mechanisms inside that make the software feasible and the
claims thereby plausible.
I have been sloppy but it is possible to make precise claims with
little or no more bulk.
I think there should be such prominent claims near the front door for
the hackers and security professionals.
If your lawyers are queasy, then make these as aspirations, not
claims, and invite the customer to audit your architecture for
shortcomings.
I am excited about your technology. Good luck.
More information about the tahoe-dev
mailing list