[tahoe-dev] "BitTorrent for storage" is a bad idea -- [p2p-hackers] P2P file storage systems
Zooko O'Whielacronx
zooko at zooko.com
Sun Feb 20 22:51:30 PST 2011
Folks:
For your information, there is discussion of Tahoe-LAFS on the
p2p-hackers list: [1]. Michael Militzer started a discussion on that
list asking for "P2P file storage systems" which were distributed
across a large number of participants and had security.
I earlier mentioned his initial letter to that list on this list, in
the thread on this list about Tahoe-LAFS being confusing and/or
ill-documented, since he appeared to think it didn't have "attack
resistance" and I wondered what he meant by that. His later posts to
the p2p-hackers list such as the one below indicate that he was
thinking of DoS-type attacks on grids with a large number of servers
which are allowed to join a grid and start acting as a server in some
sort of ad hoc fashion, similar to the way BitTorrent lets random
peers join and start acting as content deliverers.
He's right that Tahoe-LAFS doesn't have good defenses against
DoS-style attacks in the case that you entrust your data to a large
number of random hosts that connect to you over the network and offer
to hold your data for you. It isn't a use-case that Tahoe-LAFS is
designed for. In fact it is a use case that we explicitly rejected! I
suspect it is not worth trying to satisfy that use case. (Based partly
on my experiences when I participated in the Mojo Nation, Mnet, and
Allmydata projects which each attempted to support something like that
and failed.)
It works really well to rely on a large group of ad hoc strangers for
getting a copy of a file which they are also currently interested in
(BitTorrent). The same sort of social organization probably does not
work well for storing files persistently.
Regards,
Zooko
[1] http://lists.zooko.com/pipermail/p2p-hackers/2011-February/002891.html
---------- Forwarded message ----------
From: Michael Militzer <michael at xvid.org>
Date: Sun, Feb 20, 2011 at 7:45 AM
Subject: Re: [p2p-hackers] P2P file storage systems
To: p2p-hackers at lists.zooko.com
Dear Chris,
thanks for your thoughts. I had a look also at your Octavia filesystem
some time ago. While I don't agree that we should really drop erasure
coding, I however like your approach to keep things simple. Also today
indeed bandwidth should be the more precious resource in a P2P system
compared to storage, which is available in abundance to the home user.
So a simple replication strategy might not be so bad after all...
Quoting Chris Palmer <chris at noncombatant.org>:
> Michael Militzer writes:
>
>> Data availability, privacy and also censorship resistance must be
>> verifiable. In addition, a secure storage system must withstand
>> adversarial attacks. A direct consequence of this is that the peer
>> software and protocol must be open-source. A storage system built around a
>> secret protocol and proprietary software cannot be trusted.
>
> I'm not convinced that OSD-approved licensing (which is what I assume you
> mean by "open-source") necessarily or exclusively correlates with
> trustworthiness. Plenty of open source software is untrustworthy, and at
> least some proprietary software is at least as trustworthy as the most
> trustworthy open source software.
Hm, maybe "trustworthiness" is not the right term and "verifiability" is
better. If you have all the necessary source code (no matter if OSI approved
or not) to create the corresponding binary yourself you have all ability to
verify what the application will do at runtime. This mere possibility does
of course not autmatically imply trustworthiness.
> Nor is closed-source software really as "closed" to security scrutiny as
> people believe, and nor is open source as open to security scrutiny as
> people believe.
That's true.
> That said, of course, I want open source software too. :)
>
>> Allmydata/Tahoe:
>>
>> The only true open-source contender I know of. Unfortunately, not really
>> targeted towards a global-scale network of untrusted nodes. Also, no
>> particular measures to withstand adversarial attacks (but is also not
>> needed when deployed in a trusted environment).
>
> I think Tahoe-LAFS has pretty good defenses against a range of attacks on
> confidentiality, integrity, and availability. What do you find insufficient
> about its defense measures?
Well, I have no clue on cryptograhpy and what I know about Tahoe is solely
derived from descriptions and documentation I read. That said, I think data
confidentiality and integrity in Tahoe is sound as it's based on well
understood cryptographic primitives. I am more concerned about availability.
Not when used to set up a small grid (which is the targeted use-case of
Taqhoe) but rather when trying to build up a large-scale network made from
untrusted nodes.
If I understood it right, Tahoe clients simply keep a connection with each
storage node in a storage cluster. Obviously, this doesn't scale. So for
a global, large-scale deployment the peer selection and lookups should be
performed based on nodeid and a DHT. So data availability then ultimately
depends on the robustness of the DHT. If adversarial nodes can compromise
the DHT, data still present on active storage nodes might not be found
anymore by clients and hence become unavailable.
So if the DHT is deployed on untrusted nodes we need to care about things
like admission control, sybil attack, routing and index poisening, eclipse
attack and so on. Any kind of denial of service attack against the DHT
could mean data becoming inaccessible and hence unavailabe in the system
even though the data itself may physically still be present.
I am unaware of any counter-measures against these kinds of attacks in
Tahoe (but there's also no need for them within Tahoe's current use-case).
> You might also want to look at David Mazières' SFS. It was a bit ahead of
> its time, and so is sometimes forgotten. But it deserves a good look, and
> maybe resuscitation.
Thanks for the pointer. I have found some papers on SFS but the former
website seems to be down unfortunately: http://www.fs.net/sfswww/
>> I haven't found a P2P backup solution that has:
>>
>> - Deployability on a global scale with untrusted nodes
>> - Secure, private and persistent data storage
>> - Open-source protocol and software
>> - Censorship-resistance
>> - Resiliency to adversarial attacks
>> - Reasonably simple and manageable design
>
> Tahoe has all but the last item (and maybe that is fixable). SFS has all of
> them, but lacks (as far as I can tell) a maintained, recent implementation.
Interestingly, my thoughts were almost the opposite (for Tahoe). I think
the basic design is still reasonably simple but it is not ready for use in
a large-scale, untrusted network:
- As outlined above, it doesn't seem to scale to thousands or hundreds of
thousand nodes.
- It may need further modification to be safely usable in a network
comprised of untrusted nodes (sybils, DHT robustness against denial of
service attacks, ...)
- To guarantee persistence in a P2P network of untrusted and unreliable
nodes Tahoe's information dispersal strategy needs be adapted. The degree
of redundancy must be increased (n/k) but just as well the number of
erasure coded fragments (k) too for storage efficiency. I don't know if
this is practically doable within Tahoe's current structure (galois-field
based Reed-Solomon coding is slow with large k and n) or what other side
effects this may have (size of the Merkle trees?).
- Further, an automatic repair mechanism is required to retain data
availability in the long term. The client controlled repair strategy Tahoe
currently implements seems insufficient in a network with low availability
of the single nodes.
- Censorship-resistance obviously also depends on availability and data
persistence guarantees. If directed (or undirected) denial of service
attacks are possible on the DHT, the system cannot said to be censorship-
resistant.
And there are other, less-obvious censorship risks too: If a third-party
can force specific node owners (e.g. by court order) to shut down their
storage nodes then certain data can become unavailable in the system. In
Tahoe, data is encrypted and erasure coded before dispersed to different
storage nodes. However, the dispersal is a 1:1 mapping in an information-
theoretic (and legal) sense. Therefore, it will be easy to determine
which storage nodes are responsible to serve parts of the original data.
One may argue that most of us live in societies with rule of law, so that
censorship ordered by independent courts would be ok and no need to feel
sorry about outlawed data. But I see it more from a practical point of view:
If by joining the storage network people risk to be exposed to legal hassle
or punishment due the actions of others, no one (apart from the usual geeks)
will use such service. I think similar risks and fears already hinder the
wide-spread adoption of other P2P systems (-> Freenet, Tor)...
Regards,
Michael
_______________________________________________
p2p-hackers mailing list
p2p-hackers at lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers
More information about the tahoe-dev
mailing list