id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
207	unit tests for failure modes of small mutable files	zooko	zooko	"This ticket is the successor to #197.  Here are all the pieces of the Small Distributed Mutable Files work that we need to finish:
 * recovery in case a colliding write is detected -- clients should probably take steps to minimize the chance that the colliding write results in all versions of the files being lost permanently.  We've designed and written down a recovery mechanism (in [source:docs/mutable.txt docs/mutable.txt] that seems Good Enough.  There is a place for it to be implemented in allmydata.mutable.Publish._maybe_recover .
 * the client should create a new-style dirnode upon first boot instead of an old-style one
 * the old dirnode code should be removed, along with the vdrive client-side code and the vdrive server (and the vdrive.furl config file)
 * dirnode2.py should replace dirnode.py
 * URIs for the new mutable filenodes and dirnodes are a bit goofy-looking. See #102.
 * rollback-attacks: we chose a policy of ""first retrieveable version wins"" on download, but for small grids and large expansion factors (i.e. small values of k) this makes it awfully easy for a single out-of-date server to effectively perform a rollback attack against you. I think we should define some parameter epsilon and use the highest-seqnum'ed retrieveable version from k+epsilon servers.
 * analyze control flow to count the round trips. I was hoping we could get an update done in just one RTT but at the moment it's more like 3 or 4. It's much more challenging than I originally thought.
 * try to always push a share (perhaps an extra N+1'th share) to ourselves, so we'll have the private key around. It would be sad to have a directory that had contents that were unrecoverable but which we could no longer modify because we couldn't get the privkey anymore.
 * choose one likely server (specifically ourselves) during publish to use to fetch our encprivkey. This means doing an extra readv (or perhaps just an extra-large readv) for that one server in _query_peers: the rest can use pretty small reads, like 1000 bytes. This ought to save us a round-trip.
 * error-handling. peers throwing random remote exceptions should not cause our publish to fail unless it's for NotEnoughPeersError?.
 * the notion of ""container size"" in the mutable-slot storage API is pretty fuzzy. One idea was to allow read vectors to refer to the end of the segment (like python string slices using negative index values), for which we'd need a well-defined container size. I'm not sure this is actually useful for anything, though. (maybe grabbing the encrypted privkey, since it's always at the end?). Probably not useful until MDMF where you'd want to grab the encprivkey without having to grab the whole share too.
 * tests, tests, tests. There are LOTS of corner cases that I want coverage on. The easy ones are what download does in the face of out-of-date servers. The hard ones are what upload does in the face of simultaneous writers.
 * Publish peer selection: rebalance shares on each publish, by noticing when there are multiple shares on a single peer and also unused peers in the permuted list. The idea is that shares created on a small grid should automatically spread out when updated after the grid has grown.
 * RSA key generation takes an unfortunately long time (between 0.8 and 3.2 seconds in my casual tests). This will make a RW deepcopy of a large directory structure pretty slow. We should do some benchmarking of this thing to determine key size / speed tradeoffs, and maybe at some point consider ECC if it could be faster.
 * code terminology: share vs slot vs container, ""SSK"" vs mutable file vs slot. We need to nail down the meanings of some of these and clean up the code to match.  Zooko thinks that the name ""SSK"" -- ""Sub-Space Key"" -- is not directly applicable to our mutable file URIs."	enhancement	closed	major	1.1.0	code-encoding	0.7.0	fixed	mutable