Opened at 2013-12-01T19:48:53Z
Last modified at 2013-12-12T18:18:36Z
#2124 new enhancement
Add [storage].max_shares configuration option — at Version 2
Reported by: | amontero | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | code-storage | Version: | 1.10.0 |
Keywords: | sneakernet space-efficiency | Cc: | |
Launchpad Bug: |
Description (last modified by zooko)
This feature is a proposed solution to #2107 and #2123.
I discussed some weeks ago a possible feature with Daira and Warner to help me achieve #2107. The solutions looks like easy enough to make for a newbie-coder-task like me. This is a constraint of this approach, but also looks to have its merit also because of its simplicity, IMO.
The solution would entail adding a "max_shares=n" setting in the "storage" config section of storage nodes (proposed names were checked with Daira and Warner). A node having this value set, for any given file, would gently refuse to accept shares in excess of "n" if its already stored shares already are "n". It is easily noted that this implies adding server-side storage configuration instead of adding anything in the share placement logic done at client side, AFAIK. Servers not having this value set, would continue to accept as many shares as they did prevously.
The relevant code I learnt it's doing this is at this loop: https://github.com/tahoe-lafs/tahoe-lafs/blob/master/src/allmydata/storage/server.py#L284 Adding here a bit of logic that checks if we already have at least "n" shares, excess shares could be simply dropped. The uploading nodes will have to find another server to place that share wherever they see fit, no changes in upload logic.
Also, as further documentation, here are some excerpts (edited for brevity) of the chat with Daira and Warner with my learnings and misundertandings about implementation details that might be useful for others:
warner: the protocol carries a request to hold a set of shares, then the server responds with two sets: the first is the set of shares it will accept, the second is the set of shares it already has at least for immutable shares
warner: for mutable shares, the client just says "write to these shares", and I think the server gets to respond with "ok" or an error (I'm not sure there's a good way to signal that it accepted some shares but not others)
amontero: it's this done here? https://github.com/tahoe-lafs/tahoe-lafs/blob/master/src/allmydata/storage/server.py#L284 looks like to me even the server throws the data away (bw.throw_out_all_data = True)
warner: I'm not sure the server is in a great position to know the best policy, but it seems like a feasible thing to add. the "first set" I described is returned as a dictionary of "bucket writers" so you could probably just change line 284 to say like "for shnum in sharenums[:LIMIT]:"
amontero: I will have to go to the Py books to look for that [:LIMIT]: thingie
warner: maybe "for shnum in list(sharenums)[:LIMIT]:" if sharenums is passed as a set instead of a list. the last colon ends the loop condition. the [:foo] syntax is how you get the first "foo" elements of a list
amontero: I was thinking in some (surely more inefficient) "if" to "pass" or set throw_out_all_data=True
warner: yeah, you probably don't want .throw_out_all_data: that will pretend to accept the share, but then silently discard the bytes. (we use that for unit tests)
warner: you want the server to explicitly reject the share, so the client puts it elsewhere. anything that doesn't add an entry to "bucketwriters" will reject the share. like "elif len(bucketwriters) > LIMIT: pass". with a comment saying "don't accept too many shares".
amontero: I have now to figure how and where in the code get the "ConfigEntry=num" from file
warner: yeah, that part will probably take 90% of the code. it should be passed as an argument into the StorageServer constructor and should be read from the config file in src/allmydata/client.py in the init_storage() method
amontero: "client.py" filename makes me think it's the wrong place to read it, I was thinking in something inside storage/source files
warner: well, client.py is slightly misnamed. client.py is where we set up all clients and storage servers. it doesn't include the weird nodes (key-generator, stats-gatherer, etc) basically because it's so common to run a node that has both client stuff and storage-server stuff. we lumped them together (at least the high-level setup code) in a single file we may fix that some day by moving the init_storage() method out of client.py and into something else but init_storage() is definitely the right place to read storage-related config settings, and init_storage() currently lives in client.py
BTW, my newbie feature branch attempt is at https://github.com/pataquets/tahoe-lafs/compare/1.10.0-storage-server--max-shares (no config, hardcoded to 1) but it is a proof/WIP, it's pending more manual testing.
Change History (2)
comment:1 Changed at 2013-12-01T19:57:04Z by amontero
- Description modified (diff)
comment:2 Changed at 2013-12-01T20:32:58Z by zooko
- Description modified (diff)