[tahoe-lafs-trac-stream] [tahoe-lafs] #2124: Add [storage].max_shares configuration option
tahoe-lafs
trac at tahoe-lafs.org
Sun Dec 1 20:32:58 UTC 2013
#2124: Add [storage].max_shares configuration option
------------------------------+------------------------
Reporter: amontero | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: undecided
Component: code-storage | Version: 1.10.0
Resolution: | Keywords: sneakernet
Launchpad Bug: |
------------------------------+------------------------
Description changed by zooko:
Old description:
> This feature is a proposed solution to #2107 and #2123.
>
> I discussed some weeks ago a possible feature with Daira and Warner to
> help me achieve #2107. The solutions looks like easy enough to make for a
> newbie-coder-task like me. This is a constraint of this approach, but
> also looks to have its merit also because of its simplicity, IMO.
>
> The solution would entail adding a "max_shares=n" setting in the
> "storage" config section of storage nodes (proposed names were checked
> with Daira and Warner).
> A node having this value set, for any given file, would gently refuse to
> accept shares in excess of "n" if its already stored shares already are
> "n".
> It is easily noted that this implies adding server-side storage
> configuration instead of adding anything in the share placement logic
> done at client side, AFAIK. Servers not having this value set, would
> continue to accept as many shares as they did prevously.
>
> The relevant code I learnt it's doing this is at this loop:
> https://github.com/tahoe-lafs/tahoe-
> lafs/blob/master/src/allmydata/storage/server.py#L284
> Adding here a bit of logic that checks if we already have at least "n"
> shares, excess shares could be simply dropped. The uploading nodes will
> have to find another server to place that share wherever they see fit, no
> changes in upload logic.
>
> Also, as further documentation, here are some excerpts (edited for
> brevity) of the chat with Daira and Warner with my learnings and
> misundertandings about implementation details that might be useful for
> others:
>
> warner:
> the protocol carries a request to hold a set of shares, then the server
> responds with two sets: the first is the set of shares it will accept,
> the second is the set of shares it already has at least for immutable
> shares
>
> warner:
> for mutable shares, the client just says "write to these shares", and I
> think the server gets to respond with "ok" or an error (I'm not sure
> there's a good way to signal that it accepted some shares but not others)
>
> amontero:
> it's this done here?
> https://github.com/tahoe-lafs/tahoe-
> lafs/blob/master/src/allmydata/storage/server.py#L284
> looks like to me even the server throws the data away
> (bw.throw_out_all_data = True)
>
> warner:
> I'm not sure the server is in a great position to know the best policy,
> but it seems like a feasible thing to add.
> the "first set" I described is returned as a dictionary of "bucket
> writers" so you could probably just change line 284 to say like "for
> shnum in sharenums[:LIMIT]:"
>
> amontero:
> I will have to go to the Py books to look for that [:LIMIT]: thingie
>
> warner:
> maybe "for shnum in list(sharenums)[:LIMIT]:" if sharenums is passed as
> a set instead of a list. the last colon ends the loop condition. the
> [:foo] syntax is how you get the first "foo" elements of a list
>
> amontero:
> I was thinking in some (surely more inefficient) "if" to "pass" or set
> throw_out_all_data=True
>
> warner:
> yeah, you probably don't want .throw_out_all_data: that will pretend to
> accept the share, but then silently discard the bytes. (we use that for
> unit tests)
>
> warner:
> you want the server to explicitly reject the share, so the client puts
> it elsewhere.
> anything that doesn't add an entry to "bucketwriters" will reject the
> share.
> like "elif len(bucketwriters) > LIMIT: pass".
> with a comment saying "don't accept too many shares".
>
> amontero:
> I have now to figure how and where in the code get the "ConfigEntry=num"
> from file
>
> warner: yeah, that part will probably take 90% of the code. it should be
> passed as an argument into the StorageServer constructor and should be
> read from the config file in src/allmydata/client.py in the
> init_storage() method
>
> amontero:
> "client.py" filename makes me think it's the wrong place to read it, I
> was thinking in something inside storage/source files
>
> warner:
> well, client.py is slightly misnamed. client.py is where we set up all
> clients and storage servers. it doesn't include the weird nodes (key-
> generator, stats-gatherer, etc) basically because it's so common to run a
> node that has both client stuff and storage-server stuff. we lumped them
> together (at least the high-level setup code) in a single file we may fix
> that some day by moving the init_storage() method out of client.py and
> into something else but init_storage() is definitely the right place to
> read storage-related config settings, and init_storage() currently lives
> in client.py
>
> BTW, my newbie feature branch attempt is at https://github.com/pataquets
> /tahoe-lafs/compare/1.10.0-storage-server--max-shares (no config,
> hardcoded to 1) but it is a proof/WIP, it's pending more manual testing.
New description:
This feature is a proposed solution to #2107 and #2123.
I discussed some weeks ago a possible feature with Daira and Warner to
help me achieve #2107. The solutions looks like easy enough to make for a
newbie-coder-task like me. This is a constraint of this approach, but also
looks to have its merit also because of its simplicity, IMO.
The solution would entail adding a "max_shares=n" setting in the "storage"
config section of storage nodes (proposed names were checked with Daira
and Warner).
A node having this value set, for any given file, would gently refuse to
accept shares in excess of "n" if its already stored shares already are
"n".
It is easily noted that this implies adding server-side storage
configuration instead of adding anything in the share placement logic done
at client side, AFAIK. Servers not having this value set, would continue
to accept as many shares as they did prevously.
The relevant code I learnt it's doing this is at this loop:
https://github.com/tahoe-lafs/tahoe-
lafs/blob/master/src/allmydata/storage/server.py#L284
Adding here a bit of logic that checks if we already have at least "n"
shares, excess shares could be simply dropped. The uploading nodes will
have to find another server to place that share wherever they see fit, no
changes in upload logic.
Also, as further documentation, here are some excerpts (edited for
brevity) of the chat with Daira and Warner with my learnings and
misundertandings about implementation details that might be useful for
others:
warner:
the protocol carries a request to hold a set of shares, then the server
responds with two sets: the first is the set of shares it will accept, the
second is the set of shares it already has at least for immutable shares
warner:
for mutable shares, the client just says "write to these shares", and I
think the server gets to respond with "ok" or an error (I'm not sure
there's a good way to signal that it accepted some shares but not others)
amontero:
it's this done here?
https://github.com/tahoe-lafs/tahoe-
lafs/blob/master/src/allmydata/storage/server.py#L284
looks like to me even the server throws the data away
(bw.throw_out_all_data = True)
warner:
I'm not sure the server is in a great position to know the best policy,
but it seems like a feasible thing to add.
the "first set" I described is returned as a dictionary of "bucket
writers" so you could probably just change line 284 to say like "for shnum
in sharenums[:LIMIT]:"
amontero:
I will have to go to the Py books to look for that [:LIMIT]: thingie
warner:
maybe "for shnum in list(sharenums)[:LIMIT]:" if sharenums is passed as a
set instead of a list. the last colon ends the loop condition. the [:foo]
syntax is how you get the first "foo" elements of a list
amontero:
I was thinking in some (surely more inefficient) "if" to "pass" or set
throw_out_all_data=True
warner:
yeah, you probably don't want .throw_out_all_data: that will pretend to
accept the share, but then silently discard the bytes. (we use that for
unit tests)
warner:
you want the server to explicitly reject the share, so the client puts it
elsewhere.
anything that doesn't add an entry to "bucketwriters" will reject the
share.
like "elif len(bucketwriters) > LIMIT: pass".
with a comment saying "don't accept too many shares".
amontero:
I have now to figure how and where in the code get the "!ConfigEntry=num"
from file
warner: yeah, that part will probably take 90% of the code. it should be
passed as an argument into the !StorageServer constructor and should be
read from the config file in src/allmydata/client.py in the init_storage()
method
amontero:
"client.py" filename makes me think it's the wrong place to read it, I
was thinking in something inside storage/source files
warner:
well, client.py is slightly misnamed. client.py is where we set up all
clients and storage servers. it doesn't include the weird nodes (key-
generator, stats-gatherer, etc) basically because it's so common to run a
node that has both client stuff and storage-server stuff. we lumped them
together (at least the high-level setup code) in a single file we may fix
that some day by moving the init_storage() method out of client.py and
into something else but init_storage() is definitely the right place to
read storage-related config settings, and init_storage() currently lives
in client.py
BTW, my newbie feature branch attempt is at https://github.com/pataquets
/tahoe-lafs/compare/1.10.0-storage-server--max-shares (no config,
hardcoded to 1) but it is a proof/WIP, it's pending more manual testing.
--
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2124#comment:2>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list