[tahoe-lafs-trac-stream] [tahoe-lafs] #2124: Add [storage].max_shares configuration option

Sun Dec 1 20:32:58 UTC 2013

#2124: Add [storage].max_shares configuration option
------------------------------+------------------------
     Reporter:  amontero      |      Owner:
         Type:  enhancement   |     Status:  new
     Priority:  normal        |  Milestone:  undecided
    Component:  code-storage  |    Version:  1.10.0
   Resolution:                |   Keywords:  sneakernet
Launchpad Bug:                |
------------------------------+------------------------
Description changed by zooko:

Old description:

> This feature is a proposed solution to #2107 and #2123.
>
> I discussed some weeks ago a possible feature with Daira and Warner to
> help me achieve #2107. The solutions looks like easy enough to make for a
> newbie-coder-task like me. This is a constraint of this approach, but
> also looks to have its merit also because of its simplicity, IMO.
>
> The solution would entail adding a "max_shares=n" setting in the
> "storage" config section of storage nodes (proposed names were checked
> with Daira and Warner).
> A node having this value set, for any given file, would gently refuse to
> accept shares in excess of "n" if its already stored shares already are
> "n".
> It is easily noted that this implies adding server-side storage
> configuration instead of adding anything in the share placement logic
> done at client side, AFAIK. Servers not having this value set, would
> continue to accept as many shares as they did prevously.
>
> The relevant code I learnt it's doing this is at this loop:
> https://github.com/tahoe-lafs/tahoe-
> lafs/blob/master/src/allmydata/storage/server.py#L284
> Adding here a bit of logic that checks if we already have at least "n"
> shares, excess shares could be simply dropped. The uploading nodes will
> have to find another server to place that share wherever they see fit, no
> changes in upload logic.
>
> Also, as further documentation, here are some excerpts (edited for
> brevity) of the chat with Daira and Warner with my learnings and
> misundertandings about implementation details that might be useful for
> others:
>
>  warner:
>  the protocol carries a request to hold a set of shares, then the server
> responds with two sets: the first is the set of shares it will accept,
> the second is the set of shares it already has at least for immutable
> shares
>
>  warner:
>  for mutable shares, the client just says "write to these shares", and I
> think the server gets to respond with "ok" or an error (I'm not sure
> there's a good way to signal that it accepted some shares but not others)
>
>  amontero:
>  it's this done here?
>  https://github.com/tahoe-lafs/tahoe-
> lafs/blob/master/src/allmydata/storage/server.py#L284
>  looks like to me even the server throws the data away
> (bw.throw_out_all_data = True)
>
>  warner:
>  I'm not sure the server is in a great position to know the best policy,
> but it seems like a feasible thing to add.
>  the "first set" I described is returned as a dictionary of "bucket
> writers" so you could probably just change line 284 to say like "for
> shnum in sharenums[:LIMIT]:"
>
>  amontero:
>  I will have to go to the Py books to look for that [:LIMIT]: thingie
>
>  warner:
>  maybe "for shnum in list(sharenums)[:LIMIT]:" if sharenums is passed as
> a set instead of a list. the last colon ends the loop condition. the
> [:foo] syntax is how you get the first "foo" elements of a list
>
>  amontero:
>  I was thinking in some (surely more inefficient) "if" to "pass" or set
> throw_out_all_data=True
>
>  warner:
>  yeah, you probably don't want .throw_out_all_data: that will pretend to
> accept the share, but then silently discard the bytes. (we use that for
> unit tests)
>
>  warner:
>  you want the server to explicitly reject the share, so the client puts
> it elsewhere.
>  anything that doesn't add an entry to "bucketwriters" will reject the
> share.
>  like "elif len(bucketwriters) > LIMIT: pass".
>  with a comment saying "don't accept too many shares".
>
>  amontero:
>  I have now to figure how and where in the code get the "ConfigEntry=num"
> from file
>
>  warner: yeah, that part will probably take 90% of the code. it should be
> passed as an argument into the StorageServer constructor and should be
> read from the config file in src/allmydata/client.py in the
> init_storage() method
>
>  amontero:
>  "client.py" filename makes me think it's the wrong place to read it, I
> was thinking in something inside storage/source files
>
>  warner:
>  well, client.py is slightly misnamed. client.py is where we set up all
> clients and storage servers. it doesn't include the weird nodes (key-
> generator, stats-gatherer, etc) basically because it's so common to run a
> node that has both client stuff and storage-server stuff. we lumped them
> together (at least the high-level setup code) in a single file we may fix
> that some day by moving the init_storage() method out of client.py and
> into something else but init_storage() is definitely the right place to
> read storage-related config settings, and init_storage() currently lives
> in client.py
>
> BTW, my newbie feature branch attempt is at https://github.com/pataquets
> /tahoe-lafs/compare/1.10.0-storage-server--max-shares (no config,
> hardcoded to 1) but it is a proof/WIP, it's pending more manual testing.

New description:

 This feature is a proposed solution to #2107 and #2123.

 I discussed some weeks ago a possible feature with Daira and Warner to
 help me achieve #2107. The solutions looks like easy enough to make for a
 newbie-coder-task like me. This is a constraint of this approach, but also
 looks to have its merit also because of its simplicity, IMO.

 The solution would entail adding a "max_shares=n" setting in the "storage"
 config section of storage nodes (proposed names were checked with Daira
 and Warner).
 A node having this value set, for any given file, would gently refuse to
 accept shares in excess of "n" if its already stored shares already are
 "n".
 It is easily noted that this implies adding server-side storage
 configuration instead of adding anything in the share placement logic done
 at client side, AFAIK. Servers not having this value set, would continue
 to accept as many shares as they did prevously.

 The relevant code I learnt it's doing this is at this loop:
 https://github.com/tahoe-lafs/tahoe-
 lafs/blob/master/src/allmydata/storage/server.py#L284
 Adding here a bit of logic that checks if we already have at least "n"
 shares, excess shares could be simply dropped. The uploading nodes will
 have to find another server to place that share wherever they see fit, no
 changes in upload logic.

 Also, as further documentation, here are some excerpts (edited for
 brevity) of the chat with Daira and Warner with my learnings and
 misundertandings about implementation details that might be useful for
 others:

  warner:
  the protocol carries a request to hold a set of shares, then the server
 responds with two sets: the first is the set of shares it will accept, the
 second is the set of shares it already has at least for immutable shares

  warner:
  for mutable shares, the client just says "write to these shares", and I
 think the server gets to respond with "ok" or an error (I'm not sure
 there's a good way to signal that it accepted some shares but not others)

  amontero:
  it's this done here?
  https://github.com/tahoe-lafs/tahoe-
 lafs/blob/master/src/allmydata/storage/server.py#L284
  looks like to me even the server throws the data away
 (bw.throw_out_all_data = True)

  warner:
  I'm not sure the server is in a great position to know the best policy,
 but it seems like a feasible thing to add.
  the "first set" I described is returned as a dictionary of "bucket
 writers" so you could probably just change line 284 to say like "for shnum
 in sharenums[:LIMIT]:"

  amontero:
  I will have to go to the Py books to look for that [:LIMIT]: thingie

  warner:
  maybe "for shnum in list(sharenums)[:LIMIT]:" if sharenums is passed as a
 set instead of a list. the last colon ends the loop condition. the [:foo]
 syntax is how you get the first "foo" elements of a list

  amontero:
  I was thinking in some (surely more inefficient) "if" to "pass" or set
 throw_out_all_data=True

  warner:
  yeah, you probably don't want .throw_out_all_data: that will pretend to
 accept the share, but then silently discard the bytes. (we use that for
 unit tests)

  warner:
  you want the server to explicitly reject the share, so the client puts it
 elsewhere.
  anything that doesn't add an entry to "bucketwriters" will reject the
 share.
  like "elif len(bucketwriters) > LIMIT: pass".
  with a comment saying "don't accept too many shares".

  amontero:
  I have now to figure how and where in the code get the "!ConfigEntry=num"
 from file

  warner: yeah, that part will probably take 90% of the code. it should be
 passed as an argument into the !StorageServer constructor and should be
 read from the config file in src/allmydata/client.py in the init_storage()
 method

  amontero:
  "client.py" filename makes me think it's the wrong place to read it, I
 was thinking in something inside storage/source files

  warner:
  well, client.py is slightly misnamed. client.py is where we set up all
 clients and storage servers. it doesn't include the weird nodes (key-
 generator, stats-gatherer, etc) basically because it's so common to run a
 node that has both client stuff and storage-server stuff. we lumped them
 together (at least the high-level setup code) in a single file we may fix
 that some day by moving the init_storage() method out of client.py and
 into something else but init_storage() is definitely the right place to
 read storage-related config settings, and init_storage() currently lives
 in client.py

 BTW, my newbie feature branch attempt is at https://github.com/pataquets
 /tahoe-lafs/compare/1.10.0-storage-server--max-shares (no config,
 hardcoded to 1) but it is a proof/WIP, it's pending more manual testing.

--

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2124#comment:2>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage