#1816 new defect

add a lease renewal method that only renews some shares in a shareset, to be used by repair/rebalancing

Reported by: davidsarah Owned by: warner
Priority: normal Milestone: undecided
Component: code-storage Version: 1.9.2
Keywords: forward-compatibility rebalancing repair RIStorageServer leases leasedb servers-of-happiness Cc:
Launchpad Bug:

Description (last modified by daira)

The current remote_add_lease and remote_renew_lease methods of RIStorageServer add or renew leases on all shares in a shareset. This is not ideal for rebalancing, because it provides no way to indicate which shnums are no longer needed on a given server.

The new method could also allow requesting a specific lease duration. (With accounting, we will at some point have a way to configure maximum lease durations for particular accounts.)

Change History (8)

comment:1 Changed at 2012-09-28T01:24:08Z by davidsarah

Note that the new method will not need arguments for lease renewal or cancel secrets, since those are not used by the LeaseDB (#666).

comment:2 Changed at 2012-09-28T01:55:34Z by davidsarah

Straw-man signature:

def add_or_renew_leases(storage_index=StorageIndex,
                        sharenums=SetOf(int, maxLength=MAX_BUCKETS),
                        requested_duration_seconds=int)
    """
    Renew leases on the specified shares, or add them where there is no existing
    lease, requesting the given lease duration in seconds. Raise IndexError (and
    do not renew any leases) if any of the specified shares are not held by this
    server.

    Returns the lease duration accepted by the server, in seconds, which may be
    smaller than the requested duration. (This value does not take into account
    that existing leases on one of more of the specified shares, added by this or
    other accounts, may have a longer duration.)

    Server expiration policy might result in shares being deleted before the
    accepted lease duration returned by this method, but this should not normally
    happen without intervention to change the configured policy (for this account
    or globally), or to explicitly delete shares.

    This method was added in Tahoe-LAFS v1.X.0. For backward compatibility with
    older servers, use remote_add_lease (which will also renew existing leases).
    """
    return int
Last edited at 2012-09-28T03:27:39Z by davidsarah (previous) (diff)

comment:3 Changed at 2012-09-28T01:59:43Z by davidsarah

  • Keywords design-review-needed added
  • Owner set to warner

comment:4 Changed at 2012-09-28T03:25:55Z by davidsarah

  • Description modified (diff)

comment:5 Changed at 2012-12-06T22:32:26Z by davidsarah

Merged from #881 (cancel leases on extra shares in repairer, check-and-add-lease, upload, and publish):

The ideal state of a file is to have exactly N distinct shares on N distinct servers. Anything beyond that is "extra": they might improve reliability but also consume extra storage space. We'd like to remove these extra shares to bring the total consumed storage space back down to the target implied by the user's choice of the N/k "expansion ratio".

For mutable files, anyone with a writecap can simply delete the extra shares. We should modify the "publish" operation to identify and delete the extra shares (after successfully updating the non-extra shares).

But there is no appropriate way to explicitly delete an immutable share: we intentionally do not provide a "destroycap". So the way to get rid of these shares is through garbage collection.

The operations that add leases (check --add-lease, and the repairer) should pay attention to how many shares have been seen, and identify the extra shares, and then cancel any leases that we can on them.

Check-and-add-lease pipelines both operations: it sends a DYHB and an add-lease-to-anything-you-have message together, ignoring the response from the add-lease message, and counting the DYHB responses to form the checker results. This speeds up the operation: if we allowed the code to have an unbounded number of outstanding messages in flight, the entire operation could be finished in one RTT.

Instead, this code should watch the DYHB responses and identify the extra shares, then send out cancel-lease messages for the extra shares. This increases the required time to two RTT (since we can't send out any cancel-lease messages until we've seen enough DYHB responses to correctly identify shares as being extra), but only in the (hopefully rare) case where there are extra shares. In the common case, check-and-add-lease should proceed at full speed and never need to send out additional messages.

Sending out cancel-lease messages is also easier than carefully refraining from sending out add-lease messages on the extra shares. To accomplish that, we'd have to do a full check run (i.e. DYHB messages to everyone), and only after most of those came back could we do the selective add-lease messages. By sending out cancel-messages instead, we're sending more messages (DYHB, add-lease, cancel-lease), but we can pipeline them more efficiently.

Extra shares can arise in a variety of ways. The most common is when a mutable file is modified while some of the servers are offline: new shares (to replace the unavailable ones) will be created and sent to new servers, and then on a subsequent publish, all shares will be updated. This typically results in e.g. sh1 being present on both servers A and B.

Another cause is the immutable repairer, which (because immutable upload is still pretty simplistic) will place a share on a server before checking to see if that same share is on a different server, or before seeing if there are any other shares on that server already. This typically results in e.g. sh1 and sh2 being present on server A, while sh2 is also present on server B.

The storage server's add/cancel lease operations need to be enhanced to allow clients to selectively manipulate leases on each share, not just the bucket as a whole. This is needed to allow the sh2 on server A to expire, while preserving the sh1 on server A. This also argues against some of the storage-server changes that I've recommended elsewhere (#600), in which the lease information would be pulled out of the per-share files and into a per-bucket structure, since that would make it impossible to cancel a lease on one share but not the other.

The above was written before leasedb (#1818) and also before the fix to #1528 which removed the remote methods to explicitly cancel leases. Now, it would have to be implemented in terms of a selective lease renewal operation as proposed in comment:2.

comment:7 Changed at 2013-07-12T16:55:36Z by zooko

  • Keywords design-review-needed removed

comment:8 Changed at 2013-12-02T03:11:54Z by daira

  • Keywords leases leasedb servers-of-happiness added
Note: See TracTickets for help on using tickets.