[tahoe-lafs-trac-stream] [tahoe-lafs] #1835: stop grovelling the whole storage backend looking for externally-added shares to add a lease to
tahoe-lafs
trac at tahoe-lafs.org
Fri Feb 28 13:38:56 UTC 2014
#1835: stop grovelling the whole storage backend looking for externally-added
shares to add a lease to
-------------------------+-------------------------------------------------
Reporter: zooko | Owner:
Type: | Status: new
enhancement | Milestone: undecided
Priority: normal | Version: 1.9.2
Component: code- | Keywords: leases garbage-collection
storage | accounting
Resolution: |
Launchpad Bug: |
-------------------------+-------------------------------------------------
Old description:
> Currently, storage server operators can manually add share files into the
> storage backend, such as with "mv" or "rsync" or what have you, and a
> crawler will eventually discover that share and add a lease to it.
>
> I propose that we stop supporting this method of installing shares. If we
> stop supporting this, that would leave three options for if you want to
> add a share to a server:
>
> 1. Send it through the front door — use a tool that speaks the LAFS
> protocol, connects to the storage server over a network socket, and
> delivers the share. This will make the server write the share out to
> persistent storage, and also update the leasedb to reflect the share's
> existence, so that the share can get garbage-collected when appropriate.
> This would be a good way to do it if you have few shares or if they are
> on a remote server that can connect to this storage server over a
> network.
> 2. Copy the shares directly into place in the storage backend and then
> remove the leasedb. The next time the storage server starts, it will
> initiate a crawl that will eventually reconstruct the leasedb, and the
> newly reconstructed leasedb will include lease information about the new
> share so that it can eventually be garbage collected. This might be a
> reasonable thing to do when you are adding a large number of shares and
> it is easier/more efficient for you to add them directly to the storage
> backend, and you don't mind temporarily losing the lease information on
> the shares that are already there.
> 3. Copy the shares into place, but don't do anything that would register
> them in the leasedb. They are now immortal, unless a client subsequently
> adds a lease to them.
>
> The combination of these two options ''might'' suffice for most real use
> cases. If there are use cases where these aren't good enough, i.e. it is
> too inconvenient or slow to send all of the shares through the LAFS
> storage protocol, and you don't want to destroy the extant lease
> information, and you don't want the new shares to possibly become
> immortal, then we could invent other ways to do it:
>
> 4. Copy the shares into place and then use a newly added feature of
> storage server which tells it to notice the existence of each new share
> (by storage index). This newly added feature doesn't need to be exported
> over the network to remote foolscap clients, it could just be a "tahoe"
> command-line that connects to the storage server's local WAPI. What the
> server does when it is informed this way about the existence of a share
> is check if the share is really there and then add it to the leasedb.
> 5. Copy the shares into place and then use a newly added feature of
> storage server which performs a full crawl to update the leasedb without
> first deleting it.
>
> 4 would be a bit more efficient than 5 when used, but a lot more
> complication for the server administrator, who has to figure out how to
> call {{{tahoe add-share-to-lease-db $STORAGEINDEX}}} for each share that
> he's added, or else that share will be immortal. It is also more work for
> us to implement.
>
> 5 is really simple both for us to implement and storage server operators
> to use. It is exactly like the current crawler code, except that instead
> of continuously restarting itself and going to look for new shares, it
> quiesces and doesn't restart unless the server operator invokes {{{tahoe
> resync-lease-db}}}.
>
> So my proposal boils down to: change the accounting crawler never to run
> unless the leasedb is missing or corrupted (which also happens the first
> time you upgrade your server to a leasedb-capable version), or unless the
> operator has specifically indicated that the accounting crawler should
> run.
>
> This is part of an "overarching ticket" to eliminate most uses of crawler
> — ticket #1834.
New description:
Currently, storage server operators can manually add share files into the
storage backend, such as with "mv" or "rsync" or what have you, and a
crawler will eventually discover that share and add a lease to it.
I propose that we stop supporting this method of installing shares. If we
stop supporting this, that would leave three options for if you want to
add a share to a server:
1. Send it through the front door — use a tool that speaks the LAFS
protocol, connects to the storage server over a network socket, and
delivers the share. This will make the server write the share out to
persistent storage, and also update the leasedb to reflect the share's
existence, so that the share can get garbage-collected when appropriate.
This would be a good way to do it if you have few shares or if they are on
a remote server that can connect to this storage server over a network.
2. Copy the shares directly into place in the storage backend and then
remove the leasedb. The next time the storage server starts, it will
initiate a crawl that will eventually reconstruct the leasedb, and the
newly reconstructed leasedb will include lease information about the new
share so that it can eventually be garbage collected. This might be a
reasonable thing to do when you are adding a large number of shares and it
is easier/more efficient for you to add them directly to the storage
backend, and you don't mind temporarily losing the lease information on
the shares that are already there.
3. Copy the shares into place, but don't do anything that would register
them in the leasedb. They are now immortal, unless a client subsequently
adds a lease to them.
The combination of these two options ''might'' suffice for most real use
cases. If there are use cases where these aren't good enough, i.e. it is
too inconvenient or slow to send all of the shares through the LAFS
storage protocol, and you don't want to destroy the extant lease
information, and you don't want the new shares to possibly become
immortal, then we could invent other ways to do it:
4. Copy the shares into place and then use a newly added feature of
storage server which tells it to notice the existence of each new share
(by storage index). This newly added feature doesn't need to be exported
over the network to remote foolscap clients, it could just be a "tahoe"
command-line that connects to the storage server's local WAPI. What the
server does when it is informed this way about the existence of a share is
check if the share is really there and then add it to the leasedb.
5. Copy the shares into place and then use a newly added feature of
storage server which performs a full crawl to update the leasedb without
first deleting it.
4 would be a bit more efficient than 5 when used, but a lot more
complication for the server administrator, who has to figure out how to
call {{{tahoe add-share-to-lease-db $STORAGEINDEX}}} for each share that
he's added, or else that share will be immortal. It is also more work for
us to implement.
5 is really simple both for us to implement and storage server operators
to use. It is exactly like the current crawler code, except that instead
of continuously restarting itself and going to look for new shares, it
quiesces and doesn't restart unless the server operator invokes {{{tahoe
resync-lease-db}}}.
So my proposal boils down to: change the accounting crawler never to run
unless the leasedb is missing or corrupted (which also happens the first
time you upgrade your server to a leasedb-capable version), or unless the
operator has specifically indicated that the accounting crawler should
run.
This is part of an "overarching ticket" to eliminate most uses of crawler
— ticket #1834.
--
Comment (by daira):
Replying to [comment:3 davidsarah]:
> Currently the server always does a list query to the backend. The
leasedb allows us to skip that list query in the case where the share is
present in the DB. If the leasedb is not authoritative, then we still do
the query in the case where the share is not present in the DB, but this
only prevents us from improving the latency of reporting that a server
does ''not'' have a share. So, given that the downloader uses the first k
servers to respond to a DYHB, it does not affect the performance of a
(successful) download.
See [ticket:287#comment:29] for more information about what the downloader
does. I think it may wait for the 10 second timeout if there are servers
that haven't responded, rather than proceeding immediately after the first
k servers have responded -- in which case, my above argument isn't valid
unless that is fixed.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1835#comment:4>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list