| 1 | .. -*- coding: utf-8-with-signature -*- |
|---|
| 2 | |
|---|
| 3 | =========================== |
|---|
| 4 | Garbage Collection in Tahoe |
|---|
| 5 | =========================== |
|---|
| 6 | |
|---|
| 7 | 1. `Overview`_ |
|---|
| 8 | 2. `Client-side Renewal`_ |
|---|
| 9 | 3. `Server Side Expiration`_ |
|---|
| 10 | 4. `Expiration Progress`_ |
|---|
| 11 | 5. `Future Directions`_ |
|---|
| 12 | |
|---|
| 13 | Overview |
|---|
| 14 | ======== |
|---|
| 15 | |
|---|
| 16 | When a file or directory in a Tahoe-LAFS file store is no longer referenced, |
|---|
| 17 | the space that its shares occupied on each storage server can be freed, |
|---|
| 18 | making room for other shares. Tahoe currently uses a garbage collection |
|---|
| 19 | ("GC") mechanism to implement this space-reclamation process. Each share has |
|---|
| 20 | one or more "leases", which are managed by clients who want the |
|---|
| 21 | file/directory to be retained. The storage server accepts each share for a |
|---|
| 22 | pre-defined period of time, and is allowed to delete the share if all of the |
|---|
| 23 | leases expire. |
|---|
| 24 | |
|---|
| 25 | Garbage collection is not enabled by default: storage servers will not delete |
|---|
| 26 | shares without being explicitly configured to do so. When GC is enabled, |
|---|
| 27 | clients are responsible for renewing their leases on a periodic basis at |
|---|
| 28 | least frequently enough to prevent any of the leases from expiring before the |
|---|
| 29 | next renewal pass. |
|---|
| 30 | |
|---|
| 31 | There are several tradeoffs to be considered when choosing the renewal timer |
|---|
| 32 | and the lease duration, and there is no single optimal pair of values. See |
|---|
| 33 | the following diagram to get an idea of the tradeoffs involved: |
|---|
| 34 | |
|---|
| 35 | .. image:: lease-tradeoffs.svg |
|---|
| 36 | |
|---|
| 37 | |
|---|
| 38 | If lease renewal occurs quickly and with 100% reliability, than any renewal |
|---|
| 39 | time that is shorter than the lease duration will suffice, but a larger ratio |
|---|
| 40 | of duration-over-renewal-time will be more robust in the face of occasional |
|---|
| 41 | delays or failures. |
|---|
| 42 | |
|---|
| 43 | The current recommended values for a small Tahoe grid are to renew the leases |
|---|
| 44 | once a week, and give each lease a duration of 31 days. In the current |
|---|
| 45 | release, there is not yet a way to create a lease with a different duration, |
|---|
| 46 | but the server can use the ``expire.override_lease_duration`` configuration |
|---|
| 47 | setting to increase or decrease the effective duration (when the lease is |
|---|
| 48 | processed) to something other than 31 days. |
|---|
| 49 | |
|---|
| 50 | Renewing leases can be expected to take about one second per file/directory, |
|---|
| 51 | depending upon the number of servers and the network speeds involved. |
|---|
| 52 | |
|---|
| 53 | |
|---|
| 54 | |
|---|
| 55 | Client-side Renewal |
|---|
| 56 | =================== |
|---|
| 57 | |
|---|
| 58 | If all of the files and directories which you care about are reachable from a |
|---|
| 59 | single starting point (usually referred to as a "rootcap"), and you store |
|---|
| 60 | that rootcap as an alias (via "``tahoe create-alias``" for example), then the |
|---|
| 61 | simplest way to renew these leases is with the following CLI command:: |
|---|
| 62 | |
|---|
| 63 | tahoe deep-check --add-lease ALIAS: |
|---|
| 64 | |
|---|
| 65 | This will recursively walk every directory under the given alias and renew |
|---|
| 66 | the leases on all files and directories. (You may want to add a ``--repair`` |
|---|
| 67 | flag to perform repair at the same time.) Simply run this command once a week |
|---|
| 68 | (or whatever other renewal period your grid recommends) and make sure it |
|---|
| 69 | completes successfully. As a side effect, a manifest of all unique files and |
|---|
| 70 | directories will be emitted to stdout, as well as a summary of file sizes and |
|---|
| 71 | counts. It may be useful to track these statistics over time. |
|---|
| 72 | |
|---|
| 73 | Note that newly uploaded files (and newly created directories) get an initial |
|---|
| 74 | lease too: the ``--add-lease`` process is only needed to ensure that all |
|---|
| 75 | older objects have up-to-date leases on them. |
|---|
| 76 | |
|---|
| 77 | A separate "rebalancing manager/service" is also planned -- see ticket |
|---|
| 78 | `#543`_. The exact details of what this service will do are not settled, but |
|---|
| 79 | it is likely to work by acquiring manifests from rootcaps on a periodic |
|---|
| 80 | basis, keeping track of checker results, managing lease-addition, and |
|---|
| 81 | prioritizing repair and rebalancing of shares. Eventually it may use multiple |
|---|
| 82 | worker nodes to perform these jobs in parallel. |
|---|
| 83 | |
|---|
| 84 | .. _#543: http://tahoe-lafs.org/trac/tahoe-lafs/ticket/543 |
|---|
| 85 | |
|---|
| 86 | |
|---|
| 87 | Server Side Expiration |
|---|
| 88 | ====================== |
|---|
| 89 | |
|---|
| 90 | Expiration must be explicitly enabled on each storage server, since the |
|---|
| 91 | default behavior is to never expire shares. Expiration is enabled by adding |
|---|
| 92 | config keys to the ``[storage]`` section of the ``tahoe.cfg`` file (as described |
|---|
| 93 | below) and restarting the server node. |
|---|
| 94 | |
|---|
| 95 | Each lease has two parameters: a create/renew timestamp and a duration. The |
|---|
| 96 | timestamp is updated when the share is first uploaded (i.e. the file or |
|---|
| 97 | directory is created), and updated again each time the lease is renewed (i.e. |
|---|
| 98 | "``tahoe check --add-lease``" is performed). The duration is currently fixed |
|---|
| 99 | at 31 days, and the "nominal lease expiration time" is simply $duration |
|---|
| 100 | seconds after the $create_renew timestamp. (In a future release of Tahoe, the |
|---|
| 101 | client will get to request a specific duration, and the server will accept or |
|---|
| 102 | reject the request depending upon its local configuration, so that servers |
|---|
| 103 | can achieve better control over their storage obligations.) |
|---|
| 104 | |
|---|
| 105 | The lease-expiration code has two modes of operation. The first is age-based: |
|---|
| 106 | leases are expired when their age is greater than their duration. This is the |
|---|
| 107 | preferred mode: as long as clients consistently update their leases on a |
|---|
| 108 | periodic basis, and that period is shorter than the lease duration, then all |
|---|
| 109 | active files and directories will be preserved, and the garbage will |
|---|
| 110 | collected in a timely fashion. |
|---|
| 111 | |
|---|
| 112 | Since there is not yet a way for clients to request a lease duration of other |
|---|
| 113 | than 31 days, there is a ``tahoe.cfg`` setting to override the duration of all |
|---|
| 114 | leases. If, for example, this alternative duration is set to 60 days, then |
|---|
| 115 | clients could safely renew their leases with an add-lease operation perhaps |
|---|
| 116 | once every 50 days: even though nominally their leases would expire 31 days |
|---|
| 117 | after the renewal, the server would not actually expire the leases until 60 |
|---|
| 118 | days after renewal. |
|---|
| 119 | |
|---|
| 120 | The other mode is an absolute-date-cutoff: it compares the create/renew |
|---|
| 121 | timestamp against some absolute date, and expires any lease which was not |
|---|
| 122 | created or renewed since the cutoff date. If all clients have performed an |
|---|
| 123 | add-lease some time after March 20th, you could tell the storage server to |
|---|
| 124 | expire all leases that were created or last renewed on March 19th or earlier. |
|---|
| 125 | This is most useful if you have a manual (non-periodic) add-lease process. |
|---|
| 126 | Note that there is not much point to running a storage server in this mode |
|---|
| 127 | for a long period of time: once the lease-checker has examined all shares and |
|---|
| 128 | expired whatever it is going to expire, the second and subsequent passes are |
|---|
| 129 | not going to find any new leases to remove. |
|---|
| 130 | |
|---|
| 131 | The ``tahoe.cfg`` file uses the following keys to control lease expiration: |
|---|
| 132 | |
|---|
| 133 | ``[storage]`` |
|---|
| 134 | |
|---|
| 135 | ``expire.enabled = (boolean, optional)`` |
|---|
| 136 | |
|---|
| 137 | If this is ``True``, the storage server will delete shares on which all |
|---|
| 138 | leases have expired. Other controls dictate when leases are considered to |
|---|
| 139 | have expired. The default is ``False``. |
|---|
| 140 | |
|---|
| 141 | ``expire.mode = (string, "age" or "cutoff-date", required if expiration enabled)`` |
|---|
| 142 | |
|---|
| 143 | If this string is "age", the age-based expiration scheme is used, and the |
|---|
| 144 | ``expire.override_lease_duration`` setting can be provided to influence the |
|---|
| 145 | lease ages. If it is "cutoff-date", the absolute-date-cutoff mode is |
|---|
| 146 | used, and the ``expire.cutoff_date`` setting must be provided to specify |
|---|
| 147 | the cutoff date. The mode setting currently has no default: you must |
|---|
| 148 | provide a value. |
|---|
| 149 | |
|---|
| 150 | In a future release, this setting is likely to default to "age", but in |
|---|
| 151 | this release it was deemed safer to require an explicit mode |
|---|
| 152 | specification. |
|---|
| 153 | |
|---|
| 154 | ``expire.override_lease_duration = (duration string, optional)`` |
|---|
| 155 | |
|---|
| 156 | When age-based expiration is in use, a lease will be expired if its |
|---|
| 157 | ``lease.create_renew`` timestamp plus its ``lease.duration`` time is |
|---|
| 158 | earlier/older than the current time. This key, if present, overrides the |
|---|
| 159 | duration value for all leases, changing the algorithm from:: |
|---|
| 160 | |
|---|
| 161 | if (lease.create_renew_timestamp + lease.duration) < now: |
|---|
| 162 | expire_lease() |
|---|
| 163 | |
|---|
| 164 | to:: |
|---|
| 165 | |
|---|
| 166 | if (lease.create_renew_timestamp + override_lease_duration) < now: |
|---|
| 167 | expire_lease() |
|---|
| 168 | |
|---|
| 169 | The value of this setting is a "duration string", which is a number of |
|---|
| 170 | days, months, or years, followed by a units suffix, and optionally |
|---|
| 171 | separated by a space, such as one of the following:: |
|---|
| 172 | |
|---|
| 173 | 7days |
|---|
| 174 | 31day |
|---|
| 175 | 60 days |
|---|
| 176 | 2mo |
|---|
| 177 | 3 month |
|---|
| 178 | 12 months |
|---|
| 179 | 2years |
|---|
| 180 | |
|---|
| 181 | This key is meant to compensate for the fact that clients do not yet have |
|---|
| 182 | the ability to ask for leases that last longer than 31 days. A grid which |
|---|
| 183 | wants to use faster or slower GC than a 31-day lease timer permits can |
|---|
| 184 | use this parameter to implement it. |
|---|
| 185 | |
|---|
| 186 | This key is only valid when age-based expiration is in use (i.e. when |
|---|
| 187 | ``expire.mode = age`` is used). It will be rejected if cutoff-date |
|---|
| 188 | expiration is in use. |
|---|
| 189 | |
|---|
| 190 | ``expire.cutoff_date = (date string, required if mode=cutoff-date)`` |
|---|
| 191 | |
|---|
| 192 | When cutoff-date expiration is in use, a lease will be expired if its |
|---|
| 193 | create/renew timestamp is older than the cutoff date. This string will be |
|---|
| 194 | a date in the following format:: |
|---|
| 195 | |
|---|
| 196 | 2009-01-16 (January 16th, 2009) |
|---|
| 197 | 2008-02-02 |
|---|
| 198 | 2007-12-25 |
|---|
| 199 | |
|---|
| 200 | The actual cutoff time shall be midnight UTC at the beginning of the |
|---|
| 201 | given day. Lease timers should naturally be generous enough to not depend |
|---|
| 202 | upon differences in timezone: there should be at least a few days between |
|---|
| 203 | the last renewal time and the cutoff date. |
|---|
| 204 | |
|---|
| 205 | This key is only valid when cutoff-based expiration is in use (i.e. when |
|---|
| 206 | "expire.mode = cutoff-date"). It will be rejected if age-based expiration |
|---|
| 207 | is in use. |
|---|
| 208 | |
|---|
| 209 | expire.immutable = (boolean, optional) |
|---|
| 210 | |
|---|
| 211 | If this is False, then immutable shares will never be deleted, even if |
|---|
| 212 | their leases have expired. This can be used in special situations to |
|---|
| 213 | perform GC on mutable files but not immutable ones. The default is True. |
|---|
| 214 | |
|---|
| 215 | expire.mutable = (boolean, optional) |
|---|
| 216 | |
|---|
| 217 | If this is False, then mutable shares will never be deleted, even if |
|---|
| 218 | their leases have expired. This can be used in special situations to |
|---|
| 219 | perform GC on immutable files but not mutable ones. The default is True. |
|---|
| 220 | |
|---|
| 221 | Expiration Progress |
|---|
| 222 | =================== |
|---|
| 223 | |
|---|
| 224 | In the current release, leases are stored as metadata in each share file, and |
|---|
| 225 | no separate database is maintained. As a result, checking and expiring leases |
|---|
| 226 | on a large server may require multiple reads from each of several million |
|---|
| 227 | share files. This process can take a long time and be very disk-intensive, so |
|---|
| 228 | a "share crawler" is used. The crawler limits the amount of time looking at |
|---|
| 229 | shares to a reasonable percentage of the storage server's overall usage: by |
|---|
| 230 | default it uses no more than 10% CPU, and yields to other code after 100ms. A |
|---|
| 231 | typical server with 1.1M shares was observed to take 3.5 days to perform this |
|---|
| 232 | rate-limited crawl through the whole set of shares, with expiration disabled. |
|---|
| 233 | It is expected to take perhaps 4 or 5 days to do the crawl with expiration |
|---|
| 234 | turned on. |
|---|
| 235 | |
|---|
| 236 | The crawler's status is displayed on the "Storage Server Status Page", a web |
|---|
| 237 | page dedicated to the storage server. This page resides at $NODEURL/storage, |
|---|
| 238 | and there is a link to it from the front "welcome" page. The "Lease |
|---|
| 239 | Expiration crawler" section of the status page shows the progress of the |
|---|
| 240 | current crawler cycle, expected completion time, amount of space recovered, |
|---|
| 241 | and details of how many shares have been examined. |
|---|
| 242 | |
|---|
| 243 | The crawler's state is persistent: restarting the node will not cause it to |
|---|
| 244 | lose significant progress. The state file is located in two files |
|---|
| 245 | ($BASEDIR/storage/lease_checker.state and lease_checker.history), and the |
|---|
| 246 | crawler can be forcibly reset by stopping the node, deleting these two files, |
|---|
| 247 | then restarting the node. |
|---|
| 248 | |
|---|
| 249 | Future Directions |
|---|
| 250 | ================= |
|---|
| 251 | |
|---|
| 252 | Tahoe's GC mechanism is undergoing significant changes. The global |
|---|
| 253 | mark-and-sweep garbage-collection scheme can require considerable network |
|---|
| 254 | traffic for large grids, interfering with the bandwidth available for regular |
|---|
| 255 | uploads and downloads (and for non-Tahoe users of the network). |
|---|
| 256 | |
|---|
| 257 | A preferable method might be to have a timer-per-client instead of a |
|---|
| 258 | timer-per-lease: the leases would not be expired until/unless the client had |
|---|
| 259 | not checked in with the server for a pre-determined duration. This would |
|---|
| 260 | reduce the network traffic considerably (one message per week instead of |
|---|
| 261 | thousands), but retain the same general failure characteristics. |
|---|
| 262 | |
|---|
| 263 | In addition, using timers is not fail-safe (from the client's point of view), |
|---|
| 264 | in that a client which leaves the network for an extended period of time may |
|---|
| 265 | return to discover that all of their files have been garbage-collected. (It |
|---|
| 266 | *is* fail-safe from the server's point of view, in that a server is not |
|---|
| 267 | obligated to provide disk space in perpetuity to an unresponsive client). It |
|---|
| 268 | may be useful to create a "renewal agent" to which a client can pass a list |
|---|
| 269 | of renewal-caps: the agent then takes the responsibility for keeping these |
|---|
| 270 | leases renewed, so the client can go offline safely. Of course, this requires |
|---|
| 271 | a certain amount of coordination: the renewal agent should not be keeping |
|---|
| 272 | files alive that the client has actually deleted. The client can send the |
|---|
| 273 | renewal-agent a manifest of renewal caps, and each new manifest should |
|---|
| 274 | replace the previous set. |
|---|
| 275 | |
|---|
| 276 | The GC mechanism is also not immediate: a client which deletes a file will |
|---|
| 277 | nevertheless be consuming extra disk space (and might be charged or otherwise |
|---|
| 278 | held accountable for it) until the ex-file's leases finally expire on their |
|---|
| 279 | own. |
|---|
| 280 | |
|---|
| 281 | In the current release, these leases are each associated with a single "node |
|---|
| 282 | secret" (stored in $BASEDIR/private/secret), which is used to generate |
|---|
| 283 | renewal-secrets for each lease. Two nodes with different secrets |
|---|
| 284 | will produce separate leases, and will not be able to renew each |
|---|
| 285 | others' leases. |
|---|
| 286 | |
|---|
| 287 | Once the Accounting project is in place, leases will be scoped by a |
|---|
| 288 | sub-delegatable "account id" instead of a node secret, so clients will be able |
|---|
| 289 | to manage multiple leases per file. In addition, servers will be able to |
|---|
| 290 | identify which shares are leased by which clients, so that clients can safely |
|---|
| 291 | reconcile their idea of which files/directories are active against the |
|---|
| 292 | server's list, and explicitly cancel leases on objects that aren't on the |
|---|
| 293 | active list. |
|---|
| 294 | |
|---|
| 295 | By reducing the size of the "lease scope", the coordination problem is made |
|---|
| 296 | easier. In general, mark-and-sweep is easier to implement (it requires mere |
|---|
| 297 | vigilance, rather than coordination), so unless the space used by deleted |
|---|
| 298 | files is not expiring fast enough, the renew/expire timed lease approach is |
|---|
| 299 | recommended. |
|---|
| 300 | |
|---|