[tahoe-lafs-trac-stream] [tahoe-lafs] #1018: backup manager task (inside the node)

Sat Oct 6 18:02:15 UTC 2012

#1018: backup manager task (inside the node)
-------------------------------+------------------------------------------
     Reporter:  warner         |      Owner:
         Type:  enhancement    |     Status:  new
     Priority:  major          |  Milestone:  undecided
    Component:  code-frontend  |    Version:  1.6.1
   Resolution:                 |   Keywords:  backup performance usability
Launchpad Bug:                 |
-------------------------------+------------------------------------------
Changes (by jg71):

 * cc: jg71@… (added)

Old description:

> So, after finally wrangling the hardware into the right places,
> I've finished setting up my personal backup grid, and have
> started to upload my large photo archives into it with the "tahoe
> backup" CLI tool. It's a large archive, and my rough estimate is
> that it will take about 45 hours of continuous uploading to
> complete.
>
> One of the nodes is in my parent's house, and their downstream
> DSL is not very fast, so I expect that I'm maxing it out while
> I'm doing the upload. I don't want to impact their email and web
> browsing, so I'm trying to only run the upload at night. At this
> rate (8 hours a day) my backup is likely to take about 6 days.
>
> Each night I start "tahoe backup", and each morning I kill it.
> The backupdb is working perfectly, and it only takes a few
> seconds to skip over the 10k-ish files that have already been
> uploaded.
>
> But, what I'm starting to want is something to automate all this.
> I'd like to have a "backup manager" task, inside the node, which
> knows the source directory and target dirnode, and is configured
> with some timing information. Maybe something in tahoe.cfg like
> this:
>
> {{{
> [backups]
> b1.source = /Volumes/BackupDrive/Pictures
> b1.target = backup:pictures
> b1.frequency = 1week
> b1.allowed_times = 2200-2400,0000-0800
> }}}
>
> The node would use a small DB to remember how long it's been
> since the last backup completed, and wouldn't start a new one
> until the ".frequency" duration had elapsed. It would look at
> ".allowed_times" to figure out whether it's allowed to start a
> backup right now or not, and would wait until the window begins.
> At that point, it would start a node-side "tahoe backup"
> equivalent, and let it run until either it completes or the
> window closes, at which point the process would be suspended
> until the next window.
>
> The "b1" prefix is just an .ini-format trick to let you specify
> multiple jobs.
>
> Once Foolscap learns how to perform bandwidth management
> ([http://foolscap.lothar.com/trac/ticket/41 Foolscap#41]), it
> would be nice to add a "b1.bandwidth" value, which would tell the
> backup manager that this job is not allowed to use more than a
> certain amount. I can imagine refinements to that specification,
> to say something like "don't send more than X bps to Tub 1234",
> to specifically protect my parent's downstream (while not
> directly limiting anything else). Another option is to tell the
> node what percentage of our resources (upstream/downstream
> bandwidth, CPU time) we're willing to put into this task, and
> have it throttle the backup job when the usage goes above that
> threshold.
>
> Later, when we get a similar "checker/repair/rebalancing manager"
> in the node (#450, #543, #483, #661), we could configure it in a
> similar way, to control how much time/disk/IO it spends on the
> repair task. Because a tahoe-side deep-traversal is so much more
> expensive than a local disk walk (where the OS caches a lot of
> data), the repair manager probably wants to use a fairly large DB
> to keep track of which dirnodes have been visited or not, and
> which files haven't been checked in a while, etc. The backup
> manager can afford to simply kill and restart the "tahoe backup"
> job each time, because the backupdb does a good job of letting it
> skip over earlier work.
>
> I'm not entirely sure how to best display the status of this
> task. Probably a web page, that shows some estimates of total
> files seen and how many have been uploaded or skipped so far. But
> I don't know how this page needs to be protected. If we don't put
> any controls on it, and don't display anything too secret (like
> dircaps), then maybe we can afford to put it at a guessable URL
> (like we currently do with the storage server status page). If we
> decide that it contains sensitive data, or we want to add
> controls (like "pause backup", or maybe let you twiddle config
> settings right from the web page), then it needs to be
> unguessable. #674 is about having private WUI pages like this.

New description:

 So, after finally wrangling the hardware into the right places,
 I've finished setting up my personal backup grid, and have
 started to upload my large photo archives into it with the "tahoe
 backup" CLI tool. It's a large archive, and my rough estimate is
 that it will take about 45 hours of continuous uploading to
 complete.

 One of the nodes is in my parent's house, and their downstream
 DSL is not very fast, so I expect that I'm maxing it out while
 I'm doing the upload. I don't want to impact their email and web
 browsing, so I'm trying to only run the upload at night. At this
 rate (8 hours a day) my backup is likely to take about 6 days.

 Each night I start "tahoe backup", and each morning I kill it.
 The backupdb is working perfectly, and it only takes a few
 seconds to skip over the 10k-ish files that have already been
 uploaded.

 But, what I'm starting to want is something to automate all this.
 I'd like to have a "backup manager" task, inside the node, which
 knows the source directory and target dirnode, and is configured
 with some timing information. Maybe something in tahoe.cfg like
 this:

 {{{
 [backups]
 b1.source = /Volumes/BackupDrive/Pictures
 b1.target = backup:pictures
 b1.frequency = 1week
 b1.allowed_times = 2200-2400,0000-0800
 }}}

 The node would use a small DB to remember how long it's been
 since the last backup completed, and wouldn't start a new one
 until the ".frequency" duration had elapsed. It would look at
 ".allowed_times" to figure out whether it's allowed to start a
 backup right now or not, and would wait until the window begins.
 At that point, it would start a node-side "tahoe backup"
 equivalent, and let it run until either it completes or the
 window closes, at which point the process would be suspended
 until the next window.

 The "b1" prefix is just an .ini-format trick to let you specify
 multiple jobs.

 Once Foolscap learns how to perform bandwidth management
 ([http://foolscap.lothar.com/trac/ticket/41 Foolscap#41]), it
 would be nice to add a "b1.bandwidth" value, which would tell the
 backup manager that this job is not allowed to use more than a
 certain amount. I can imagine refinements to that specification,
 to say something like "don't send more than X bps to Tub 1234",
 to specifically protect my parent's downstream (while not
 directly limiting anything else). Another option is to tell the
 node what percentage of our resources (upstream/downstream
 bandwidth, CPU time) we're willing to put into this task, and
 have it throttle the backup job when the usage goes above that
 threshold.

 Later, when we get a similar "checker/repair/rebalancing manager"
 in the node (#450, #543, #483, #661), we could configure it in a
 similar way, to control how much time/disk/IO it spends on the
 repair task. Because a tahoe-side deep-traversal is so much more
 expensive than a local disk walk (where the OS caches a lot of
 data), the repair manager probably wants to use a fairly large DB
 to keep track of which dirnodes have been visited or not, and
 which files haven't been checked in a while, etc. The backup
 manager can afford to simply kill and restart the "tahoe backup"
 job each time, because the backupdb does a good job of letting it
 skip over earlier work.

 I'm not entirely sure how to best display the status of this
 task. Probably a web page, that shows some estimates of total
 files seen and how many have been uploaded or skipped so far. But
 I don't know how this page needs to be protected. If we don't put
 any controls on it, and don't display anything too secret (like
 dircaps), then maybe we can afford to put it at a guessable URL
 (like we currently do with the storage server status page). If we
 decide that it contains sensitive data, or we want to add
 controls (like "pause backup", or maybe let you twiddle config
 settings right from the web page), then it needs to be
 unguessable. #674 is about having private WUI pages like this.

--

Comment:

 Replying to [ticket:1018 warner]:

 I'd like that as well!

 Additionally, some kind of throttle config.option for the uploading
 node/client would be great, so that the uploader's connection is still
 usable and not maxed out constantly during large uploads, even if a helper
 service is used.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1018#comment:3>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage