[tahoe-dev] Errors, lost connection

Brian Warner warner-tahoe at allmydata.com
Mon Feb 9 17:25:32 PST 2009


On Tue, 10 Feb 2009 11:33:24 +1100
Andrej Falout <andrej at falout.org> wrote:

> 1) Is there a particular reason why are errors reported by CLI client
> in 3000 line HTML format? Makes it very hard to read.

The CLI tools use the webapi, so it's basically just copying the
HTML-formatted exception traceback (which the webapi server creates) to
stderr. Ideally, the CLI tools would either use a slightly different API
(perhaps a webapi flag to say "I really don't want HTML exceptions"). Another
approach which I've considered is to change the CLI tools to use a foolscap
connection to their tahoe node, rather than an HTTP connection, which would
let us see better-structured exceptions and thus present better error
messages.

> 2) Just lost power for about 10 seconds, and despite UPS-es everywhere
> on my side, I lost Internet connectivity. Not on my side, so must be
> telco's fault. Regardless. Lost 2 days uploading.
> Is there a way to increase Tahoe's tolerance to this kind of issues,
> or have it retry failed operation if it's time out cant be extended?
> I'd say it is to be expected that in any network operation that takes
> days, 10 second connectivity losses have to be expected?

What operation were you doing? It's likely that you didn't actually lose 2
days of progress, but given the way the tools present status/progress
information, it could certainly look that way.

We have several mechanisms to improve tolerance to network interruption. The
"tahoe backup" command, once it finishes backing up a single file, will add
an entry in its database to avoid re-uploading that file again later.

On a per-file basis, if your client is using a "helper", then the helper
upload protocol stores the partially-uploaded ciphertext for a while (a few
months) so it can resume the upload without losing the progress made on that
file so far. Your tahoe node's "Welcome Page" will tell you whether it's
connected to a helper or not.

If you were using "tahoe backup", add the --verbose option to see which files
it is uploading and which files it is skipping. You should find that
interrupting the process will cause it to re-upload whatever one file it was
working on at the time, but all previously uploaded files should be skipped
in the future.

> 3) I was testing new 'tahoe backup ...'. I see no way to resume half
> complete backup. "Archives" directory is empty - where did already
> uploaded data go?

The Archives/ directory is only populated when the backup is complete
(likewise the 'Latest' reference is only updated when the backup is
complete), so an interrupted backup will not leave you with any way to access
the partial contents. There were intermediate directory nodes being created,
but they get orphaned when the process is interrupted before completion.

This is basically a consequence of the design: we might be able to re-use the
previous backup directory if nothing changed, but we won't know until we've
uploaded everything below that point. Since we never modify an existing
directory, we never create a partial one: either it reuses an old directory,
or it creates a new one fully-populated (using set_children). This also
improves performance, since it reduces the number of read-modify-write cycles
that it must go through to a bare minimum.

Note however that the backupdb is filled with file entries as they are
uploaded, and uploading the files is what consumes the majority of the backup
time. So even an interrupted backup is making significant progress towards
making your next backup faster.

If you want to experiment with 'tahoe backup' in smaller batches, try backing
up a smaller set of files.. it should behave the same way.


hope that helps,
 -Brian


More information about the tahoe-dev mailing list