[tahoe-dev] this week in tahoe development
Brian Warner
warner at allmydata.com
Tue Oct 30 12:10:54 PDT 2007
Just a random status report:
Last friday's HackFest was a great success. We had probably 15 or 20 people
present. Almost everybody there with a laptop installed Tahoe and joined the
ad-hoc storage grid. It functioned pretty well.. we had a couple of
exceptions and problems, which resulted in a couple of new tickets and some
ideas for how to monitor/diagnose these things better:
* the node should have a "log-gatherer.furl" config setting. If provided,
the node should deliver log messages to this destination. (specifically it
should contact that object and offer it a reference to the node's logging
code). If we'd had such a thing running on friday, the handful of
exceptions we saw would have been automatically logged and easier to
analyze afterwards
* streaming video and audio from the grid worked great. However the download
algorithm is not currently tolerant of server nodes going away during the
download itself. We need to change the peer-selection code to be able to
acquire new peers during the download, not just at the start. It should
probably also keep a "hot spare" available for faster failure-handling.
* sometimes we observed a 30-ish second pause in streaming data. We don't
know why.. I suspect we'd need detailed logs from all peers involved (and
more detailed logging in general) to determine the cause. I suspect one
peer being a bit slow. The download algorithm could probably be enhanced
to pay attention to how long each peer is taking to provide the share data
and prefer ones who give faster service. Or, simply kicking a peer into
the "hot standby" category if they take too long to provide data might do
the job.
* more visibility into where each file gets stored would make it easier to
reason about failures that result from peers going away. As the party
broke up and people started going home, some files naturally became
unretrievable. But it would be nice to be able to predict which files were
about to die because of specific nodes going away. And obviously a
repair/rebalance mechanism could have allowed the remaining nodes retain
enough shares to keep the files alive until the bitter end.
* the "easy_install tahoe" approach seemed to be the most popular. I think
at least one person used the .deb install, but mostly I saw a lot of
OS-X boxes.
* distributing the introducer.furl/vdrive.furl was a bit of a drag, since
(as unguessable capabilities) they're too long to conveniently dictate. We
put a copy on the wiki page, and I pasted them via IRC to a few people.
The distributed dirnodes project will get rid of one of the two furls. The
Invitation project will affect this but probably for the worse (since the
invitations are single-use.. we'll have to build a multi-use form for
posting to some common space). For spatially-localized grids that are
intended to be public (like at the HackFest), we'll probably want to make
it easy to publish the contact info via Bonjour or something (losing the
verification aspect of the furl, of course).
* David Reid proved that tahoe works correctly with unicode filenames. I had
no idea it could do that. Thanks David!
This week I'm focussing on implementing the small-mutable-file project (#197)
documented in source:docs/mutable.txt . When complete, this will replace the
current non-distributed dirnodes (#115), making them more reliable and giving
them shorter URIs. The hope is to have this done in a week or two.. we'll
see, there's a lot of code involved. Zooko is lined up to split the work with
me, but he's currently tracking down a more important problem, an apparent
bug in our implementation of SHA-256 that causes occasional bad hashes.
that's the news from Lake Tahoebegon, where all the hashes are strong, all
the URIs are good-looking, and all the reliability metrics are above
average..
-Brian
More information about the tahoe-dev
mailing list