[tahoe-dev] this week in tahoe development

Tue Oct 30 12:10:54 PDT 2007

Just a random status report:

Last friday's HackFest was a great success. We had probably 15 or 20 people
present. Almost everybody there with a laptop installed Tahoe and joined the
ad-hoc storage grid. It functioned pretty well.. we had a couple of
exceptions and problems, which resulted in a couple of new tickets and some
ideas for how to monitor/diagnose these things better:

 * the node should have a "log-gatherer.furl" config setting. If provided,
   the node should deliver log messages to this destination. (specifically it
   should contact that object and offer it a reference to the node's logging
   code). If we'd had such a thing running on friday, the handful of
   exceptions we saw would have been automatically logged and easier to
   analyze afterwards

 * streaming video and audio from the grid worked great. However the download
   algorithm is not currently tolerant of server nodes going away during the
   download itself. We need to change the peer-selection code to be able to
   acquire new peers during the download, not just at the start. It should
   probably also keep a "hot spare" available for faster failure-handling.

 * sometimes we observed a 30-ish second pause in streaming data. We don't
   know why.. I suspect we'd need detailed logs from all peers involved (and
   more detailed logging in general) to determine the cause. I suspect one
   peer being a bit slow. The download algorithm could probably be enhanced
   to pay attention to how long each peer is taking to provide the share data
   and prefer ones who give faster service. Or, simply kicking a peer into
   the "hot standby" category if they take too long to provide data might do
   the job.

 * more visibility into where each file gets stored would make it easier to
   reason about failures that result from peers going away. As the party
   broke up and people started going home, some files naturally became
   unretrievable. But it would be nice to be able to predict which files were
   about to die because of specific nodes going away. And obviously a
   repair/rebalance mechanism could have allowed the remaining nodes retain
   enough shares to keep the files alive until the bitter end.

 * the "easy_install tahoe" approach seemed to be the most popular. I think
   at least one person used the .deb install, but mostly I saw a lot of
   OS-X boxes.

 * distributing the introducer.furl/vdrive.furl was a bit of a drag, since
   (as unguessable capabilities) they're too long to conveniently dictate. We
   put a copy on the wiki page, and I pasted them via IRC to a few people.
   The distributed dirnodes project will get rid of one of the two furls. The
   Invitation project will affect this but probably for the worse (since the
   invitations are single-use.. we'll have to build a multi-use form for
   posting to some common space). For spatially-localized grids that are
   intended to be public (like at the HackFest), we'll probably want to make
   it easy to publish the contact info via Bonjour or something (losing the
   verification aspect of the furl, of course).

 * David Reid proved that tahoe works correctly with unicode filenames. I had
   no idea it could do that. Thanks David!

This week I'm focussing on implementing the small-mutable-file project (#197)
documented in source:docs/mutable.txt . When complete, this will replace the
current non-distributed dirnodes (#115), making them more reliable and giving
them shorter URIs. The hope is to have this done in a week or two.. we'll
see, there's a lot of code involved. Zooko is lined up to split the work with
me, but he's currently tracking down a more important problem, an apparent
bug in our implementation of SHA-256 that causes occasional bad hashes.

that's the news from Lake Tahoebegon, where all the hashes are strong, all
the URIs are good-looking, and all the reliability metrics are above
average..

 -Brian