[tahoe-dev] Hack Fest Day One report
Zooko Wilcox-O'Hearn
zooko at zooko.com
Sat Mar 31 20:15:16 UTC 2012
Folks:
Here are some notes about Hack Fest Day One.
• David-Sarah did a lot of trac ticket gardening. You can see their
changes by browsing the Trac Timeline ¹ (which, by the way, is one of
my favorite ways of keeping up with Tahoe-LAFS development). I haven't
looked at all of their updates to tickets yet.
• Andrew Miller wrote a unit test for his path to make "tahoe backup"
follow a limited number of symlinks, and updated his patch on #641. He
didn't set the "review-needed" flag. I'm not sure why not -- perhaps
that means he doesn't think the patch is (yet?) suitable for trunk.
We have a conversation about how to handle symlinks. There are three
possible features:
1. Traverse symlinks when doing a "tahoe backup". That's what
amiller's patch does, and it raises the question of how to handle
cycles encountered while traversing the local filesystem to do a
backup. The main use case for this is amiller's desire to use symlinks
as a way to specify which directories should be backed up -- he can
put symlinks into one directory pointing to several other directories,
then ask tahoe to backup that one directory.
2. Have a special file to notify "tahoe backup" to skip the current
directory. It could be named ".tahoeignore" (analogous to .gitignore)
and "tahoe backup" would refuse to backup any directory containing a
.tahoeignore file, nor traverse through that one to others.
3. A way to restore filesystem details other than the names and
contents of files, such as symlinks, ownership, permissions,
timestamps, etc. This is the topic of ticket #1325.
• There was some discussion of whether people wanted "tahoe backup"
(which stores your files and directories individually so that you can
browse, restore, and share access to them separately) to handle more
and more of the local filesystem metadata (#1325) or if instead people
who want that should just switch to using a more "archive oriented"
backup tool like duplicity, which bundles all of your files and
directories, along with their metadata, into a tarball before storing
it on Tahoe-LAFS. I tend to "not want to go there" and send people off
to use duplicity or Shawn Willden's Grid Backup or something if that's
what they want, but everyone else seemed to think it would be a good
idea to add more of these features to "tahoe backup".
The reason I gave at the time for not wanting to go there is that
there is such a mismatch between the Tahoe-LAFS model, with immutable
files, capability access control, metadata on edges rather than on
nodes, a full graph instead of a tree, etc., and the traditional Unix
filesystem with its mutable files, ownership and permission bits,
metadata on nodes instead of on edges, tree-plus-symlinks, etc.
I later thought that there is another reason I don't like the idea
extending "tahoe backup" in that direction, which is that I like for
the data which lives in Tahoe-LAFS to be meaningful independent of any
given "source machine" (for example things like "UID of owner" are
either meaningless or incorrect unless you decide to interpret them in
the context of a "source system" which happens to have the right
meaning for that UID), and also independent of any given source
operating system type. For example a long-standing bug that really
annoys me (someone fix it during Hack Fest!) is that "tahoe backup"
reads the "metadata-change-time" if on Unix, or else reads the
"file-creation-time" if on Windows, and then sticks the resulting
timestamp into an field named "ctime" in Tahoe-LAFS (#897). The result
is meaningless unless you choose to interpret in terms of being either
"in Unix" or "in Windows". Annoying! The thing is in the cloud -- it
isn't in either Unix or Windows while it is there.
I guess in general, I think of the data stored in Tahoe-LAFS as being
shareable among multiple people, and meaningful within different
contexts -- different "source machines" and even different "source
operating system types", and being meaningful outside of the context
of any source machine at all -- just purely "in the cloud". It is
perhaps not impossible, but certainly more work, to handle extended
filesystem metadata correctly, compared to making a traditional
"backup and restore" app. In a traditional "backup and restore" app
like duplicity, the data "in the cloud" can't be shared or used other
than by first restoring it to a system that is of a sufficiently
similar type to its source system.
• On the topic of issue 1., above, I suggested an alternative way to
prevent cycling endlessly through the local filesystem. The primary
proposal -- already partially implemented by amiller -- is to have a
limit on how many symlinks you'll cross over before you stop. If you
stop, you should probably print out a complete listing of how you got
to where you are, and which steps were traversing symlinks, so that
the user can figure out if this was really a cycle and if so how they
want to break the cycle.
My alternative proposal is that for each directory that you visit,
learn its device id and inode number and use those as a unique
identifier for this directory in the context of this run of "tahoe
backup", and if you encounter a directory for the second time then use
the same within-tahoe-lafs-directory that you created for it the first
time.
That's way better! It is almost elegant. But as Brian then pointed
out, it can't work. Because all directories (except for the root dir)
that are produced by a "tahoe backup" are immutable. You can't have a
cycle of immutable directories, because it is impossible to get a link
to an immutable directory before providing the initial and final
contents of it.
• This led me to off-handedly comment that such a thing would be
possible with Petrification a.k.a. revocation of write authority
(#954) which inspired David-Sarah to announce that they were going to
implement #954 during Hack Fest. That would be very ambitious, but
David-Sarah often says that they will do ambitious things and then
does them. I remember that the "Drop-Upload" feature was implemented
in a single afternoon by David-Sarah at The Second Tahoe-LAFS Summit.
Anyway, I'm pretty excited about the possibility of getting #954 soon.
It would be the first big addition to the vocabulary of what you can
express with Tahoe-LAFS caps.
• I started this tutorial on "How To Write Tests": ², because of the
increasing number of patches that are ready except for tests ³. The
tutorial is already enough to get you started, but you'll probably
come to a spot where you say "Hey NOW WHAT?". Please do that, so that
I -- or someone -- will extend the tutorial.
• I did a lot of tinkering with packaging and build system and docs. I
care a lot about this stuff, and I hope you do too, and I could use
help, but I'm not going to explain it all in this letter because
currently new work is being done at Hack Fest faster than I'm
finishing this letter describing the work that is being done. :-) You
can find out all about it by reading the trac Timeline. One thing I
definitely need help with is building binary packages of dependencies
on various platforms, especially with Python 2.7. Please see the table
of precompiled packages hosted on tahoe-lafs.org: ⁴.
• Lebek did some wiki gardening and fixed a bug in FTP -- #1688.
• Brian posted a fix for #1689, which is good to see as #1689 was one
of those regressions in handling of mutables that we shipped in
v1.9.x.
• Marlowe reviewed some docs patches and resumed working on the
Tahoe-LAFS Weekly News. Yay!
• John Dougherty posted his first review. ☺
Okay, that's at least part of what we've done so far. There are still
approximately 48 hours of Hack Fest to go. Jump in!
Regards,
Zooko
¹ https://tahoe-lafs.org/trac/tahoe-lafs/timeline?from=03%2F31%2F12&daysback=3&authors=&ticket=on&ticket_details=on&changeset=on&milestone=on&wiki=on&update=Update
² https://tahoe-lafs.org/trac/tahoe-lafs/wiki/HowToWriteTests
³ https://tahoe-lafs.org/pipermail/tahoe-dev/2012-March/007205.html
⁴ https://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-lafs-dep-eggs/README.html
tickets mentioned in this email:
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/641# tahoe backup should
be able to backup symlinks
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/954# revocable write authority
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/897# "tahoe backup"
thinks "ctime" means "creation time"
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1325# make `tahoe
backup` keep more filesystem metadata
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1688# ftpd returns 0 for
all timestamps
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1689# assertion failure
More information about the tahoe-dev
mailing list