#1429 closed enhancement (fixed)
automatically upload a file when it is put in a given local directory
Reported by: | davidsarah | Owned by: | warner |
---|---|---|---|
Priority: | major | Milestone: | 1.9.0 |
Component: | code-frontend | Version: | 1.8.2 |
Keywords: | drop-upload inotify usability review-needed news | Cc: | |
Launchpad Bug: |
Description (last modified by davidsarah)
During the Tahoe-LAFS summit, I (David-Sarah) implemented a prototype of a dropbox-like uploader: if you write a file into a given directory, it will upload it to a given Tahoe mutable directory (with the same name as its name in the local filesystem).
Its current limitations are:
- it only handles one directory, not including subdirectories
the behaviour when multiple files are put in the directory (or the same file more than once) has not been well-tested so farit doesn't have unit tests or docs (I will write those)- it must be given the URI of a mutable directory to upload to; it would be more usable to allow that to be a path that may start with an alias
- it uses the twisted.internet.inotify API, which depends on Twisted 10.1 and is only supported on platforms with inotify.
Attachments (9)
Change History (68)
comment:1 Changed at 2011-07-16T16:28:10Z by davidsarah
- Description modified (diff)
- Status changed from new to assigned
comment:2 Changed at 2011-07-16T16:30:22Z by davidsarah
- Description modified (diff)
- Keywords inotify usability added
Changed at 2011-07-16T16:44:07Z by davidsarah
comment:3 Changed at 2011-07-16T16:45:47Z by davidsarah
- Keywords design-review-needed added
- Owner davidsarah deleted
- Status changed from assigned to new
comment:4 Changed at 2011-07-19T00:54:15Z by davidsarah
http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1435/dependency-updates.darcs.patch updates the dependency for Twisted to 10.1, which ensures that twisted.internet.inotify is available. (Note that it conflicts with drop-upload.darcs.patch. I'll rebase the latter when I have tests and docs.)
comment:5 Changed at 2011-07-22T05:26:54Z by david-sarah@…
In 8b4082677477daf1:
comment:6 follow-up: ↓ 7 Changed at 2011-07-22T20:40:32Z by zooko
Here is a review of attachment:drop-upload.darcs.patch .
First of all: yay! This feature is neat. :-)
- What is the "FIXME: Unicode paths"? I don't see what needs to change there to handle non-ASCII paths.
- Why are we catching Exception--was it just because we might have Twisted < 10.1? We should remove any checks for if we have Twisted < 10.1 (except, of course, for the checks we do for all dependencies in src/allmydata/__init__.py).
- "FIXME: error reporting" sounds like a serious incomplete part, but what exactly should be added? Logging to the Tahoe-LAFS logging system of certain sorts of failures? What could fail here?
- Perhaps the logging in _notify() is too verbose.
- the "dump event to file" part of _notify() is just for debugging and should be removed.
- "FIXME: what happens if we get another event for the same file while it is uploading?" Well, I looked at the code of dirnode.add_file() and Uploader.upload() and I guess it just redundantly uploads it a second time. (See also #952, which is about what happens when you upload the bytewise-identical file more than once at a time through the WAPI.) What should it do? Well, what should the user experience look like? You drag and drop a file into this directory, and then you drag and drop a new version of it one second later, and then what? Maybe the visual display of the directory should have been updated as soon as you dropped the first one to show that it is in the middle of uploading that file. Then, if you drop another version on it, the display should update to show that there is another update scheduled to happen after this one completes. So I guess we need a new ticket for a GUI frontend to this functionality (integrated into some extant GUI like Nautilus, I suppose). But at this layer, I guess what it should do is schedule another upload for as soon as this upload is finished unless another such upload is already scheduled, in which case it should do nothing.
- "FIXME: should all notified events cause an upload?": from http://www.kernel.org/doc/man-pages/online/pages/man7/inotify.7.html and http://inotify.aiken.cz/?section=inotify&page=faq&lang=en it looks like we should respond to IN_CLOSE_WRITE | IN_CREATE | IN_MOVE_SELF | IN_MOVE_TO
- Also of course the other FIXMEs deserve to be investigated more. :-)
Other than that, it looks good. It is neat how few lines of code are needed to add this functionality! :-)
comment:7 in reply to: ↑ 6 ; follow-up: ↓ 8 Changed at 2011-07-22T21:03:44Z by davidsarah
Replying to zooko:
Here is a review of attachment:drop-upload.darcs.patch .
First of all: yay! This feature is neat. :-)
Thanks :-)
- What is the "FIXME: Unicode paths"? I don't see what needs to change there to handle non-ASCII paths.
tahoe.cfg is encoded in UTF-8, but the path wants to be in sys.getfilesystemencoding().
- Why are we catching Exception--was it just because we might have Twisted < 10.1? We should remove any checks for if we have Twisted < 10.1 (except, of course, for the checks we do for all dependencies in src/allmydata/__init__.py).
If there is any exception here, the node will silently fail to start. (The same is true of exceptions during initialization of the other frontends.) We should fix that in a better way -- it seems relevant to #355 and #1360 for example -- but I wanted to avoid this failure mode while I was debugging.
- "FIXME: error reporting" sounds like a serious incomplete part, but what exactly should be added? Logging to the Tahoe-LAFS logging system of certain sorts of failures? What could fail here?
We should raise an exception with a proper message (and ensure that gets logged) rather than failing an assert.
- Perhaps the logging in _notify() is too verbose.
- the "dump event to file" part of _notify() is just for debugging and should be removed.
Yes, it was just easier to see the events that way while prototyping. Logging to the Twisted log is sufficient, and the message could be shorter.
- "FIXME: what happens if we get another event for the same file while it is uploading?" Well, I looked at the code of dirnode.add_file() and Uploader.upload() and I guess it just redundantly uploads it a second time.
Yes, I think this is fairly harmless, modulo the fact that we should probably be using a more restrictive event mask.
(See also #952, which is about what happens when you upload the bytewise-identical file more than once at a time through the WAPI.) What should it do? Well, what should the user experience look like? You drag and drop a file into this directory, and then you drag and drop a new version of it one second later, and then what? Maybe the visual display of the directory should have been updated as soon as you dropped the first one to show that it is in the middle of uploading that file. Then, if you drop another version on it, the display should update to show that there is another update scheduled to happen after this one completes. So I guess we need a new ticket for a GUI frontend to this functionality (integrated into some extant GUI like Nautilus, I suppose). But at this layer, I guess what it should do is schedule another upload for as soon as this upload is finished unless another such upload is already scheduled, in which case it should do nothing.
Right, I'll open another ticket about suppressing redundant uploads.
- "FIXME: should all notified events cause an upload?": from http://www.kernel.org/doc/man-pages/online/pages/man7/inotify.7.html and http://inotify.aiken.cz/?section=inotify&page=faq&lang=en it looks like we should respond to IN_CLOSE_WRITE | IN_CREATE | IN_MOVE_SELF | IN_MOVE_TO
That looks about right, although IN_MOVE_SELF should be treated differently, since it's moving the local directory away from its current path. I don't know what that should do -- probably the safest thing is to disable the drop-upload feature entirely until the node is restarted (and say "don't do that" in the docs).
- Also of course the other FIXMEs deserve to be investigated more. :-)
Other than that, it looks good. It is neat how few lines of code are needed to add this functionality! :-)
Yeah, twisted.internet.inotify rocks!
comment:8 in reply to: ↑ 7 Changed at 2011-07-23T04:52:05Z by davidsarah
Replying to davidsarah:
Replying to zooko:
- What is the "FIXME: Unicode paths"? I don't see what needs to change there to handle non-ASCII paths.
tahoe.cfg is encoded in UTF-8, but the path wants to be in sys.getfilesystemencoding().
Also, in _notify the conversion to a Tahoe Unicode filename is wrong (unicode(filepath.basename()) will use the default ASCII encoding, when it should be converting from sys.getfilesystemencoding()).
Changed at 2011-07-25T04:40:02Z by davidsarah
Drop-upload frontend, with tests (but no documentation). refs #1429
comment:9 Changed at 2011-07-25T04:44:30Z by davidsarah
- Keywords review-needed added; design-review-needed removed
- Milestone changed from soon to 1.9.0
The tests could do with covering a few more corner cases and error conditions, but I'm pretty sure there will be enough time for that before the beta.
comment:10 follow-up: ↓ 11 Changed at 2011-07-25T06:14:03Z by warner
- Type changed from defect to enhancement
This is pretty cool stuff. I'm hesitant about landing it in Tahoe core, though.. at least in its present form, it feels more like something that wants to be in a plugin, or in an extensions/ sort of directory. If it were a more complete Dropbox-ish replacment, I'd feel better about it: watching all directories under a root, handling modification of existing files not just new ones, and ideally some kind of multiple-client support. (as is, it's more like an automatic backup tool than a sync-a-virtual-directory-across-multiple-machines tool).
Can you envision maintaining its current UI for a couple years? Or do you think you'll look at it in six months and go "oh, that should really look like X instead".
comment:11 in reply to: ↑ 10 Changed at 2011-07-25T12:24:40Z by davidsarah
Replying to warner:
This is pretty cool stuff. I'm hesitant about landing it in Tahoe core, though.. at least in its present form, it feels more like something that wants to be in a plugin, or in an extensions/ sort of directory.
We don't have a plugin mechanism. In any case, I'm not encouraged by the fate of things that were previously out-of-core, like the FUSE implementations. I'd prefer it to be in the core, tested by default, and available for all users to try out without installing anything else.
If it introduced new dependencies that would be a different matter, but it doesn't (and we'd decided that it was OK to depend on Twisted 10.1, which has other advantages).
If it were a more complete Dropbox-ish replacment, I'd feel better about it: watching all directories under a root, handling modification of existing files not just new ones, and ideally some kind of multiple-client support. (as is, it's more like an automatic backup tool than a sync-a-virtual-directory-across-multiple-machines tool).
I didn't intend it to be a sync-a-virtual-directory-across-multiple-machines tool. It just uploads things that you drop into a particular directory. Perhaps the name 'drop-upload' suggests that it is more similar to Dropbox than it is. We have time before the 1.9 release to rename it, if a better name is suggested.
It already handles modification of existing files (I'll add that to the tests). Watching all directories under a root is #1433, and is a relatively straightforward evolution of the current code.
Can you envision maintaining its current UI for a couple years?
Yes, absolutely. It'll be easy to maintain compatibility with the current UI, that's only a few lines of code and documentation. It also needs to be in core to get sufficient feedback on the UI to improve it.
comment:12 Changed at 2011-07-27T01:07:04Z by davidsarah
- Description modified (diff)
Changed at 2011-07-27T01:08:57Z by davidsarah
attachment:drop-upload-3.darcs.patch fixes some bugs in error paths, and has better test coverage. The tests also pass on Windows now.
Changed at 2011-07-27T03:31:46Z by davidsarah
drop-upload: make counts visible on the statistics page, and disable some debugging. refs #1429
comment:13 Changed at 2011-07-31T18:52:17Z by davidsarah
- Description modified (diff)
comment:14 Changed at 2011-07-31T19:09:55Z by davidsarah
There are some lines > 80 characters in drop-upload-docs.darcs.patch; I'll rewrap those before committing it.
comment:15 Changed at 2011-08-01T15:54:33Z by nejucomo
- Owner set to nejucomo
- Status changed from new to assigned
comment:16 Changed at 2011-08-01T16:08:04Z by nejucomo
I just read over these ticket comments and I'm about to review each patch in sequence. On success I can say: "This code makes sense to me and the tests work and the docs are clear.", but I cannot say: "We should / should not include this in trunk."
I will bug repository writers in IRC about that policy decision after reviewing the ticket, but before I change "review-needed" to "reviewed" to avoid an unintended 1.9 merge.
Changed at 2011-08-01T16:40:55Z by nejucomo
Fix requirement justification comments for Twisted >= 10.1.0
comment:17 Changed at 2011-08-01T16:48:45Z by nejucomo
There's a comment collision for attachment:drop-upload.darcs.patch in _auto_deps.py which causes no change, just more justifications, for the requirement "Twisted>=10.1.0". I've attached attachment:requirements-comment-merge.darcs.patch with the merge of the requirements comments.
comment:18 Changed at 2011-08-01T16:50:59Z by nejucomo
- Owner changed from nejucomo to davidsarah
- Status changed from assigned to new
When applying attachment:drop-upload-2.darcs.patch there are merge conflicts in ./src/allmydata/client.py, ./src/allmydata/scripts/create_node.py, and ./src/allmydata/test/test_runner.py.
I'm going to punt understanding and resolving these conflicts back to davidsarah and move on to reviewing other patches.
comment:19 Changed at 2011-08-01T17:24:17Z by davidsarah
attachment:drop-upload-2.darcs.patch is obsolete. Use attachment:drop-upload-4.darcs.patch .
comment:20 Changed at 2011-08-01T17:35:23Z by davidsarah
The wording in attachment:requirements-comment-merge.darcs.patch isn't quite equivalent, because the inotify requirement is specific to Linux and the FTP server one is cross-platform. I think the wording as attachment:drop-upload-4.darcs.patch leaves it is fine.
comment:21 Changed at 2011-08-01T17:40:11Z by nejucomo
- Owner changed from davidsarah to nejucomo
- Status changed from new to assigned
I am resuming this ticket after I realized later patch attachments supercede earlier ones.
comment:22 follow-up: ↓ 23 Changed at 2011-08-01T17:50:58Z by nejucomo
In the DropUploader constructor there may be a race condition between:
if not self._local_path.isdir(): raise AssertionError("The drop-upload local path %r was not an existing directory." % quote_output(local_dir))
-and later-
self._notifier.watch(self._local_path, mask=mask, callbacks=[self._notify])
What happens if self._local_path is deleted between these lines?
Would it be better to catch and handle the error of self._local_path not existing in the call to .watch ? (Would that call signal the error?)
I do not consider this issue significant enough to require another iteration on this ticket; but a new ticket may be necessary. I haven't thought of any security problems as of yet, and only a rare usability problem.
comment:23 in reply to: ↑ 22 Changed at 2011-08-01T18:54:17Z by davidsarah
Replying to nejucomo:
In the DropUploader constructor there may be a race condition between:
if not self._local_path.isdir(): raise AssertionError("The drop-upload local path %r was not an existing directory." % quote_output(local_dir))-and later-
self._notifier.watch(self._local_path, mask=mask, callbacks=[self._notify])What happens if self._local_path is deleted between these lines?
If the directory doesn't exist, the call to watch seems to succeed but not work as intended (even if the directory is later created). Similarly if the path points to a file rather than a directory. So the isdir() check is necessary for proper error reporting. In general I agree that it is "Better To Ask Forgiveness Than Permission" as opposed to "Looking Before You Leap", but it doesn't seem to be possible to do that here.
There is no security problem if the directory is deleted, that will just cause there to be no further notifications.
comment:24 Changed at 2011-08-01T18:59:30Z by davidsarah
- Keywords reviewed added; review-needed removed
- Owner changed from nejucomo to warner
- Status changed from assigned to new
Reassigning to warner to decide whether this goes in the beta release.
comment:25 Changed at 2011-08-01T21:23:02Z by nejucomo
I concur that this is reviewed. davidsarah answered the question about the _local_path race condition in IRC: The inotify interface will not indicate if the target directory does not exist.
It still may be possible to remove the race condition by calling isdir() after the call to watch() but this detail does not seem important enough to prevent inclusion.
comment:26 Changed at 2011-08-02T00:18:17Z by davidsarah
- Keywords news added
Proposed NEWS:
New Features '''''''''''' - A "drop-upload" feature has been added, which allows you to upload files to a Tahoe-LAFS directory just by writing them to a local directory. This feature is experimental and should not be relied on to store the only copy of valuable data. It is currently available only on Linux. See `<docs/frontends/drop-upload.rst>`_ for documentation. (`#1429`_)
Compatibility and Dependencies '''''''''''''''''''''''''''''' - The Twisted dependency has been raised to version 10.1. This ensures that we no longer require pywin32 on Windows even when using older versions of Twisted, that the new drop-upload feature has the required support from Twisted on Linux, and that it is never necessary to patch Twisted in order to use the FTP frontend. (`#1274`_, `#1429`_, `#1438`_)
comment:27 Changed at 2011-08-02T06:14:31Z by zooko
Brian mentioned on IRC that he was uncertain about including this feature in 1.9. His reasoning persuaded me that we might want to be a bit conservative about distributing this feature in a way that makes people think of it as being a fully supported feature of Tahoe-LAFS. It is new and we haven't thought about it that much, and we don't want to take on the burden of backward compatibility for this feature the way we do for all of the supported features in Tahoe-LAFS.
I think maybe we should go ahead and include this feature in Tahoe-LAFS v1.9 but mark it as an experimental feature which we don't necessarily commit to supporting at this time.
Having a new feature in a release seems like a great way to learn about whether we want to promote it to a fully supported feature in a future release.
If this is agreeable to everyone, we should make sure to flag it as experimental in the release announcement and docs.
comment:28 Changed at 2011-08-08T02:53:13Z by davidsarah
http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1431/drop-upload-docs-including-windows.darcs.patch is an alternative docs patch if the Windows implementation is included in 1.9.
Changed at 2011-08-09T00:47:39Z by davidsarah
Drop-upload frontend, rerecorded for 1.9 beta (and correcting a minor mistake). Includes some fixes for Windows but not the Windows inotify implementation. fixes #1429
comment:29 Changed at 2011-08-09T00:52:24Z by david-sarah@…
- Resolution set to fixed
- Status changed from new to closed
In 32a7717205ed824a:
comment:30 Changed at 2011-08-09T00:52:24Z by david-sarah@…
In 667b086b59ee37d3:
comment:31 Changed at 2011-08-09T01:02:46Z by david-sarah@…
In 08af9cea50c3c3cd:
Changed at 2011-08-09T19:57:11Z by davidsarah
drop-upload: rename the 'upload.uri' parameter to 'upload.dircap', and a couple of cleanups to error messages. refs #1429
Changed at 2011-08-09T22:24:56Z by zooko
comment:32 Changed at 2011-08-09T22:32:01Z by zooko
- Keywords review-needed added; reviewed removed
- Resolution fixed deleted
- Status changed from closed to reopened
Reviewed attachment:rename-upload-uri-to-dircap.darcs.patch. There are some issues. I'm attaching patches that fix a couple of them but not all.
Out of curiosity I rerecorded the patch with darcs record; this helped me find issues in the patch; see below. The resulting patches are in the latest attachment -- they redo the renaming with darcs replace and would be an acceptable variant to commit to trunk (also a suitably fixed version with hunks instead of darcs replace would be acceptable to me).
issues fixed by the latest attachment:
- There was a missed rename in Client.init_drop_uploader which would cause an exception if the code were executed.
- In init_drop_uploader the local variable could be changed from upload_uri to upload_cap, the way this patch changed such names elsewhere. This is changed by the darcs record rename in the attached patches.
- In DropUploader.__init__ it uses FilePath.is_dir as if it will return False when then thing doesn't exist or the thing is a non-directory, but FilePath.is_dir actually raises exception when the thing doesn't exist. The attached patches fix it to use FilePath.exists and change it to report those two cases separately.
issues not fixed:
- The presence of the missed rename also means that the code in Client.init_drop_uploader isn't exercised by unit tests.
- The "was not valid utf-8" and the "could not be represented in your local filesystem" exceptions could be caught and reported separately in DropUploader.__init__ so that the user would know which one happened. Not necessary for this ticket, but could be nice.
comment:33 Changed at 2011-08-09T22:38:40Z by zooko
Specifically it is the code inside the "if drop-upload is configured" block in Client.init_drop_uploader. The unit tests instantiate a DropUploader instance directly, but don't test what happens when you instantiate a Client instance which is configured to create a DropUploader.
comment:34 Changed at 2011-08-09T22:41:05Z by zooko
I use the following bash script (executed by emacs) to generate code coverage results for just test_drop_upload:
cd ~/playground/tahoe-lafs/1429-rerecord rm -rf ./.cover* ./htmlcov* coverage run --branch --include="`pwd`/src/*" /usr/local/bin/trial allmydata.test.test_drop_upload
comment:35 Changed at 2011-08-10T04:26:26Z by zooko@…
In 720bc2433b9bd16d:
(The changeset message doesn't reference this ticket)
comment:36 Changed at 2011-08-10T04:26:27Z by zooko@…
In 5633375d267e5728:
(The changeset message doesn't reference this ticket)
comment:37 Changed at 2011-08-10T04:26:29Z by zooko@…
In b7683d9b83a23cdd:
(The changeset message doesn't reference this ticket)
comment:38 Changed at 2011-08-10T04:26:29Z by zooko@…
In 612abca271703508:
comment:39 Changed at 2011-08-10T04:26:30Z by david-sarah@…
In 369e30b1dfa2b77f:
comment:40 Changed at 2011-08-10T04:26:31Z by david-sarah@…
In f157b733676b33f5:
comment:41 Changed at 2011-08-10T04:26:32Z by david-sarah@…
In 10ee22f50e5bdfd3:
comment:42 Changed at 2011-08-10T04:26:33Z by david-sarah@…
In c102056ac1df1784:
comment:43 Changed at 2011-08-10T04:26:34Z by david-sarah@…
In db22fdc20dc93a3e:
comment:4 Changed at 2011-08-10T06:32:21Z by david-sarah@…
In ab9eb12f7006322f:
comment:5 Changed at 2011-08-10T17:28:00Z by david-sarah@…
- Resolution set to fixed
- Status changed from reopened to closed
comment:6 follow-up: ↓ 7 Changed at 2011-08-10T17:28:04Z by david-sarah@…
comment:7 follow-up: ↓ 8 Changed at 2011-08-10T17:28:04Z by david-sarah@…
comment:8 Changed at 2011-08-10T17:28:06Z by zooko@…
comment:9 Changed at 2011-08-10T17:28:07Z by zooko@…
comment:10 follow-up: ↓ 11 Changed at 2011-08-10T17:28:07Z by zooko@…
comment:11 Changed at 2011-08-10T17:28:08Z by zooko@…
comment:12 Changed at 2011-08-10T17:28:09Z by david-sarah@…
comment:13 Changed at 2011-08-10T17:28:10Z by david-sarah@…
comment:14 Changed at 2011-08-10T17:28:11Z by david-sarah@…
comment:15 Changed at 2011-08-10T17:28:12Z by david-sarah@…
comment:16 Changed at 2011-08-10T17:28:13Z by david-sarah@…
comment:17 Changed at 2011-08-10T17:28:15Z by david-sarah@…
comment:18 Changed at 2011-08-10T20:12:02Z by davidsarah
Although this was auto-closed for the wrong reason (a commit of the original patch on a branch), it is in fact fixed.
comment:19 Changed at 2011-11-01T06:20:41Z by warner
just to be clear, this made it into 1.9.0 (released yesterday). It's labeled as "experimental", which means we aren't committed to supporting it long-term yet, and it may get pulled out if it misbehaves :).
Prototype implementation of drop-upload from Tahoe-LAFS summit. No tests or docs. (Corrected to include new file.)