#104 closed task (invalid)

does cp -r work as expected?

Reported by: zooko Owned by: warner
Priority: major Milestone: soon
Component: code-frontend-cli Version: 0.7.0
Keywords: usability tahoe-cp docs Cc:
Launchpad Bug:

Description (last modified by daira)

It would be good if the command-lines

allmydata-tahoe get

and

allmydata-tahoe put

supported the --recursive or -r option so that you could upload or download and entire collection of files with one command-line.

There are actually a host of issues that arise in implementing this, such as those mentioned in the "names versus identifiers" section of webapi.txt, and quoted here:

For example, suppose you are writing code which recursively downloads the
contents of a directory. The first thing your code does is fetch the listing
of the contents of the directory. For each child that it fetched, if that
child is a file then it downloads the file, and if that child is a directory
then it recurses into that directory. Now, if the download and the recurse
actions are performed using the child's name, then the results might be
wrong, because for example a child name that pointed to a sub-directory when
you listed the directory might have been changed to point to a file, in which
case your attempt to recurse into it would result in an error and the file
would be skipped, or a child name that pointed to a file when you listed the
directory might now point to a sub-directory, in which case your attempt to
download the child would result in a file containing HTML text describing the
sub-directory!

These problems can be avoided by traversing identifiers instead of names, but the next problems can't. The next problems are that dirnodes can recurse (a dirnode can contain an entry pointing to another dirnode which contains an entry pointing to the first), or can converge (two entries in the same or different dirnodes can point to the same object). We could implement a recursive download of such things by (perhaps arbitrarily) choosing one path to be a real link and the other to be a symlink. But Windows doesn't have symlinks. Another option would be to abort and print an error message if such a pattern is encountered.

Change History (25)

comment:1 Changed at 2007-08-15T21:34:26Z by zooko

  • Milestone changed from undecided to 0.6.0
  • Status changed from new to assigned

comment:2 Changed at 2007-09-19T22:55:45Z by zooko

  • Milestone changed from 0.6.0 to 0.7.0

comment:3 Changed at 2007-10-01T18:17:13Z by zooko

  • Summary changed from recursive get and recursive put to command-line: recursive get and recursive put

comment:4 Changed at 2007-10-01T19:25:42Z by zooko

  • Milestone changed from 0.7.0 to 0.6.1
  • Version changed from 0.4.0 to 0.6.0

Promoting this to Milestone 0.6.1 because my favorite customer, Peter, wants it.

comment:5 Changed at 2007-10-13T06:50:48Z by zooko

  • Milestone changed from 0.6.1 to 0.7.0

bumping this to v0.7

comment:6 Changed at 2007-11-01T18:13:48Z by zooko

  • Milestone changed from 0.7.0 to 0.7.1

We're focussing on an imminent v0.7.0 (see the roadmap) which hopefully has #197 -- Small Distributed Mutable Files and also a fix for #199 -- bad SHA-256. So I'm bumping less urgent tickets to v0.7.1.

comment:7 Changed at 2007-11-01T18:14:13Z by zooko

  • Version changed from 0.6.0 to 0.6.1

comment:8 Changed at 2007-11-13T18:22:08Z by zooko

  • Milestone changed from 0.7.1 to 0.7.2
  • Version changed from 0.6.1 to 0.7.0

We need to choose a manageable subset of desired improvements for v0.7.1, scheduled for two week hence, so I'm bumping this one into v0.7.2, scheduled for mid-December.

comment:9 Changed at 2008-01-15T21:36:41Z by zooko

  • Component changed from code-frontend to code-frontend-cli

comment:10 Changed at 2008-01-23T04:19:03Z by zooko

  • Milestone changed from 0.7.2 to undecided

comment:11 Changed at 2008-03-10T01:31:01Z by zooko

  • Owner changed from zooko to nobody
  • Status changed from assigned to new

comment:12 Changed at 2008-06-01T21:02:33Z by warner

  • Milestone changed from eventually to 1.2.0

this is being replaced by "cp -r", and might be sufficiently done by now (although we may wish to put off closing this until "cp -r" works a bit better). Moving this to 1.2.0 with the idea that it might be closed by the 1.1.0 release.

comment:13 Changed at 2008-06-07T19:34:48Z by zooko

  • Milestone changed from 1.2.0 to 1.1.0

I don't understand why you put it into Milestone 1.2.0 if you think it is ready to be closed as a feature added to 1.1.0.

Also, what did you do about convergent links (as mentioned in the initial note on this ticket), and what did you do about link cycles? And did you avoid the weirdness of race conditions, as described in the initial note of this ticket, by using caps instead of names as the "next links"?

Thanks!

comment:14 Changed at 2008-06-07T19:35:02Z by zooko

  • Owner changed from nobody to warner

comment:15 Changed at 2008-06-09T18:30:16Z by zooko

  • Milestone changed from 1.1.0 to 1.2.0

Okay, there is a complete implementation of cp -r, but we haven't analyzed some of the potential issues mentioned in this ticket, or whether this UI is sufficient, or whether it is not actually completely complete. So, later we'll consider these questions, and we're leaving this ticket open to remind us to do that.

comment:16 Changed at 2009-06-30T12:39:27Z by zooko

  • Milestone changed from 1.5.0 to eventually

comment:17 Changed at 2009-12-13T03:55:23Z by davidsarah

  • Keywords usability added
  • Summary changed from command-line: recursive get and recursive put to does cp -r work as expected?

comment:18 Changed at 2009-12-13T03:55:52Z by davidsarah

  • Keywords cp added

comment:19 Changed at 2009-12-13T03:56:16Z by davidsarah

  • Type changed from enhancement to task

comment:20 Changed at 2010-02-02T03:17:39Z by davidsarah

  • Milestone changed from eventually to 1.7.0

comment:21 Changed at 2010-02-12T05:11:11Z by davidsarah

  • Keywords tahoe-cp added; cp removed

comment:22 Changed at 2010-02-12T05:11:20Z by davidsarah

  • Keywords docs added

comment:23 Changed at 2010-06-16T03:59:31Z by davidsarah

  • Milestone changed from 1.7.0 to soon

comment:24 Changed at 2012-11-26T00:36:58Z by davidsarah

This ticket is way too vague.

TahoeDirectorySource and TahoeDirectoryTarget in git/src/allmydata/scripts/tahoe_cp.py have cache dictionaries that seem as though they might have the effect of copying cycles correctly between two Tahoe directories, but I don't see a unit test for that in allmydata.test.test_cli.Cp.

#712 is one way in which tahoe cp -r does not do the right thing. I also don't think it will do the right thing when copying a cyclic Tahoe directory, although perhaps #712 obscures that. I filed #1878 for this.

OTOH, TahoeDirectorySource does not have the following bug:

Now, if the download and the recurse actions are performed using the child's name, then the results might be wrong, because for example a child name that pointed to a sub-directory when you listed the directory might have been changed to point to a file, [...] or a child name that pointed to a file when you listed the directory might now point to a sub-directory...

Is there anything more to do on this ticket, or is it covered by #712 and #1878?

Version 0, edited at 2012-11-26T00:36:58Z by davidsarah (next)

comment:25 Changed at 2013-08-28T16:47:41Z by daira

  • Description modified (diff)
  • Resolution set to invalid
  • Status changed from new to closed

Closed for vagueness.

Note: See TracTickets for help on using tickets.