#731 new defect

what to do with filenames that are illegal on some systems — at Initial Version

Reported by: zooko Owned by:
Priority: major Milestone: eventually
Component: code-dirnodes Version: 1.4.1
Keywords: forward-compatibility i18n unicode names Cc:
Launchpad Bug:

Description

If someone copies a file from system A into Tahoe-LAFS and then later someone tries to copy that file from Tahoe-LAFS into system B, then a problem could arise if the filename from system A is illegal on system B. This can happen in a few ways:

  1. The filename could be illegal on Windows (http://msdn.microsoft.com/en-us/library/aa365247.aspx ), and system B could be Windows and system A non-Windows.
  1. The filename could be illegal on Mac (http://developer.apple.com/technotes/tn/tn1150table.html ).
  1. The filename could case-collide with another filename in the same directory, and system B could be a case-insensitive filesystem. (Note that Tahoe's current naïve approach will result in a randomly-chosen one of the files overwriting the other if the target system is Windows or Macintosh.)
  1. If we allowed undecodable bytestring filenames from POSIX system A's, either by storing bytestring (non-unicode) filenames, or by some escaping mechanism such as utf8b, then a non-POSIX

system B would not be able to accept that name (or at least we should not write that name into that system). Likewise some users of POSIX have a policy that only correctly encoded unicode filenames should be stored in their filesystem, so for them we should not write that name even though we can do so by using the POSIX byte-oriented APIs.

Here are someone else's notes about these sorts of issues:

http://www.portfoliofaq.com/pfaq/FAQ00352.htm

See also David A. Wheeler's excellent article arguing that we should start being pickier about filenames in POSIX systems:

http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

There are various ways Tahoe can deal with this. It can do something about it on the Tahoe -> system B leg of the trip, such as by stopping with an error, offering to rename the offending files, etc.. It could also do something about it on the system A -> Tahoe leg of the trip.

I think in the short term it might be better if Tahoe rejected non-portable filenames in the system A -> Tahoe leg of the trip, because we don't yet know how we want to handle them. By rejecting them, we avoid the current random-overwrite issue and we don't constrain future versions of Tahoe-LAFS as much in terms of what sorts of filenames it has to support. (There might already be some problematic filenames stored in Tahoe and we might want to extend Tahoe to deal with these better in the future, but if Tahoe-v1.5 starts rejecting new ones then the problem will probably be less widespread and less severe in the future.)

On the other hand, rejecting them would be a UI/API regression, so we would probably want to add a --force-nonportable-filenames option to make it behave like Tahoe-v1.4 currently does.

Help!?

Change History (0)

Note: See TracTickets for help on using tickets.