Opened at 2009-06-09T21:19:25Z
Last modified at 2016-03-01T15:11:28Z
#731 new defect
what to do with filenames that are illegal on some systems — at Initial Version
Reported by: | zooko | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | eventually |
Component: | code-dirnodes | Version: | 1.4.1 |
Keywords: | forward-compatibility i18n unicode names | Cc: | |
Launchpad Bug: |
Description
If someone copies a file from system A into Tahoe-LAFS and then later someone tries to copy that file from Tahoe-LAFS into system B, then a problem could arise if the filename from system A is illegal on system B. This can happen in a few ways:
- The filename could be illegal on Windows (http://msdn.microsoft.com/en-us/library/aa365247.aspx ), and system B could be Windows and system A non-Windows.
- The filename could be illegal on Mac (http://developer.apple.com/technotes/tn/tn1150table.html ).
- The filename could case-collide with another filename in the same directory, and system B could be a case-insensitive filesystem. (Note that Tahoe's current naïve approach will result in a randomly-chosen one of the files overwriting the other if the target system is Windows or Macintosh.)
- If we allowed undecodable bytestring filenames from POSIX system A's, either by storing bytestring (non-unicode) filenames, or by some escaping mechanism such as utf8b, then a non-POSIX
system B would not be able to accept that name (or at least we should not write that name into that system). Likewise some users of POSIX have a policy that only correctly encoded unicode filenames should be stored in their filesystem, so for them we should not write that name even though we can do so by using the POSIX byte-oriented APIs.
Here are someone else's notes about these sorts of issues:
http://www.portfoliofaq.com/pfaq/FAQ00352.htm
See also David A. Wheeler's excellent article arguing that we should start being pickier about filenames in POSIX systems:
http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html
There are various ways Tahoe can deal with this. It can do something about it on the Tahoe -> system B leg of the trip, such as by stopping with an error, offering to rename the offending files, etc.. It could also do something about it on the system A -> Tahoe leg of the trip.
I think in the short term it might be better if Tahoe rejected non-portable filenames in the system A -> Tahoe leg of the trip, because we don't yet know how we want to handle them. By rejecting them, we avoid the current random-overwrite issue and we don't constrain future versions of Tahoe-LAFS as much in terms of what sorts of filenames it has to support. (There might already be some problematic filenames stored in Tahoe and we might want to extend Tahoe to deal with these better in the future, but if Tahoe-v1.5 starts rejecting new ones then the problem will probably be less widespread and less severe in the future.)
On the other hand, rejecting them would be a UI/API regression, so we would probably want to add a --force-nonportable-filenames option to make it behave like Tahoe-v1.4 currently does.
Help!?