[tahoe-dev] [tahoe-lafs] #731: what to do with filenames that are illegal on some systems

tahoe-lafs trac at allmydata.org
Tue Jun 9 14:19:25 PDT 2009


#731: what to do with filenames that are illegal on some systems
---------------------------+------------------------------------------------
 Reporter:  zooko          |           Owner:       
     Type:  defect         |          Status:  new  
 Priority:  major          |       Milestone:  1.5.0
Component:  code-dirnodes  |         Version:  1.4.1
 Keywords:                 |   Launchpad_bug:       
---------------------------+------------------------------------------------
 If someone copies a file from system A into Tahoe-LAFS and then later
 someone tries to copy that file from Tahoe-LAFS into system B, then a
 problem could arise if the filename from system A is illegal on system B.
 This can happen in a few ways:

 1.  The filename could be illegal on Windows (http://msdn.microsoft.com
 /en-us/library/aa365247.aspx ), and system B could be Windows and system A
 non-Windows.

 2.  The filename could be illegal on Mac
 (http://developer.apple.com/technotes/tn/tn1150table.html ).

 3.  The filename could case-collide with another filename in the same
 directory, and system B could be a case-insensitive filesystem.  (Note
 that Tahoe's current naïve approach will result in a randomly-chosen one
 of the files overwriting the other if the target system is Windows or
 Macintosh.)

 4.  If we allowed undecodable bytestring filenames from POSIX system A's,
 either by storing bytestring (non-unicode) filenames, or by some escaping
 mechanism such as {{{utf8b}}}, then a non-POSIX
 system B would not be able to accept that name (or at least we ''should''
 not write that name into that system).  Likewise some users of POSIX have
 a policy that only correctly encoded unicode filenames should be stored in
 their filesystem, so for them we should not write that name even though we
 can do so by using the POSIX byte-oriented APIs.

 Here are someone else's notes about these sorts of issues:

 http://www.portfoliofaq.com/pfaq/FAQ00352.htm

 See also David A. Wheeler's excellent article arguing that we should start
 being pickier about filenames in POSIX systems:

 http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

 There are various ways Tahoe can deal with this.  It can do something
 about it on the Tahoe -> system B leg of the trip, such as by stopping
 with an error, offering to rename the offending files, etc..  It could
 also do something about it on the system A -> Tahoe leg of the trip.

 I think in the short term it might be better if Tahoe rejected non-
 portable filenames in the system A -> Tahoe leg of the trip, because we
 don't yet know how we want to handle them.  By rejecting them, we avoid
 the current random-overwrite issue and we don't constrain future versions
 of Tahoe-LAFS as much in terms of what sorts of filenames it has to
 support.  (There ''might'' already be some problematic filenames stored in
 Tahoe and we might want to extend Tahoe to deal with these better in the
 future, but if Tahoe-v1.5 starts rejecting new ones then the problem will
 probably be less widespread and less severe in the future.)

 On the other hand, rejecting them would be a UI/API regression, so we
 would probably want to add a {{{--force-nonportable-filenames}}} option to
 make it behave like Tahoe-v1.4 currently does.

 Help!?

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/731>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list