[tahoe-lafs-trac-stream] [Tahoe-LAFS] #731: what to do with filenames that are illegal on some systems

Tahoe-LAFS trac at tahoe-lafs.org
Tue Mar 1 15:09:16 UTC 2016


#731: what to do with filenames that are illegal on some systems
-------------------------+-------------------------------------------------
     Reporter:  zooko    |      Owner:
         Type:  defect   |     Status:  new
     Priority:  major    |  Milestone:  eventually
    Component:  code-    |    Version:  1.4.1
  dirnodes               |   Keywords:  forward-compatibility i18n unicode
   Resolution:           |  names
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Old description:

> If someone copies a file from system A into Tahoe-LAFS and then later
> someone tries to copy that file from Tahoe-LAFS into system B, then a
> problem could arise if the filename from system A is illegal on system B.
> This can happen in a few ways:
>
> 1.  The filename could be illegal on Windows (http://msdn.microsoft.com
> /en-us/library/aa365247.aspx ), and system B could be Windows and system
> A non-Windows.
>
> 2.  The filename could be illegal on Mac
> (http://developer.apple.com/technotes/tn/tn1150table.html ).
>
> 3.  The filename could case-collide with another filename in the same
> directory, and system B could be a case-insensitive filesystem.  (Note
> that Tahoe's current naïve approach will result in a randomly-chosen one
> of the files overwriting the other if the target system is Windows or
> Macintosh.)
>
> 4.  If we allowed undecodable bytestring filenames from POSIX system A's,
> either by storing bytestring (non-unicode) filenames, or by some escaping
> mechanism such as {{{utf8b}}}, then a non-POSIX system B would not be
> able to accept that name (or at least we ''should'' not write that name
> into that system).  Likewise some users of POSIX have a policy that only
> correctly encoded unicode filenames should be stored in their filesystem,
> so for them we should not write that name even though we can do so by
> using the POSIX byte-oriented APIs.
>
> Here are someone else's notes about these sorts of issues:
>
> http://www.portfoliofaq.com/pfaq/FAQ00352.htm
>
> See also David A. Wheeler's excellent article arguing that we should
> start being pickier about filenames in POSIX systems:
>
> http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html
>
> There are various ways Tahoe can deal with this.  It can do something
> about it on the Tahoe -> system B leg of the trip, such as by stopping
> with an error, offering to rename the offending files, etc..  It could
> also do something about it on the system A -> Tahoe leg of the trip.
>
> I think in the short term it might be better if Tahoe rejected non-
> portable filenames in the system A -> Tahoe leg of the trip, because we
> don't yet know how we want to handle them.  By rejecting them, we avoid
> the current random-overwrite issue and we don't constrain future versions
> of Tahoe-LAFS as much in terms of what sorts of filenames it has to
> support.  (There ''might'' already be some problematic filenames stored
> in Tahoe and we might want to extend Tahoe to deal with these better in
> the future, but if Tahoe-v1.5 starts rejecting new ones then the problem
> will probably be less widespread and less severe in the future.)
>
> On the other hand, rejecting them would be a UI/API regression, so we
> would probably want to add a {{{--force-nonportable-filenames}}} option
> to make it behave like Tahoe-v1.4 currently does.
>
> Help!?

New description:

 If someone copies a file from system A into Tahoe-LAFS and then later
 someone tries to copy that file from Tahoe-LAFS into system B, then a
 problem could arise if the filename from system A is illegal on system B.
 This can happen in a few ways:

 1.  The filename could be illegal on Windows (http://msdn.microsoft.com
 /en-us/library/aa365247.aspx ), and system B could be Windows and system A
 non-Windows.

 2.  The filename could be illegal on Mac
 (http://developer.apple.com/technotes/tn/tn1150table.html ).

 3.  The filename could case-collide with another filename in the same
 directory, and system B could be a case-insensitive filesystem.  (Note
 that Tahoe's current naïve approach will result in a randomly-chosen one
 of the files overwriting the other if the target system is Windows or
 Macintosh.)

 4.  If we allowed undecodable bytestring filenames from POSIX system A's,
 either by storing bytestring (non-unicode) filenames, or by some escaping
 mechanism such as {{{utf8b}}}, then a non-POSIX system B would not be able
 to accept that name (or at least we ''should'' not write that name into
 that system).  Likewise some users of POSIX have a policy that only
 correctly encoded unicode filenames should be stored in their filesystem,
 so for them we should not write that name even though we can do so by
 using the POSIX byte-oriented APIs.

 Here are someone else's notes about these sorts of issues:

 http://www.portfoliofaq.com/pfaq/FAQ00352.htm

 See also David A. Wheeler's excellent article arguing that we should start
 being pickier about filenames in POSIX systems:

 http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

 There are various ways Tahoe can deal with this.  It can do something
 about it on the Tahoe -> system B leg of the trip, such as by stopping
 with an error, offering to rename the offending files, etc..  It could
 also do something about it on the system A -> Tahoe leg of the trip.

 I think in the short term it might be better if Tahoe rejected non-
 portable filenames in the system A -> Tahoe leg of the trip, because we
 don't yet know how we want to handle them.  By rejecting them, we avoid
 the current random-overwrite issue and we don't constrain future versions
 of Tahoe-LAFS as much in terms of what sorts of filenames it has to
 support.  (There ''might'' already be some problematic filenames stored in
 Tahoe and we might want to extend Tahoe to deal with these better in the
 future, but if Tahoe-v1.5 starts rejecting new ones then the problem will
 probably be less widespread and less severe in the future.)

 On the other hand, rejecting them would be a UI/API regression, so we
 would probably want to add a {{{--force-nonportable-filenames}}} option to
 make it behave like Tahoe-v1.4 currently does.

 Help!?

--

Comment (by zooko):

 Here's a good summary of Windows paths:
 https://googleprojectzero.blogspot.co.uk/2016/02/the-definitive-guide-on-
 win32-to-nt.html

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/731#comment:20>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list