#1437 new defect

Replace os.path (etc.) with twisted.python.filepath

Reported by: arch_o_median Owned by: zancas
Priority: minor Milestone: undecided
Component: code Version: 1.8.2
Keywords: unicode cleanup Cc:
Launchpad Bug:

Description (last modified by zooko)

The filepath file system interface offers a single consistent object oriented interface for the various file system operations we need. Initially we just wanted to know if a file was the descendant of some directory, but this seemingly trivial question is not easily answered with the traditional python file system libraries (e.g. shutil, os.path, the open builtin, ... etc). Moreover these functions are not to be found in a single place, and lack a consistent interface. Additionally we think filepath is unique in offering atomic overwrite, (through setContent). So this ticket is to replace all the older ways of manipulating the filesystem with shiny new FilePath.

Change History (10)

comment:1 Changed at 2011-07-19T04:24:42Z by arch_o_median

  • Description modified (diff)

comment:2 Changed at 2011-07-19T12:30:37Z by zooko

Here is Glyph's advertisement for why you should use filepath:

http://glyph.twistedmatrix.com/2008/02/highlighting-buried-treasure-in-twisted.html

comment:3 Changed at 2011-07-19T16:18:29Z by davidsarah

  • Component changed from unknown to code
  • Keywords unicode cleanup added

When I looked at the code for FilePath recently, it wasn't clear to me whether it properly supported Unicode paths. (It might to the extent that the os.path operations it's using do, but it didn't seem to have been explicitly considered, i.e. if the underlying path can be a Unicode string then that would only work by coincidence.) Unicode path support is critical for Windows; if this is not supported then it would be a regression to switch from unicode strings to FilePath.

comment:4 Changed at 2011-07-19T16:25:33Z by davidsarah

To clarify my previous comment, Windows filesystems -- and the Python APIs -- are capable of representing characters in paths that are not representable as a string in sys.getfilesystemencoding() (which is always "mbcs" on Windows, i.e. the system's "ANSI" encoding). Currently we do handle such paths correctly in most cases (except that a Tahoe-LAFS source tree won't build in a Unicode directory).

comment:5 Changed at 2011-07-26T04:13:49Z by arch_o_median

  • Owner changed from arch_o_median to Zancas

comment:6 Changed at 2011-08-11T04:42:35Z by Zancas

  • Owner changed from Zancas to zancas

comment:7 Changed at 2014-12-19T18:13:24Z by zooko

  • Description modified (diff)

I really want to use filepath. It results in more readable and less error-prone code than using the smorgasbord of open(), os.whatever, shutil.whatever, and so on. I think we should do it. However, there are several issues that the maintainers of filepath are not going to fix anytime soon, or even one critical issue that they refuse to fix in the way that I want it done, so I think we'll need to do it by forking or embedding filepath into our codebase.

The critical issue about which they and I disagree is: what do you do if you go to read a filename from the operating system, and you get a sequence of bytes which cannot be decoded in the nominal encoding? Their answer is: store the bytes, because if you write them back out to the same filesystem later, later users will presumably treat them as having the same encoding, whatever it was, even though our code doesn't know what it was. My answer is: raise an error, because we can't safely transport these bytes out to other systems, or process these bytes, and anyway this is probably a rare condition or even an outright error in the user's system. (That is what we long ago decided to do in Tahoe-LAFS, and we've never had a complaint about it. There are quite extensive discussion threads in old closed tickets and in the tahoe-dev email archive about this design decision.)

That discussion is https://twistedmatrix.com/trac/ticket/5203 . Other tickets are:

comment:8 Changed at 2014-12-20T04:17:12Z by daira

I tried to post this at https://twistedmatrix.com/trac/ticket/5203 but couldn't get past the spam filter:


It doesn't particularly surprise me that this issue has gone for three years without comments, because the Description is quite rambling and doesn't focus on what Zooko wants done. (In fact I'm not clear on what Zooko is asking for either.)

What I would like in order to be able to use FilePath in Tahoe-LAFS is either:

  • for FilePath(unicode_path) to be specified to work, and to return Unicode strings from methods/attributes that currently return path components;
  • or alternatively, a UnicodeFilePath class that works that way.

Currently I think the status is that FilePath(unicode_path) mostly works by coincidence of the stdlib file APIs accepting Unicode paths, but that doesn't fill me with confidence.

Note that representing paths as byte strings can't possibly work correctly in general on Windows. It is not sufficient to only be able to represent paths that have characters in the "ANSI" encoding.

The behaviour around undecodable paths is a secondary issue that we could work around; I wouldn't care if we needed wrapper functions to get the behaviour we wanted there.

Last edited at 2014-12-20T04:17:59Z by daira (previous) (diff)

comment:9 Changed at 2014-12-20T23:05:53Z by daira

Oh, it seems like what I want is https://twistedmatrix.com/trac/ticket/2366. (If anyone pastes my comment above to https://twistedmatrix.com/trac/ticket/5203, please mention that as well.)

comment:10 Changed at 2015-04-05T15:36:40Z by daira

https://twistedmatrix.com/trac/ticket/7805 seems to address my concerns, at least at the API level. It will be a while before we depend on a version of Twisted that has this, though.

Note: See TracTickets for help on using tickets.