#1135 closed defect (fixed)

reduce unnecessary escaping in quote_output

Reported by: davidsarah Owned by: davidsarah
Priority: major Milestone: 1.8β
Component: code-frontend-cli Version: 1.7.1
Keywords: usability unicode reviewed Cc:
Launchpad Bug:

Description (last modified by davidsarah)

The quote_output function, which is used to output possibly-Unicode strings to stdout and stderr, does more backslash-escaping than is necessary for the output to be interpreted unambiguously. In particular, it will double any backslashes in the string, which makes Windows paths look ugly.

We can reduce this by making it follow POSIX-like escaping rules: within single quotes, backslashes are not special, and the only printable character that cannot appear is a single quote. If we do have a single quote or control characters in the string, then we use double quotes with backslash-escaping.

Attachments (1)

encodingutil-reduce-quote-escaping.dpatch (10.3 KB) - added by davidsarah at 2010-07-23T08:17:42Z.
util.encodingutil: change quote_output to do less unnecessary escaping, and to use double-quotes more consistently when needed. This version avoids u-escaping for characters that are representable in the output encoding, when double quotes are used; and includes tests. fixes #1135

Download all attachments as: .zip

Change History (23)

comment:1 Changed at 2010-07-22T01:51:32Z by davidsarah

Please hold off reviewing this; the _double_quote function is broken for some inputs.

comment:2 Changed at 2010-07-22T02:00:21Z by davidsarah

  • Owner set to davidsarah
  • Status changed from new to assigned

comment:3 Changed at 2010-07-22T02:01:18Z by davidsarah

  • Description modified (diff)

Changed at 2010-07-23T08:17:42Z by davidsarah

util.encodingutil: change quote_output to do less unnecessary escaping, and to use double-quotes more consistently when needed. This version avoids u-escaping for characters that are representable in the output encoding, when double quotes are used; and includes tests. fixes #1135

comment:4 Changed at 2010-07-23T08:18:29Z by davidsarah

  • Keywords review-needed added

comment:5 Changed at 2010-07-23T08:18:46Z by davidsarah

  • Owner davidsarah deleted
  • Status changed from assigned to new

comment:6 Changed at 2010-07-24T04:49:22Z by zooko

This seems like a utility which is not at all Tahoe-LAFS-specific and could be useful to lots of other Python programmers.

Do you have any ideas about how to make it available to them?

I've been trying to do that sort of thing by publishing the utility in my pyutil library. (See also #47 use pyutil as a separate package and contribute src/allmydata/util/* into pyutil) I'm not sure if this has helped anyone! Nobody uses pyutil as far as I am aware. (Hm, although http://pypi.python.org/pypi/pyutil says that it has been download more than a thousand times. I wonder.) Recently when I added a new feature to pyutil that I wanted people to be aware of I also made a new separate package named jsonutil and separately uploaded it to pypi and announced it on the python mailing list: http://pypi.python.org/pypi/jsonutil .

Maybe the Right thing to do for this is to package up everything into pyutil (per #47) and then to open requests of the Python upstream maintainers that they consider some (or all) of those functions as additions to the Python Standard Library.

Over the years (almost ten years, now) of my maintaining pyutil, I've seen many of the features of pyutil get added as standard features of the Python Standard Library, but in every case because someone else independently discovered this idea, independently implemented it, and pushed it into the Python Standard Library. I guess if we want the Python folks to notice features like this in pyutil we have to actually tell them. :-)

comment:7 Changed at 2010-07-31T00:40:03Z by davidsarah

For reviewers: the test_quote_output_mock method (which tests quote_output with the default output encoding) has been simplified on the ticket1074 branch to not use mocks:

    def test_quote_output_default(self):
        encodingutil.output_encoding = 'ascii'
        self.test_quote_output_ascii(None)

        encodingutil.output_encoding = 'latin1'
        self.test_quote_output_latin1(None)

        encodingutil.output_encoding = 'utf-8'
        self.test_quote_output_utf8(None)

comment:8 Changed at 2010-07-31T00:46:23Z by davidsarah

It's probably easier to review this code as it is on the ticket1074 branch (tests here), rather than the patch.

comment:9 follow-up: Changed at 2010-08-01T20:54:29Z by david-sarah@…

  • Owner set to david-sarah@…
  • Resolution set to fixed
  • Status changed from new to closed

In [4600/ticket798]:

util.encodingutil: change quote_output to do less unnecessary escaping, and to use double-quotes more consistently when needed. This version avoids u-escaping for characters that are representable in the output encoding, when double quotes are used, and includes tests. fixes #1135

comment:10 in reply to: ↑ 9 Changed at 2010-08-02T19:34:15Z by davidsarah

  • Resolution fixed deleted
  • Status changed from closed to reopened

Replying to david-sarah@…:

In [4600/ticket798]:

util.encodingutil: change quote_output to do less unnecessary escaping, and to use double-quotes more consistently when needed. This version avoids u-escaping for characters that are representable in the output encoding, when double quotes are used, and includes tests. fixes #1135

This ticket should not have been auto-closed yet, because the commit was on a branch.

comment:11 Changed at 2010-08-07T21:37:27Z by davidsarah

  • Owner changed from david-sarah@… to somebody
  • Status changed from reopened to new

Applied to trunk in 28e6ad51a794067f. Still could do with review.

comment:12 Changed at 2010-12-30T23:43:48Z by davidsarah

For the benefit of reviewers, the POSIX shell escaping that this is approximately trying to emulate is specified here, except that that spec doesn't discuss Unicode. Generally when a Unicode character can be represented in the output encoding as itself, we do so, otherwise we use an \x, \u or \U escape (whichever is shortest). Interesting questions are:

  • is this a sensible way of escaping strings?
  • does it have any gratuitous inconsistencies with POSIX, C99, Python, or other escaping conventions that users might be used to, that could easily be fixed?
  • does the implementation correspond to the comment?
  • are there important cases missing in the tests?
Last edited at 2010-12-30T23:47:13Z by davidsarah (previous) (diff)

comment:13 Changed at 2011-01-08T17:10:26Z by Zarutian

  • Keywords review-needed removed
  • Owner changed from somebody to davidsarah

Minimalizing, or omitting, the quoteing of unicode characters that can be represented in the output encoding is pretty neat.

However regarding backslash quoting behaviour: wont this break other stuff due to things such as "C:\new folder\fav_pet\targh42" being interpreted as "C:" followed by a newline, followed by "ew folder", followed by formfeed, followed by "av_pet", followed by tab and finally ending with "argh42"

I think doubling the backslashes in Windows pathnames is unavoidable.

comment:14 Changed at 2011-01-08T17:26:07Z by zooko

Perhaps we (probably David-Sarah) could add documentation (in the form of doc file, docstring, or comment) to the patch addressing Zarutian's concern.

comment:15 Changed at 2011-01-09T01:34:21Z by david-sarah@…

In 6ce3ec6d0d363e68:

docs/frontends/CLI.rst: discuss commandline/output quoting issues and wildcards. refs #1135

comment:16 follow-up: Changed at 2011-01-09T01:35:57Z by davidsarah

Replying to Zarutian:

Minimalizing, or omitting, the quoteing of unicode characters that can be represented in the output encoding is pretty neat.

However regarding backslash quoting behaviour: wont this break other stuff due to things such as "C:\new folder\fav_pet\targh42" being interpreted as "C:" followed by a newline, followed by "ew folder", followed by formfeed, followed by "av_pet", followed by tab and finally ending with "argh42"

No, because single-quote escaping single-quoting doesn't give a special meaning to backslashes.

This is described in the comment at src/allmydata/util/encodingutil.py@4616#L195, but not in any end-user documentation. I've now added some in docs/frontends/CLI.rst.

Last edited at 2011-01-09T02:04:02Z by davidsarah (previous) (diff)

comment:17 Changed at 2011-01-09T02:05:09Z by davidsarah

  • Keywords review-needed added
  • Owner changed from davidsarah to zarutian

comment:18 Changed at 2011-01-15T00:07:33Z by zooko

  • Owner changed from zarutian to Zarutian

comment:19 in reply to: ↑ 16 ; follow-up: Changed at 2011-01-15T00:39:10Z by Zarutian

  • Keywords reviewed added; review-needed removed
  • Owner changed from Zarutian to davidsarah

Replying to davidsarah:

-snip-

No, because single-quote escaping single-quoting doesn't give a special meaning to backslashes.

This is described in the comment at src/allmydata/util/encodingutil.py@4616#L195, but not in any end-user documentation. I've now added some in docs/frontends/CLI.rst.

I say that having consistant handling of backslashes and having to deal with some doubling of them is preferable to other issues such as subtle modal context changes. Because I know in the future it will happen that some poor sod will miss the subtle distinction between single- and double-quotes in the quoted output. (I know I have, in the past elsewhere)

However if you want to use single quotes to turn of backslash escaping when it might not be warrented then feel free to do so. (But dont blame me when the poor sod comes along ;)

comment:20 in reply to: ↑ 19 Changed at 2011-01-15T01:10:12Z by davidsarah

Replying to Zarutian:

... However if you want to use single quotes to turn of backslash escaping when it might not be warrented then feel free to do so. (But dont blame me when the poor sod comes along ;)

I guess that's a -0 on this change (which was made in 1.8) :-/. FWIW, I think that casual users will probably not notice that the single-quoted strings are quoted at all, and so they won't be confused by them being different to double-quoted strings.

(Previous to quote_output being added 80252debcd94fc28, in many cases we just had names being printed to stdout/err using "... '%s' ..." % (name,), which was definitely wrong.)

I think we can probably close this ticket now, unless anyone wants to do a more technical code review.

comment:21 Changed at 2011-01-16T17:41:57Z by Zarutian

A quick note: I cant see any obvious bugs (such as off by one, fence posting or logic ones) in this patch so I think it is most propably okay.

(Please note that I do not know Python and have to infer what I have seen in other similiar languages both syntactically and semantically.)

Last edited at 2011-01-16T17:42:52Z by Zarutian (previous) (diff)

comment:22 Changed at 2011-01-22T04:50:49Z by davidsarah

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.