#3581 closed defect (fixed)

Remove argv unicode "mangling" complexity

Reported by: exarkun Owned by: GitHub <noreply@…>
Priority: normal Milestone: undecided
Component: unknown Version: n/a
Keywords: Cc:
Launchpad Bug:


Long ago, setup.py was arranged to generate a "tahoe-script" from a template

The template was a Python program that did not actually run Tahoe-LAFS code itself but instead launched a child process with the `subprocess` module where Tahoe-LAFS code actually ran.

Some time afterwards, the tahoe-script template was altered to better account for some quirk of Unicode on Windows (note particularly mangle). It began to apply a slightly lossy transformation to each argv value. At essentially the same time, a helper to perform the same encoding (note particularly unicode_to_argv. Note that while comments near the mangling code suggest the unmangling is done in src/allmydata/scripts/runner.py, unmangling actually appears to be implemented in https://github.com/tahoe-lafs/tahoe-lafs/commit/9d04b2a317c2ecf4a8138cca93b66d043ad79a6a#diff-7256465b4351f3419c706d268e37f02b581719b6ca510794b420581fc5ec3786 alongside a raft of Windows-specific unicode handling logic.

Fast-forward to present day.

There is no tahoe-script.template anymore. There is no Python wrapper that uses subprocess to launch Tahoe at all. Use of unicode_to_argv has infested more and more of the codebase and it's unclear whether the unmangling ever happens (it only happens at most *once* per process, thanks to global state, and it only happens that one time if the windows "fixups" are initialized, and it's not clear they ever are, at least in the test suite - so all of this behavior may just be completely untested).

Is any of this complexity necessary anymore? I can't say for sure yet. However, it is in the way and it is hard to understand. At best, it is all obsolete and we can delete it and greatly simplify our argv and stdio interactions. At worst, we can update it to account for other changes in the codebase, add carefully targeted automated tests with good documentation to explain what it does and demonstrate that it still works, and gain the understanding of the issues it is grappling with sufficient to let us work around it.

Change History (3)

comment:1 Changed at 2021-01-05T01:13:16Z by exarkun

  • Summary changed from Get rid of argv unicode "mangling" to Remove argv unicode "mangling" complexity

comment:3 Changed at 2021-02-12T18:29:46Z by GitHub <noreply@…>

  • Owner set to GitHub <noreply@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 33d566e/trunk:

Merge pull request #965 from LeastAuthority?/3581.unicode_to_argv.1

Remove unicode_to_argv, argv_to_unicode and weird unicode mangling

Fixes: ticket:3581

Note: See TracTickets for help on using tickets.