id summary reporter owner description type status priority milestone component version resolution keywords cc launchpad_bug 3581 "Remove argv unicode ""mangling"" complexity" exarkun GitHub "Long ago, `setup.py` was arranged [https://github.com/tahoe-lafs/tahoe-lafs/blob/083795ddd6b286d75e398e0bccf5a550f7a93b48/setup.py#L367 to generate a ""tahoe-script""] from [https://github.com/tahoe-lafs/tahoe-lafs/blob/083795ddd6b286d75e398e0bccf5a550f7a93b48/bin/tahoe-script.template a template] The template was a Python program that did not actually run Tahoe-LAFS code itself but instead [https://github.com/tahoe-lafs/tahoe-lafs/blob/083795ddd6b286d75e398e0bccf5a550f7a93b48/bin/tahoe-script.template#L51 launched a child process with the `subprocess` module] where Tahoe-LAFS code actually ran. Some time afterwards, [https://github.com/tahoe-lafs/tahoe-lafs/commit/37b07a545f17f8bb76ad024de7cf81f7e81f36de the tahoe-script template was altered to better account for some quirk of Unicode on Windows] (note particularly `mangle`). It began to apply a slightly lossy transformation to each argv value. At essentially the same time, [https://github.com/tahoe-lafs/tahoe-lafs/commit/f036dfaa4bfb9f90d6953cd567506cff758841a8 a helper to perform the same encoding ] (note particularly `unicode_to_argv`. Note that while comments near the mangling code suggest the unmangling is done in `src/allmydata/scripts/runner.py`, unmangling actually appears to be implemented in https://github.com/tahoe-lafs/tahoe-lafs/commit/9d04b2a317c2ecf4a8138cca93b66d043ad79a6a#diff-7256465b4351f3419c706d268e37f02b581719b6ca510794b420581fc5ec3786 alongside a raft of Windows-specific unicode handling logic. Fast-forward to present day. There is no `tahoe-script.template` anymore. There is no Python wrapper that uses `subprocess` to launch Tahoe at all. Use of `unicode_to_argv` has infested more and more of the codebase and it's unclear whether the unmangling ever happens (it only happens at most *once* per process, thanks to global state, and it only happens that one time if the windows ""fixups"" are initialized, and it's not clear they ever are, at least in the test suite - so all of this behavior may just be completely untested). Is any of this complexity necessary anymore? I can't say for sure yet. However, it is in the way and it is hard to understand. At best, it is all obsolete and we can delete it and greatly simplify our argv and stdio interactions. At worst, we can update it to account for other changes in the codebase, add carefully targeted automated tests with good documentation to explain what it does and demonstrate that it still works, and gain the understanding of the issues it is grappling with sufficient to let us work around it. " defect closed normal undecided unknown n/a fixed