= Porting to Python 3 = == Motivation == * Make code behave the same on Python 2 and Python 3, insofar as one can, so e.g. `map()` is the same on Python 2 and Python 3 (i.e. lazy). * Reduce errors by relying on Python 2 behavior and tests as well as manual review. * Try to reduce grunt work. == How to set up your development environment == We use [https://tox.readthedocs.io/en/latest/ tox] to standardize environments across developers and CI. 1. Install tox (globally, probably). 2. In your Tahoe-LAFS working copy, run `tox -e py36 --notest` to bootstrap the `py36` virtualenv. 3. Activate the environment with `source .tox/py36/bin/activate` or equivalent and run `trial allmydata.test.test_python3` as a smoke test. 4. `deactivate` the virtualenv (or switch shells) and run `tox -e py36` to exercise the whole suite. == How to choose a module to port == TBD, something involving core abstractions first, then dependency graph topological traversal. At the moment we're focusing on just porting `allmydata.util`, since it's necessary for other packages. == The porting process, big picture == For a module M, there is also a corresponding module T, the unittests for M. If the tests for M are embedded into a module that tests multiple modules, step one is to split off the tests so there's T that only tests M. Then: 1. Update T to run on both 2+3 (see below for what that looks like). 2. Run T's tests on Python 2. They should still pass! If they don’t, something broke. 3. Port the code module M. 4. Now run T's tests on Python 3. 5. Fix any problems caught by the tests. 6. Add both M and T to `allmydata/util/_python3.py`. 7. Run `tox -e py36` (or equivalent) to update the should-be-passing-on-Python-3 tests list at `misc/python3/ratchet-passing` to include the tests in T plus any other newly passing tests, so that future development doesn't regress Python 3 support. 8. Submit for code review. 9. Check coverage report. If there are uncovered lines, see if you can add tests, or at least file a separate ticket for adding coverage. === Porting a specific Python file === **First**, add explicit byte or unicode annotations for strings where needed. **Second**, run `futurize --write --both-stages --all-imports path/to/file.py`. **Third**, fix the imports (TODO this can probably be automated). Delete this bit: {{{ #!python from future import standard_library standard_library.install_aliases() }}} And replace the `from builtins import *` variant, if any, with: {{{ #!python from future.utils import PY2 if PY2: from future.builtins import filter, map, zip, ascii, chr, hex, input, next, oct, open, pow, round, super, bytes, dict, int, list, object, range, str, max, min # noqa: F401 }}} === When things get complicated === In practice, the methodology above is somewhat idealized: a sufficiently important module might have multiple test files, and might not be easily splittable. This is where the test ratchet comes in. The test ratchet ensures that once a specific test is marked as passing in Python 3, it can't stop passing on Python 3. As a result, progress in porting need not involve a module being fully ported in one PR, or all tests being made to pass. Thus, complex modules can be ported over multiple PRs by just increasing the list of passing tests in each PR, and then only marking the module as fully ported in the final PR. This adds builtins that match Python 3's semantics. The `#noqa: F401` keeps flake8/pyflakes from complaining about unused imports. We do unused imports so that people changing code later don't have to manually check if `map()` is old style or new style. **Fourth**, manually review the code. Futureize is nice, but it very definitely doesn't catch everything, or it makes wrong decisions. In particular: * `map()`, `filter()`, etc. are now lazy. * `dict.keys()` and friends now return a view of the underlying data, rather than a list with a copy. **Fifth**, add a note to the module docstring saying it was ported to Python 3. == Known issues with `future` == The `from builtins import ` thing gives a decent Python 3 layer for Python 2. For example it'll automatically create `__nonzero__` to wrap a `__bool__`. But there are caveats. One of them is the `bytes` objects: 1. `builtins.bytes.translate` are `builtins.bytes.maketrans` buggy on PyPy. One way to fix this is with a `if PY2: translate = string.translate else: translate = bytes.translate`. 2. The behavior with `b"%s" % some_bytes_object` works fine if both objects are Future `builtins.bytes`, or both objects are native Python 2 strings/bytes, but not if you combine them. This has caused bugs. One way to fix this is by exposing only native byte strings for now, see e.g. `allmydata.util.base32`. == Don't leak Future objects == Leaking Future objects (newints, new dicts, new bytes) in module API can break existing code on Python 2. So need to be careful not to do that. For that reason int isn't in the suggested list above. == Other notes == If you just want to run the tests from the explicitly ported test modules, you can do `python -m allmydata.util._python3`.