wiki:Python3

Version 30 (modified by chadwhitacre, at 2020-10-06T11:18:04Z) (diff)

--

Porting to Python 3

Motivation

  • Make code behave the same on Python 2 and Python 3, insofar as one can, so e.g. map() is the same on Python 2 and Python 3 (i.e. lazy).
  • Reduce errors by relying on Python 2 behavior and tests as well as manual review.
  • Try to reduce grunt work.

How to set up your development environment

We use tox to standardize environments across developers and CI.

  1. Install tox (globally, probably; consider pipx).
  2. In your Tahoe-LAFS working copy, run tox -e py36 --notest to bootstrap the py36 virtualenv.
  3. Activate the environment with source .tox/py36/bin/activate or equivalent.
  4. Wire up for local dev with pip install -e .
  5. Run trial allmydata.test.test_python3 as a smoke test.
  6. Options for exercising the whole suite of ported tests (NB: test_python3 != python3_tests):
    1. trial allmydata.test.python3_tests
    2. python -m allmydata.test.python3_tests
    3. deactivate the virtualenv (or switch shells) and run tox -e py36

Worklist

Submodule† Status Assignee
__init__ todo
__main__ todo
_auto_deps todo
_version todo
blacklist todo
check_results todo
client todo
codec done
control todo
crypto done
deep_stats todo
dirnode todo
frontends todo
hashtree done
history todo
immutable doing Chad
interfaces done
introducer doing
monitor done
mutable todo
node doing Ross
nodemaker todo
scripts todo
stats todo
storage doing Itamar
storage_client todo
test doing
testing todo
unknown todo
uri done
util doing
version_checks todo
web todo
webish todo
windows todo

† of allmydata

‡ Expect spaghetti (see below).

The porting process, big picture

For a module M, there is also a corresponding module T, the unittests for M. If the tests for M are embedded into a module that tests multiple modules, step one is to split off the tests so there's T that only tests M.

Then:

  1. Update T to run on both 2+3 (see below for what that looks like).
  2. Run T's tests on Python 2. They should still pass! If they don’t, something broke.
  3. Port the code module M.
  4. Now run T's tests on Python 3.
  5. Fix any problems caught by the tests.
  6. Add both M and T to allmydata/util/_python3.py.
  7. Run tox -e py36 (or equivalent) and verify that the module you ported is included and passing.
  8. Submit for code review.
  9. Check coverage report. If there are uncovered lines, see if you can add tests, or at least file a separate ticket for adding coverage.

When ports get harder due to spaghetti dependencies

As the port progresses, the simple "port module + its test module" gets difficult, since everything ends up depending on everything else. Here's one way to approach this:

  1. Port only the test module. This involves many Python 3 fixes to lots of other modules, but they are not officially ported, they're just inched along just enough to make the tests pass. Since the test module is officially ported, regressions to the Python 3 port still are prevented.
  2. Then, port the corresponding module.

Porting a specific Python file

Zeroth, file a new ticket in milestone "Python 3", assign it to yourself.

First, add explicit byte or unicode annotations for strings where needed.

Second, run futurize --write --both-stages --all-imports path/to/file.py.

Third, fix the imports (TODO this can probably be automated).

Delete this bit:

from future import standard_library
standard_library.install_aliases()

And replace the from builtins import * variant, if any, with:

from future.utils import PY2
if PY2:
    from future.builtins import filter, map, zip, ascii, chr, hex, input, next, oct, open, pow, round, super, bytes, dict, list, object, range, str, max, min  # noqa: F401

This adds builtins that match Python 3's semantics. The #noqa: F401 keeps flake8/pyflakes from complaining about unused imports. We do unused imports so that people changing code later don't have to manually check if map() is old style or new style.

Fourth, manually review the code. Futureize is nice, but it very definitely doesn't catch everything, or it makes wrong decisions.

In particular:

  • map(), filter(), etc. are now lazy.
  • dict.keys() and friends now return a view of the underlying data, rather than a list with a copy.

Fifth, add a note to the module docstring saying it was ported to Python 3.

Sixth, open a PR with the Python 3 Port label.

Known issues with future

The from builtins import <every builtin ever> thing gives a decent Python 3 layer for Python 2. For example it'll automatically create __nonzero__ to wrap a __bool__.

But there are caveats.

One of them is the bytes objects:

  1. builtins.bytes.translate are builtins.bytes.maketrans buggy on PyPy?. One way to fix this is with a if PY2: translate = string.translate else: translate = bytes.translate.
  2. The behavior with b"%s" % some_bytes_object works fine if both objects are Future builtins.bytes, or both objects are native Python 2 strings/bytes, but not if you combine them. This has caused bugs. One way to fix this is by exposing only native byte strings for now, see e.g. allmydata.util.base32.

Don't leak Future objects

Leaking Future objects (newints, new dicts, new bytes) in module API can break existing code on Python 2. So need to be careful not to do that. For that reason int isn't in the suggested from builtins import ... list above.