= coding standards = Here are some Python code style guidelines. We also include official Python guidelines by reference (see the section on "official Python standards" below). This document overrides the official standards whenever there is a conflict. == basic standards == === compatibility === Tahoe requires Python v2.4.2 or greater. No effort should be made to offer compatibility with versions of Python older than 2.4.2. Effort should be made to work with the most recent release of Python v2.x, and with every release between v2.4.2 and the most recent 2.x. === naming and layout === * Use {{{underscore_separated_names}}} for functions, {{{CamelCapNames}}} for classes, {{{alllowercasenames}}} for modules, and {{{ALL_CAPS_NAMES}}} for constants. Use all lower-case variable names (e.g. {{{variable_name}}} or {{{variablename}}}). Prepend a leading underscore to private names. * Put parentheses around tuples if it helps make the code more readable, leave them off if not. === comments, idioms, miscellany, license, imports, docstrings, line widths === Here is a useful header for starting new Python files: {{{ # Copyright (c) 2011 The Tahoe-LAFS Software Foundation # This file is part of Tahoe-LAFS; see doc/about.html for licensing terms. """ doc string describing the module here """ # import Python Standard Library modules here from allmydata.util.assertutil import _assert, precondition, postcondition # import from other libraries, with a blank line between each library # your code here }}} * Put two blank lines between classes. * Put one blank line before a block comment if the preceding line is code at the same indent level (this makes it harder to mistake the code as part of the comment, and makes the comment easier to read). * Feel free to ignore the part of PEP-8 that says to put each module import on a separate line, but don't import modules from multiple separate packages on the same line. * Ignore the part of PEP-257 which says to put the trailing {{{"""}}} of a multi-line docstring on a separate line separated by a blank line. (That rule appears to have been motivated by a limitation of Emacs which has been fixed.) * Ignore the part of PEP-8 which specifes 79- or 72- char line widths. We use 77 columns. In emacs set {{{fill-column}}} to 77 (e.g. {{{M-x set-variable fill-column 77}}}). * PEP 8 says: "If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies)." This is a good rule; note that it also applies to some non-obvious low-priority operators, like '{{{:}}}' for list slicing. (Example: {{{a[b-c : d]}}} good, {{{a[b - c:d]}}} bad. If a slice is from the start or to the end of the array, put the '{{{:}}}' immediately next to the bracket on that side.) === truths and falsehoods === * Don't use the literals {{{True}}} or {{{False}}} in conditional expressions -- instead just write the expression which will evaluate to true or false. For example, write {{{if expr:}}} instead of {{{if expr == True:}}} and {{{if not expr:}}} instead of {{{if expr == False:}}}. * ''Disputed'': Use the fact that empty sequences, empty strings, empty dicts, {{{0}}}, and {{{None}}} all evaluate to false. Write {{{if not items:}}} instead of {{{if len(items) == 0:}}}. * I disagree with relying on implicit conversion to boolean; I think it's error-prone (and we have had real bugs because of it). -- David-Sarah * But if your intent is to test for {{{None}}} instead of to test for "any false thing", then write it out as {{{if thing is None:}}}. == advanced idioms == === preconditions and assertions === ==== basic preconditions and assertions ==== Make sure you have {{{from allmydata.util.assertutil import _assert, precondition, postcondition}}} in your imports (as shown in the template above). Now design preconditions for your methods and functions, and assert them like this: {{{ def oaep(m, emLen, p=""): precondition(emLen >= (2 * SIZE_OF_UNIQS) + 1, "emLen is required to be big enough.", emLen=emLen, SIZE_OF_UNIQS=SIZE_OF_UNIQS) ... }}} Notice how you pass in any values that ought to be printed out in the error message if the assertion fails. In the example, we pass the values {{{emLen}}} and {{{SIZE_OF_UNIQS}}}. You can pass these as normal args or keyword args. If you use keyword args then the name of the argument will also appear in the error message, which can be helpful. For example, if the assertion above fails, then a debug message will appear at the end of the stack trace, like this: {{{ >>> oaep("some secret thingie", 20) Traceback (most recent call last): File "", line 1, in ? File "", line 2, in oaep File "/home/zooko/playground/pyutil/pyutil/assertutil.py", line 47, in precondition raise preconditionfailureexception AssertionError: precondition: emLen is required to be big enough. -- emLen: 20 , 'SIZE_OF_UNIQS': 20 }}} The "error message" that will accompany a failed expression should be a statement of what is required for correct operation. Don't write something like "Spam isn't firm.", because that is ambiguous: the error could be that the spam is supposed to be firm and it isn't, or the error could be that spam isn't supposed to be firm and it is! The same ambiguity can apply to the sentence "Spam must be firm.". It helps to use the words "required to" in your message, for example "Spam is required to be firm.". ==== class invariants ==== If your class has internal state which is complicated enough that a bug in the class's implementation could lead to garbled internal state, then you should have a class invariant. A class invariant is a method like this (an actual example from !BlockWrangler, but truncated for space): {{{ def _assert_invariants(self): # All of the keys in all of these dicts are required to be ids. for d in (self.bId2chunkobj, self.bId2peers, self.Idsofwantedblocks, self.Idsoflocatedblocks,): _assert(not [key for key in d.keys() if not idlib.is_id(key)], "All of the keys in these dicts are required to be ids.", listofnonIds=[key for key in d.keys() if not idlib.is_id(key)]) # For each (peer, blockId,) tuple in peerclaimedblock, if the peer *has* # claimed the block, then the blockId is required to appear in bId2peers[blockId], # and if the peer has claimed *not* to have the block then the blockId # is required *not* to appear in bId2peers[blockId]. for ((peer, blockId,), claim,) in self.peerclaimedblock.items(): _assert((claim == "yes") == (peer in self.bId2peers.get(blockId, ())), "The blockId must appear in bId2peers if and only if the peer has claimed the block.", claim=claim, peer=peer, bId2peersentry=self.bId2peers.get(blockId, ())) }}} Now you can put {{{assert self._assert_invariants()}}} everywhere in your class where the class ought to be in an internally consistent state. For example, at the beginning of every externally-callable method. This technique can be very valuable in developing a complex class -- it catches bugs early, it isolates bugs into specific code paths, and it clarifies the internal structure of the class so that other developers can hack on it without subtle misunderstandings. * we actually appear to only have one instance of this pattern in Tahoe at time of writing, in {{{allmydata.util.dictutil}}}. It has the disadvantage of cluttering up the logic with calls to {{{_assert_invariants}}}, and should probably be used sparingly. -- David-Sarah ==== assertion policy ==== One axis of interest is how time-consuming the checks are. Many precondition checks can cause typical runtime to explode to O(n^2^) or O(n^3^), for example {{{SortedList.__contains__}}} called {{{_assert_invariants}}} which took O(n log n) each time, when {{{__contains__}}} ought to be O(log n). A caller who was expecting {{{if b in list}}} to take O(log n) could easily wind up turning their O(n log n) routine into O(n^2^) or worse. Another axis is "who could cause it to fail": some checks are looking only at internal state. For example, if {{{SortedList._assert_invariants}}} fails, it indicates a problem in some {{{SortedList}}} method. Other checks are enforcing the external API, like those which do typechecks on input arguments. Even after the {{{SortedList}}} developer has gained confidence in the code and decides that internal checks are no longer necessary, it may be useful to retain the external checks to isolate usage problems that exist in callers. * The general rule is that nodes must be functional for light traffic even when the assertions are turned on. When assertions are turned off (-O), nodes must be functional for heavy traffic. * Time-consuming internal checks: once the code is working properly, consider removing them, but they may be left in place as long as they use {{{assert}}} (the form which gets turned off when -O is used). * Cheap internal checks: once the code is working properly, consider removing them, but it is less of a concern than the time-consuming ones. If they really are cheap, use {{{_assert}}} (the unconditional form that gets used even with -O). * Time-consuming external checks: maybe leave them in place, but always use {{{assert}}} so they will not be used with -O. * Cheap external checks: leave them in place, using the unconditional {{{_assert}}} * Production grids could run with -O (in practice, the allmydata.com production grid runs without -O, because there are no expensive checks in the current codebase). * Testing grids might run without -O in order to detect more bugs. * Local developer tests will probably not use -O, and developers should be prepared to experience the same CPU load problems if they subject their nodes to real traffic levels. Developers can use -O to turn off everyone else's checks, use {{{_assert}}} on their own code to enable their own assertions, and then subject their nodes to heavy traffic, as long as they are sure to change their checks to use {{{assert}}} (or remove them altogether) before committing. === configuration === ==== minimizing configuration ==== * Do not implement configuration files for modules or libraries -- code that is going to be used by other code. Only applications -- code that is going to be used by humans -- have configuration files. Modules and libraries get "configured" by the code that calls them, for example by passing arguments to their constructors. * If there are constant values which end-users do not need to modify, then do not make them configurable, but put them in all-caps variables at the beginning of the Python file in which they are used. * Design algorithms so that they have as few "voodoo constants" and "tweakable parameters" as possible. ==== how to implement configuration ==== Whether in application code or in library code, never pass configuration values via a configuration object. Instead use Python parameters. For example -- here's another real-life example -- do not write {{{ class BlockStore: def __init__(self, confdict={}, recoverdb=True, name='*unnamed*'): if confdict.has_key('MAX_MEGABYTES'): self.maxspace = (2**20) * int(confdict.get('MAX_MEGABYTES')) else: self.maxspace = None self.basepath = os.path.abspath(confdict.get("PATH", "")) self.maintainertype = confdict.get("MAINTAINER", "rnd").lower() self.backendtype = confdict.get("BACKEND", "flat").lower() }}} , but instead write {{{ class BlockStore: def __init__(self, maxspace=None, path="", maintainertype="rnd", backendtype="flat", recoverdb=True, name='*unnamed*'): self.basepath = os.path.abspath(path) self.maintainertype = maintainertype self.backendtype = backendtype }}} . == official Python standards == These are listed in decreasing order of priority, so if a point in one of the latter guidelines contradicts a point in one of the earlier ones, then go with the earlier. The Tahoe-LAFS-specific guidelines above override all else, of course. === PEP 290 === [http://www.python.org/peps/pep-0290.html PEP 290: Code Migration and Modernization] === PEP 8 === [http://www.python.org/peps/pep-0008.html PEP 8: Style Guide for Python Code] === PEP 257 === [http://www.python.org/peps/pep-0257.html PEP 257: Docstring Conventions]