Changes between Initial Version and Version 1 of CodingStandards


Ignore:
Timestamp:
2009-02-22T02:06:14Z (16 years ago)
Author:
zooko
Comment:

import and lightly fix coding standards from the old amdlib project

Legend:

Unmodified
Added
Removed
Modified
  • CodingStandards

    v1 v1  
     1= coding standards =
     2
     3Here are some Python code style guidelines. We also include official Python guidelines by reference (see the section on "official Python standards" below). This document overrides the official standards whenever there is a conflict.
     4
     5== basic standards ==
     6=== compatibility ===
     7
     8Tahoe requires Python v2.4 or greater. No effort should be made to offer compatibility with versions of Python older than 2.4. Effort should be made to work with the most recent release of Python v2.x, and with every release between v2.4 and the most recent release.
     9
     10=== naming and layout ===
     11
     12 * Use {{{underscore_separated_names}}} for functions, {{{CamelCapNames}}} for classes, {{{alllowercasenames}}} for modules, and {{{ALL_CAPS_NAMES}}} for constants. Use all lower-case variable names (e.g. {{{variable_name}}} or {{{variablename}}}). Prepend a leading underscore to private names.
     13 * Always put parenthesis around tuples. Write {{{(spam, eggs) = fetch_breakfast()}}} instead of {{{spam, eggs = fetch_breakfast()}}}.
     14
     15=== comments, idioms, miscellany, license ===
     16
     17Here is a useful header for starting new Python files:
     18
     19{{{
     20# Copyright (c) 2009 Allmydata, Inc.
     21__copyright__ = "Copyright (c) 2009 Allmydata, Inc."
     22
     23"""
     24doc string describing the module here
     25"""
     26
     27# import Python Standard Library modules here
     28
     29from amdlib.util.assertutil import _assert, precondition, postcondition
     30
     31# import from other libraries, with a blank line between each library
     32
     33# your code here
     34}}}
     35
     36 * Files should begin with a copyright notice then a docstring about that module.
     37
     38=== truths and falsehoods ===
     39
     40 * Always use {{{True}}} or {{{False}}} if that is what you mean -- never write a literal {{{0}}}, {{{1}}}, or {{{None}}} to indicate a boolean value.
     41 * Never use the literals {{{True}}} or {{{False}}} in conditional expressions -- instead just write the expression which will evaluate to true or false. For example, write {{{if expr:}}} instead of {{{if expr == True:}}} and {{{if not expr:}}} instead of {{{if expr == False:}}}.
     42 * Use the fact that empty sequences, empty strings, empty dicts, {{{0}}}, and {{{None}}} all evaluate to false. Write {{{if not items:}}} instead of {{{if len(items) == 0:}}}.
     43 * But if your intent is to test for {{{None}}} instead of to test for "any false thing", then write it out as {{{if thing is None:}}}.
     44
     45== advanced idioms ==
     46
     47=== assertion policy ===
     48
     49One axis of interest is how time-consuming the checks are. Many precondition
     50checks can cause typical runtime to explode to O(n^2^) or O(n^3^), for example
     51{{{SortedList.__contains__}}} called {{{_assert_invariants}}} which took
     52O(n log n) each time, when {{{__contains__}}} ought to be O(log n). A caller who
     53was expecting {{{if b in list}}} to take O(log n) could easily wind up turning
     54their O(n log n) routine into O(n^2^) or worse.
     55
     56Another axis is "who could cause it to fail": some checks are looking only at
     57internal state. For example, if {{{SortedList._assert_invariants}}} fails, it
     58indicates a problem in some {{{SortedList}}} method. Other checks are
     59enforcing the external API, like those which do typechecks on input
     60arguments. Even after the {{{SortedList}}} developer has gained confidence in
     61the code and decides that internal checks are no longer necessary, it may be
     62useful to retain the external checks to isolate usage problems that exist in
     63callers.
     64
     65We decided upon a couple of policies for the future:
     66
     67 * The general rule is that nodes must be functional for light traffic even
     68   when the assertions are turned on. When assertions are turned off (-O),
     69   nodes must be functional for heavy traffic.
     70
     71 * Time-consuming internal checks: once the code is working properly,
     72   consider removing them, but they may be left in place as long as they
     73   use {{{assert}}} (the form which gets turned off when -O is used).
     74
     75 * Cheap internal checks: once the code is working properly, consider
     76   removing them, but it is less of a concern than the time-consuming ones.
     77   If they really are cheap, use {{{_assert}}} (the unconditional form
     78   that gets used even with -O).
     79
     80 * Time-consuming external checks: maybe leave them in place, but always
     81   use {{{assert}}} so they will not be used with -O.
     82
     83 * Cheap external checks: leave them in place, using the unconditional
     84   {{{_assert}}}
     85
     86 * Production grids could run with -O (in practice, the allmydata.com production grid runs without -O, because there are no expensive checks in the current codebase).
     87
     88 * Testing grids might run without -O in order to detect more bugs.
     89
     90 * Local developer tests will probably not use -O, and developers should be
     91   prepared to experience the same CPU load problems if they subject their
     92   nodes to real traffic levels. Developers can use -O to turn off everyone
     93   else's checks, use {{{_assert}}} on their own code to enable their own
     94   assertions, and then subject their nodes to heavy traffic, as long as they
     95   are sure to change their checks to use {{{assert}}} (or remove them
     96   altogether) before committing.
     97
     98
     99=== preconditions and assertions ===
     100
     101==== basic preconditions and assertions ====
     102
     103Make sure you have {{{from allmydata.util.assertutil import _assert, precondition, postcondition}}} in your imports (as shown in the template above). Now design preconditions for your methods and functions, and assert them like this:
     104
     105{{{
     106def oaep(m, emLen, p=""):
     107    precondition(emLen >= (2 * SIZE_OF_UNIQS) + 1, "emLen is required to be big enough.", emLen=emLen, SIZE_OF_UNIQS=SIZE_OF_UNIQS)
     108    ...
     109}}}
     110
     111Notice how you pass in any values that ought to be printed out in the error message if the assertion fails. In the example, we pass the values {{{emLen}}} and {{{SIZE_OF_UNIQS}}}. You can pass these as normal args or keyword args. If you use keyword args then the name of the argument will also appear in the error message, which can be helpful. For example, if the assertion above fails, then a debug message will appear at the end of the stack trace, like this:
     112
     113{{{
     114>>> oaep("some secret thingie", 20)
     115Traceback (most recent call last):
     116  File "<stdin>", line 1, in ?
     117  File "<stdin>", line 2, in oaep
     118  File "/home/zooko/playground/pyutil/pyutil/assertutil.py", line 47, in precondition
     119    raise preconditionfailureexception
     120AssertionError: precondition: emLen is required to be big enough. -- emLen: 20 <type 'int'>, 'SIZE_OF_UNIQS': 20 <type 'int'>
     121}}} 
     122
     123The "error message" that will accompany a failed expression should be a statement of what is required for correct operation. Don't write something like "Spam isn't firm.", because that is ambiguous: the error could be that the spam is supposed to be firm and it isn't, or the error could be that spam isn't supposed to be firm and it is! The same ambiguity can apply to the sentence "Spam must be firm.".  It helps to use the words "required to" in your message, for example "Spam is required to be firm.".
     124
     125==== class invariants ====
     126
     127If your class has internal state which is complicated enough that a bug in the class's implementation could lead to garbled internal state, then you should have a class invariant. A class invariant is a method like this (an actual example from BlockWrangler, but truncated for space):
     128
     129{{{
     130def _assert_consistency(self):
     131    # All of the keys in all of these dicts are required to be ids.
     132    for d in (self.bId2chunkobj, self.bId2peers, self.Idsofwantedblocks, self.Idsoflocatedblocks,):
     133        _assert(not [key for key in d.keys() if not idlib.is_id(key)], "All of the keys in these dicts are required to be ids.", listofnonIds=[key for key in d.keys() if not idlib.is_id(key)])
     134
     135    # For each (peer, blockId,) tuple in peerclaimedblock, if the peer *has*
     136    # claimed the block, then the blockId is required to appear in bId2peers[blockId],
     137    # and if the peer has claimed *not* to have the block then the blockId
     138    # is required *not* to appear in bId2peers[blockId].
     139    for ((peer, blockId,), claim,) in self.peerclaimedblock.items():
     140        _assert((claim == "yes") == (peer in self.bId2peers.get(blockId, ())), "The blockId must appear in bId2peers if and only if the peer has claimed the block.", claim=claim, peer=peer, bId2peersentry=self.bId2peers.get(blockId, ()))
     141}}}     
     142
     143Now you can put {{{assert self._assert_consistency()}}} everywhere in your class where the class ought to be in an internally consistent state. For example, at the beginning of every externally-callable method. This technique can be very valuable in developing a complex class -- it catches bugs early, it isolates bugs into specific code paths, and it clarifies the internal structure of the class so that other developers can hack on it without subtle misunderstandings.
     144
     145=== configuration ===
     146
     147==== minimizing configuration ====
     148
     149 * Do not implement configuration files for modules or libraries -- code that is going to be used by other code. Only applications -- code that is going to be used by humans -- have configuration files. Modules and libraries get "configured" by the code that calls them, for example by passing arguments to their constructors.
     150 * If there are constant values which end-users do not need to modify, then do not make them configurable, but put them in all-caps variables at the beginning of the Python file in which they are used.
     151 * Design algorithms so that they have as few "voodoo constants" and "tweakable parameters" as possible. If you find yourself needing to add more and more special cases to handle failures of the basic algorithm, this might indicate that the basic algorithm ought to be replaced with one that doesn't have so many edge cases.
     152
     153==== how to implement configuration ====
     154
     155Whether in application code or in library code, never pass configuration values via a configuration object. Instead use Python parameters. For example -- here's another real-life example -- do not write
     156
     157{{{
     158class BlockStore:
     159    def __init__(self, confdict={}, recoverdb=True, name='*unnamed*'):
     160        if confdict.has_key('MAX_MEGABYTES'):
     161            self.maxspace = (2**20) * int(confdict.get('MAX_MEGABYTES'))
     162        else:
     163            self.maxspace = None
     164        self.basepath = os.path.abspath(confdict.get("PATH", ""))
     165        self.maintainertype = confdict.get("MAINTAINER", "rnd").lower()
     166        self.backendtype = confdict.get("BACKEND", "flat").lower()
     167
     168blockstore = BlockStore(confdict)
     169}}}
     170
     171, but instead write
     172
     173{{{
     174class BlockStore:
     175    def __init__(self, maxspace=None, path="", maintainertype="rnd", backendtype="flat", recoverdb=True, name='*unnamed*'):
     176        self.basepath = os.path.abspath(path)
     177        self.maintainertype = maintainertype
     178        self.backendtype = backendtype
     179
     180maxspace = confdict.get('MAX_MEGABYTES')
     181if maxspace is not None:
     182    maxspace = int(maxspace) * 2**20
     183blockstore = BlockStore(maxspace, confdict.get('PATH', ""), confdict.get('MAINTAINER', 'rnd').lower(), confdict.get('BACKEND', 'flat').lower())
     184}}}
     185.
     186
     187== Twisted gotchas ==
     188
     189The standard-library os.popen() function is not compatible with the way Twisted uses SIGCHLD. The usual symptom is that a read() or readlines() on the filehandle returned by
     190os.popen() fails with EINTR "Interrupted System Call". I'm not sure exactly why this happens, and worse yet it only happens intermittently (-warner 06-Dec-2005).
     191
     192Instead of os.popen(), you should use twisted.internet.utils.getProcessOutput(). This function returns a Deferred which fires (with the contents of stdout) when the process
     193completes. This means, of course, that you must attach a callback to handle the output, rather than blocking the entire process while you wait for a synchronous result.
     194
     195== official Python standards ==
     196
     197These are listed in decreasing order of priority, so if a point in one of the latter guidelines contradicts a point in one of the earlier ones, then go with the earlier. The Allmydata-specific guidelines above override all else, of course.
     198
     199=== PEP 290 ===
     200
     201[http://www.python.org/peps/pep-0290.html PEP 290: Code Migration and Modernization]
     202
     203=== PEP 8 ===
     204
     205[http://www.python.org/peps/pep-0008.html PEP 8: Style Guide for Python Code]
     206
     207=== PEP 257 ===
     208
     209[http://www.python.org/peps/pep-0257.html PEP 257: Docstring Conventions]