change policies on implicit coercion to boolean and -O

[Imported from Trac: page CodingStandards, version 23]
daira 2013-05-23 17:59:30 +00:00
parent b721328d47
commit c30f5295cb

@ -43,9 +43,7 @@ from allmydata.util.assertutil import _assert, precondition, postcondition
### truths and falsehoods ### truths and falsehoods
* Don't use the literals `True` or `False` in conditional expressions -- instead just write the expression which will evaluate to true or false. For example, write `if expr:` instead of `if expr == True:` and `if not expr:` instead of `if expr == False:`. * Don't use the literals `True` or `False` in conditional expressions -- instead just write the expression which will evaluate to true or false. For example, write `if expr:` instead of `if expr == True:` and `if not expr:` instead of `if expr == False:`.
* *Disputed*: Use the fact that empty sequences, empty strings, empty dicts, `0`, and `None` all evaluate to false. Write `if not items:` instead of `if len(items) == 0:`. * Avoid relying on the fact that empty sequences, empty strings, empty dicts, `0`, and `None` are treated as false. Write `if len(items) == 0:`, `if thing is None:`, etc.
* I disagree with relying on implicit conversion to boolean; I think it's error-prone (and we have had real bugs because of it). -- David-Sarah
* But if your intent is to test for `None` instead of to test for "any false thing", then write it out as `if thing is None:`.
## advanced idioms ## advanced idioms
@ -77,6 +75,12 @@ AssertionError: precondition: emLen is required to be big enough. -- emLen: 20 <
The "error message" that will accompany a failed expression should be a statement of what is required for correct operation. Don't write something like "Spam isn't firm.", because that is ambiguous: the error could be that the spam is supposed to be firm and it isn't, or the error could be that spam isn't supposed to be firm and it is! The same ambiguity can apply to the sentence "Spam must be firm.". It helps to use the words "required to" in your message, for example "Spam is required to be firm.". The "error message" that will accompany a failed expression should be a statement of what is required for correct operation. Don't write something like "Spam isn't firm.", because that is ambiguous: the error could be that the spam is supposed to be firm and it isn't, or the error could be that spam isn't supposed to be firm and it is! The same ambiguity can apply to the sentence "Spam must be firm.". It helps to use the words "required to" in your message, for example "Spam is required to be firm.".
Assertions are not a substitute for proper error handling! An assertion, precondition or postcondition should only be used for cases that "cannot happen" unless the code is incorrect. They should not be used to check for invalid inputs; in that case, raise an exception instead.
==== avoid "bare assert" ====
Python's built-in `assert` statement, unlike `allmydata.util.assertutil._assert`, can be switched off by the `-O` option to `python` or the `PYTHONOPTIMIZE` environment variable. Although this might sound useful to reduce the overhead of assertions, in practice that overhead is negligable, and conditional assertions are more trouble than they're worth (partly because they create a configuration that is mostly untested). We are in the process of removing all bare asserts from the codebase (#1968).
==== class invariants ==== ==== class invariants ====
If your class has internal state which is complicated enough that a bug in the class's implementation could lead to garbled internal state, then you should have a class invariant. A class invariant is a method like this (an actual example from !BlockWrangler, but truncated for space): If your class has internal state which is complicated enough that a bug in the class's implementation could lead to garbled internal state, then you should have a class invariant. A class invariant is a method like this (an actual example from !BlockWrangler, but truncated for space):
@ -97,7 +101,7 @@ def _assert_invariants(self):
Now you can put `assert self._assert_invariants()` everywhere in your class where the class ought to be in an internally consistent state. For example, at the beginning of every externally-callable method. This technique can be very valuable in developing a complex class -- it catches bugs early, it isolates bugs into specific code paths, and it clarifies the internal structure of the class so that other developers can hack on it without subtle misunderstandings. Now you can put `assert self._assert_invariants()` everywhere in your class where the class ought to be in an internally consistent state. For example, at the beginning of every externally-callable method. This technique can be very valuable in developing a complex class -- it catches bugs early, it isolates bugs into specific code paths, and it clarifies the internal structure of the class so that other developers can hack on it without subtle misunderstandings.
* we actually appear to only have one instance of this pattern in Tahoe at time of writing, in `allmydata.util.dictutil`. It has the disadvantage of cluttering up the logic with calls to `_assert_invariants`, and should probably be used sparingly. -- David-Sarah * we actually appear to only have one instance of this pattern in Tahoe at time of writing, in `allmydata.util.dictutil`. It has the disadvantage of cluttering up the logic with calls to `_assert_invariants`, and should probably be used sparingly. -- Daira
==== assertion policy ==== ==== assertion policy ====
@ -117,47 +121,29 @@ the code and decides that internal checks are no longer necessary, it may be
useful to retain the external checks to isolate usage problems that exist in useful to retain the external checks to isolate usage problems that exist in
callers. callers.
* The general rule is that nodes must be functional for light traffic even * The general rule is that assertions must not prevent nodes from being functional,
when the assertions are turned on. When assertions are turned off (-O), even for heavy traffic.
nodes must be functional for heavy traffic.
* Time-consuming internal checks: once the code is working properly, * Time-consuming internal checks: once the code is working properly, consider
consider removing them, but they may be left in place as long as they removing them, or making them conditional on a module-level DEBUG constant.
use `assert` (the form which gets turned off when -O is used).
* Cheap internal checks: once the code is working properly, consider * Cheap internal checks: keep them even after the code is working properly.
removing them, but it is less of a concern than the time-consuming ones.
If they really are cheap, use `_assert` (the unconditional form
that gets used even with -O).
* Time-consuming external checks: maybe leave them in place, but always * External checks: keep them if they are necessary for security or robustness
use `assert` so they will not be used with -O. (but probably by raising an exception rather than as an assertion). If they
are not necessary for security or robustness, treat in the same way as internal
* Cheap external checks: leave them in place, using the unconditional checks.
`_assert`
* Production grids could run with -O (in practice, the allmydata.com production grid runs without -O, because there are no expensive checks in the current codebase).
* Testing grids might run without -O in order to detect more bugs.
* Local developer tests will probably not use -O, and developers should be
prepared to experience the same CPU load problems if they subject their
nodes to real traffic levels. Developers can use -O to turn off everyone
else's checks, use `_assert` on their own code to enable their own
assertions, and then subject their nodes to heavy traffic, as long as they
are sure to change their checks to use `assert` (or remove them
altogether) before committing.
### configuration === configuration ===
#### minimizing configuration ==== minimizing configuration ====
* Do not implement configuration files for modules or libraries -- code that is going to be used by other code. Only applications -- code that is going to be used by humans -- have configuration files. Modules and libraries get "configured" by the code that calls them, for example by passing arguments to their constructors. * Do not implement configuration files for modules or libraries -- code that is going to be used by other code. Only applications -- code that is going to be used by humans -- have configuration files. Modules and libraries get "configured" by the code that calls them, for example by passing arguments to their constructors.
* If there are constant values which end-users do not need to modify, then do not make them configurable, but put them in all-caps variables at the beginning of the Python file in which they are used. * If there are constant values which end-users do not need to modify, then do not make them configurable, but put them in all-caps variables at the beginning of the Python file in which they are used.
* Design algorithms so that they have as few "voodoo constants" and "tweakable parameters" as possible. * Design algorithms so that they have as few "voodoo constants" and "tweakable parameters" as possible.
#### how to implement configuration ==== how to implement configuration ====
Whether in application code or in library code, never pass configuration values via a configuration object. Instead use Python parameters. For example -- here's another real-life example -- do not write Whether in application code or in library code, never pass configuration values via a configuration object. Instead use Python parameters. For example -- here's another real-life example -- do not write