diff --git a/CHANGES.rst b/CHANGES.rst index 0cee1a09..7a869e09 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -5,7 +5,8 @@ 5.1.3 (unreleased) ================== -- Nothing changed yet. +- Add documentation section ``Persistency and Equality`` + (`#218 `_). 5.1.2 (2020-10-01) diff --git a/docs/README.rst b/docs/README.rst index 1b4fd4ec..bfb7712e 100644 --- a/docs/README.rst +++ b/docs/README.rst @@ -1,6 +1,6 @@ -========== -Interfaces -========== +============ + Interfaces +============ .. currentmodule:: zope.interface @@ -1046,6 +1046,208 @@ functionality for particular interfaces. how to override functions in interface definitions and why, prior to Python 3.6, the zero-argument version of `super` cannot be used. +.. _global_persistence: + +Persistence, Sorting, Equality and Hashing +========================================== + +.. tip:: For the practical implications of what's discussed below, and + some potential problems, see :ref:`spec_eq_hash`. + +Just like Python classes, interfaces are designed to inexpensively +support persistence using Python's standard :mod:`pickle` module. This +means that one process can send a *reference* to an interface to another +process in the form of a byte string, and that other process can load +that byte string and get the object that is that interface. The processes +may be separated in time (one after the other), in space (running on +different machines) or even be parts of the same process communicating +with itself. + +We can demonstrate this. Observe how small the byte string needed to +capture the reference is. Also note that since this is the same +process, the identical object is found and returned: + +.. doctest:: + + >>> import sys + >>> import pickle + >>> class Foo(object): + ... pass + >>> sys.modules[__name__].Foo = Foo # XXX, see below + >>> pickled_byte_string = pickle.dumps(Foo, 0) + >>> len(pickled_byte_string) + 21 + >>> imported = pickle.loads(pickled_byte_string) + >>> imported == Foo + True + >>> imported is Foo + True + >>> class IFoo(zope.interface.Interface): + ... pass + >>> sys.modules[__name__].IFoo = IFoo # XXX, see below + >>> pickled_byte_string = pickle.dumps(IFoo, 0) + >>> len(pickled_byte_string) + 22 + >>> imported = pickle.loads(pickled_byte_string) + >>> imported is IFoo + True + >>> imported == IFoo + True + +The eagle-eyed reader will have noticed the two funny lines like +``sys.modules[__name__].Foo = Foo``. What's that for? To understand, +we must know a bit about how Python "pickles" (``pickle.dump`` or +``pickle.dumps``) classes or interfaces. + +When Python pickles a class or an interface, it does so as a "global +object" [#global_object]_. Global objects are expected to already +exist (contrast this with pickling a string or an object instance, +which creates a new object in the receiving process) with all their +necessary state information (for classes and interfaces, the state +information would be things like the list of methods and defined +attributes) in the receiving process; the pickled byte string needs +only contain enough data to look up that existing object; this is a +*reference*. Not only does this minimize the amount of data required +to persist such an object, it also facilitates changing the definition +of the object over time: if a class or interface gains or loses +methods or attributes, loading a previously pickled reference will use +the *current definition* of the object. + +The *reference* to a global object that's stored in the byte string +consists only of the object's ``__name__`` and ``__module__``. Before +a global object *obj* is pickled, Python makes sure that the object being +pickled is the same one that can be found at +``getattr(sys.modules[obj.__module__], obj.__name__)``; if there is no +such object, or it refers to a different object, pickling fails. The +two funny lines make sure that holds, no matter how this example is +run (using some doctest runners, it doesn't hold by default, unlike it +normally would). + +We can show some examples of what happens when that condition doesn't +hold. First, what if we change the global object and try to pickle the +old one? + +.. doctest:: + + >>> sys.modules[__name__].Foo = 42 + >>> pickle.dumps(Foo) + Traceback (most recent call last): + ... + _pickle.PicklingError: Can't pickle : it's not the same object as builtins.Foo + +A consequence of this is that only one object of the given name can be +defined and pickled at a time. If we were to try to define a new ``Foo`` +class (remembering that normally the ``sys.modules[__name__].Foo =`` +line is automatic), we still cannot pickle the old one: + +.. doctest:: + + >>> orig_Foo = Foo + >>> class Foo(object): + ... pass + >>> sys.modules[__name__].Foo = Foo # XXX, see below + >>> pickle.dumps(orig_Foo) + Traceback (most recent call last): + ... + _pickle.PicklingError: Can't pickle : it's not the same object as builtins.Foo + +Or what if there simply is no global object? + +.. doctest:: + + >>> del sys.modules[__name__].Foo + >>> pickle.dumps(Foo) + Traceback (most recent call last): + ... + _pickle.PicklingError: Can't pickle : attribute lookup Foo on builtins failed + +Interfaces and classes behave the same in all those ways. + +.. rubric:: What's This Have To Do With Sorting, Equality and Hashing? + +Another important design consideration for interfaces is that they +should be sortable. This permits them to be used, for example, as keys +in a (persistent) `BTree `_. As such, +they define a total ordering, meaning that any given interface can +definitively said to be greater than, less than, or equal to, any +other interface. This relationship must be *stable* and hold the same +across any two processes. + +An object becomes sortable by overriding the equality method +``__eq__`` and at least one of the comparison methods (such as +``__lt__``). + +Classes, on the other hand, are not sortable [#class_sort]_. +Classes can only be tested for equality, and they implement this using +object identity: ``class_a == class_b`` is equivalent to ``class_a is class_b``. + +In addition to being sortable, it's important for interfaces to be +hashable so they can be used as keys in dictionaries or members of +sets. This is done by implementing the ``__hash__`` method [#hashable]_. + +Classes are hashable, and they also implement this based on object +identity, with the equivalent of ``id(class_a)``. + +To be both hashable and sortable, the hash method and the equality and +comparison methods **must** `be consistent with each other +`_. +That is, they must all be based on the same principle. + +Classes use the principle of identity to implement equality and +hashing, but they don't implement sorting because identity isn't a +stable sorting method (it is different in every process). + +Interfaces need to be sortable. In order for all three of hashing, +equality and sorting to be consistent, interfaces implement them using +the same principle as persistence. Interfaces are treated like "global +objects" and sort and hash using the same information a *reference* to +them would: their ``__name__`` and ``__module__``. + +In this way, hashing, equality and sorting are consistent with each +other, and consistent with pickling: + +.. doctest:: + + >>> class IFoo(zope.interface.Interface): + ... pass + >>> sys.modules[__name__].IFoo = IFoo + >>> f1 = IFoo + >>> pickled_f1 = pickle.dumps(f1) + >>> class IFoo(zope.interface.Interface): + ... pass + >>> sys.modules[__name__].IFoo = IFoo + >>> IFoo == f1 + True + >>> unpickled_f1 = pickle.loads(pickled_f1) + >>> unpickled_f1 == IFoo == f1 + True + +This isn't quite the case for classes; note how ``f1`` wasn't equal to +``Foo`` before pickling, but the unpickled value is: + +.. doctest:: + + >>> class Foo(object): + ... pass + >>> sys.modules[__name__].Foo = Foo + >>> f1 = Foo + >>> pickled_f1 = pickle.dumps(Foo) + >>> class Foo(object): + ... pass + >>> sys.modules[__name__].Foo = Foo + >>> f1 == Foo + False + >>> unpickled_f1 = pickle.loads(pickled_f1) + >>> unpickled_f1 == Foo # Surprise! + True + >>> unpickled_f1 == f1 + False + +For more information, and some rare potential pitfalls, see +:ref:`spec_eq_hash`. + +.. rubric:: Footnotes + .. [#create] The main reason we subclass ``Interface`` is to cause the Python class statement to create an interface, rather than a class. @@ -1070,3 +1272,19 @@ functionality for particular interfaces. The interface implementation doesn't enforce this, but maybe it should do some checks. + +.. [#class_sort] In Python 2, classes could be sorted, but the sort + was not stable (it also used the identity principle) + and not useful for persistence; this was considered a + bug that was fixed in Python 3. + +.. [#hashable] In order to be hashable, you must implement both + ``__eq__`` and ``__hash__``. If you only implement + ``__eq__``, Python makes sure the type cannot be + used in a dictionary, set, or with :func:`hash`. In + Python 2, this wasn't the case, and forgetting to + override ``__hash__`` was a constant source of bugs. + +.. [#global_object] From the name of the pickle bytecode operator; it + varies depending on the protocol but always + includes "GLOBAL". diff --git a/docs/api/specifications.rst b/docs/api/specifications.rst index 45cf5e18..8a1f21f7 100644 --- a/docs/api/specifications.rst +++ b/docs/api/specifications.rst @@ -161,11 +161,13 @@ Exmples for :meth:`.Specification.extends`: >>> I2.extends(I2, strict=False) True +.. _spec_eq_hash: + Equality, Hashing, and Comparisons ---------------------------------- Specifications (including their notable subclass `Interface`), are -hashed and compared based solely on their ``__name__`` and +hashed and compared (sorted) based solely on their ``__name__`` and ``__module__``, not including any information about their enclosing scope, if any (e.g., their ``__qualname__``). This means that any two objects created with the same name and module are considered equal and @@ -191,13 +193,22 @@ map to the same value in a dictionary. >>> I1 == orig_I1 == nested_I1 True -Because weak references hash the same as their underlying object, -this can lead to surprising results when weak references are involved, -especially if there are cycles involved or if the garbage collector is -not based on reference counting (e.g., PyPy). For example, if you -redefine an interface named the same as an interface being used in a -``WeakKeyDictionary``, you can get a ``KeyError``, even if you put the -new interface into the dictionary. +Caveats +~~~~~~~ + +While this behaviour works will with :ref:`pickling (persistence) +`, it has some potential downsides to be aware of. + +.. rubric:: Weak References + +The first downside involves weak references. Because weak references +hash the same as their underlying object, this can lead to surprising +results when weak references are involved, especially if there are +cycles involved or if the garbage collector is not based on reference +counting (e.g., PyPy). For example, if you redefine an interface named +the same as an interface being used in a ``WeakKeyDictionary``, you +can get a ``KeyError``, even if you put the new interface into the +dictionary. .. doctest:: @@ -225,6 +236,84 @@ interfaces, you may find surprising ``KeyError`` exceptions. For this reason, it is best to use distinct names for local interfaces within the same test module. +.. rubric:: Providing Dynamic Interfaces + +If you return an interface created inside a function or method, or +otherwise let it escape outside the bounds of that function (such as +by having an object provide it), it's important to be aware that it +will compare and hash equal to *any* other interface defined in that +same module with the same name. This includes interface objects +created by other invocations of that function. + +This can lead to surprising results when querying against those +interfaces. We can demonstrate by creating a module-level interface +with a common name, and checking that it is provided by an object: + +.. doctest:: + + >>> from zope.interface import Interface, alsoProvides, providedBy + >>> class ICommon(Interface): + ... pass + >>> class Obj(object): + ... pass + >>> obj = Obj() + >>> alsoProvides(obj, ICommon) + >>> len(list(providedBy(obj))) + 1 + >>> ICommon.providedBy(obj) + True + +Next, in the same module, we will define a function that dynamically +creates an interface of the same name and adds it to an object. + +.. doctest:: + + >>> def add_interfaces(obj): + ... class ICommon(Interface): + ... pass + ... class I2(Interface): + ... pass + ... alsoProvides(obj, ICommon, I2) + ... return ICommon + ... + >>> dynamic_ICommon = add_interfaces(obj) + +The two instances are *not* identical, but they are equal, and *obj* +provides them both: + +.. doctest:: + + >>> ICommon is dynamic_ICommon + False + >>> ICommon == dynamic_ICommon + True + >>> ICommon.providedBy(obj) + True + >>> dynamic_ICommon.providedBy(obj) + True + +At this point, we've effectively called ``alsoProvides(obj, ICommon, +dynamic_ICommon, I2)``, where the last two interfaces were locally +defined in the function. So checking how many interfaces *obj* now +provides should return three, right? + +.. doctest:: + + >>> len(list(providedBy(obj))) + 2 + +Because ``ICommon == dynamic_ICommon`` due to having the same +``__name__`` and ``__module__``, only one of them is actually provided +by the object, for a total of two provided interfaces. (Exactly which +one is undefined.) Likewise, if we run the same function again, *obj* +will still only provide two interfaces + +.. doctest:: + + >>> _ = add_interfaces(obj) + >>> len(list(providedBy(obj))) + 2 + Interface =========