From 3b99e6d0972af28a1547262ccc3ee5519ed9ff68 Mon Sep 17 00:00:00 2001 From: dieter Date: Fri, 16 Oct 2020 09:28:09 +0200 Subject: [PATCH 1/6] Document persistency and equality --- CHANGES.rst | 3 ++- docs/README.rst | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+), 1 deletion(-) diff --git a/CHANGES.rst b/CHANGES.rst index 0cee1a09..7a869e09 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -5,7 +5,8 @@ 5.1.3 (unreleased) ================== -- Nothing changed yet. +- Add documentation section ``Persistency and Equality`` + (`#218 `_). 5.1.2 (2020-10-01) diff --git a/docs/README.rst b/docs/README.rst index 1b4fd4ec..2bcc6229 100644 --- a/docs/README.rst +++ b/docs/README.rst @@ -1046,6 +1046,43 @@ functionality for particular interfaces. how to override functions in interface definitions and why, prior to Python 3.6, the zero-argument version of `super` cannot be used. + +Persistency and Equality +======================== + +An important design goal has been persistency support for interfaces. +This allows different processes to use the same interface and +to create persistent associations between (persistent) objects +and interfaces. For example, an application can store an object +together with its provided interfaces in a database; later, potentially +in a different invocation, it can search the database for objects +providing a given interface. + +To make an object persistent, it must have a persistent identifier, +PID for short. +It is this PID which identifies the object across different +processes. In the context of interfaces, we want to support +evolution similarly to the evolution of classes: even though +we change the interface (e.g. add a method, change documentation), we +still want to consider it as the same interface. + +Python's main persistency support comes from its ``pickle`` module. +It uses as PID for code objects (such as classes and functions) +the combinations of their ``__module__`` and ``name`` attributes. +In analogy, the PID for an interface is defined +as the combination of its +``__module__`` and ``__name__`` giving persistency behavior similar +to classes, including evolution support. + +Unlike classes, interfaces define their +(runtime) equality in terms of their PID, i.e. two interfaces are +considered equal if they have equal PIDs. Especially, +two interfaces defined in the same module are considered equal +if they have the same name, even if they are otherwise completely +unrelated. In rare cases, this may lead to surprises - especially, +when interfaces are defined dynamically (e.g. inside functions). + + .. [#create] The main reason we subclass ``Interface`` is to cause the Python class statement to create an interface, rather than a class. From c49e8b27b6769904ae3bfc779bb4565edcc7053a Mon Sep 17 00:00:00 2001 From: dieter Date: Mon, 19 Oct 2020 08:16:55 +0200 Subject: [PATCH 2/6] add example --- docs/README.rst | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/docs/README.rst b/docs/README.rst index 2bcc6229..8cdf107a 100644 --- a/docs/README.rst +++ b/docs/README.rst @@ -1079,8 +1079,35 @@ Unlike classes, interfaces define their considered equal if they have equal PIDs. Especially, two interfaces defined in the same module are considered equal if they have the same name, even if they are otherwise completely -unrelated. In rare cases, this may lead to surprises - especially, +unrelated. In rare cases, this can lead to surprises - especially, when interfaces are defined dynamically (e.g. inside functions). +This is demonstrated by the following example where the locally +defined interface ``I`` is identified with the globally defined +interface of the same name and therefore not added by ``alsoProvides``. + +.. doctest:: + + >>> from zope.interface import Interface, alsoProvides, providedBy + >>> + >>> class I(Interface): + ... pass + ... + >>> class Obj(object): + ... pass + ... + >>> obj = Obj() + >>> alsoProvides(obj, I) + >>> def add_interfaces(obj): + ... class I(Interface): + ... pass + ... class I2(Interface): + ... pass + ... alsoProvides(obj, I, I2) + ... + >>> add_interfaces(obj) + >>> # we would expect that *obj* provides 3 interfaces at this place but + ... len(list(providedBy(obj))) + 2 .. [#create] The main reason we subclass ``Interface`` is to cause the From 4f76a54239ee1537040238c5af4a35dd189fbc4c Mon Sep 17 00:00:00 2001 From: dieter Date: Mon, 19 Oct 2020 08:28:18 +0200 Subject: [PATCH 3/6] cosmetics --- docs/README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/README.rst b/docs/README.rst index 8cdf107a..c5471648 100644 --- a/docs/README.rst +++ b/docs/README.rst @@ -1097,6 +1097,7 @@ interface of the same name and therefore not added by ``alsoProvides``. ... >>> obj = Obj() >>> alsoProvides(obj, I) + >>> >>> def add_interfaces(obj): ... class I(Interface): ... pass From 27347de8773b42cb3b9c33b2cd95b72b9d31487c Mon Sep 17 00:00:00 2001 From: Jason Madden Date: Fri, 23 Oct 2020 15:18:16 -0500 Subject: [PATCH 4/6] Add example from #220 to the specification docs and expand it. --- docs/api/specifications.rst | 105 +++++++++++++++++++++++++++++++++--- 1 file changed, 97 insertions(+), 8 deletions(-) diff --git a/docs/api/specifications.rst b/docs/api/specifications.rst index 45cf5e18..8a1f21f7 100644 --- a/docs/api/specifications.rst +++ b/docs/api/specifications.rst @@ -161,11 +161,13 @@ Exmples for :meth:`.Specification.extends`: >>> I2.extends(I2, strict=False) True +.. _spec_eq_hash: + Equality, Hashing, and Comparisons ---------------------------------- Specifications (including their notable subclass `Interface`), are -hashed and compared based solely on their ``__name__`` and +hashed and compared (sorted) based solely on their ``__name__`` and ``__module__``, not including any information about their enclosing scope, if any (e.g., their ``__qualname__``). This means that any two objects created with the same name and module are considered equal and @@ -191,13 +193,22 @@ map to the same value in a dictionary. >>> I1 == orig_I1 == nested_I1 True -Because weak references hash the same as their underlying object, -this can lead to surprising results when weak references are involved, -especially if there are cycles involved or if the garbage collector is -not based on reference counting (e.g., PyPy). For example, if you -redefine an interface named the same as an interface being used in a -``WeakKeyDictionary``, you can get a ``KeyError``, even if you put the -new interface into the dictionary. +Caveats +~~~~~~~ + +While this behaviour works will with :ref:`pickling (persistence) +`, it has some potential downsides to be aware of. + +.. rubric:: Weak References + +The first downside involves weak references. Because weak references +hash the same as their underlying object, this can lead to surprising +results when weak references are involved, especially if there are +cycles involved or if the garbage collector is not based on reference +counting (e.g., PyPy). For example, if you redefine an interface named +the same as an interface being used in a ``WeakKeyDictionary``, you +can get a ``KeyError``, even if you put the new interface into the +dictionary. .. doctest:: @@ -225,6 +236,84 @@ interfaces, you may find surprising ``KeyError`` exceptions. For this reason, it is best to use distinct names for local interfaces within the same test module. +.. rubric:: Providing Dynamic Interfaces + +If you return an interface created inside a function or method, or +otherwise let it escape outside the bounds of that function (such as +by having an object provide it), it's important to be aware that it +will compare and hash equal to *any* other interface defined in that +same module with the same name. This includes interface objects +created by other invocations of that function. + +This can lead to surprising results when querying against those +interfaces. We can demonstrate by creating a module-level interface +with a common name, and checking that it is provided by an object: + +.. doctest:: + + >>> from zope.interface import Interface, alsoProvides, providedBy + >>> class ICommon(Interface): + ... pass + >>> class Obj(object): + ... pass + >>> obj = Obj() + >>> alsoProvides(obj, ICommon) + >>> len(list(providedBy(obj))) + 1 + >>> ICommon.providedBy(obj) + True + +Next, in the same module, we will define a function that dynamically +creates an interface of the same name and adds it to an object. + +.. doctest:: + + >>> def add_interfaces(obj): + ... class ICommon(Interface): + ... pass + ... class I2(Interface): + ... pass + ... alsoProvides(obj, ICommon, I2) + ... return ICommon + ... + >>> dynamic_ICommon = add_interfaces(obj) + +The two instances are *not* identical, but they are equal, and *obj* +provides them both: + +.. doctest:: + + >>> ICommon is dynamic_ICommon + False + >>> ICommon == dynamic_ICommon + True + >>> ICommon.providedBy(obj) + True + >>> dynamic_ICommon.providedBy(obj) + True + +At this point, we've effectively called ``alsoProvides(obj, ICommon, +dynamic_ICommon, I2)``, where the last two interfaces were locally +defined in the function. So checking how many interfaces *obj* now +provides should return three, right? + +.. doctest:: + + >>> len(list(providedBy(obj))) + 2 + +Because ``ICommon == dynamic_ICommon`` due to having the same +``__name__`` and ``__module__``, only one of them is actually provided +by the object, for a total of two provided interfaces. (Exactly which +one is undefined.) Likewise, if we run the same function again, *obj* +will still only provide two interfaces + +.. doctest:: + + >>> _ = add_interfaces(obj) + >>> len(list(providedBy(obj))) + 2 + Interface ========= From 2ae267b9101efef4913d4875b8fd2c3ffa74beb8 Mon Sep 17 00:00:00 2001 From: Jason Madden Date: Fri, 23 Oct 2020 15:18:55 -0500 Subject: [PATCH 5/6] Explore more details of persistence and expand on the motivation for how this relates to equality/hashing/sorting. --- docs/README.rst | 227 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 162 insertions(+), 65 deletions(-) diff --git a/docs/README.rst b/docs/README.rst index c5471648..42576bcf 100644 --- a/docs/README.rst +++ b/docs/README.rst @@ -1,6 +1,6 @@ -========== -Interfaces -========== +============ + Interfaces +============ .. currentmodule:: zope.interface @@ -1046,70 +1046,151 @@ functionality for particular interfaces. how to override functions in interface definitions and why, prior to Python 3.6, the zero-argument version of `super` cannot be used. +.. _global_persistence: + +Persistence, Sorting, Equality and Hashing +========================================== + +.. tip:: For the practical implications of what's discussed below, and + some potential problems, see :ref:`spec_eq_hash`. + +Just like Python classes, interfaces are designed to inexpensively +support persistence using Python's standard :mod:`pickle` module. This +means that one process can send a *reference* to an interface to another +process in the form of a byte string, and that other process can load +that byte string and get the object that is that interface. The processes +may be separated in time (one after the other), in space (running on +different machines) or even be parts of the same process communicating +with itself. + +We can demonstrate this. Observe how small the byte string needed to +capture the reference is. Also note that since this is the same +process, the identical object is found and returned: + +.. doctest:: + + >>> import sys + >>> import pickle + >>> class Foo(object): + ... pass + >>> sys.modules[__name__].Foo = Foo # XXX, see below + >>> pickled_byte_string = pickle.dumps(Foo, 0) + >>> len(pickled_byte_string) + 21 + >>> imported = pickle.loads(pickled_byte_string) + >>> imported == Foo + True + >>> imported is Foo + True + >>> class IFoo(zope.interface.Interface): + ... pass + >>> sys.modules[__name__].IFoo = IFoo # XXX, see below + >>> pickled_byte_string = pickle.dumps(IFoo, 0) + >>> len(pickled_byte_string) + 22 + >>> imported = pickle.loads(pickled_byte_string) + >>> imported is IFoo + True + >>> imported == IFoo + True + +The eagle-eyed reader will have noticed the two funny lines like +``sys.modules[__name__].Foo = Foo``. What's that for? To understand, +we must know a bit about how Python "pickles" (``pickle.dump`` or +``pickle.dumps``) classes or interfaces. + +When Python pickles a class or an interface, it does so as a "global +object" [#global_object]_. Global objects are expected to already +exist (contrast this with pickling a string or an object instance, +which creates a new object in the receiving process) with all their +necessary state information (for classes and interfaces, the state +information would be things like the list of methods and defined +attributes) in the receiving process; the pickled byte string needs +only contain enough data to look up that existing object; this is a +*reference*. Not only does this minimize the amount of data required +to persist such an object, it also facilitates changing the definition +of the object over time: if a class or interface gains or loses +methods or attributes, loading a previously pickled reference will use +the *current definition* of the object. + +The *reference* to a global object that's stored in the byte string +consists only of the object's ``__name__`` and ``__module__``. Before +a global object *obj* is pickled, Python makes sure that the object being +pickled is the same one that can be found at +``getattr(sys.modules[obj.__module__], obj.__name__)``; if there is no +such object, or it refers to a different object, pickling fails. The +two funny lines make sure that holds, no matter how this example is +run (using some doctest runners, it doesn't hold by default, unlike it +normally would). + +We can show some examples of what happens when that condition doesn't +hold. First, what if we change the global object and try to pickle the +old one? -Persistency and Equality -======================== - -An important design goal has been persistency support for interfaces. -This allows different processes to use the same interface and -to create persistent associations between (persistent) objects -and interfaces. For example, an application can store an object -together with its provided interfaces in a database; later, potentially -in a different invocation, it can search the database for objects -providing a given interface. - -To make an object persistent, it must have a persistent identifier, -PID for short. -It is this PID which identifies the object across different -processes. In the context of interfaces, we want to support -evolution similarly to the evolution of classes: even though -we change the interface (e.g. add a method, change documentation), we -still want to consider it as the same interface. - -Python's main persistency support comes from its ``pickle`` module. -It uses as PID for code objects (such as classes and functions) -the combinations of their ``__module__`` and ``name`` attributes. -In analogy, the PID for an interface is defined -as the combination of its -``__module__`` and ``__name__`` giving persistency behavior similar -to classes, including evolution support. - -Unlike classes, interfaces define their -(runtime) equality in terms of their PID, i.e. two interfaces are -considered equal if they have equal PIDs. Especially, -two interfaces defined in the same module are considered equal -if they have the same name, even if they are otherwise completely -unrelated. In rare cases, this can lead to surprises - especially, -when interfaces are defined dynamically (e.g. inside functions). -This is demonstrated by the following example where the locally -defined interface ``I`` is identified with the globally defined -interface of the same name and therefore not added by ``alsoProvides``. - -.. doctest:: - - >>> from zope.interface import Interface, alsoProvides, providedBy - >>> - >>> class I(Interface): - ... pass - ... - >>> class Obj(object): - ... pass - ... - >>> obj = Obj() - >>> alsoProvides(obj, I) - >>> - >>> def add_interfaces(obj): - ... class I(Interface): - ... pass - ... class I2(Interface): - ... pass - ... alsoProvides(obj, I, I2) - ... - >>> add_interfaces(obj) - >>> # we would expect that *obj* provides 3 interfaces at this place but - ... len(list(providedBy(obj))) - 2 +.. doctest:: + + >>> sys.modules[__name__].Foo = 42 + >>> pickle.dumps(Foo) + Traceback (most recent call last): + ... + _pickle.PicklingError: Can't pickle : it's not the same object as builtins.Foo + +Or what if there simply is no global object? + +.. doctest:: + + >>> del sys.modules[__name__].Foo + >>> pickle.dumps(Foo) + Traceback (most recent call last): + ... + _pickle.PicklingError: Can't pickle : attribute lookup Foo on builtins failed + +Interfaces and classes behave the same in all those ways. + +.. rubric:: What's This Have To Do With Sorting, Equality and Hashing? + +Another important design consideration for interfaces is that they +should be sortable. This permits them to be used, for example, as keys +in a (persistent) `BTree `_. As such, +they define a total ordering, meaning that any given interface can +definitively said to be greater than, less than, or equal to, any +other interface. This relationship must be *stable* and hold the same +across any two processes. +An object becomes sortable by overriding the equality method +``__eq__`` and at least one of the comparison methods (such as +``__lt__``). + +Classes, on the other hand, are not sortable [#class_sort]_. +Classes can only be tested for equality, and they implement this using +object identity: ``class_a == class_b`` is equivalent to ``class_a is class_b``. + +In addition to being sortable, it's important for interfaces to be +hashable so they can be used as keys in dictionaries or members of +sets. This is done by implementing the ``__hash__`` method [#hashable]_. + +Classes are hashable, and they also implement this based on object +identity, with the equivalent of ``id(class_a)``. + +To be both hashable and sortable, the hash method and the equality and +comparison methods **must** `be consistent with each other +`_. +That is, they must all be based on the same principle. + +Classes use the principle of identity to implement equality and +hashing, but they don't implement sorting because identity isn't a +stable sorting method (it is different in every process). + +Interfaces need to be sortable. In order for all three of hashing, +equality and sorting to be consistent, interfaces implement them using +the same principle as persistence. Interfaces are treated like "global +objects" and sort and hash using the same information a *reference* to +them would: their ``__name__`` and ``__module__``. + +For more information, and some rare potential pitfalls, see +:ref:`spec_eq_hash`. + +.. rubric:: Footnotes .. [#create] The main reason we subclass ``Interface`` is to cause the Python class statement to create an interface, rather @@ -1135,3 +1216,19 @@ interface of the same name and therefore not added by ``alsoProvides``. The interface implementation doesn't enforce this, but maybe it should do some checks. + +.. [#class_sort] In Python 2, classes could be sorted, but the sort + was not stable (it also used the identity principle) + and not useful for persistence; this was considered a + bug that was fixed in Python 3. + +.. [#hashable] In order to be hashable, you must implement both + ``__eq__`` and ``__hash__``. If you only implement + ``__eq__``, Python makes sure the type cannot be + used in a dictionary, set, or with :func:`hash`. In + Python 2, this wasn't the case, and forgetting to + override ``__hash__`` was a constant source of bugs. + +.. [#global_object] From the name of the pickle bytecode operator; it + varies depending on the protocol but always + includes "GLOBAL". From 34085388b34738c094b30a65c7d0abbcd05ee421 Mon Sep 17 00:00:00 2001 From: Jason Madden Date: Fri, 23 Oct 2020 15:32:46 -0500 Subject: [PATCH 6/6] Add more persistence examples. --- docs/README.rst | 56 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/docs/README.rst b/docs/README.rst index 42576bcf..bfb7712e 100644 --- a/docs/README.rst +++ b/docs/README.rst @@ -1135,6 +1135,22 @@ old one? ... _pickle.PicklingError: Can't pickle : it's not the same object as builtins.Foo +A consequence of this is that only one object of the given name can be +defined and pickled at a time. If we were to try to define a new ``Foo`` +class (remembering that normally the ``sys.modules[__name__].Foo =`` +line is automatic), we still cannot pickle the old one: + +.. doctest:: + + >>> orig_Foo = Foo + >>> class Foo(object): + ... pass + >>> sys.modules[__name__].Foo = Foo # XXX, see below + >>> pickle.dumps(orig_Foo) + Traceback (most recent call last): + ... + _pickle.PicklingError: Can't pickle : it's not the same object as builtins.Foo + Or what if there simply is no global object? .. doctest:: @@ -1187,6 +1203,46 @@ the same principle as persistence. Interfaces are treated like "global objects" and sort and hash using the same information a *reference* to them would: their ``__name__`` and ``__module__``. +In this way, hashing, equality and sorting are consistent with each +other, and consistent with pickling: + +.. doctest:: + + >>> class IFoo(zope.interface.Interface): + ... pass + >>> sys.modules[__name__].IFoo = IFoo + >>> f1 = IFoo + >>> pickled_f1 = pickle.dumps(f1) + >>> class IFoo(zope.interface.Interface): + ... pass + >>> sys.modules[__name__].IFoo = IFoo + >>> IFoo == f1 + True + >>> unpickled_f1 = pickle.loads(pickled_f1) + >>> unpickled_f1 == IFoo == f1 + True + +This isn't quite the case for classes; note how ``f1`` wasn't equal to +``Foo`` before pickling, but the unpickled value is: + +.. doctest:: + + >>> class Foo(object): + ... pass + >>> sys.modules[__name__].Foo = Foo + >>> f1 = Foo + >>> pickled_f1 = pickle.dumps(Foo) + >>> class Foo(object): + ... pass + >>> sys.modules[__name__].Foo = Foo + >>> f1 == Foo + False + >>> unpickled_f1 = pickle.loads(pickled_f1) + >>> unpickled_f1 == Foo # Surprise! + True + >>> unpickled_f1 == f1 + False + For more information, and some rare potential pitfalls, see :ref:`spec_eq_hash`.