New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a WeakKeyDictionary to store session => status map #23
Use a WeakKeyDictionary to store session => status map #23
Conversation
Stale entries for old sessions could be left in the `id(session) => status` map if an application failed to end a transaction after a session event, from the set used by `zope.sqlalchemy.register`, caused the session to be joined to the transaction. While such issues often indicate an error on the part of the application, this can lead to difficult-to-debug issues because IDs can be re-used in the lifetime of a Python process, so a stale entry for an old Session may affect what happens to a future, unrelated Session. Using a weak dict of `session => state` avoids this hazard.
The flip side is that garbage collection is not deterministic in PyPy (or in CPython when there is a cycle involved), so this may not actually achieve the desired result. |
The doctest failing on Travis is also failing on master for me. I'll look into it. |
In the scenario I described, it is OK if stale entries are left for some arbitrary amount of time after the request ends (until GC happens), as long as a lookup for a different Session won't return that entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
The tests should be easy to fix (add a ...
at the end of the expected exception message).
Additional explanatory text has been added to the end of the exception message in newer SQLAlchemy releases.
Thanks for the tip @mgedmin, that worked. |
How are session objects hashed? If they have the default hash, then that's the same as an id (on cpython), and since the GC on PyPy is lazy, it's possible that hashing a different object will still have the same ID as the dead one that's still in the map. I admit that I'm not a fan of introducing non-determinism in this way, or hiding programming bugs in this way. Can we make the bug more explicitly obvious? Alternately, (and I know nothing about Session or really what the cache is used for beyond what I see here), could we piggyback on the session object to store what's currently in the cache? Instead of |
My understanding is that entries are automatically removed from the |
A private attr would solve the problem equally well from my point of view. Could that cause problems for any other users? |
On PyPy, WeakKeyDictionary isn't cleaned up immediately when objects die, so iterating it and listing its keys and values can return stale entries or even contradict each other: Python 2.7.13 (84a2f3e6a7f8, Oct 03 2017, 18:46:00)
[PyPy 5.9.0 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>> class Foo(object):
.... def __hash__(self):
.... return 42
.... def __eq__(self, other):
.... return isinstance(other, Foo)
....
>>>> from weakref import WeakKeyDictionary
>>>> d = WeakKeyDictionary()
>>>> d[Foo()] = 1
>>>> d[d.keys()[0]]
1
>>>> d.keys() # We resurrected a dead object!
[<__main__.Foo object at 0x00000001106fe090>]
>>>> d = WeakKeyDictionary() # start over
>>>> d[Foo()] = 1
>>>> d.items()
[]
>>>> d[Foo()] = 1
>>>> d.values() # Contradiction!
[1, 1]
>>>> import gc
>>>> gc.collect()
0
>>>> d.values()
[] (Compare to CPython: Python 2.7.14 (default, Sep 27 2017, 12:15:00)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo(object):
... def __hash__(self):
... return 42
... def __eq__(self, other):
... return isinstance(other, Foo)
...
>>> from weakref import WeakKeyDictionary
>>> d = WeakKeyDictionary()
>>> d[Foo()] = 1
>>> d.keys()
[] ) Fortunately, the weak refs themselves that are used internally are guaranteed not to compare equal, even if they hash equal, when one of them is dead, so the trivial exercise of looking one up exactly always does work: >>>> d = WeakKeyDictionary()
>>>> d[Foo()] = 42
>>>> d[Foo()]
Traceback (most recent call last):
File "//pypy/lib-python/2.7/weakref.py", line 356, in __getitem__
return self.data[ref(key)]
KeyError: <weakref at 0x00000001106e3720; to 'Foo'> But if you do anything other than that, watch out. |
Thanks for the insights 🙂. All the lookups in this PR are direct reads or writes with |
Update zope.sqlalchemy to a fork which includes zopefoundation/zope.sqlalchemy#23. Once that is released we can revert to an upstream release. The fork is referenced only in requirements.txt and not requirements.in due to pip-compile not currently supporting package URLs.
As this pull request has been approved and I think we have resolved potential issues with PyPy above, would it be possible to get this merged and released? We have some hacks in our codebase that we would like to get rid of. |
Have you signed the Zope Foundation Committer Agreement? I'm not allowed to merge code from people who haven't signed it. (And once you sign it, you'll get the committer bit that allows you to hit the nice big green Merge button yourself.) |
I have now. Thanks! |
We encountered an issue in our app which uses Pyramid + pyramid_tm + zope.sqlalchemy where requests would occasionally complete successfully yet the DB changes from that request did not get committed.
It turned out that what was happening is that sometimes our logging code caused accesses to the DB session after the transaction commit/abort performed by pyramid_tm, and these DB accesses caused stale entries (IDs of now-dead Session objects) to be left in zope.sqlalchemy's internal
_SESSION_STATE
cache after request processing ended.These stale entries caused future Sessions which happened to share the same ID (because they happened to be allocated at the same address in memory by the CPython runtime), not to be committed.
Although this issue ultimately boils down to a programming error in our app, it took a lot of time to debug and I can imagine other users getting caught out. This PR removes the hazard by replacing the
id(session)
map with aWeakKeyDictionary
so that Python automatically removes entries to dead sessions from the map.