New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use anyio in "core" psycopg #146
Conversation
What is the advantage of using anyio? Last time I checked anyio it was adding a lot of overhead. Can we see flame graphs to show what changes during querying and during copy? It wouldn't be bad to put up something to create them automatically actually. When I play with them they were a great guide to improve performances. Ref: https://www.varrazzo.com/blog/2020/05/19/a-trip-into-optimisation/ |
The advantage is that we have a single implementation on our side that's independent of the underlying async library (asyncio or trio or anything that anyio might support in the future). For users, it works transparently: they can use
I haven't checked. But... FastAPI recently moved to anyio as well (https://github.com/tiangolo/fastapi/releases/tag/0.69.0), following starlette move https://www.starlette.io/release-notes/#0150
That would be very helpful, indeed. |
I have run MagickStack's benchmark against 3.0.3 and this branch. The source code of the benchmark tool handling psycopg 3 is available at https://github.com/dlax/pgbench (branch Running
And then, on this branch, with anyio:
|
Why is your branch using internal functions instead of the public interface? About 1y ago I made a psycopg3 pgbench branch too: https://github.com/dvarrazzo/pgbench/tree/psycopg3. I don't know if it's still good or if it's rotten.
Curious results, no? Do tests hit different paths, so that anyio is faster on the first two and slower on the other ones? Or am I reading something wrong? Without changing too much, but only dropping in your Futures patch in the 3.0.3 wait loop, how does it compare? Thank you for this! |
You mean
Didn't know about this; it's close to my branch, only the "copy" benchmarks seem different (you used an How about we coordinate with a common fork in psycopg organization?
Yeah, that looked strange to me as well. But I did not investigate much yet.
Good idea, will do. |
In fact, I cannot get reliable results with this benchmarks. For instance, running them twice on 3.0.3 tag, I get:
and
Even running each benchmark alone does not produce consistent results... That's all I can tell with the hardware I have.
I did not observe any speedup with the Future-based waiting loop (see commit dlax/psycopg@4630e68) rather a slight slowdown actually. In addition to the above benchmarks (and their inconsistency), I ran the "pipeline demo" test (https://github.com/psycopg/psycopg/pull/116/files#diff-7c5c69215e01c35a4a21c087ec14a2581abf18209386d7522b685ca8af20961dR46, takes about 2.5s to complete on my machine) as a benchmark with the Event-based and Future-based loop: no obvious winner/looser there either... |
Reworked this to make the dependency on anyio optional. (Still needs some work on copy.) |
7632f8e
to
597c1cf
Compare
|
Probably anyio uses that module. The test runs with the assumption that the network module is not imported when psycopg is. |
I see, makes sense; thanks. |
I understand why this PR was created... But does it still make sense to have |
I've reworked this PR to make it as minimal as possible by:
So should be easier to review now... |
All references to asyncio in comments are turned into a more generic 'async libraries' term.
In preparation for adding support for anyio as asynchronous I/O library, this makes it clearer that these functions are asyncio-specific.
This is to support other async libraries than asyncio. Functions fake_resolve() and fail_resolve() in conninfo tests are no longer fixtures monkeypatching asyncio but simply implement a fake getaddrinfo(). On the other hand, fake_resolve() in test_connection_async.py (previously imported from test_conninfo.py) still monkeypatches asyncio as this is an indirect test for resolve_hostaddr_async().
Trio will raise this ResourceWarning: Async generator 'psycopg.cursor_async.AsyncCursor.__aiter__' was garbage collected before it had been e xhausted. Surround its use in 'async with aclosing(...):' to ensure that it gets cleaned up as soon as you're done using it. So we explicitly close async generators when we're partially consuming them (because of a 'break' in the loop).
This adds a new AnyIOConnection class to be used instead of AsyncConnection in combination with the 'anyio' async library. The class essentially uses an anyio.Lock instead of an asyncio.Lock (and same for getaddrinfo()) and relies on AnyIO-specific waiting functions, also introduced here. The same is done for crdb connection class, though with more repetition due to typing issues mentioned in inline comments. All anyio-related code lives in the _anyio sub-package. An 'anyio' setuptools extra is defined to pull required dependencies. AnyIOConnection is exposed on the psycopg namespace, a runtime check is performed when instantiating possibly producing an informative message about missing dependencies. In tests, overall, the previous anyio_backend fixture is now parametrized with both asyncio and trio backends and 'aconn_cls' returns either AsyncConnection or AnyIOConnection depending on backend name. Test dependencies now include 'anyio[trio]'. In "waiting" tests, we define 'wait_{conn_,}_async' fixtures that will pick either asyncio or anyio waiting functions depending on the value of 'anyio_backend' fixture. Concurrency tests (e.g. test_concurrency_async.py or respective crdb ones) are not run with the trio backend as then explicitly use asyncio API. Porting them does not seem strictly needed, at least now. So they get marked with asyncio_backend. Finally, we ignore an invalid error (raised by a deprecation warning during test) about usage of the 'loop' parameter in asyncio API that is due to Python bug as mentioned in comment.
We add a --anyio={asyncio,trio} option to pytest to select the AnyIO backend to run async tests. When this option is set, we'll use the AnyIO implementations of psycopg API (i.e. AnyIOConnection, resp. waiting functions, etc.); otherwise (the default), we'll use plain asyncio implementations (i.e. AsyncConnection). This way, we can now run the test suite with: - plain asyncio implementations (previously achieved through the asyncio backend for AnyIO pytest plugin), - AnyIO implementations, with asyncio backend (new from this commit), - AnyIO implementations, with trio backend (previously achieved through the Trio backend for AnyIO pytest plugin). Accordingly, the anyio_backend fixture is no longer parametrized and its value depends on the --anyio option. Selection of whether to use AnyIO or plain asyncio implementations is done through the new 'use_anyio' fixture, which simply detects if no --anyio option got specified. This new fixture is used everywhere we used anyio_backend_name fixture previously to select either plain asyncio or anyio variants of test cases. The fixture pulls (but does not use) anyio_backend so that all tests using it will be detected as async. In CI, we add a new pytest_opts axis to the matrix to cover all configurations in various environments.
What is the status on this? The one test that failed seems to be a PipelineAborted exception? Is it possible to rerun the tests? As a user, psycopg3 / sqlalchemy is the single library that I am most interested in seeing have an anyio backend / use structured concurrency. |
What happened to this? This was a really nice PR that I was excited about |
@dvarrazzo, can we make a decision about this? (The lack of response makes me feel you are not prone to integrate the feature but maybe not? I'd be a bit sad to give up on this work, sure; but staying in limbo is worse.) |
Hello @dlax I believe I have lost the ball on this: trying to figure out the advantage (because I don't know much about the frameworks outside asyncio). I think I was waiting on a followup about the inconsistencies in the benchmark. Let's not consider this closed. |
Hello @dlax of course this branch now has several conflict with master, and I have no problem to help rebasing it, if we think it's the right thing to do - as I may know better where things have gone, to which aisle of the supermarket the tuna was moved, etc. It's also not fair on your work so I'll give you a hand with it. To be honest, adding a core dependency on anyio concerns me. Recently, the release of anyio 4 changed interfaces, broke tests, and I had to pin it to < 4 (e847f3c) until dropping Python 3.7. Now that 3.7 is gone we can move to use >= 4 instead; opened #662 to remember to address that. As such I would like to decouple a release cycle that anyio or trio might impose to an optional component that can follow better the release cycle of the components it depends on. This was the same choice made for the connection pool. So what I propose is to follow the same pattern used with the pool, so that people can Now, I see that anyio has a concept of global backend, which means that we don't expect people to use both So, what about using a global switch too, something like try:
import psycopg_anyio
except ImportError:
raise ImportError(
"anyio support is not available. Please install it using `pip install psycopg[anyio]"
)
enable = psycopg_anyio.enable
... # other objects if needed Fast backwards many years, to the relation between psycopg2 and eventlet/gevent. I had the same concern of "moving targets", although the interface with these libraries proved pretty stable (but the interaction with them is only in the wait mechanism). The implementation was satisfactory for those times: providing the psycogreen external module, offering a As we are being requested to provide greenlet integration with psycopg 3 (which used to work out-of-the-box until 3.1.4, then broke, because we started using a C wait function - #527), I would think about a similar mechanism for greenelt libraries:
Differences between psycopg2 and 3:
As implementation, we could have a handful of global functions to produce a Lock, an Event, the wait() function, the getaddrinfo() function, and switching to anyio would replace these function with anyio-friendly ones. A cleaner option would be to create an What do you think? I apologise again for the lack of interaction from my part on this feature. Many other things got my attention. |
Just so we're on the same page, what do you understand by the concept of "global backend"? |
What I linked, i.e. that if someone run their program into a: from anyio import run
...
run(main, backend='trio') then they don't expect to run As a consequence, my understanding is that psycopg user using trio have no reason to use Is my understanding on the page you expect to be? |
The reason your wording piqued my interest is that there is indeed something potentially coming in the AnyIO space called a "global event loop". This would allow one to provide both an async and a sync interface to a library by making the synchronous version into a shim that runs the async version in a dedicated event loop thread. This approach is demonstrated in the latest alpha of APScheduler. |
I don't think it's necessary (and I think we want to avoid) having duplicate implementations. My understanding is that if you implement i.e. if you do have an This makes duplicate implementations redundant at best, from an end-user perspective, and in practice, doubling the api surface area and forcing the user to choose between implementations is a worse UX. I'm not an |
Sure, ideally that is how it would work. But, I've seen some prominent downstream projects that are averse to having an unconditional dependency on AnyIO (at least httpcore comes to mind). |
@dhirschfeld read my message above: anyio has its own release cycle, on which I don't want psycopg to depend on. I don't want to drop Python 3.x support when anyio decides it's the time, I don't want to put a upper limit to the anyio version used because a non-backward-compatible change they may make makes our support difficult. To further on it, If anyio project changes in direction we don't like, if anyio goes out of fashion because another abstraction api comes up making it easier to work with N > 2 async libraries, I don't want psycopg to depend on it. |
AnyIO supports the Python releases supported by the PSF. That is to say, it won't drop support until a Python release reaches EOL. If you intend to keep supporting old Python versions longer, then this could potentially be an issue.
I am quite skeptical that another such abstraction API would surface. It's a tremendous amount of work. |
I'm totally in favour of avoiding classes duplication for each async library. (In fact, I seem to remember this was an option initially but we decided not to go that way; anyway...)
This option is worth a try; it's also quite similar to what's done by httpcore I think.
I think the first commits of this branch (or some of them), i.e. all but the last two, might be worth considering for integration now as they are just preparatory refactorings; they need a rebase and they indeed conflict. The last-but-one commit, the one introducing the feature, would need a complete rework if we actually go the "AsyncBackend" route, so it's not worth rebasing IMO. @dvarrazzo, if you can help rebasing (possibly only some of) the first commits, that'd indeed be appreciated. I would then be happy to take over the rest. |
Unconditionally depending on Having two separate code paths, one for an Maintaining separate code paths is pessimistically paying a cost for an eventuality which may never happen. It's like an insurance policy against At the end of the day, I'll be happy if I can just use |
@dhirschfeld thank you for your input. API breakage with anyio already happened in 3 -> 4; luckily it just affected the test suite. Together with the change of interface, anyio dropped support for Python 3.7. This is all very legitimate, nothing bad from their part. We plan to drop Python 3.7 support too, but in Python 3.2, possibly in a few months time. So we would have had to do something awkward to support possibly incompatible 3 and 4, or drop support for 3.7 before planned, or set an upper boundary < 4, which is a very bad idea. Incidentally, in the test suite for the moment I did exactly that, but just because it's a test dependency, and hopefully temporary (#662 easier as Python 3.7 is now out of CI).
I have also been considering that AFAICS the real feature is not supporting anyio, it is supporting trio.
|
AnyIO started with 3 back-ends, ended up dropping Curio as it was untenable to support and its author explicitly requested this action. One prospect has been supporting Twisted, but in all honesty, I don't think that's ever going to happen as work on sniffio support in Twisted has pretty much stalled, and there seems to be comparably little interest in Twisted support in AnyIO in the first place.
Whether using AnyIO makes sense depends on what you're doing. It offers a single API for task management, networking, IPC, subprocesses, async file access among other things. I personally consider the structural concurrency APIs perhaps the most important bit. |
The concurrency is mostly left to the users of psycopg: the core psycopg query loop is a sequence of non-blocking calls interspersed with waiting for sockets to be ready, which is where we try to play fair with whatever is running concurrently. We allow connections to be used by more than one thread, but they get effectively serialized in their request-response cycle. So, to implement all this we need a wait function and a Lock. This part of psycopg is a library, not a program that is massively concurrent on its own. I think there was an issue in the past with anyio not being able to wait for read and write at the same time on a socket, and that's a deal breaker for us (psycopg/psycopg2#1057 (comment)). I assume that if @dlax has gone so far with this MR then that's a solved problem? There is a bit more action in the pool, which has a scheduling task and a few worker tasks waiting on a queue. In the pool we create and destroy tasks, use Queues and Conditions. I see that Trio doesn't implement a Queue object, it uses sending/receiving channels, and that anyio doesn't offer a Queue-type object either, so we should create that interface ourselves anyway, if we want to keep the similarities with the sync code. Trio doesn't have a Condition object, anyio wraps it, but it's a simple object. Comparatively, there is way more concurrency in the test suite to simulate many different concurrency scenarios. AFAICS the concurrency needs in our library seem simple; what we want the most is to make the library available to people who have made their architecture considerations and have decided that anyio is the tool they want to use, or that trio is. |
Upon reflection, and another look at the code and discussions recently, I don't see a clear way forward for this change proposal. As it does not seem actionable by me, as the author, I am closing the PR rather than keeping it in my todo list. |
This adds support for anyio in psycopg "core" (i.e. not the psycopg_pool) through an
AnyIOConnection
class, associated waiting functionsand a custom.AnyIOLibpqWriter
implementationThis uses conditional imports and pulls no explicit dependency on anyio.
The test suite now uses anyio pytest plugin, instead of pytest-asyncio (this change has beed submitted as another PR #352, which this one is based on).
ticket #29