Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython version of Deferred #12124

Open
itamarst opened this issue Apr 2, 2024 · 32 comments
Open

Cython version of Deferred #12124

itamarst opened this issue Apr 2, 2024 · 32 comments

Comments

@itamarst
Copy link
Contributor

itamarst commented Apr 2, 2024

An inspection of Twisted application's profiling tends to show lots of Deferred APIs.

As an experiment, I threw together a Deferred version in Cython. Using the HTTP client benchmark I've been using in some previous tickets, I got a 10% speedup. Given Deferreds are only part of the benchmark's work, this suggests a very significant improvement in the Deferred speed!

The two main questions on how to proceed are:

  1. Maintainability: can we avoid two copies of the code? Or at least, minimize how much code is duplicated.
  2. Distribution: Twisted is a pure Python package. Including this would either mean switching to binary wheels, or making it an optional dependency; the latter would mean people would need to opt-in, which would be unfortunate.
@glyph
Copy link
Member

glyph commented Apr 2, 2024

I think this would be a good opportunity to split Deferred into a separate package and treating it as more of a self-contained/top-level tool. There are an increasing number of contexts where a stand-alone Deferred would be handy; sometimes I don't want to pull in all of Twisted, but in the halcyon days of async/await python we now occupy, I always want synchronousResultOf and Deferred.fromCoroutine.

We've already got https://github.com/twisted/deferred , and we could do the work there.

The reason to put it in a separate package would be that having a small repo that has a bit of packaging shenanigans to make deferred and an optional fast-deferred dependency would be a lot more straightforward than having twisted and fast-twisted.

So, re: distribution; do something like psycopg-binary or twisted-iocpsupport, albeit with a single repo containing both a pure-python package and a wheel-built package.

Regarding maintainability: I am given to understand that Cython now has multiple syntaxes, one which uses the old style Cython AST extensions, and another which uses Mypy-style type annotations. Can we get away with a single module using annotations which either is or is not built with Cython, so we have a pure-Python fallback?

@adiroiban
Copy link
Member

Thanks for looking into this.

This looks like a duplicate of #2245

But maybe we can use this ticket to discuss the general implementation plan for mantaining optimized non-Python source in twisted/twisted repo.

I am +1 for having a separate twisted-deferred package that is distributed as a Python only or binary wheel.

Rather than having the code a separate repo, maybe it would help just to move the code to src/twisted_deferred or something similar.
This will result in twisted/twisted being a (mini) mono repo.


I remember there was atempts in the past to have twisted/twisted as a mono repo, with things like twisted.conch and twisted.web in separate packages.

I guess that the overal experience for that work was that separate packages are not really wanted.

I think that at least for the initial release, it would be easier to have the new deferred code in the same repo.


As a long term goal, I think that it would help to add Cython optimization to other parts of Twisted.

For example if you transfer a file over HTTPS vs SSH+SFTP the twisted.conch implementation is so much slower.
So I think that for example twisted.conch could also benefit from some C optimization.
And I think that Cython is the cleanest way to get some performance optimizations.

@glyph
Copy link
Member

glyph commented Apr 2, 2024

I am +1 for having a separate twisted-deferred package that is distributed as a Python only or binary wheel.

There's already https://pypi.org/project/deferred/ , which we own, and does contain a version of Deferred.

@glyph
Copy link
Member

glyph commented Apr 2, 2024

This will result in twisted/twisted being a (mini) mono repo.

I am -1 on this because it breaks VCS pins in pip. I am open to being convinced, but I think that rather than develop monorepo tooling we should develop config-replication tooling to copy CI stuff between repos.

@glyph
Copy link
Member

glyph commented Apr 2, 2024

I remember there was atempts in the past to have twisted/twisted as a mono repo, with things like twisted.conch and twisted.web in separate packages.

I guess that the overal experience for that work was that separate packages are not really wanted.

It was a different time and different levels of tooling existed. It was exponetnially more painful to add a dependency then than it was now.

But also, the axis we chose to split upon was horizontal (i.e.: by protocol) rather than what we are talking about here, which is vertical (split out Deferred, FilePath, etc).

I think being somewhat monolithic by protocol still makes sense, as having integrated protocol implementations is a big part of the value proposition of Twisted. But these smaller utility packages that can be self-contained make sense to split out.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 2, 2024

I dislike the idea of psycopg-binary equivalent; libpq interaction is the reason there, and I don't see why it would apply here. I would just have binary wheels by default and a tarball if people don't want to use that.

@glyph
Copy link
Member

glyph commented Apr 2, 2024

I dislike the idea of psycopg-binary equivalent; libpq interaction is the reason there, and I don't see why it would apply here. I would just have binary wheels by default and a tarball if people don't want to use that.

OK, to back up and explain my motivation for that suggestion: I don't want people to pip install twisted on an unusual architecture or a new Python version and end up with a build failure because they don't have a C compiler, and I don't want to wait to deploy this until we have wheels for every distro for every weird single-board computer ever shipped in the world.

Is it possible to just ship a pure-python fallback wheel, with the same distribution name? If we do that, how does the end user specify "no, I absolutely want the fast version and I want the build to fail if I don't get it"?

@adiroiban
Copy link
Member

I am +1 for having a separate twisted-deferred package that is distributed as a Python only or binary wheel.

There's already https://pypi.org/project/deferred/ , which we own, and does contain a version of Deferred.

I am +1 on having the package, not the repo :)

But it's ok to have it as a separate repo... we alrady have twisted-iocpsupport

I feel that somehow the twisted-iocpsupport is a 2nd class repo.

My goal with having the twisted-deferred package in Twisted is to make it more visible.

I also expect that it will be easier to test "twisted-defered" agains the twisted trunk branch... basically, we will be able to test the deferred code as part of any twisted/twisted PR

I am ok to have a separate repo.

The main reason to have a mono repo is to make it easier to run the automated tests.


OK, to back up and explain my motivation for that suggestion: I don't want people to pip install twisted on an unusual architecture or a new Python version and end up with a build failure because they don't have a C compiler,

There is the option of a non-binary wheel fallback.

One example is the pyrsistent project.
There is a py3-none-any.whl that might be be used on OpenBSD and then there are the usual Linux Glic/musl, Windows, macOS binary wheels

@itamarst
Copy link
Contributor Author

itamarst commented Apr 3, 2024

I tried pyrsistent and yeah it does the right thing. Specifically I tested 3.13 as the version lacking a binary wheel, and it installed the any pure Python wheel for that. So that solves that problem, no need for separate binary package, and Twisted can just use the fast version by default.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 3, 2024

As far as separate package... this feels like a bunch of duplicated work (packaging, linting, etc) and a bunch of integration work, for a very minor benefit. https://pypistats.org/packages/deferred shows 16 downloads a month, and many of them are probably just mirroring. The fact that @glyph may want Deferred standalone occasionally doesn't mean it's a common use case. And the existence of Future in the stdlib means it's that much less likely to be something someone wants to do.

So I'd rather just have it be part of Twisted, make Twisted have binary wheels, and do the thing where if you can't use a binary wheel it still works with pure Python wheel.

@glyph
Copy link
Member

glyph commented Apr 3, 2024

So I'd rather just have it be part of Twisted, make Twisted have binary wheels, and do the thing where if you can't use a binary wheel it still works with pure Python wheel.

OK, I'm sold. If we're going to spin up cython binary wheel infrastructure, I guess it doesn't much matter where we do this. The fact that we don't need separate packaging infrastructure invalidates my desire to mix the two things together.

I still kinda want to split out Deferred for other reasons, but I no longer have any valid reason to conflate it with this issue, so carry on :)

@glyph
Copy link
Member

glyph commented Apr 3, 2024

When we start building binary wheels again, please do make sure to build universal2 and only universal2 wheels on macOS, so we can avoid this problem: pypa/pip#11573

@itamarst
Copy link
Contributor Author

itamarst commented Apr 3, 2024

My last foray into this involved the issue that older versions of Python don't do universal2, but I guess... 3.8 has it, excellent.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 3, 2024

As far as sharing code between pure Python and Cython... pretty sure that's doable, will just need to shift part of the type annotations into a .pyi file since the Python code annotations will use Cython-specific junk.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 3, 2024

(Although lack of symlink support on Windows is gonna make it annoying.)

@glyph
Copy link
Member

glyph commented Apr 3, 2024

As far as sharing code between pure Python and Cython... pretty sure that's doable, will just need to shift part of the type annotations into a .pyi file since the Python code annotations will use Cython-specific junk.

This seems wrong to me, but then, I guess you already have a prototype so you'd know better. What Cython specifics do you have in mind here, and what would live in a .pyi? Couldn't we just do everything with conditional imports and/or TYPE_CHECKING aliases?

@glyph
Copy link
Member

glyph commented Apr 3, 2024

(Although lack of symlink support on Windows is gonna make it annoying.)

Hang on, in what way are symlinks useful here?

@adiroiban
Copy link
Member

Happy to go with a single package and without a separate repo :)

Let me know if you need help with the build system.

I see pyrsistent uses cibuildwheel for binary wheels

Non-binary/pure-python is done via python -m build and an env variable that disalbes the extension.

Linux ARM64 done via docker+qemu

macOS is done only via universal2

There is also a PR to build for Alpine/Musl

Let me know if you need help with the GitHub Actions part.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 3, 2024

Never mind, I can set the extension name separately from the file name.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 11, 2024

One problem here is that Twisted will need to switch from hatch to setuptools, unless I can find some way to bridge the gap (there's hatch-cython but I have so far failed to get it to work; I asked author for help).

@glyph
Copy link
Member

glyph commented Apr 11, 2024

cc @ofek since maybe he will have interest in not having all his work on #11655 reverted

@glyph
Copy link
Member

glyph commented Apr 11, 2024

@itamarst Black uses Hatch and manages to build with mypyc, so maybe mypyc would be a better option here than Cython? I don't believe we're doing a ton of optimizable integer math here, and mypyc doesn't carry the runtime dependency baggage

@ofek
Copy link
Contributor

ofek commented Apr 12, 2024

Hey, thanks for the ping! My personal preference nowadays is to stay very far away from Cython and only use Mypyc or write extension modules in Rust. You can get very good speedups with the former while the latter is better (when I have to) for me personally because I dislike writing C and very much enjoy writing Rust.

That being said, I would like to help here. What issue did you run into with that plugin? If it's something on my side I will fix it!

@itamarst
Copy link
Contributor Author

I couldn't figure out how to use the plugin at all. Basic functionality like "here are the list of files to use for this extension" was completely opaque. It tried to compile every single .py file in the repo. Maybe the author can help, I asked.

I can try mypyc, I guess, as an alternative, it's definitely not very interesting cython code, most of the benefit is probably having fast if statements on bools, plus cheaper function calls.

@ofek
Copy link
Contributor

ofek commented Apr 12, 2024

Sure let's continue the discussion here: joshua-auchincloss/hatch-cython#45

The docs for the plugin are surprisingly good actually, there are just so many options!

@itamarst
Copy link
Contributor Author

mypyc is probably blocked on mypyc/mypyc#1051

@ofek
Copy link
Contributor

ofek commented Apr 12, 2024

I don't have full context about Deferred but I am always in favor of stripping stuff out into isolated packages; I think that is a great idea.

Also, I think there was some confusion in this thread about the example of psycopg that Glyph brought up. One of the responses was talking about psycopg-binary and that it was a train wreck essentially which I agree with. Basically, what happened was at that point they didn't know how to properly build the extension module for distribution so they had an entirely separate package just for compiled wheels because they were broken in subtle ways.

What I think Glyph is talking about (which I think is indeed a great example) is the modern/rewritten psycopg package: https://github.com/psycopg/psycopg

The main package is pure Python which then depends on specific compiled packages for performance and extra features, which all live in the same repository.

@itamarst
Copy link
Contributor Author

itamarst commented Apr 23, 2024

Another place that could do well with Cython version, probably, is twisted.internet.base.DelayedCall. And there are likely others. I feel like the argument for lots of little compiled packages is not compelling, and personally would argue for either switching Twisted back to setuptools, or having one single compiled-extensions add-on package, not one per feature.

@ofek
Copy link
Contributor

ofek commented Apr 23, 2024

having one single compiled-extensions add-on package

This is what I was imagining, yes.

@itamarst
Copy link
Contributor Author

I wonder what's more work, doing that or switching back to setuptools.

@ofek
Copy link
Contributor

ofek commented Apr 23, 2024

I think the build backend choice is irrelevant here (if you tell me what you want Cython-wise I will provide you an example configuration of how to achieve it) but rather I think the original points are still valid reasons to desire a separate package #12124 (comment)

Do what you wish but please understand that what you want is already possible with the existing tooling. I believe your feature request is this which for the time being you actually can very easily do yourself with a custom build hook that inherits from that plugin and simply overrides this property.

edit: there is also built-in functionality for conditional execution of build hooks so that you can build a pure Python wheel as an option

@itamarst
Copy link
Contributor Author

I don't see how can you have multiple extensions, though. Which isn't strictly necessary I guess but the more one adds the more attractive it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

4 participants