New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracker: array types (CuPy, PyTorch & co) and array API standard support #18867
Comments
I have a question that probably falls under "desirable for later". Regarding scipy.sparse, is the intention to update the various sparse array / matrix classes to use the array api? Or has there been some discussion about making use of the sparse classes that already exist in CuPy, PyTorch, etc? |
Good question @izaid. For |
I've opened #18915 to discuss updating the sparse array class to adopt the array api as a provider. I am curious what functionality in |
We could add that partial coverages of |
|
I think this should be relatively straightforward - we just need to maintain two code branches, one which is pure python and one which is in whichever language we want for NumPy arrays. In fact, I think it would be useful to ensure that no pure python code (including submodules which do not yet have array API support) is removed from now on. For example, if a Python file gets Cythonized, the pure Python file should be kept (even if it is 'dead code' at the time). |
That doesn't sound like a reasonable restriction to me. We have lots of compiled code around, and there isn't a real need to preemptively make it harder to accelerate code.
This I agree with. The point being that we want to avoid a regression in functionality - once something works with a GPU library for example, we can't break that. Which means keeping pure Python code around (it could be rewritten of course if that'd make it better/faster - a lot of our code is really old and suboptimal). |
In that case I would suggest that any new translations at least add a comment into the code to state that they are translations from pure Python. That should avoid a scenario where we mistakenly 'dismiss' a module as compiled code rather than going into the git history and restoring (an array-agnostic version of) the pure Python implementation. |
Sure, a code comment won't hurt. |
One general question: is there a best practice to install the additional dependencies in the scipy dev environment? The thought of adding pytorch, cupy and potentially jax to my conda environment is a little scary and likely to mess everything up. Might be worth to document also. |
This is why conda environments are good, you can just remove them if you mess them up. You just have to remember to not install things into the base environment. |
True, but my issue is that in the past I sometimes had to completely nuke and recreate the scipy development environment only for scipy itself. Getting the scipy development environment working is also a big issue for many newcomers. These additional heavy dependencies will make the whole dev environment even more prone to build problems. That's why I asked for a guideline if there is one. |
One Conda environment with every package is... challenging. For me, on Ubuntu, installing the correct Nvidia drivers and compatible CuPy/PyTorch/JAX versions in one environment is something I haven't managed to do yet. See for example google/jax#18032 (comment), which shows that you'll want CUDA 12.1. But I still haven't been able to figure out which parts are supposed to be downloaded from the Nvidia website, and which things should come straight from Conda. This is further complicated by the fact that I use a (potentially different) driver version for my system, and the process to uninstall/switch versions is not clear. One thing that I have found quite easy is to have separate environments (SciPy + JAX, SciPy + PyTorch etc.). That wasn't too difficult to get working, but it's not ideal. |
With that said, I agree that this needs documentation and a streamlined process. There are too many moving parts at the minute though IMO, especially on GPU, and covering GPU drivers across different platforms is a big task. |
What I do is this:
Results:
Everything in a single env may work, but is indeed likely to be fragile. No reason not to have a bunch of separate envs. In a similar vein: I also have a couple of repo clones, so it's easy to compare between different branches or build configs. |
Ok, so you also do not have the silver bullet ;). One env per array library sounds sensible. Not a great developer experience but unavoidable it seems. If someone with experience could add something to the developer docs, that would be great. GPU stuff could be left out in the beginning as that is a minefield we wanna avoid for some time. |
I'm happy to tackle this properly in the summer (in terms of separate environments), but we should probably include instructions for venvs as well. Feel free for anyone else to write something before then. Maybe we'll even get the GPU drivers figured out (I may play around with wiping my machine / starting new containers from scratch like Andrew has talked about before). The GPU stuff would probably be useful more widely for people who want to use a combination of CuPy/PyTorch/JAX, and I wouldn't be surprised if there are already efforts from people to document how to do it with a lot more knowledge than me. |
Also, have some mercy for the windows devs please. Most of the tools, in particular Jax has very little care for native environments. And I would really prefer that the array API tests are optional with a |
Interested to know more in that regard. E.g. Are there things that escape an environment? |
They are, there's an |
Nothing particular, Jax on windows is experimental and only available for So if we decide to go a bit more ambitious with testing on gpu locally for devs then windows is out. |
We should probably also declare Furthermore, I think it's time to finalize the decision on |
Agreed, I recently added "remove |
No such plan I think - legacy in particular means "not deprecated, we're keeping it but please don't use it for new code". Of course, once something has been in legacy for a long time and usage fades away, we are always free to reconsider and actually deprecate it. That'd be a new discussion though. |
See gh-18286 for the proposed design changes, and gh-18668 for the main PR that laid the foundations (it supported all of
scipy.cluster
, added a CI job that used bothpytorch-cpu
andnumpy.array_api
, and documented the design patterns).This issue is to track the current status of support across submodules, and other larger tasks/TODOs. Submodules:
scipy.cluster
scipy.constants
ENH: constants: add array api support #20593scipy.datasets
ENH: datasets: array API standard support #20594 (no changes needed)scipy.fft
# Don't do,scipy.fftpack
fftpack
is legacyscipy.integrate
scipy.interpolate
scipy.io
scipy.linalg
(in progress, see ENH: linalg: array library interoperability #19068)# Don't do, will be removedscipy.misc
scipy.ndimage
scipy.odr
scipy.optimize
scipy.signal
ENH: signal: add array API support #20678scipy.sparse
# TBD on whether to update functions in that namespace; don't touch the data structuresscipy.sparse.csgraph
scipy.sparse.linalg
scipy.spatial
scipy.special
(initial set of functions done by @mdhaber)scipy.stats
(see ENH: stats: add array API-support #20544 - in progress by @mdhaber, PRs welcome from others)Desirable for later:
numpy
). See this comment).array-api-compat
package, add support for them in SciPy (both can be tested in CI too). (JAX in progress, see ENH: array types: add JAX support #20085)The text was updated successfully, but these errors were encountered: