Tracker: array types (CuPy, PyTorch & co) and array API standard support #18867

rgommers · 2023-07-12T12:58:36Z

izaid · 2023-07-12T15:37:16Z

I have a question that probably falls under "desirable for later". Regarding scipy.sparse, is the intention to update the various sparse array / matrix classes to use the array api? Or has there been some discussion about making use of the sparse classes that already exist in CuPy, PyTorch, etc?

rgommers · 2023-07-13T09:03:36Z

Good question @izaid. For sparse it's mainly the functionality on top of the data structures themselves that can support multiple array types - I've split off sparse.csgraph and sparse.linalg in the list above. Major changes to the sparse API is a separate topic; that's not in the cards I think. There's some desire from some folks to deprecate *_matrix in favor of *_array, whether that happens and on what timeline is still TBD and waiting for a concrete proposal. The *_array API should see some additions, but I can't see it become array API standard compatible (that'd be pydata/sparse).

ivirshup · 2023-07-19T19:29:24Z

I've opened #18915 to discuss updating the sparse array class to adopt the array api as a provider.

I am curious what functionality in sparse and its submodules could be targeted for becoming array API consumers given the broad reliance on native code. Has anyone looked into this yet?

rgommers · 2023-07-19T19:34:09Z

Thanks @ivirshup! I did think about that quite a bit already. I'm a bit swamped right this minute, but will to reply on gh-18915 within 1-2 days.

lucascolley · 2023-09-07T12:46:01Z

We could add that partial coverages of signal and linalg are in progress by Tyler and me respectively.

lucascolley · 2023-09-20T11:33:41Z

fft can be ticked off now!

lucascolley · 2024-02-08T00:33:04Z

Write a policy for cases where one wants to change existing pure Python code with array API support to compiled code (for performance improvements with numpy)

I think this should be relatively straightforward - we just need to maintain two code branches, one which is pure python and one which is in whichever language we want for NumPy arrays.

In fact, I think it would be useful to ensure that no pure python code (including submodules which do not yet have array API support) is removed from now on. For example, if a Python file gets Cythonized, the pure Python file should be kept (even if it is 'dead code' at the time).

rgommers · 2024-02-09T11:39:50Z

the pure Python file should be kept (even if it is 'dead code' at the time).

That doesn't sound like a reasonable restriction to me. We have lots of compiled code around, and there isn't a real need to preemptively make it harder to accelerate code.

I think this should be relatively straightforward - we just need to maintain two code branches, one which is pure python and one which is in whichever language we want for NumPy arrays.

This I agree with. The point being that we want to avoid a regression in functionality - once something works with a GPU library for example, we can't break that. Which means keeping pure Python code around (it could be rewritten of course if that'd make it better/faster - a lot of our code is really old and suboptimal).

lucascolley · 2024-02-09T11:45:49Z

That doesn't sound like a reasonable restriction to me. We have lots of compiled code around, and there isn't a real need to preemptively make it harder to accelerate code.

In that case I would suggest that any new translations at least add a comment into the code to state that they are translations from pure Python. That should avoid a scenario where we mistakenly 'dismiss' a module as compiled code rather than going into the git history and restoring (an array-agnostic version of) the pure Python implementation.

rgommers · 2024-02-09T11:48:50Z

Sure, a code comment won't hurt.

dschmitz89 · 2024-04-23T06:51:13Z

One general question: is there a best practice to install the additional dependencies in the scipy dev environment? The thought of adding pytorch, cupy and potentially jax to my conda environment is a little scary and likely to mess everything up. Might be worth to document also.

andyfaff · 2024-04-23T06:53:12Z

This is why conda environments are good, you can just remove them if you mess them up. You just have to remember to not install things into the base environment.

dschmitz89 · 2024-04-23T06:59:01Z

True, but my issue is that in the past I sometimes had to completely nuke and recreate the scipy development environment only for scipy itself. Getting the scipy development environment working is also a big issue for many newcomers.

These additional heavy dependencies will make the whole dev environment even more prone to build problems. That's why I asked for a guideline if there is one.

lucascolley · 2024-04-23T07:06:02Z

One Conda environment with every package is... challenging. For me, on Ubuntu, installing the correct Nvidia drivers and compatible CuPy/PyTorch/JAX versions in one environment is something I haven't managed to do yet.

See for example google/jax#18032 (comment), which shows that you'll want CUDA 12.1. But I still haven't been able to figure out which parts are supposed to be downloaded from the Nvidia website, and which things should come straight from Conda.

This is further complicated by the fact that I use a (potentially different) driver version for my system, and the process to uninstall/switch versions is not clear.

One thing that I have found quite easy is to have separate environments (SciPy + JAX, SciPy + PyTorch etc.). That wasn't too difficult to get working, but it's not ideal.

lucascolley · 2024-04-23T07:09:52Z

With that said, I agree that this needs documentation and a streamlined process. There are too many moving parts at the minute though IMO, especially on GPU, and covering GPU drivers across different platforms is a big task.

rgommers · 2024-04-23T07:14:24Z

What I do is this:

Edit environment.yml to add the desired dependencies for any custom testing, and append something descriptive to the default scipy-devenvironment name (e.g.,scipy-dev-cupy`)
Run `mamba env create -f environment.yml
Undo changes to environment.yml

Results:

$ mamba env list | rg scipy
scipy-dev             *  /home/rgommers/mambaforge/envs/scipy-dev
scipy-dev-icx            /home/rgommers/mambaforge/envs/scipy-dev-icx
scipy-dev-jax            /home/rgommers/mambaforge/envs/scipy-dev-jax
scipy-dev-sparse         /home/rgommers/mambaforge/envs/scipy-dev-sparse
....

Everything in a single env may work, but is indeed likely to be fragile. No reason not to have a bunch of separate envs.

In a similar vein: I also have a couple of repo clones, so it's easy to compare between different branches or build configs.

dschmitz89 · 2024-04-23T10:37:20Z

Ok, so you also do not have the silver bullet ;). One env per array library sounds sensible. Not a great developer experience but unavoidable it seems. If someone with experience could add something to the developer docs, that would be great. GPU stuff could be left out in the beginning as that is a minefield we wanna avoid for some time.

lucascolley · 2024-04-23T10:49:08Z

I'm happy to tackle this properly in the summer (in terms of separate environments), but we should probably include instructions for venvs as well. Feel free for anyone else to write something before then. Maybe we'll even get the GPU drivers figured out (I may play around with wiping my machine / starting new containers from scratch like Andrew has talked about before).

The GPU stuff would probably be useful more widely for people who want to use a combination of CuPy/PyTorch/JAX, and I wouldn't be surprised if there are already efforts from people to document how to do it with a lot more knowledge than me.

ilayn · 2024-04-23T11:53:14Z

Also, have some mercy for the windows devs please. Most of the tools, in particular Jax has very little care for native environments. And I would really prefer that the array API tests are optional with a dev.py flag.

andyfaff · 2024-04-23T12:05:08Z

Most of the tools, in particular Jax has very little care for native environments.

Interested to know more in that regard. E.g. Are there things that escape an environment?

rgommers · 2024-04-23T12:49:54Z

And I would really prefer that the array API tests are optional with a dev.py flag.

They are, there's an --array-api-backend flag to dev.py test, and it requires setting an environment variable SCIPY_ARRAY_API to even enable any of this. It's also a single separate CI job only at this point.

ilayn · 2024-04-24T09:01:40Z

Interested to know more in that regard. E.g. Are there things that escape an environment?

Nothing particular, Jax on windows is experimental and only available for cpu. Still a bit at a loss why certain things are not running on my laptop but probably it's on me.

So if we decide to go a bit more ambitious with testing on gpu locally for devs then windows is out.

rgommers · 2024-04-29T09:25:24Z

We should probably also declare scipy.io as out of scope, same as scipy.datasets (xref #20594 (comment)). Given that the supported file formats are all living on disk and the I/O is inherently going through host memory, there's not really anything to do. Changing to a non-numpy array type can just as easily be done by the user with a function call like xp.asarray as with a keyword to scipy.io functions.

Furthermore, I think it's time to finalize the decision on scipy.fftpack. It's legacy and we want everyone to use scipy.fft instead - so let's not touch fftpack at all.

lucascolley · 2024-04-29T09:45:07Z

Furthermore, I think it's time to finalize the decision on scipy.fftpack. It's legacy and we want everyone to use scipy.fft instead - so let's not touch fftpack at all.

Agreed, I recently added "remove fftpack??" to the SciPy 2.0 wiki. IIUC the plan with legacy things is to break them out into their own repo and archive them?

rgommers · 2024-04-29T16:38:52Z

No such plan I think - legacy in particular means "not deprecated, we're keeping it but please don't use it for new code". Of course, once something has been in legacy for a long time and usage fades away, we are always free to reconsider and actually deprecate it. That'd be a new discussion though.

rgommers added enhancement A new feature or improvement array types Items related to array API support and input array validation (see gh-18286) labels Jul 12, 2023

rgommers mentioned this issue Jul 12, 2023

RFC: SciPy array types & libraries support #18286

Open

lucascolley mentioned this issue Jul 24, 2023

WIP: ENH: fft: support array API lucascolley/scipy#2

Closed

This was referenced Aug 1, 2023

MAINT: fft: rename test_numpy.py to test_basic.py #19000

Merged

ENH: fft: support array API standard #19005

Merged

This was referenced Aug 15, 2023

ENH: linalg: array library interoperability lucascolley/scipy#6

Closed

ENH: linalg: array library interoperability #19068

Open

lucascolley mentioned this issue Sep 19, 2023

ENH: linalg: support array API for standard extension functions #19260

Open

lucascolley mentioned this issue Oct 23, 2023

MAINT: Update fft.helper import #19426

Merged

This was referenced Dec 11, 2023

DOC: array types: mention partial support in special #19677

Merged

ENH: Using Array API standard for functions implemented using pure Python and NumPy API #15354

Closed

lucascolley mentioned this issue Mar 19, 2024

WIP: stats.gstd: add array API support #20285

Closed

mdhaber mentioned this issue Mar 20, 2024

ENH: stats.moment: add array API support #20292

Merged

lucascolley mentioned this issue Mar 20, 2024

ENH: array types: add JAX support #20085

Open

3 tasks

lucascolley mentioned this issue Apr 21, 2024

ENH: stats: add array API-support #20544

Open

69 tasks

j-bowhay mentioned this issue Apr 27, 2024

ENH: constants: add array api support #20593

Merged

j-bowhay pinned this issue Apr 27, 2024

lucascolley mentioned this issue Apr 27, 2024

ENH: datasets: array API standard support #20594

Closed

j-bowhay mentioned this issue Apr 28, 2024

ENH: stats.skewtest: add array-API support #20597

Merged

lucascolley mentioned this issue May 9, 2024

ENH: signal: add array API support #20678

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracker: array types (CuPy, PyTorch & co) and array API standard support #18867

Tracker: array types (CuPy, PyTorch & co) and array API standard support #18867

rgommers commented Jul 12, 2023 •

edited by lucascolley

izaid commented Jul 12, 2023

rgommers commented Jul 13, 2023

ivirshup commented Jul 19, 2023

rgommers commented Jul 19, 2023

lucascolley commented Sep 7, 2023

lucascolley commented Sep 20, 2023

lucascolley commented Feb 8, 2024

rgommers commented Feb 9, 2024

lucascolley commented Feb 9, 2024

rgommers commented Feb 9, 2024

dschmitz89 commented Apr 23, 2024

andyfaff commented Apr 23, 2024

dschmitz89 commented Apr 23, 2024

lucascolley commented Apr 23, 2024 •

edited

lucascolley commented Apr 23, 2024

rgommers commented Apr 23, 2024

dschmitz89 commented Apr 23, 2024 •

edited

lucascolley commented Apr 23, 2024

ilayn commented Apr 23, 2024

andyfaff commented Apr 23, 2024

rgommers commented Apr 23, 2024

ilayn commented Apr 24, 2024 •

edited

rgommers commented Apr 29, 2024

lucascolley commented Apr 29, 2024

rgommers commented Apr 29, 2024

Tracker: array types (CuPy, PyTorch & co) and array API standard support #18867

Tracker: array types (CuPy, PyTorch & co) and array API standard support #18867

Comments

rgommers commented Jul 12, 2023 • edited by lucascolley

izaid commented Jul 12, 2023

rgommers commented Jul 13, 2023

ivirshup commented Jul 19, 2023

rgommers commented Jul 19, 2023

lucascolley commented Sep 7, 2023

lucascolley commented Sep 20, 2023

lucascolley commented Feb 8, 2024

rgommers commented Feb 9, 2024

lucascolley commented Feb 9, 2024

rgommers commented Feb 9, 2024

dschmitz89 commented Apr 23, 2024

andyfaff commented Apr 23, 2024

dschmitz89 commented Apr 23, 2024

lucascolley commented Apr 23, 2024 • edited

lucascolley commented Apr 23, 2024

rgommers commented Apr 23, 2024

dschmitz89 commented Apr 23, 2024 • edited

lucascolley commented Apr 23, 2024

ilayn commented Apr 23, 2024

andyfaff commented Apr 23, 2024

rgommers commented Apr 23, 2024

ilayn commented Apr 24, 2024 • edited

rgommers commented Apr 29, 2024

lucascolley commented Apr 29, 2024

rgommers commented Apr 29, 2024

rgommers commented Jul 12, 2023 •

edited by lucascolley

lucascolley commented Apr 23, 2024 •

edited

dschmitz89 commented Apr 23, 2024 •

edited

ilayn commented Apr 24, 2024 •

edited