Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: csr_array(int()) errors #20128

Closed
ilan-gold opened this issue Feb 21, 2024 · 20 comments · Fixed by #20490
Closed

BUG: csr_array(int()) errors #20128

ilan-gold opened this issue Feb 21, 2024 · 20 comments · Fixed by #20490
Labels
array types Items related to array API support and input array validation (see gh-18286) defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse
Milestone

Comments

@ilan-gold
Copy link
Contributor

Describe your issue.

It seems like the described behavior does no work while np.array(int()) does seem to work. This comes from the array api test suite setup, so presumably it should work and indeed does for csr_matrix. I think the error is recent (or on main) because it does not error in 1.11.4. I know there has been a lot of work around 1d so sorry if this is expected! Happy to fix if it's a real bug!

Reproducing Code Example

from scipy import sparse
import numpy as np

sparse.csr_array(int()) # errors
sparse.csr_matrix(int()) # does not error
np.array(int()) # does not error

Error message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ilangold/Projects/Theis/scipy/scipy/sparse/_compressed.py", line 86, in __init__
    self._coo_container(arg1, dtype=dtype)
  File "/Users/ilangold/Projects/Theis/scipy/scipy/sparse/_coo.py", line 81, in __init__
    self._shape = check_shape(M.shape, allow_1d=is_array)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/Projects/Theis/scipy/scipy/sparse/_sputils.py", line 317, in check_shape
    raise TypeError("function missing 1 required positional argument: "
TypeError: function missing 1 required positional argument: 'shape'

SciPy/NumPy/Python version and system information

1.13.0.dev0+1365.cd9d20b 1.26.4 sys.version_info(major=3, minor=11, micro=8, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /Users/ilangold/micromamba/envs/scipy-dev-array-api/include
    lib directory: /Users/ilangold/micromamba/envs/scipy-dev-array-api/lib
    name: openblas
    openblas configuration: USE_64BITINT=0 DYNAMIC_ARCH=0 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK=0 NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP=1 VORTEX MAX_THREADS=128
    pc file directory: /Users/ilangold/micromamba/envs/scipy-dev-array-api/lib/pkgconfig
    version: 0.3.26
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /Users/ilangold/micromamba/envs/scipy-dev-array-api/include
    lib directory: /Users/ilangold/micromamba/envs/scipy-dev-array-api/lib
    name: openblas
    openblas configuration: USE_64BITINT=0 DYNAMIC_ARCH=0 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK=0 NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP=1 VORTEX MAX_THREADS=128
    pc file directory: /Users/ilangold/micromamba/envs/scipy-dev-array-api/lib/pkgconfig
    version: 0.3.26
  pybind11:
    detection method: pkgconfig
    include directory: /Users/ilangold/micromamba/envs/scipy-dev-array-api/include
    name: pybind11
    version: 2.11.1
Compilers:
  c:
    args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem,
      /Users/ilangold/micromamba/envs/scipy-dev/include, -D_FORTIFY_SOURCE=2, -isystem,
      /Users/ilangold/micromamba/envs/scipy-dev/include
    commands: arm64-apple-darwin20.0.0-clang
    linker: ld64
    linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/ilangold/micromamba/envs/scipy-dev/lib,
      -L/Users/ilangold/micromamba/envs/scipy-dev/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong,
      -O2, -pipe, -isystem, /Users/ilangold/micromamba/envs/scipy-dev/include, -D_FORTIFY_SOURCE=2,
      -isystem, /Users/ilangold/micromamba/envs/scipy-dev/include
    name: clang
    version: 16.0.6
  c++:
    args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++,
      -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/ilangold/micromamba/envs/scipy-dev/include,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/ilangold/micromamba/envs/scipy-dev/include
    commands: arm64-apple-darwin20.0.0-clang++
    linker: ld64
    linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/ilangold/micromamba/envs/scipy-dev/lib,
      -L/Users/ilangold/micromamba/envs/scipy-dev/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong,
      -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0,
      -isystem, /Users/ilangold/micromamba/envs/scipy-dev/include, -D_FORTIFY_SOURCE=2,
      -isystem, /Users/ilangold/micromamba/envs/scipy-dev/include
    name: clang
    version: 16.0.6
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.8
  fortran:
    args: -march=armv8.3-a, -ftree-vectorize, -fPIC, -fno-stack-protector, -O2, -pipe,
      -isystem, /Users/ilangold/micromamba/envs/scipy-dev/include
    commands: /Users/ilangold/micromamba/envs/scipy-dev/bin/arm64-apple-darwin20.0.0-gfortran
    linker: ld64
    linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/ilangold/micromamba/envs/scipy-dev/lib,
      -L/Users/ilangold/micromamba/envs/scipy-dev/lib, -march=armv8.3-a, -ftree-vectorize,
      -fPIC, -fno-stack-protector, -O2, -pipe, -isystem, /Users/ilangold/micromamba/envs/scipy-dev/include
    name: gcc
    version: 12.3.0
  pythran:
    include directory: ../../../../micromamba/envs/scipy-dev-array-api/lib/python3.11/site-packages/pythran
    version: 0.15.0
Machine Information:
  build:
    cpu: aarch64
    endian: little
    family: aarch64
    system: darwin
  cross-compiled: false
  host:
    cpu: aarch64
    endian: little
    family: aarch64
    system: darwin
Python Information:
  path: /Users/ilangold/micromamba/envs/scipy-dev-array-api/bin/python3.11
  version: '3.11'
@ilan-gold ilan-gold added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Feb 21, 2024
@ilan-gold
Copy link
Contributor Author

Just checked 1.12.0 as well and does not seem to be an error there either.

@lucascolley lucascolley added scipy.sparse array types Items related to array API support and input array validation (see gh-18286) labels Feb 21, 2024
@lucascolley
Copy link
Member

sparse.csr_array(int()) is equivalent to sparse.csr_array(np.array(0)). So I suppose the question is whether that should be allowed.

@dschult
Copy link
Contributor

dschult commented Feb 21, 2024

We don't support 0-D sparse arrays. So I think this slipped through our shape checking changes.
That will still not make it conform to array_api, but it will be a different error message.

I'm not sure what a sparse version of a 0-D array would look like. Probably raising is the right thing to do even if that breaks array_api. @ilan-gold what do you think should be returned in this case?

@ilan-gold
Copy link
Contributor Author

@dschult Let me do some fiddling and inspection. This isn't even in one of the tests for the array api, it's just in the lead up to starting the tests. So maybe it's not even in there (although I somewhat doubt this).

We don't support 0-D sparse arrays.

Is there a design document or the like I could look at as a reference for this sort of thing?

@rgommers
Copy link
Member

Indexing a 2-D CSR array yields a numpy scalar:

>>> csr_2d[1, 1]
np.float64(1.0)

That is the same behavior as numpy, and given that numpy scalars duck type as 0-D arrays, this is okay. So there shouldn't be an urgent need for 0-D sparse arrays I'd think.

That said, if 1-D sparse arrays are a thing, I don't see why 0-D sparse arrays shouldn't be a thing also. It's probably more a "is it worth it" question.

@ilan-gold
Copy link
Contributor Author

In terms of " is it worth it" (since we will be getting 1D sparse arrays from what I understand), I would think maybe yes? Beyond the goal of matching numpy's behavior (which is an array with shape ()), this behavior is "officially" tested in the array api. However, I don't see anything officially in the spec. Could you point me to it if it is there?

I think the overriding factor should be the array api's demands and then secondarily can think about numpy.

@rgommers
Copy link
Member

However, I don't see anything officially in the spec. Could you point me to it if it is there?

From https://data-apis.org/array-api/draft/API_specification/array_object.html: "Furthermore, a conforming implementation of the array API standard must support, at minimum, array objects of rank (i.e., number of dimensions) 0, 1, 2, 3, and 4 and must explicitly document their maximum supported rank N. ... Conforming implementations must support zero-dimensional arrays."

It is pretty weird indeed to return numpy scalars (i.e., a more-or-less 0-D array from another library) for a standard operation like indexing a 1-D or 2-D array.

@ilan-gold
Copy link
Contributor Author

Thanks! So it seems we do indeed need this behavior to not error for array api to work then. And more generally, beyond being able to construct these arrays as the issue here specifies, we need the return type of scipy sparse arrays then to also be of this 0-d type.

@ilan-gold
Copy link
Contributor Author

This is somewhat related to #19919 as well. The array api test suite actually provides a nice barrage of tests by which we could try to standardize some of this behavior. I will do some more digging to formalize a list or at least some more concrete points of what exactly is needed/missing.

But that is separate from this issue, which I think is focused on construction, not return types.

@lucascolley
Copy link
Member

So it seems we do indeed need this behavior to not error for array api to work then

Note the discussion in gh-18915, it seems unlikely that we'll reach a point where scipy.sparse arrays will "work" with the array API standard. (Maybe you just mean 'work, as far as we can with what is in scope')

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Feb 22, 2024

@lucascolley I agree, but it just seemed that this case was special given that covering it is apparently needed just to get the tests to run at all. Otherwise I don't think I would have posted it.

One thing I can see as a result of thinking about this would be some level of customization for hypothesis tests in the array api test suite. This would allow us to pass tests we want to, while being clear about what we don't support. I am not really sure it makes sense to have some sort of testing suite where some things pass and others error, which we then have to track by hand somehow. I imagine we will not be the only ones with this issue. Although, I guess trakcing by hand isn't the worst thing, although it would not let us customize to only do 2D tests for example. So we might want __getitem__ to pass for 2D but not 3D.

@lucascolley
Copy link
Member

One thing I can see as a result of thinking about this would be some level of customization for hypothesis tests in the array api test suite. This would allow us to pass tests we want to, while being clear about what we don't support.

Have you seen https://github.com/numpy/numpy/blob/main/tools/ci/array-api-skips.txt?

@ilan-gold
Copy link
Contributor Author

Oh very cool @lucascolley thanks! I had not!

@dschult
Copy link
Contributor

dschult commented Feb 23, 2024

Additional tests (and maybe tests of what we don't support) would naturally go into the scipy.sparse module scipy/sparse/tests/test_array_api.py

@lucascolley
Copy link
Member

NumPy just installs array-api-tests in a CI job and passes in the list of skips: https://github.com/numpy/numpy/blob/18c13458d544d9b935b63e7b495a346ec14f5468/.github/workflows/linux.yml#L212-L249

@dschult
Copy link
Contributor

dschult commented Feb 23, 2024

I'm thinking about what a 0D sparse array would look like. I think we would just want to make numpy ndarray 0D arrays be the sparse array 0D arrays. They aren't identified as scipy.sparse in any way. But is that even needed? I guess what I mean is -- to support the array api we can combine array types with other libraries, right? (There is already at least one place where we return a dense nd array for n>2. I can't quite recall, but I think it is after multiplying a sparse array by a nd dense array.)

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Mar 26, 2024

@dschult After our discussion I did some work on this. I am not sure I am actually that comfortable just sort of saying "np arrays are 0d arrays." I have a branch for this and I'll make a PR but given the low turnout at the last WG, I'll just keep my branch (which goes sparse.csr_array(int()) -> 1x1 sparse array) and then we can discuss at the next one. I do think np.array would be easiest but I am not sure it makes programming sense (i.e., declare a class and then just get a completely unrelated one).

@dschult
Copy link
Contributor

dschult commented Mar 26, 2024

I guess you mean: you're not comfortable saying "np scalars are 0d arrays".

The option of: sparse.csr_array(int()) -> 1x1 sparse array matches what sparse_matrix does: treating 0d input as 2d by default. But I thought that the array-api says that array libraries should provide a 0d option.

Can you show us the disadvantages of using 0d numpy arrays? You mention that it doesn't make sense to "declare a class and then just get a completely unrelated one" -- but these are certainly not completely unrelated. And that standard doesn't seem to fit with the array-api ethos. I'm not saying we really should use numpy 0d for scipy sparse. I'm trying to figure out what the disadvantages and advantages of that approach are in terms of the array-api, and in terms of users not having to change their code much when switching libraries (which I think of as the driver of the array-api approach).

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Mar 27, 2024

But I thought that the array-api says that array libraries should provide a 0d option.

I think we are not going to support everything in the array api anyway but since this is needed to get tests running we need to return something that has a dtype attribute.

For advantages/disadvantages, I am not sure there's really a disadvantage of using a numpy array besides the semantics - they are not completely unrelated, but returning a different class from a different package from the constructor is just kind of strange to me. It is true that we'll need to "support" numpy arrays as a returned class anyway (as far as I can tell) for mixed ops that might produce a dense array.

It seems to me that a 1x1 sparse array would be nice with the disadvantage being that it might make sense to implement 0d sparse arrays if/when 1d arrays are merged, in which case this doesn't matter so much. But I don't think that's a good idea. As @ivirshup pointed out, despite the 1d array work, these are fundamentally 2d classes....So I'm still in favor of 1x1 sparse array.

@ilan-gold
Copy link
Contributor Author

Decision: throw an error uniformly across all sparse array classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@rgommers @dschult @ilan-gold @lucascolley and others