Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: ENH: fft: support array API #2

Closed
wants to merge 93 commits into from
Closed

WIP: ENH: fft: support array API #2

wants to merge 93 commits into from

Conversation

lucascolley
Copy link
Owner

@lucascolley lucascolley commented Jul 18, 2023

Reference issue

RFC for array API support in SciPy: scipy#18286
Tracker for array API support in SciPy: scipy#18867
Merged PR adding machinery and covering scipy.cluster: scipy#18668

(This PR follows on from issue tupui#26)

What does this implement/fix?

This PR extends the array API support of scipy#18668 to also cover scipy.fft.

fft is an array API standard extension module and has matching APIs in both CuPy and PyTorch. The goal is to fully cover scipy.fft to conform to the array API standard, as specified in the array API standard documentation:

A conforming implementation of the array API standard must provide and support all the functions, arguments, data types, syntax, and semantics described in this specification.

A conforming implementation of the array API standard may provide additional values, objects, properties, data types, and functions beyond those described in this specification.

For more context on the standard and the goal of SciPy adoption, please see the RFC.

For explanation of the machinery of which this PR makes use, please see the cluster PR.

Changes:

  • The uarray dispatch structure of _basic.py has been moved to a new file, _basic_uarray.py, which is used for NumPy arrays and array-likes. All input arrays are now validated by array_namespace, so all valid input arrays either support the array API standard or are converted to NumPy arrays. The functions of _basic.py fall into two categories in their handling of non-NumPy arrays:
  1. Those which are part of the standard first raise exceptions if unsupported parameters are provided. They then check whether the array's namespace has an fft module. If it does, then that is used. If not then a conversion to NumPy is attempted in order to use uarray. If successful, the result is converted back to the same namespace as the input array. For an example, see fft.fft.
  2. For those which are not part of the standard, a conversion to NumPy is attempted in order to use uarray. If successful, the result is converted back to the same namespace as the input array. For an example, see fft.fft2. This means that GPU support is restricted to functions which are part of the standard for now, as otherwise they are hitting compiled code made for NumPy arrays. This design decision is explained in the RFC:
RFC excerpt (expandable)

When a non-NumPy array type sees compiled code in SciPy (which tends to use the NumPy C API), we have a couple of options:

  1. dispatch back to the other library (PyTorch, CuPy, etc.).

  2. convert to a NumPy array when possible (i.e., on CPU via the buffer protocol, DLPack, or array), use the compiled code in question, then convert back.

  3. don't support this at all

I'll note that (1) is the long-term goal; how to implement this is out of scope for this proposal - for more on that, see the Appendix section. For now we choose to do (2) when possible, and (3) otherwise. Switching from that approach to (1) in the future will be backwards-compatible.

  • The same design has been applied to _realtransforms.py and _realtransforms_uarray.py, save for the fact that there are no standard functions in these files.
  • fftfreq, rfftfreq, fftshift and ifftshift were previously just imported from numpy. They now have implementations in _helper.py, and follow the pattern for standard functions.
  • The NumPy backend of _fftlog.py has been moved to a new file, _fftlog_np.py. Its uarray dispatch structure remains in the file _fftlog_multimethods.py. The functions now follow the pattern for non-standard functions.
  • Tests for fftfreq etc. have been copied over from NumPy to test_helper.py.
  • Tests have been modified to be array API compatible where possible.
  • TestNamespaces has been added to test_basic.py and test_helper.py to check that output arrays are of the same namespace as input arrays.
  • test_non_standard_params has been added to test_basic.py to check that exceptions are raised for unsupported parameters.
  • A new helper has been added to conftest.py: set_assert_allclose(xp), which returns an assert_allclose equivalent for the namespace xp (currently supports cupy and torch), to allow testing on GPU.
  • Two new helpers have been added to _array_api.py: is_numpy(xp) which checks whether xp.__name__ is "scipy._lib.array_api_compat.array_api_compat.numpy", and _assert_matching_namespace by Tyler Reddy.

Additional information

To-do:

Difficulties:

  • Some errors have come up where the floating point precision hasn't quite been good enough. If somebody could check that any introductions/modifications of rtol are appropriate, that would be greatly appreciated.
  • test_basic.py::test_multiprocess is currently skipped for array API since it hangs for any array API library (even numpy.array_api). Generally, multithreading tests (including those using the unsupported workers keyword in test_multithreading.py are skipped apart from TestFFTThreadSafe in test_basic.py.
  • @skip_if_array_api_backend and @pytest.mark.parametrize don't seem to work together, raising exceptions for the backends which aren't skipped, no matter which decorator comes first. If anyone could get this working, that would be great!
  • @skip_if_array_api_gpu currently skips on every backend when python dev.py test -b all is used, due to a fault in its logic. This is not critical since the tests can be run separately for each backend, but this is unintended behaviour which should be fixed.

Implementation Details:

  • Any tests of non-standard functions (which involve attempted conversion to NumPy upon reaching compiled code) are skipped on GPU backends.
  • The signatures for fftfreq and rfftfreq have changed to (n, d=1.0, *, xp=None, device=None). The addition of device is to match the signature in the standard. The addition of xp is to specify the namespace for the output array. This is required since these are "array-generating functions" which output an array without taking one as input. Since array-api-compat and numpy.array_api are yet to implement fft, these functions and their tests currently require workarounds/pytest.xfail's. A lot of cleanup will be possible at a later date.

    scipy/scipy/fft/_helper.py

    Lines 147 to 152 in 2cb6268

    xp = np if xp is None else xp
    # numpy does not yet support the `device` keyword
    # `xp.__name__ != 'numpy'` should be removed when numpy is compatible
    if hasattr(xp, 'fft') and xp.__name__ != 'numpy':
    return xp.fft.fftfreq(n, d=d, device=device)
    return np.fft.fftfreq(n, d=d)
    if xp.__name__ == 'numpy.array_api':
    pytest.xfail("`device` keyword has not yet been implemented\
    in `numpy.array_api.fft.fftfreq`")
    elif xp.__name__ == 'cupy':
    pytest.xfail("`device` keyword has not yet been implemented\
    in `array_api_compat.cupy.fft.fftfreq`")
    These implementations will close fftfreq uses numpy module as example scipy/scipy#13887.
    Perhaps a warning would be good to state that a NumPy array is returned unless the xp parameter is provided?
  • test_basic.py::test_fft_with_order is skipped since orders are not supported in the array API. Likewise, test_identity_1d_overwrite and test_identity_nd_overwrite in test_real_transforms.py are skipped since the overwrite_x keyword is not supported.
  • Some of the tests in test_basic.py::TestFFT1D and test_helper.py::TestFFTShift are failing on torch backend due to differences in the function signatures, but they will pass once array-api-compat has implemented torch.fft. Currently either @skip_if_array_api_backend('torch') or pytest.xfail is used.
  • Justification for why we have set_assert_allclose instead of just converting to NumPy and using assert_allclose: to_device() -- any way to force back to host "portably?" data-apis/array-api#626. Additional dtype casts are needed for torch.assert_close e.g. in test_float32 (a part of the previous test_dtypes) and in test_basic.py::TestFFTThreadSafe.

Follow-up:

  • assert_raises(ValueError, func, x, overwrite_x=True)
    assert_raises(ValueError, func, x, workers=2)
    # `plan` param is not tested since SciPy does not use it currently
    # but should be tested if it comes into use
  • Docstrings need to be updated for the array API. This includes the new keywords for fftfreq and rfftfreq.
  • The device keyword of fftfreq and rfftfreq are not currently tested since the keyword only works with PyTorch currently. But this should probably be tested once more array libraries support it.
  • We may want to rename _fftlog_multimethods.py to _fftlog_uarray.py. Currently omitted to keep the diff clean.
  • Functions outside of the array API like fft2 could be rewritten in terms of the standard functions in order to avoid conversion to NumPy, thus allowing them to run on GPU.

Many thanks to Pamphile, Irwin and Ralf who have given really helpful guidance in the making of this PR!

References:

@lucascolley lucascolley self-assigned this Jul 20, 2023
@lucascolley lucascolley added the enhancement New feature or request label Jul 20, 2023
scipy/_lib/_testutils.py Outdated Show resolved Hide resolved
scipy/fft/_basic.py Show resolved Hide resolved
scipy/fft/_basic.py Show resolved Hide resolved
scipy/fft/_helper.py Outdated Show resolved Hide resolved
scipy/fft/tests/test_basic.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lucascolley 🚀 This is moving good.

I am thinking about the real PR you are going to open. I anticipate that the main critic you will get will be around the refactoring (move/rename) part. Ideally, the move part would be done in a single commit, the first one. This way we can tell maintainers to review this commit separately from the rest (in GitHub you can filter the commits you are looking at and here it will greatly help.)

scipy/_lib/_testutils.py Outdated Show resolved Hide resolved
scipy/fft/tests/test_basic.py Outdated Show resolved Hide resolved
scipy/fft/tests/test_basic.py Show resolved Hide resolved
scipy/fft/tests/test_basic.py Outdated Show resolved Hide resolved
scipy/fft/_helper.py Outdated Show resolved Hide resolved
@rgommers
Copy link

rgommers commented Aug 1, 2023

I am thinking about the real PR you are going to open. I anticipate that the main critic you will get will be around the refactoring (move/rename) part. Ideally, the move part would be done in a single commit, the first one. This way we can tell maintainers to review this commit separately from the rest (in GitHub you can filter the commits you are looking at and here it will greatly help.)

I am likely missing a reason and/or previous discussion here, but: why are these large code moves / file renames necessary? It seems to me like it's splitting out the uarray dispatching code, however the docstrings for example should not be moved as far as I can tell. E.g. for the fft function:

  • it's imported in __init__.py from ._basic (not ._basic_np),
  • hence the docstring for the function should remain in _basic.py,
  • the For non-numpy arrays, this ... docstring is more a code comment than a docstring,
  • _np_basic.fft returns a dispatchable, that's actually deserving of code comments (maybe at the top of _basic_np.py?) - it hits _ScipyBackend in _backend.py which actually forwards to the fft._pocketfft extension module which contains the actual implementation
  • so once the docstrings stay, perhaps the _basic_np.py file should be named _basic_uarray.py instead, since all it will then contain is a thin layer of uarray dispatchers.

To verify the docstring behavior:

$ python dev.py ipython
>>> from scipy import fft
>>> fft.fft?

Also a minor comment on code style: try to not do import ... as ... unless you can avoid it. E.g., in from . import _basic_np as npbasic just remove the as npbasic and the code will be easier to read.

@rgommers
Copy link

rgommers commented Aug 1, 2023

Great, this now looks much easier to read and review, and I like that the _basic_uarray.py has been boiled down to its essence.

@rgommers
Copy link

rgommers commented Aug 2, 2023

@lucascolley in every *_uarray.py file, I suggest putting the following comment right below the import statements:

# The functions in this file decorated with `@_dispatch` will only be called
# from the correspondingly named public functions with NumPy arrays or
# array-likes, not with CuPy, PyTorch and other array API standard supporting
# objects. See the docstrings in `_backend.py` for more details and examples.

Let's leave out the longer comment in _backend.py for now, the examples are fairly clear there, so this should already help enough.

@lucascolley
Copy link
Owner Author

Closing now as the full PR has been opened on the main repo: scipy#19005

@lucascolley lucascolley closed this Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fftfreq uses numpy module as example
4 participants