Skip to content

Backwards incompatible ideas for a major release

Matti Picus edited this page Nov 23, 2022 · 26 revisions

This is a collection of ideas for changes that would break backwards compatibility and be inappropriate for anything but a major release. If/when we do make a major release, we can then go through this list and see what can be taken along.

That does not mean that all of these ideas necessarily strictly require a major release.

Adding your name to the change, which means you will do the work to implement it would be nice.

Python APIs

  • Make the default integer type int64 at least on python 3, no matter what long is on the system (from @seberg). (Making the default integer equal to intp may be the simpler option that is at least predictable/easier to reason about.)
  • overhaul casting rules to avoid things like uint64 + int64 -> float64. Perhaps use "C-like" casting instead. See https://github.com/numpy/numpy/issues/12525.
  • Casting rules for arithmetic are value-dependent for scalars (https://github.com/numpy/numpy/issues/6240)
  • result_type(int, str) should be object, not str. (seberg: Or shouldn't it simply raise an error, i.e. never implicitly go to object? – In which case, is simple deprecation viable?)
  • Require explicitly writing dtype=object to get an object dtype array.
  • Find a solution to issues created by PyArray_Return: That is, most numpy functions, importantly ufuncs, convert 0-D array results to scalars when returning. This could be a breaking change returning arrays always, or more complex solutions. Possible steps forward that do not require breakage (immediately) are discussed in https://github.com/numpy/numpy/issues/13105.
  • Make the ufunc out argument force a higher precision loop (maybe possible without a major version increase?). https://mail.python.org/pipermail/numpy-discussion/2019-September/080106.html
  • Some APIs (probably only the loadtxt/genfromtxt) have backcompat to default to "byte string" behaviour, it would be nice to remove that and switch the default.
  • Get rid of long double / float96 / float128 completely. This is a very cumbersome alias on macOS, Windows and Linux-aarch64. And on Linux-x86 it's 80-bit. The time spent on long doubles is not worth it.
  • np.logical_or.reduce(), etc. (but more specifically and less controversial maybe just any/all) should probably return booleans by default. I.e. by default, do not try to imitate Python's logical or operator.
  • Clean up the namespaces, underlined exposed functions, and aliases.

C APIs

  • Extend the ndarray struct in order to speed up and clean up buffer handling. We did this already.
  • Implement the bf_releasebuffer on ndarray. This was never done, because it breaks compatibility with the "s", etc. parsing codes for the PyArg_Parse* API. However, maybe this break is more acceptable now and easier with a major release. Further, scalars have their own code paths now, so the amount affected code may be smaller.
  • Delete the sigint header and related functions (technically an ABI and API break, but a loud one and nobody probably notices)
  • Removing NPY_CHAR (see https://github.com/numpy/numpy/issues/2801 and linked PRs/issues)
  • Dtype cleanup ideas (see https://github.com/numpy/numpy/issues/2899)
  • Make the PyArray_Descr and PyUfunc_Object structs opaque like we did with PyArray_Object, extracting PyArray_Descr_Fields etc - this allows us to make API changes more easily later.
  • modify NPY_SORTKIND to allow different sorting algorithms (timsort, radixsort). THis requires a change in size of PyArray_ArrFuncs See https://github.com/numpy/numpy/pull/12586 https://github.com/numpy/numpy/pull/12586
  • Increase NPY_MAXARGS to more than 32, see https://github.com/numpy/numpy/issues/4398.
  • implement radixsort once sort ABI can be changed, see https://github.com/numpy/numpy/pull/12586
  • Remove promise to handle NULL as Py_None for object arrays (we do not use this, and it crashes hard, so could probably do it without a major release as well). The probably necessary exception is uninitialized or cleared data. To simplify buffer initialization and clearing it is easier to NULL it initially (and NULL it again upon clearing). That means writing to an object buffer/array must use Py_XSETREF. However, reading from an array/buffer would be allowed to assume NULL is not possible. A buffer has to be cleared (to avoid double clearing) using Py_XSETREF(ptr, NULL).

"Recompile the world release" (Break C-ABI)

  • The elsize slot of PyArray_Descr should be npy_intp or ssize_t and not integer.