Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndarray.astype typing ignores the passed dtype #19716

Closed
gwerbin opened this issue Aug 19, 2021 · 7 comments · Fixed by #19745
Closed

ndarray.astype typing ignores the passed dtype #19716

gwerbin opened this issue Aug 19, 2021 · 7 comments · Fixed by #19745

Comments

@gwerbin
Copy link

gwerbin commented Aug 19, 2021

Mypy is not aware that changing the dtype of an array changes its annotated type, even if the dtype is passed as a literal np.dtype[np.float64], np.dtype[np.float32], etc.

I imagine that this kind of type inference could be very difficult in general, but it would be nice if at least this special case worked.

Also - the parameters of np.ndarray don't appear to be documented anywhere. I kind of just guessed that it was something like tuple[int], where the length of the tuple represented the shape, but honestly I have no idea if this is right. It would be great if this was stated in the docs somewhere!

Reproducing code example:

import numpy as np

Vec64 = np.ndarray[tuple[int], np.dtype[np.float64]]
Vec32 = np.ndarray[tuple[int], np.dtype[np.float32]]

def convert(data: Vec64) -> Vec32:
    result = data.astype(np.float32)
    reveal_type(result)
    return result

x = np.array([1, 2, 3], dtype=np.float64)
y = convert(x)

Error message:

Mypy output:

% mypy mypy_numpy_test.py
mypy_numpy_test.py:8: note: Revealed type is "numpy.ndarray*[Tuple[builtins.int], numpy.dtype[numpy.floating[numpy.typing._64Bit]]]"
mypy_numpy_test.py:9: error: Incompatible return value type (got "ndarray[Tuple[int], dtype[floating[_64Bit]]]", expected "ndarray[Tuple[int], dtype[floating[_32Bit]]]")
Found 1 error in 1 file (checked 1 source file)

NumPy/Python version information:

In [3]: import sys

In [4]: print(sys.version)
3.9.6 (default, Aug  3 2021, 19:43:34)
[Clang 10.0.1 (clang-1001.0.46.4)]

In [5]: import numpy

In [6]: print(numpy.__version__)
1.21.2

In [7]: !mypy --version
mypy 0.910

mypy.ini:

[run]
strict = true

[mypy]
plugins =
	numpy.typing.mypy_plugin,
	classes.contrib.mypy.classes_plugin,
	returns.contrib.mypy.returns_plugin
@ShreyPatel4
Copy link

Hey, I would like to solve this issue. can you guide me

@BvB93 BvB93 linked a pull request Aug 24, 2021 that will close this issue
@BvB93 BvB93 changed the title Mypy plugin doesn't understand ndarray.astype ndarray.astype typing ignores the passed dtype Aug 24, 2021
@BvB93 BvB93 added this to the 1.21.3 release milestone Aug 24, 2021
@BvB93
Copy link
Member

BvB93 commented Aug 24, 2021

Ah yes, the issue here is that prior to #19140 incorrect typevar usage would cause the passed dtype to be ignored.
This issue was already fixed on the main branch, but I suppose was can backport it as well.

Also - the parameters of np.ndarray don't appear to be documented anywhere. I kind of just guessed that it was something like tuple[int], where the length of the tuple represented the shape, but honestly I have no idea if this is right. It would be great if this was stated in the docs somewhere!

True, there is some information about it in the npt.NDArray docs, but there is definitely room for improvement here;
I would advice you to create a dedicated issue for this though.

As for the parameters: I'd strongly recommend keeping the shape-type as Any for now, as it is very much a placeholder slot until we can properly get shape typing going once PEP 646 is live. Until that time use anything else at your own risk (or just use the more compact npt.NDArray).

@gwerbin
Copy link
Author

gwerbin commented Aug 24, 2021

As for the parameters: I'd strongly recommend keeping the shape-type as Any for now, as it is very much a placeholder slot until we can properly get shape typing going once PEP 646 is live. Until that time use anything else at your own risk (or just use the more compact npt.NDArray).

Interesting, thanks.

Ideally I would want to be able to specify something like:

  • Any shape as long as it has length 2 (i.e. any 2-d array)
  • Any dtype with the prefix 'f'

I'm sure this was discussed at great length already, but are these use cases covered in the proposed syntax under PEP 646? I would love to read more about the future roadmap for this, and hopefully participate in some constructive fashion.

I also know that there are problems with statically type-checking exact array dimensions - I've been implementing a Numpy-like interface in Idris 2 for example, and I quickly ran into situations where I not only needed to use the dependent type features, but also ended up needing some non-trivial goal rewriting, which seems way beyond the scope of what you can do in a Mypy plugin. With fixed-size arrays at the type level, even something as mundane as np.column_stack has to be dependently-typed!

@BvB93
Copy link
Member

BvB93 commented Aug 26, 2021

  • Any shape as long as it has length 2 (i.e. any 2-d array)

So PEP 646, the introduction of variadic generics, should allow us to implement a shape-typing system. I'd recommend taking a look at the Summary Examples section, as it contains a few examples in the context of array-shape-typing (including both dimensionality as well as exact shapes).

  • Any dtype with the prefix 'f'

Right, there is currently an issue on the implementation of dtype-typing-support (#19252). In short, there are four general categories of dtypes that we'll have to be able to parse:

  • Dtype-likes parameterizable w.r.t. np.generic, e.g. dtype=np.float64. These are by far the easiest to implement and I hope to get most (all?) of it done for the 1.22 release.
  • Builtin scalar types, e.g. dtype=float. These are a bit more tricky, but in principle it should be rather straightforward if we can get them parameterizable w.r.t. np.generic (for example via adding a np.-generic-returning method with the help of a mypy plugin).
  • Literal string, e.g. dtype="f8". These are honestly much more tricky, and quite frankly I can't think of any way of adding support here beyond adding a truck load of overloads. This is unmaintanable and in some cases downright impossible (see flexible and time-related dtypes).
  • Structures dtypes based on more miscellaneous object, e.g. dtype=[("field1", np.float64), ("field2", np.int64)]. The same issue/challenge here as with the literal strings

@BvB93
Copy link
Member

BvB93 commented Aug 26, 2021

Closed by #19745.

@BvB93 BvB93 closed this as completed Aug 26, 2021
@gwerbin
Copy link
Author

gwerbin commented Aug 26, 2021

Thanks again @BvB93. So if I wanted to support "any float dtype" I should do something like this?

from typing import TypeVar
import numpy as np
from numpy.typing import NDArray

FloatArray = TypeVar('FloatArray', NDArray[np.floating])

def f(x: FloatArray, y: FloatArray) -> FloatArray:
    return x + y

I got np.floating from https://numpy.org/doc/stable/reference/arrays.dtypes.html#specifying-and-constructing-data-types.

@BvB93
Copy link
Member

BvB93 commented Aug 30, 2021

- FloatArray = TypeVar('FloatArray', NDArray[np.floating])
+ FloatArray = TypeVar('FloatArray', bound=NDArray[np.floating])

Yup, besides the missing bound keyword this looks about right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants