Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: rename #46428

Merged
merged 5 commits into from
Apr 3, 2022
Merged

TYP: rename #46428

merged 5 commits into from
Apr 3, 2022

Conversation

twoertwein
Copy link
Member

tested with mypy and pyright:

import pandas as pd

def bool_() -> bool:
    ...

df = pd.DataFrame({"a": []})
ser = pd.Series({"a": 1})

# DataFrame/Series
reveal_type(df.rename(columns={"a": "b"}))
reveal_type(df.rename(columns={"a": "b"}, inplace=False))
reveal_type(ser.rename("a"))
reveal_type(ser.rename("a", inplace=False))

# None
reveal_type(df.rename(columns={"a": "b"}, inplace=True))
reveal_type(ser.rename("a", inplace=True))

# both
reveal_type(df.rename(columns={"a": "b"}, inplace=bool_()))
reveal_type(ser.rename("a", inplace=bool_()))

@pep8speaks
Copy link

pep8speaks commented Mar 19, 2022

Hello @twoertwein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-04-01 12:11:24 UTC

@twoertwein twoertwein added the Typing type annotations, mypy/pyright type checking label Mar 19, 2022
@twoertwein twoertwein marked this pull request as draft March 19, 2022 03:02
@twoertwein
Copy link
Member Author

Marking as draft as #46423 should be merged first. Then this PR can add a few more IgnoreRaise to rename.

@twoertwein twoertwein marked this pull request as ready for review March 22, 2022 00:31
@twoertwein twoertwein requested a review from Dr-Irv March 22, 2022 21:02
Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small question. Otherwise, looks good.

pandas/core/series.py Outdated Show resolved Hide resolved
@twoertwein
Copy link
Member Author

Thank you @Dr-Irv for reviewing my PRs :) If you think I can provide feedback for your PRs here or at MS, feel free to ping me.

I slowly try to get the errors from pyright's reportGeneralTypeIssues to manageable amount to enable it at some point (#44855). Many (easier) to fix errors are caused by missing overloads (mypy assumes Any but pyright inspects all possible return values which then creates error).

@jreback jreback added this to the 1.5 milestone Mar 25, 2022
index: Renamer | None = ...,
columns: Renamer | None = ...,
index: Renamer[HashableTa] | None = ...,
columns: Renamer[HashableTb] | None = ...,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugly: need different TypeVars so that they are not tied to each other

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the comment here as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifying mapper and index/columns raise a TypeError.. TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'

As this is by definition a TypeError, we could catch this with a type checker by adding more overloads.

would presumably only need two TypeVars?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is by definition a TypeError, we could catch this with a type checker by adding more overloads.

This would double the number of overloads. I wouldn't mind skipping this (for now).

pandas/core/generic.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ugly, but it works. See suggestions.

pandas/_typing.py Show resolved Hide resolved
pandas/_typing.py Outdated Show resolved Hide resolved
pandas/core/series.py Show resolved Hide resolved
index: Renamer | None = ...,
columns: Renamer | None = ...,
index: Renamer[HashableTa] | None = ...,
columns: Renamer[HashableTb] | None = ...,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the comment here as well

@simonjayhawkins
Copy link
Member

I think I found a nicer solution that works with the latest pyright+mypy and is sanctions by one of the typing gods:

hmm, this is really the issue here, that Mapping is invariant in its key type, xref python/mypy#12464 (comment)

the response was to use Any as the key type as a workaround. While this is probably fine for stubs since a dict/mapping can only be created at runtime with hashable keys, this workaround does not help with the type checking of the function itself, as we would not know the type of the key inside the function. (Using Any also prevents us turning on some of the strict flags down the line.)

I fear that due to other questions in that thread that we didn't get a resolution on this one.

So while the solution here using the TypeVars seems to satisfy the requirements of both users checking their own code and checking the internals of pandas itself, it still feels a bit kludgy.

@twoertwein in you discussions with mypy/pyright regarding this did you come across other projects having this issue? Is there a precendent for using TypeVars as a workaround for Mapping being invariant in its key type?

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Mar 31, 2022

could we maybe use Any in the overloads (the public API) and Hashable in the signature of function itself (internal code checking)?

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 31, 2022

So while the solution here using the TypeVars seems to satisfy the requirements of both users checking their own code and checking the internals of pandas itself, it still feels a bit kludgy.

I agree. IMHO, if we want to use a type for the keys of Dict or Mapping, it can only be a base type (e.g., str, int) or we use Any. We can't use a Union (e.g., Dict[int|str, str] won't work). I don't think we can use Hashable, because essentially Hashable is a Union (although declared as a Protocol).

@simonjayhawkins
Copy link
Member

or we use Any

since that was "official" response from the mypy side I think we should probably do that for now.

This does not satisfy internal code checking requirements, hence the suggestion above #46428 (comment)

@twoertwein
Copy link
Member Author

twoertwein commented Mar 31, 2022

I think I finally agree with Mapping[Any, Hashable] in case of rename as pandas will only ever call it with Hasables and it is guaranteed to get a Hashable in return. (If we had code that inverts keys and values, then that wouldn't be okay with me.)

@twoertwein in you discussions with mypy/pyright regarding this did you come across other projects having this issue? Is there a precendent for using TypeVars as a workaround for Mapping being invariant in its key type?

No other projects but two of my "issues" at pyright microsoft/pyright#2344 (comment) (pyright: explicit use case to get covariant behavior) and microsoft/pyright#3272 (no objections by pyright and mypy maintainers for implementing covariant keys with typevars - but also no approval)

The only issue I have with Mapping[Any, Hashable] is that the keys of Mapping are not required to be Hashable (and neither are the arguments to a Callable):

from typing import Any, Hashable, Mapping, TypeVar

HashableT = TypeVar("HashableT", bound=Hashable)


def get_mapper() -> Mapping[list[int], int]:
    ...


def any_rename(mapper: Mapping[Any, Hashable]) -> None:
    ...


def typevar_rename(mapper: Mapping[HashableT, Hashable]) -> None:
    ...


# allowed - Mapping does not require the keys to be Hashable (Dict does)
any_rename(get_mapper())

# error (both mypy and pyright)
typevar_rename(get_mapper())

But I fully agree that the TypeVar solution is quite verbose when we have functions where we need multiple TypeVars to express the same type constraint (on the otherhand it will also work with any non-generic type including int | str).

@twoertwein
Copy link
Member Author

So while in the past we tried to be as precise as possible with the type annotations, i'm fine with less precise types in these situations (for now!)

I'll go with Any

Using Any also prevents us turning on some of the strict flags down the line.

I think we are unfortunately quite far from that point. I would prioritize py.typed over having no Anys (typeshed itself also has a few Anys).

@twoertwein
Copy link
Member Author

My takeaways for the future:

  • Avoid TypeVar magic when it is getting too verbose
  • When the first argument of Mapping/Dict/List/Callable is Hashable and the type is used as an input argument, replacing Hashable with Any seems like a good solution
  • For types stricter than Hashable, TypeVar magic is a solution (maybe even the only solution) but it might get verbose

Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled your latest version, and ran the following test:

# flake8: noqa: F841

from typing import List, Union, TYPE_CHECKING


import pandas as pd


def check_series_result(s: pd.Series):
    pass

def test_types_rename() -> None:
    # Scalar
    s1 = pd.Series([1, 2, 3]).rename("A")
    check_series_result(s1)
    # Hashable Sequence
    s2 = pd.Series([1, 2, 3]).rename(("A", "B"))
    check_series_result(s2)

    # Optional
    s3 = pd.Series([1, 2, 3]).rename(None)
    check_series_result(s3)

    # Functions
    def add1(x: int) -> int:
        return x + 1

    s4 = pd.Series([1, 2, 3]).rename(add1)
    check_series_result(s4)

    # Dictionary
    s5 = pd.Series([1, 2, 3]).rename({1: 10})
    check_series_result(s5)
    # inplace
    s6: None = pd.Series([1, 2, 3]).rename("A", inplace=True)

    if TYPE_CHECKING:
        s7 = pd.Series([1, 2, 3]).rename({1: [3, 4, 5]})  # type: ignore

mypy is thinking that all the results are Optional[Series], not Series (or None in the case of inplace), so there is an issue here.

@twoertwein
Copy link
Member Author

I pulled your latest version, and ran the following test

Thank you for testing! When I run mypy and pyright (both on python 3.10), I do not get a single error (except for unused variables from pyright).

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 1, 2022

I pulled your latest version, and ran the following test

Thank you for testing! When I run mypy and pyright (both on python 3.10), I do not get a single error (except for unused variables from pyright).

My mistake. I had a typo when I pulled the PR. Test passes for me now.

@jreback jreback merged commit 79fb2de into pandas-dev:main Apr 3, 2022
@jreback
Copy link
Contributor

jreback commented Apr 3, 2022

thanks @twoertwein

@twoertwein twoertwein deleted the rename branch May 26, 2022 01:59
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants