Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variables annotation #913

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Bibo-Joshi
Copy link
Contributor

Hi.

TIL that pylance by default does not check the types of class/instance variables that are not unnotated, see microsoft/pylance-release#1983. Via this issue I became aware of the guidelines at https://typing.readthedocs.io/en/latest/libraries.html, which IISC were copied as-is from the docs of the pyright type checker.

These guidelines state

All class variables, instance variables, and methods that are "visible" (not overridden) are annotated and refer to known types

From the discussion at microsoft/pylance-release#1983 I understood that pyright hence doesn't consider a the variable name to be annotated in the following example:

class MyClass:
    def __init__(self, name: str):
        self.name = name

This was very surprising to me, especially since I've never had type inference problems for such situations with the tools I use (namely the PyCharm IDE and the mypy type checker). Adding a type annotation of the style self.name: str = name seems really unnecessary and IMO promotes a suboptimal coding style.

Of course if pyright wants to make these requirements, that's fine with me. However, recommending obvious repetitons in a guideline on a homepage maintained by the python organisation is something that I'm not really comfortable with. Moreover, because the guideline was copied as-is from pyright, it strongly emphasizes pyright as typechecker, which I also wouldn't expect from a python source. For comparison: the docs of the typing module which are somewhat longer than these guidelines mention "e.g., via mypy or Pyre" exactly once.

From srittau/peps#94 (comment) I understand that this guide is not set in stone, which is why I'd like to propose the following changes:

  1. Add type inferrence within __init__/__new__ to the recommended ways of declaring the type of class/instance variables
  2. Make the guide neutral in terms of specific type checkers

I'd like to point out that I'm completely on board with the other recommendations of the guideline (tbh I, too, haven't read every sentence …) and of course support having such a guideline in a official python resource in general. Moreover, the aim of this PR is certainly not to pressure pyright into adopting the changes that I propose - after all it's "just" a guideline and any type checker can of course freely choose which features it supports and which it does not.

I hope that could make my intentions for this change clear and am looking forward for your feedback.

@erictraut
Copy link
Collaborator

erictraut commented Oct 27, 2021

I'm the original author of this document (as well as one of the main contributors to pyright).

I'm happy to have a discussion about accommodating more cases where type inference is used, but I would like us to consider the following.

First, type inference rules are not part of the Python typing standards, and they vary from one type checker to another. Authors of the various type checkers will be resistant to standardizing these rules because the number of cases is very large (probably too large to feasibly enumerate in a spec), type inference capabilities will depend on internal architecture of each type checker, and changes to the existing type inference rules will be very painful for users who have already committed to using a particular type checker for their code base.

Second, type information is used for more than just type checking. Pylance (the language server built on pyright) has millions of users, and only a small percentage of them enable type checking today. The vast majority of them make use of type annotations for interactive editing features like completion suggestions, signature help, and semantic coloring. PyCharm also uses type information for these features. I think it's important to consider these use cases because they are important to the vast majority of Python developers. For these use cases, it's important for library type information to be accurate and complete — and for type evaluations to be very fast.

Third, an oft-cited principle in Python is "Explicit is better than implicit". I would think that a library author would especially want this principle to apply to the public interface contract exposed by their library. After all, this is the contract they must be maintained over time to retain compatibility with consumers of that library.

These considerations led me to suggest that in a "py.typed" library, symbols that comprise the public interface contract should have explicit type annotations. My document carves out a few specific cases (constants and enum members that are assigned literal values). In these specific cases, the type inference rules were sufficiently clear so as to eliminate concerns about ambiguity or performance.

I solicited and received input from other Python library authors when working on this specification. This is the first time that a library author has expressed concerns about explicitly typing instance variables. @Bibo-Joshi, I'm not saying that your viewpoint is unfounded or incorrect, just that it might not be reflective of other library authors.

I think it's reasonable to have a discussion about carving out additional cases where types could be inferred (implicit rather than explicit), but we'd need to clarify those specific cases. It's not sufficient, in my opinion, to say that any instance variable assigned a value within an __init__ method can be inferred. For example, what if a type checker encountered the statement self.colors = ("red", "blue"). Pyright and mypy would infer different types for that instance variable if they both used their default type inference rules.

Your proposed wording "...if they are assigned a value within the __init__ or __new__ method whose type is either known or can be inferred" is too ambiguous. There is a good reason for the specificity I provided in my original document. This updated wording would leave all type checker authors scratching our heads trying to interpret what it means, and we would all likely walk away with a different interpretation.

With respect to removing references to "pyright" in the document, I think that's a reasonable change. I would suggest that there's still value in including the section about the "--verifytypes" option in pyright as help to library authors who would like a way to verify that their library is conforming with the rules in the specification. It could be moved to a footnote or a link, and it could be updated if/when other type checkers provide a similar option.

@Bibo-Joshi
Copy link
Contributor Author

Thanks for your comments @erictraut!

For these use cases, it's important for library type information to be accurate and complete — and for type evaluations to be very fast.

I'm having a hard time to see this as argument for stricter type hinting guidelines for authors of libraries. Ofc I do understand that those make it easier for tools like pylance to offer editing features and also that authors have an interest in making their libs easy to use in common editors. However, it feels wrong to entirely shift this responsibility to the authors of libraries. After all, fast auto completion in an editor is not a feature of the actual libraries … I'm lacking proper words here, so let me make a very crude analogy: I you want to buy a bike and know that you will be replacing your brakes with newer models every now and then, then you will choose a bike whose screw mounts are compatible with the most brakes instead of asking the brakes manufacturers to built their brakes such they are compatible with the specific screw mounts of your bike.

Third, an oft-cited principle in Python is "Explicit is better than implicit".

The Zen also says "Beautiful is better than ugly." and

def __init__(self, arg: str):
    self.arg: str = arg

is indeed ugly IMHO. 🙃

I would think that a library author would especially want this principle to apply to the public interface contract exposed by their library. After all, this is the contract they must be maintained over time to retain compatibility with consumers of that library.

That depends on the library, IMO. Many libraries don't have an explicit & detailed policy on what is considered public API and what is not. Personally, I usually don't consider changes/improvements to type annotations to be breaking changes, since they usually don't have any effect at runtime.

For example, what if a type checker encountered the statement self.colors = ("red", "blue"). Pyright and mypy would infer different types for that instance variable if they both used their default type inference rules.

Another surprise for me :D

First, type inference rules are not part of the Python typing standards, and they vary from one type checker to another.

Your proposed wording [...] is too ambiguous.

Okay, I can understand these arguments. Fair enough.

I think it's reasonable to have a discussion about carving out additional cases where types could be inferred (implicit rather than explicit), but we'd need to clarify those specific cases.

Since the starting point for this PR was the above example about annotated arguments getting assigned to attributes, I would be fine with mostly concentrating on this case. Would dropping the or can be inferred suffice for you to make the formulation unambiguous?

-  Class and instance variables may alternatively be unannotated, if they are assigned a value with a known type within the ``__init__`` or ``__new__`` method

The examples could be updated to e.g.

# Class with known type
class MyClass:
		height: float = 2.0
	   
		def __init__(self, name: str, age: int, first_name: str = None):
			# Value is known to be of type Optional[str]
			self.first_name = first_name
			self.age = parser_with_return_type_annotation(age)
			
# Class with partially unknown type
class MyClass:
	# Missing type annotation for class variable
	height = 2.0

	# Missing input parameter annotations
	def __init__(self, name: str, age: int, first_name: str = None):
	    # Type of the value is not guaranteed to be correctly inferred
		self.first_name = 'Unknown' if first_name is None else first_name
		# Missing type annotation for instance variable
		self.age = parser_without_return_type_annotation(age)

In addition it might be good to reformulate the introduction of the Type Completeness section to avoid the impression that the recommendations are an acutal standard (e.g. in contrast to, say, PEP8):

Type Completeness
=================

A “py.typed” library should aim to be “type complete” so that type
checking/inspection can work to their full extend. Here we say that a
library is “type complete” if all of the symbols
that comprise its interface have type annotations that refer to types
that are fully known. Private symbols are exempt.

The following are best practice recommendations for how to define “type complete”:

I would suggest that there's still value in including the section about the "--verifytypes" option […]

Maybe something along these lines?

Verifying Type Completeness
===========================

Some type checkers provide a features that allows library authors to verify type
completeness for a “py.typed” package. E.g. Pyright has a special
`command line flag <https://git.io/JPueJ>`_ for this.

@erictraut
Copy link
Collaborator

The Zen also says "Beautiful is better than ugly."

"Beautiful" is subjective. I personally find self.arg: str = arg to be beautiful because it's self-documenting code. "Explicit" is easy to measure objectively.

Personally, I usually don't consider changes/improvements to type annotations to be breaking changes, since they usually don't have any effect at runtime.

I agree that a change to a type annotation doesn't constitute a breaking change. I'm talking about an actual breaking change, such as changing the way a particular input parameter or instance variable is handled by the code. For example, let's say that your library provides a function that expects to receive a Sequence. One of the code paths within the function uses a len() call to determine the length of the sequence, but in another code path the length isn't needed. A consumer of your library passes an Iterable and gets lucky because they don't hit the code path that uses len(). You now update the code in your library and use len() in all code paths. The updated library now crashes at runtime. You have now created a headache for yourself and the consumer of your library. Being explicit about your public interface allows you to avoid problems like this because the interface is self-documented, and static analysis tools help enforce these contracts. Documenting your interface contract seems like a pretty basic and core responsibility of any library author, so I'm surprised to hear pushback from you on this point.

Would dropping the "or can be inferred" suffice for you to make the formulation unambiguous?

No, dropping these words makes it even more ambiguous. There are two sources of type information: type declarations (in the form of explicit annotations) or type inference. You are effectively still saying (now through implication) that type inference must be used in this case. The phrase "with a known type" is the problem. Know by whom? And by what method? Under what circumstances? What expression forms are allowed within the assignment? What about assignments that are within a conditional statement? What if there are two assignments of the same symbol? Etc. Etc. Etc. This is what I mean by ambiguous.

I'll also point out that the types of instance variables cannot be declared or inferred within __new__ methods, only __init__ methods. And class variable types are declared or inferred from statements within the class body.

@Bibo-Joshi
Copy link
Contributor Author

And class variable types are declared or inferred from statements within the class body.

Stupid mistake of mine, true. Class variables are not part of my argument.

Documenting your interface contract seems like a pretty basic and core responsibility of any library author, so I'm surprised to hear pushback from you on this point.

Here my example for not considering annotation changes as breaking would have been changing the annotation from Iterable to Sequence if the argument was falsely annotated as Iterable before for some reason. While this wouldn't cause any new runtime exceptions for users, it would give new errors on type checking, which I would find acceptable even in minor updates.


Be that as it may, I realize that I don't have detailed enough knowledge of type annotations to represent my point of view in a meaningful way. Even though I can't seem to provide a satisfactory formulation for it, I still think I've made my point clear to at least some extend.

I voiced that I see a need for adjustment in the guidelines, and in doing so I did what I believe is the right thing to do in the OSS community. I hope that I have not caused too much annoyance in my naivety.

I'll leave this thread open so that the maintainers may judge if they want to pick up on this discussion or close it. At least the part about making the guidelines more neutral is still valid and if needed I'm happy to open a new PR for that change alone.

Thanks you for the insightful discussion @erictraut !

@shannonzhu
Copy link
Contributor

I'm a bit late to this discussion, but thank you for opening the PR and starting a discussion! I agree that the aim of these docs (which are just getting started and will be iterated on) is to provide a type-checker agnostic set of documentation that can provide more detail and help Python developers familiarize with common typing practices and features of the type system.

I'd therefore be supportive of pushing the changes suggested, if we can iron out the details of the explicit annotation expectation in the constructor.

fwiw, Pyre will permit omitting an annotation if the attribute name and the argument name are identical, and the argument was given an explicit annotation (so no inference capabilities implied). ie.,

class Foo:
  def __init__(self, x: int, y: str) -> None:
      self.x = x  # treated as though this were explicitly annotated
      self.renamed_y = y  # annotation is expected here

This is generally safe because for explicitly annotated parameters, if something inside the function tries to change the type, you'll get an error.

That said, I'm not sure if this is something we want to codify in the docs for now. I'd be comfortable just asserting that all attributes should be explicitly annotated. That said, will set aside some time later on to go through the existing doc contents in full, so I may be missing details in this high-level response.

@technic
Copy link

technic commented Nov 3, 2021

Pyre will permit omitting an annotation if the attribute name and the argument name are identical, and the argument was given an explicit annotation

This seems like a good assumption. I am not writing libraries much, but in the application code this way of type annotation is commonly used and all tools were happy so far. In this case type checkers do full type inference, but for the libraries I think you want to cut the corners.

I agree with @Bibo-Joshi, that having annotations in two places feels redundant, and requires extra work during refactoring. And I guess we all like python for its short development cycle. However, there is an issue with the implicit approach, that the following code shall not type check

class Bar:
    def __init__(self, x: int):
        self.x = x

    def change(self):
        self.x = None

So, if we go with implicit approach (rather than explicit self.x: int = x) the type checker need to have two modes for analysis. In the "application" mode it shall accept the code above, but in the "library" mode this code should be rejected with the error message saying: explicit annotation Optional[int] is required for self.x.

@erictraut
Copy link
Collaborator

Pyre will permit omitting an annotation if the attribute name and the argument name are identical, and the argument was given an explicit annotation

@shannonzhu, does this apply only to assignments in __init__? What happens in cases where there are multiple assignments to the same instance variable? And what happens if the assignment is within a conditional block?

I agree with @Bibo-Joshi, that having annotations in two places feels redundant, and requires extra work during refactoring

In my experience having extra annotations (even if they appear redundant) makes refactoring much easier because the type checker is able to tell you about any previous assumptions that you've broken in your refactoring.

So, if we go with implicit approach the type checker need to have two modes for analysis

I don't think it implies the need for multiple analysis modes. It simply means that an explicit type annotation would be needed in situations like the one you've highlighted in your example.

class Bar:
    def __init__(self, x: int):
        self.x: int | None = x

@shannonzhu
Copy link
Contributor

shannonzhu commented Nov 5, 2021

@erictraut Right, a few clarifications on what Pyre treats as equivalent to an explicit annotation:

  • We only allow this same-name-as-explicitly-annotated-param shortcut in constructors
  • If the assignment is nested (ie., in a conditional) then we don't make any assumptions and treat it as not explicitly annotated
  • If there are multiple assignments we error if any subsequent assignments to that attribute don't conform to the same type, same behavior as if the first assignment were given an explicit annotation

Perhaps relatedly, we do the same for basic builtin constants that are always inferrable, when they're assigned to globals or attributes:

MY_GLOBAL = 1

will be equivalent to Pyre as if you had written

MY_GLOBAL: int = 1

If you want to change the type (to be able to assign None to this global later on without error), then you have to be explicit:

MY_GLOBAL: Optional[int] = 1

We made both of these convenience optimizations due to widespread user complaints of redundancy, but I don't feel strongly about codifying either of these practices as standard for all type checkers. Would still be in favor of cleaning up this part of the PR and landing the other unrelated adjustments if possible.

@erictraut
Copy link
Collaborator

@shannonzhu, thanks for the additional details.

You said that MY_GLOBAL = 1 would be equivalent to MY_GLOBAL: int = 1 in Pyre. In Pyright, the type of MY_GLOBAL = 1 is inferred as Literal[1] because MY_GLOBAL is assumed to be a constant because it is all-caps.

And you said that if there are multiple assignments to an instance variable, Pyre treats the first assignment as the declared type. Pyright doesn't do this in the case of multiple assignments; it uses all assignments to infer the type.

This underscores my point about the dangers of relying on inference rules when it comes to public interface contracts.

I am now even more convinced that we should require explicit annotations for all symbols that comprise the public interface contract for a "py.typed" library. If a symbol is not annotated, its type should be assumed as "Any" (or "Unknown").

This would be a good topic for us to discuss in one of the upcoming typing meet-ups.

@srittau srittau added the topic: documentation Documentation-related issues and PRs label Nov 6, 2021
@shannonzhu
Copy link
Contributor

I agree we should be explicit in the typing docs about what an explicit type annotation entails (I think there was a typing-sig discussion in the recent past about expected type checker behavior on explicit local annotations, re-assignments to them, explicit Any annotations, etc... I'm having some trouble finding the thread, but can try to dig it up later if it would be helpful).

I am now even more convinced that we should require explicit annotations for all symbols that comprise the public interface

Based on years of user experience feedback and pain points, I don't think Pyre would want to give up the shortcuts I mentioned earlier around explicit annotations for simple globals and attributes. However, I wouldn't view this as an additional type system specification that we need to codify across all type checkers, but rather that Pyre just supports an additional way to "write" an explicit annotation, the behavior of which I agree should be well-defined in the shared docs.

Note: To back to the example we've been using, an explicit global like: MY_GLOBAL: int = 1 would still be inferred as a having type Literal[1] with a "maximum" type of int, so we wouldn't error if something tried to assign 2 to this global unlike if the explicit annotation were MY_GLOBAL: Literal[int] = 1. The only thing Pyre does differently is to allow MY_GLOBAL = 1 as an alternative way to write MY_GLOBAL: int = 1, which I don't feel strongly that other type checkers also have to do (but is something we've found is a huge pain point for some use cases when we don't).

More fundamentally, I do agree that we want all globally accessible values to be explicitly annotated by users in order to get consistent guarantees from the type checker, and the behavior of the type system given a set of types should be clearly defined. I also don't think it's very feasible or desirable to specify in detail how type checkers arrive at inferred, intermediate types when those annotations are unknown -- but I think we're in agreement on this point.

This would be a good topic for us to discuss in one of the upcoming typing meet-ups.

Sounds good!

@Bibo-Joshi would you be open to removing the changes in this PR related to specifying an inference shortcut, which I don't think we're looking to codify? Some of the other changes look good to me.

@Bibo-Joshi
Copy link
Contributor Author

@Bibo-Joshi would you be open to removing the changes in this PR related to specifying an inference shortcut, which I don't think we're looking to codify? Some of the other changes look good to me.

I opened a new PR for that at #934 .
I'm not sure exactly sure, how the authorities work here, so I'd like to leave it to the maintainers of this repository to close this thread if they see fit :)


Apart from that, let me try one last time to respond to some arguments. I'm aware that my responses are unlikely to be technically precise enough and just hope to add some last clarity to my argument.

  • Type inference for constants, example MY_GLOBAL = 1. IMO this is very different from the situation I'm addressing. To translate this into my setting, one could maybe consider something like

    class MyClass:
        def __init__(self, arg: int):
            self.__class__.MY_GLOBAL = arg

    The important point is that the parameter arg has an explicit annotation.

  • You are effectively still saying (now through implication) that type inference must be used in this case. The phrase "with a known type" is the problem. Know by whom? And by what method? Under what circumstances? What expression forms are allowed within the assignment? What about assignments that are within a conditional statement? What if there are two assignments of the same symbol? Etc. Etc. Etc. This is what I mean by ambiguous.

    Regarding "The phrase "with a known type" is the problem.": I was trying to use the terms already established in the document. I.e. the document currently states

    A “known type” is defined as follows: ...

    • Variables:
      • All variables have type annotations that refer to known types

    and I had hoped that using the established terminology would clarify things. Apparently, it did not …
    What I was trying to express was:

    class MyClass:
        def __init__(self, foo: int, bar):
            self.a = foo  # okay, because `foo` has an explicit annotation -> "known type"
            self.b = foo or 7  # not okay because `foo or 7` has no explicit annotation -> not a "known type"
            self.c = bar  # not okay because `bar` has no explicit annotation -> not a "known type"
            self.d = function_with_return_type_annotation()  # okay b/c function as explicit return type annotation -> "known type"
            self.c = function_wo_return_type_annotation()  # okay b/c function as no return type annotation -> not a "known type"

    I'd be okay with excluding attributes that are assigned within nested structures like conditionals or loops

  • And you said that if there are multiple assignments to an instance variable, Pyre treats the first assignment as the declared type. Pyright doesn't do this in the case of multiple assignments; it uses all assignments to infer the type.

    Here I somewhat agree with @technic about different "modes" of checking. Sticking with the (extended) example

    class Bar:
        def __init__(self, x: int):
            self.x = x
            ...
            self.x = 'foo'
    
        def change(self):
            self.x = None

    I'd argue that

    • A checker checking the type integrity of Bar should complain about None/str being assigned to an attribute type int
    • A checker checking the integrity of code that uses Bar should not check the integrity of Bar itself and instead tread Bar.x as type int because that's the type that x has on declaration.

    This behavior should cover both the case where you're working on the lib containing Bar and where you're using that lib in a different project.

@shannonzhu
Copy link
Contributor

Great! Thanks for splitting those changes out. :)

Re: known types - I'm not sure about the details of the "known type" terminology as used in the document, but in the examples you provided, type inference is still necessary to propagate an annotated parameter to a place it's referenced later in the body of the function.

These were the two "known types" you pointed out:

class MyClass:
    def __init__(self, foo: int, bar):
        self.a = foo  # okay, because `foo` has an explicit annotation -> "known type"
        self.d = function_with_return_type_annotation()  # okay b/c function as explicit return type annotation -> "known type"

For both, the type checker still has to resolve the RHS. Were there interleaving re-assignments to 'foo'? Asserts or early returns that condition on its type? A global foo line that makes foo actually refer to something different? For the function call, does the return type depend on the type of the parameters passed in? This doesn't make it qualitatively different from resolving foo or 7 as a RHS value to infer the type of the attribute.

Re: different type checking modes, I strongly disagree and don't think it's viable to change type inference based on whether a module is a project file or a library file. Type errors across the board should be consistent. If we're showing type errors to the library author that their attribute is being inferred as type 'int', so their string assignment is invalid and they need to change it, then we should also be enforcing that users of that library module are treating that attribute as the same type.

@dvarrazzo
Copy link

Hello,

this doesn't seem a case of missing type annotation. It is pretty clear to humans and computers that height is a float.

# Class with partially unknown type
class MyClass:
	# Missing type annotation for class variable
	height = 2.0

Mypy has no problem inferring that. Can we please revert that before too much bureaucracy is introduced?

Thank you very much.

@dvarrazzo
Copy link

FYI I have opened a conversation on #1058

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA signed topic: documentation Documentation-related issues and PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants