Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add typings #577

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

bryanforbes
Copy link
Collaborator

@bryanforbes bryanforbes commented May 18, 2020

This is a work in progress and still produces errors when type checking the codebase. I added type hints to the Python files where it was feasible and updated the code of the methods to be type safe. In two places (pool.py and exceptions/_base.py), adding type hints to the code itself would not work because of the dynamic nature of the code in those modules. There may also be places where I'm making wrong assumptions about the code (especially in the cython code), so a thorough review would be very welcome. Lastly, I did not add typings for asyncpg._testbase since that seems to be an internal testing module.

Some of the remaining problems that mypy finds may be bugs in the code or they may be that mypy is being overly strict:

  • cursor.py: In BaseCursor.__repr__, self._state could technically be None, and cause an exception when self._state.query is used
  • connection.py:
    • In Python 3.7+, asyncio.current_task() can return None, so compat.current_task() has to be typed as returning Optional[Task[Any]]. This means Connection._cancel() may throw an exception when self._cancellations.discard(compat.current_task(self._loop)) is called
    • The code in _extract_stack() has a couple of issues:
      • Passing an iterator to StackSummary.extract() but it expects a generator
      • __path__ does not exist on the asyncpg module
  • connect_utils.py has several errors that relate to the code in _parse_connect_dsn_and_args() assigning new types to variables originally declared as another type. I can try to clean this up to make it more type-safe, or just add # type: ignore.
  • cluster.py:
    • There are a couple of places where a bytes is being passed to a formatting string which mypy recommends using !r with those
    • In Cluster.connect(), self.get_connection_spec() can return None, which would cause conn_info.update() to throw an error. Is this desired?
    • A similar issue can be found in Cluster._test_connection() with self._connection_addr

References #569, #387

@bryanforbes
Copy link
Collaborator Author

bryanforbes commented May 19, 2020

One thing to note is that the handling of Record isn't quite what I'd like it to be yet. Currently, a user could do something like this:

class MyRecord(asyncpg.Record):
    id: int
    name: str
    created_at: datetime.datetime

r: MyRecord = ...
reveal_type(r.id) # int

The user will only use this class for type checking and can use attribute access with type checking, but not lookup notation. A mypy plugin can probably be written to handle the lookup notation (TypedDict has something similar), so this isn't a big issue. The bigger issue is the following will be an error:

records: typing.List[MyRecord] = await conn.fetch('SELECT ... from ...')

It's an error because fetch() is typed to return List[Record], which doesn't match List[MyRecord] (List is invariant, so it has to match exactly). There are two options:

  1. Return a bound generic from every function that returns a Record. This mostly works, but it requires the user to type their variables, even if they're not using a custom class:
record: MyRecord = await conn.fetchrow('SELECT ... FROM ...')
records: typing.List[MyRecord] = await conn.fetch('SELECT ... FROM ...')
cursor: asyncpg.cursor.CursorFactory[MyRecord] = await conn.cursor('SELECT ... FROM ...')
record2: asyncpg.Record = await conn.fetchrow('SELECT ... FROM ...')
records2: typing.List[asyncpg.Record] = await conn.fetch('SELECT ... FROM ...')
cursor2: asyncpg.cursor.CursorFactory[asyncpg.Record] = await conn.cursor('SELECT ... FROM ...')
  1. Add a record_class parameter to methods to any function that returns a Record. This, in my opinion, works the best but adds an unused (code-wise) parameter:
# the variables below have the same types as the last code block,
# but the types are inferred instead
record = await conn.fetch('SELECT ... FROM ...', record_class=MyRecord)
records = await conn.fetch('SELECT ... FROM ...', record_class=MyRecord)
cursor = await conn.cursor('SELECT ... FROM ...', record_class=MyRecord)
record2 = await conn.fetchrow('SELECT ... FROM ...')
records2 = await conn.fetch('SELECT ... FROM ...')
cursor2 = await conn.cursor('SELECT ... FROM ...')

Note that while mypy will see record as a MyRecord, it will be an instance of Record at runtime because record_class is only used to infer the type of the return.

I have the second option coded locally, but I wanted to check if adding that extra record_class parameter for type-checking purposes is OK.

@elprans
Copy link
Member

elprans commented May 19, 2020

There's a third option: lie about the return type and say it's a Sequence[Record] instead. I don't think the results of fetch() are mutated that often (why would you?) so this shouldn't be much of an issue.

@bryanforbes
Copy link
Collaborator Author

Typing it as Sequence[Record] kind of works, but you'd still need a cast to go from Sequence[Record] to Sequence[MyRecord]. Combining Sequence with option 2 is better than List (since Sequence is covariant), but it still requires adding type information for all of your variables storing the results from functions returning Record.

@elprans
Copy link
Member

elprans commented May 19, 2020

but you'd still need a cast

Right. Well, my concern with the record_class parameter is that it could be very confusing to unknowing users. If we named it record_class, then one might rightfully assume that the returned records will be instances of that class, when, in fact, they wouldn't be.

You'd mentioned that a plugin is necessary to handle subscript access, so perhaps said plugin can also deal with the fetch calls (via a function hook). The plugin would only need to check that the lval type is a covariant sequence of Record and adjust the return type of fetch() to it to make mypy happy.

@bryanforbes
Copy link
Collaborator Author

Right. Well, my concern with the record_class parameter is that it could be very confusing to unknowing users.

I understand the concern and share your concern. Would record_type be less confusing?

The plugin would only need to check that the lval type is a covariant sequence of Record and adjust the return type of fetch() to it to make mypy happy.

I'll have to do some digging to see if this is possible. If it is, I agree that it would probably be the best solution.

@bryanforbes bryanforbes marked this pull request as draft May 26, 2020 14:42
@bryanforbes
Copy link
Collaborator Author

bryanforbes commented Jun 4, 2020

I've done quite a bit more work on the mypy plugin and have the following working:

import asyncpg
import datetime
import typing
import typing_extensions

class MyRecord(asyncpg.Record):
    foo: int
    bar: typing.Optional[str]

class MyOtherRecord(MyRecord):
    baz: datetime.datetime

async def main() -> None:
    conn = await asyncpg.connect(...)
    m = typing.cast(MyRecord, await conn.fetchrow('SELECT foo, bar FROM records'))
    o = typing.cast(MyOtherRecord, await conn.fetchrow('SELECT foo, bar, baz FROM other_records'))

    key = 'baz'
    fkey: typing_extensions.Final = 'baz'

    reveal_type(m['foo'])  # int
    reveal_type(m['bar'])  # Optional[str]
    reveal_type(m['baz'])  # error: "MyRecord" has no key 'baz'
    reveal_type(m[0])  # int
    reveal_type(m[1])  # Optional[str]
    reveal_type(m[2])  # error: "MyRecord" has no index 2
    reveal_type(m.get('foo'))  # int
    reveal_type(m.get('bar'))  # Optional[str]
    reveal_type(m.get('baz'))  # error: "MyRecord" has no key 'baz'
    reveal_type(m.get('baz', 1))  # Literal[1]
    reveal_type(m.get(key, 1))  # Union[Any, int]
    reveal_type(m.get(fkey, 1))  # Literal[1]

    reveal_type(o['foo'])  # int
    reveal_type(o['bar'])  # Optional[str]
    reveal_type(o['baz'])  # datetime
    reveal_type(o[0])  # int
    reveal_type(o[1])  # Optional[str]
    reveal_type(o[2])  # datetime
    reveal_type(o.get('foo'))  # int
    reveal_type(o.get('bar'))  # Optional[str]
    reveal_type(o.get('baz'))  # datetime
    reveal_type(o.get('baz', 1))  # datetime
    reveal_type(o.get(key, 1))  # Union[Any, int]
    reveal_type(o.get(fkey, 1))  # datetime

Based on the implementation of asyncpg.Record, I believe I have the typing for __getitem__() and get() correct. I tried to get the typings for Record to be as similar to TypedDict as possible (given the implementation differences). You'll notice that when the key can be determined by the type system, get() with a default argument is deterministic (ex. o.get('baz', 1) and o.get(fkey, 1)), otherwise it returns a Union. One thing I'd like to possibly try is to come up with a metaclass that would act like a typing_extensions.Protocol with runtime checking so instanceof() could be used to determine if a Record matched. At this time, I haven't attempted it.

I also did a lot of poking around to try and get the plugin to set the return type based on the variable it is being assigned to, but I don't see a way that it's possible. This means we're left with two options:

  1. Force users to cast to subclasses of asyncpg.Record (as seen in the examples above). This means types like PreparedStatement would need to be exposed from asyncpg so users could easily cast the result of Connection.prepare() to asyncpg.PreparedStatement[MyRecord]:
stmt = typing.cast(asyncpg.prepared_stmt.PreparedStatement[MyRecord], await conn.prepare('SELECT ... FROM ...'))
reveal_type(stmt)  # asyncpg.prepared_stmt.PreparedStatement[MyRecord]
  1. Add an unused parameter to indicate to the type system which type should be inferred (and if it's left off, the return type would be asyncpg.Record). I suggested record_class above, but I think return_type would be less prone to users thinking the result would be a different class. This approach would mean that the result of calls to functions like prepare() wouldn't need to be cast to a subscript and the type system would infer the subscript:
stmt = await conn.prepare('SELECT ... FROM ...', return_type=MyRecord)
reveal_type(stmt)  # asyncpg.prepared_stmt.PreparedStatement[MyRecord]

There's also a possible third option if the Protocol-like class actually works: require users to use isinstance() to narrow the type:

stmt = await conn.prepare('SELECT ... FROM ...')
reveal_type(stmt)  # asyncpg.prepared_stmt.PreparedStatement[asyncpg.protocol.protocol.Record]

record = await stmt.fetchrow(...)
reveal_type(record)  # Optional[asyncpg.protocol.protocol.Record]
assert isinstance(record, MyRecord)
reveal_type(record)  # MyRecord

The third option completely depends on whether the Protocol-like class is feasible, and would also do runtime checking (which would check if Record.keys() has all of the keys of the subclass). I would imagine the runtime checking would be a performance hit.

@elprans
Copy link
Member

elprans commented Jun 5, 2020

Awesome work on the plugin, @bryanforbes! Thanks!

Add an unused parameter to indicate to the type system which type should be inferred

I'm still a bit uneasy with adding unused parameters just for the typing purpose. That said, if we had record_class= actually make fetch() and friends return instances of that class, that would be a great solution. There's actually #40, which quite a few people requested before.

@bryanforbes
Copy link
Collaborator Author

@elprans Thanks! There are still some places in the code where # type: ignore is being used. I've updated them to use the specific code(s) that can be used to ignore them (so if new type errors arise, mypy doesn't ignore those as well). Some of them can be ignored. For example, anywhere TypedDict is being used to ensure a dictionary has the right shape, mypy will complain when using ** on it. Others indicate a possible issue with the code that I wasn't sure how to fix. I'll list those here (using the latest commit I've just pushed):

  • connection.py line 1264: All of the methods use _protocol without checking if it's None, so I typed it as BaseProtocol. However, when the connection is aborted None is assigned to _protocol so _protocol should technically be typed as Optional[BaseProtocol] and be checked everywhere it's used (with a descriptive error) so the methods of the Connection don't throw a strange error about an internal member if they're used after aborting the connection.
  • connection.py line 1357: compat.current_asyncio_task() can return None (because the standard library can), _cancellations is typed as Set[asyncio.Task[Any]], so discard() rejects passing None to it.
  • cluster.py lines 129, 547, 604: mypy gives the following error: On Python 3 '{}'.format(b'abc') produces "b'abc'"; use !r if this is a desired behavior. This probably should be updated to use !r, but I wasn't sure.
  • cluster.py line 136: get_connection_spec() can return None which would throw an error about an internal variable being None. This should probably be checked to see if it's None.
  • cursor.py line 169: _state can be None, which would cause an exception to be raised here.

With regards to the unused parameter, I completely understand your concern. I also think that making fetch() and friends return the instances of that class would be a good solution. I can look into that, but my experience with cpython is fairly limited (I've used cython a long long time ago, so it's much more familiar). Let me know how you'd like to proceed.

elprans added a commit that referenced this pull request Jul 19, 2020
Add the new `record_class` parameter to the `create_pool()` and
`connect()` functions, as well as to the `cursor()`, `prepare()`,
`fetch()` and `fetchrow()` connection methods.

This not only allows adding custom functionality to the returned
objects, but also assists with typing (see #577 for discussion).

Fixes: #40.
elprans added a commit that referenced this pull request Jul 19, 2020
Add the new `record_class` parameter to the `create_pool()` and
`connect()` functions, as well as to the `cursor()`, `prepare()`,
`fetch()` and `fetchrow()` connection methods.

This not only allows adding custom functionality to the returned
objects, but also assists with typing (see #577 for discussion).

Fixes: #40.
elprans added a commit that referenced this pull request Jul 19, 2020
Add the new `record_class` parameter to the `create_pool()` and
`connect()` functions, as well as to the `cursor()`, `prepare()`,
`fetch()` and `fetchrow()` connection methods.

This not only allows adding custom functionality to the returned
objects, but also assists with typing (see #577 for discussion).

Fixes: #40.
Copy link
Member

@elprans elprans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regards to the unused parameter, I completely understand your concern. I also think that making fetch() and friends return the instances of that class would be a good solution.

I took this upon myself to implement in #559.

I did a cursory review of the PR and left a few comments. Most importantly, I think we should bite the bullet and use the PEP 526 syntax for type annotations instead of comments. Python 3.5 is rapidly going out of fashion, and I'd love to drop a bunch of hacks we already have to support it.

Thanks again for working on this!

low, high = port_range

port = low
port = low # type: typing.Optional[int]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.5 would be EOL'd in September and I think we should just drop support for it and use proper annotation syntax everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to switch to 3.6 type annotations and removed 3.5 from CI

asyncpg/cluster.py Outdated Show resolved Hide resolved
asyncpg/cluster.py Outdated Show resolved Hide resolved
asyncpg/cluster.py Outdated Show resolved Hide resolved
asyncpg/cluster.py Outdated Show resolved Hide resolved
asyncpg/connect_utils.py Outdated Show resolved Hide resolved
asyncpg/connect_utils.py Outdated Show resolved Hide resolved
# Put the connection into the aborted state.
self._aborted = True
self._protocol.abort()
self._protocol = None
self._protocol = None # type: ignore[assignment]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a cleaner solution would be to assign a sentinel instance of a Protocol-like object, e.g. DeadProtocol that raises an error on any attribute access.

elprans added a commit that referenced this pull request Aug 8, 2020
Add the new `record_class` parameter to the `create_pool()` and
`connect()` functions, as well as to the `cursor()`, `prepare()`,
`fetch()` and `fetchrow()` connection methods.

This not only allows adding custom functionality to the returned
objects, but also assists with typing (see #577 for discussion).

Fixes: #40.
elprans added a commit that referenced this pull request Aug 8, 2020
Add the new `record_class` parameter to the `create_pool()` and
`connect()` functions, as well as to the `cursor()`, `prepare()`,
`fetch()` and `fetchrow()` connection methods.

This not only allows adding custom functionality to the returned
objects, but also assists with typing (see #577 for discussion).

Fixes: #40.
elprans added a commit that referenced this pull request Aug 14, 2020
Add the new `record_class` parameter to the `create_pool()` and
`connect()` functions, as well as to the `cursor()`, `prepare()`,
`fetch()` and `fetchrow()` connection methods.

This not only allows adding custom functionality to the returned
objects, but also assists with typing (see #577 for discussion).

Fixes: #40.
elprans added a commit that referenced this pull request Aug 15, 2020
Add the new `record_class` parameter to the `create_pool()` and
`connect()` functions, as well as to the `cursor()`, `prepare()`,
`fetch()` and `fetchrow()` connection methods.

This not only allows adding custom functionality to the returned
objects, but also assists with typing (see #577 for discussion).

Fixes: #40.
@victoraugustolls
Copy link

@bryanforbes PR #599 was merged! I don't know if it was a blocker for this PR or not (by reading the discussion here I think it was). Thanks for this PR, really, looking forward to it! 😄

@bryanforbes
Copy link
Collaborator Author

@victoraugustolls I'll rebase and update this PR today

@bryanforbes
Copy link
Collaborator Author

@victoraugustolls I finished the rebase, but there's an issue with Python 3.6 and ConnectionMeta + Generic related to python/typing#449. I'll poke around with it this weekend and try to get it working.

@victoraugustolls
Copy link

Thanks for the update @bryanforbes ! I will try and search for something that can help

@bryanforbes
Copy link
Collaborator Author

@victoraugustolls I was able to come up with a solution for the metaclass issue in Python 3.6 (that won't affect performance on 3.7+), but I'd like to work on some tests before taking this PR out of draft. Feel free to review the code, though.

@bryanforbes
Copy link
Collaborator Author

@victoraugustolls I started writing unit tests last night, however they use py.test and pytest-mypy-plugins to allow testing the output for both errors and revealed types. However, although py.test is a dev extra, I noticed you aren't using it to run the test suite. Will it be a problem to use py.test to only run the mypy unit tests?

@elprans
Copy link
Member

elprans commented Oct 16, 2020

@bryanforbes I don't think we need more than running mypy like we do flake8 in test__sourcecode.py for the main regression testsuite. Here's an example

@victoraugustolls
Copy link

Hi @bryanforbes ! I'm not a maintainer, but thanks for asking! I agree with @elprans and running mypy should be enough. If mypy is happy it should be fine!

@bryanforbes
Copy link
Collaborator Author

I spent some time today rebasing and cleaning up this PR. I'd appreciate another review if someone can spare the cycles.

Copy link

@HarrySky HarrySky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments/questions, didn't get through all the changes, yet (that's a lot of work you have done, thank you!)

asyncpg/compat.py Outdated Show resolved Hide resolved
Comment on lines 100 to 116
def _read_password_file(
passfile: pathlib.Path
) -> typing.List[typing.Tuple[str, ...]]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this style not used everywhere? Is line breaks with \ better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a mix of styles in the codebase. I use the black style (seen here) in my own projects, and I saw it in some of the newer code, so I went with it. If that needs to be changed, I can format the code a different way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer Black style, but it is up to asyncpg maintainers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, but I'll wait for the maintainers 😄

asyncpg/connect_utils.py Show resolved Hide resolved
asyncpg/connect_utils.py Outdated Show resolved Hide resolved
asyncpg/connection.py Outdated Show resolved Hide resolved
asyncpg/introspection.py Outdated Show resolved Hide resolved
@bryanforbes
Copy link
Collaborator Author

Left a few comments/questions, didn't get through all the changes, yet (that's a lot of work you have done, thank you!)

Thank you! I think I responded or 👍 all your comments. I'll address what I can.

@bitterteriyaki
Copy link

bitterteriyaki commented May 5, 2022

Bumping this PR. Is there something pending?

@bryanforbes
Copy link
Collaborator Author

@elprans could you approve the workflow so the regression tests can run?

@Andarius
Copy link

Andarius commented Dec 2, 2022

Hey guys, thanks for the amazing work.
Do you need help on this PR to get this merged? @elprans @bryanforbes

@bryanforbes
Copy link
Collaborator Author

I need to rebase on master. I have some free time coming up in the next few weeks, so I should be able to take care of it. I'll let you know if I need some help, @Andarius.

@ethanleifer
Copy link

Is there anything blocking this pr? I would be willing to help out if needed

@takeda
Copy link

takeda commented May 23, 2023

Yeah I think it should be merged. Maybe it is still not perfect, but it likely will work for majority of people and any bugs will come out when people start using.

It's been 3 years this PR has been open and rebasing it constantly just adds unnecessary overhead.

@ethanleifer
Copy link

Is there someone we have to ping to get this merged? I would be willing to help out wherever is needed

@takeda
Copy link

takeda commented May 23, 2023

Is there someone we have to ping to get this merged? I would be willing to help out wherever is needed

I saw @elprans initially commenting here with suggestions, and @1st1 in another issue commenting that type annotations would be welcomed. Hopefully that could bring attention to this PR.

@Voldemat
Copy link

Hi! When you plan to merge this proposal to main?

@r614
Copy link

r614 commented Feb 6, 2024

bumping this thread since its been a while, any ETA on the merge @elprans @1st1 - seems like this PR is basically done?

@eirnym
Copy link

eirnym commented Feb 12, 2024

Hi @bryanforbes, could you please resolve conflicts. I have no permissions to do that

@elprans I see that this work could be continued even after this PR has been merged.

@bryanforbes
Copy link
Collaborator Author

I'll try to get some time to resolve conflicts this week

Add typings to the project and check the project using mypy.
`PostgresMessage` and `PoolConnectionProxy` were broken out into their
own files to make it easier to add typing via stub (pyi) files. Since
they are metaclasses which generate dynamic objects, we can't type them
directly in their python module.
@bryanforbes
Copy link
Collaborator Author

@elprans I've updated the typings to match what I have in asyncpg-stubs. Please give this another review when you have time.

@bryanforbes
Copy link
Collaborator Author

@elprans would it help you review this to break this into individual PRs? There's a lot here.

@elprans
Copy link
Member

elprans commented Mar 4, 2024

@elprans would it help you review this to break this into individual PRs? There's a lot here.

Yes, absolutely.

@bryanforbes
Copy link
Collaborator Author

@elprans would it help you review this to break this into individual PRs? There's a lot here.

Yes, absolutely.

Ok. I'll take care of that this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet