Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add bgzf typing #4654

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

bzoracler
Copy link
Contributor

@bzoracler bzoracler commented Mar 7, 2024

  • I hereby agree to dual licence this and any previous contributions under both
    the Biopython License Agreement AND the BSD 3-Clause License.

  • I have read the CONTRIBUTING.rst file, have run pre-commit
    locally, and understand that continuous integration checks will be used to
    confirm the Biopython unit tests and style checks pass with these changes.

  • I have added my name to the alphabetical contributors listings in the files
    NEWS.rst and CONTRIB.rst as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)


@peterjc As discussed in #4651 (comment), this is a conversion of Bio.bgzf to --strict static typing. Setting this as a draft PR for initial feedback.

Part of this required making some runtime changes, which are done in the following commits:

I believe these taken together are fully compatible with the usage of the module as given by the specifications, but this needs thorough review.

@bzoracler
Copy link
Contributor Author

Some of these style failures conflict with black's defaults. I'll just disable the codes for this file for now.

else:
crc = struct.pack("<I", crc)
crc = struct.pack("<I", checksum)
if expected_crc != crc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have some crc usage despite changing the varible name above...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deliberate. Static type checkers won't like crc being of two different types:

  • crc is first inferred to be an int from the line crc = zlib.crc32(data)
  • Type checkers will subsequently complain on the line crc = struct.pack(...), because struct.pack(...) returns a bytes object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I've hit that kind of issue myself - using a new variable name does seem to be clearest. I may have not read this change carefully enough.

self._block_raw_length = None
self._buffers: dict[int, tuple[_BufferT, int]] = {}
# Is there a safe default for a non-`None` integer?
self._block_start_offset: int = None # type: ignore[assignment]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[int] perhaps?

We can't used zero here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[int] is the same as int | None, and doesn't play nicely with the rest of the class body. Essentially, if you set _block_start_offset: Optional[int], you'll have to do an assertion that it's not None whenever it's used in the context of an integer, in the rest of the class body; otherwise the type-checker will complain.

The assertions are visual noise, however - I'd prefer not to add them. I'll leave it like this for now, because the next line (self._load_block) is guaranteed to set it to an integer that's not None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have had to introduce occasional assert isintance(...) lines in my own project. As you say, largely visual noise :(

_Self = tp.TypeVar("_Self")
# fmt: off
# Writing and updating modes
_ReadingBinaryMode: TypeAlias = tp.Literal["rb", "br"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, esentially an enum - does this work nicely in a compatible code editor for suggesting values? Until now I'd have just annotated this as str.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some language server protocols might work well with typing.Literal - I use PyCharm and unfortunately it doesn't seem to do so :(

Bio/bgzf.py Show resolved Hide resolved
# fmt: on
def __init__(
self,
filename: _typeshed.StrPath | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At what point can we use XXX | None rather than Optional[XXX]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same explanation as #4654 (comment)

def open(filename, mode="rb"):
# fmt: off
@tp.overload
def open(filename: _typeshed.StrPath, mode: _ReadingBinaryMode = "rb") -> BgzfReader[bytes]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it these lines triggering D103 missing docstring in public functions? That feels like a false positive - indeed PyCQA/pydocstyle#419 agreed.

However, pydocstyle is now deprecated. I will look at droping this from our flake8 configuration and enabling it via ruff as suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants