MAINT: Use typing.IO for file streams #1498

thehale · 2022-12-12T21:29:49Z

The Python standard library provides the IO type for file streams. [source]

This commit replaces the complex Union type of the various IO implementations with the official IO type. This will improve the accuracy of type checking in users' IDEs.

There are also a few quality-of-life improvements for new code contributors like myself.

Many developers (like myself) like to use virtual environments included within the current project. These virtual environment are local development constructs and should not be checked into source control. This commit adds two common virtual environment directory names to the .gitignore to avoid accidental commits from future developers.

The current contribution instructions in `docs/dev/intro.md` direct new code contributors to install the `dev` requirements. After following that instruction, the minimal test suite fails with the following errors: ``` python -m venv .venv source .venv/bin/activate pip install -r requirements/dev.txt pytest -m "not external" -m "not samples" -m "not slow" ``` =================================================================================================== short test summary info ==================================================================================================== FAILED tests/test_reader.py::test_get_images[pdflatex-outline.pdf-expected_images0] - ModuleNotFoundError: No module named 'PIL' FAILED tests/test_reader.py::test_get_images[crazyones.pdf-expected_images1] - ModuleNotFoundError: No module named 'PIL' FAILED tests/test_reader.py::test_get_images[git.pdf-expected_images2] - ModuleNotFoundError: No module named 'PIL' FAILED tests/test_reader.py::test_get_images[imagemagick-CCITTFaxDecode.pdf-expected_images5] - ModuleNotFoundError: No module named 'PIL' FAILED tests/test_reader.py::test_get_images[src6-expected_images6] - ModuleNotFoundError: No module named 'PIL' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/994/994636.pdf-tika-994636.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/952/952133.pdf-tika-952133.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/914/914568.pdf-tika-914568.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/952/952016.pdf-tika-952016.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/965/965118.pdf-tika-952016.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/959/959184.pdf-tika-959184.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/958/958496.pdf-tika-958496.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/972/972174.pdf-tika-972174.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/972/972243.pdf-tika-972243.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/969/969502.pdf-tika-969502.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction[https://arxiv.org/pdf/2201.00214.pdf-arxiv-2201.00214.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction_strict - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' FAILED tests/test_workflows.py::test_image_extraction2[https://corpora.tika.apache.org/base/docs/govdocs1/977/977609.pdf-tika-977609.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]' ======================================================================= 18 failed, 536 passed, 5 skipped, 53 deselected, 5 xfailed in 146.94s (0:02:26) ======================================================================== This commit adds `pillow` to `requirements/dev.in` so that the minimal test suite can pass on the first try so that new code contributors can start implementing improvements with confidence.

The Python standard library provides the `IO` type for file streams. (Source: https://docs.python.org/3/library/typing.html#typing.IO) This commit replaces the complex Union type of the `IO` implementations with the official `IO` type. This will improve the accuracy of type checking in users' IDEs.

The CI system flagged some additional conflicts with the `IO` type in the writer classes. This commit changes the writer classes to use the standard `IO` type instead of the union of IO implementations.

MartinThoma · 2022-12-15T21:47:28Z

Thank you for the contribution!

This time it might take a little bit longer until the release happens as I want to make the 3.0.0 release :-)

MartinThoma · 2022-12-15T21:47:56Z

If you want, I'll add you to the list of contributors: contributors — PyPDF2 documentation

thehale · 2022-12-15T21:50:03Z

If you want, I'll add you to the list of contributors: contributors — PyPDF2 documentation

Go for it. :D Thanks!

MartinThoma · 2022-12-15T22:02:57Z

Done 🤗

BREAKING CHANGES: - Deprecate features with PyPDF2==3.0.0 (#1489) - Refactor Fit / Zoom parameters (#1437) New Features (ENH): - Add Cloning (#1371) - Allow int for indirect_reference in PdfWriter.get_object (#1490) Documentation (DOC): - How to read PDFs from S3 (#1509) - Make MyST parse all links as simple hyperlinks (#1506) - Changed 'latest' for 'stable' generated docs (#1495) - Adjust deprecation procedure (#1487) Maintenance (MAINT): - Use typing.IO for file streams (#1498) [Full Changelog](2.12.1...3.0.0)

thehale added 4 commits December 12, 2022 13:31

STY: Use standard IO type hint for writers

c9e7ec3

The CI system flagged some additional conflicts with the `IO` type in the writer classes. This commit changes the writer classes to use the standard `IO` type instead of the union of IO implementations.

MartinThoma changed the title ~~Use official IO type for file streams~~ MAINT: Use typing.IO for file streams Dec 15, 2022

MartinThoma merged commit b8f787e into py-pdf:main Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Use typing.IO for file streams #1498

MAINT: Use typing.IO for file streams #1498

thehale commented Dec 12, 2022

MartinThoma commented Dec 15, 2022

MartinThoma commented Dec 15, 2022

thehale commented Dec 15, 2022

MartinThoma commented Dec 15, 2022

MAINT: Use typing.IO for file streams #1498

MAINT: Use typing.IO for file streams #1498

Conversation

thehale commented Dec 12, 2022

MartinThoma commented Dec 15, 2022

MartinThoma commented Dec 15, 2022

thehale commented Dec 15, 2022

MartinThoma commented Dec 15, 2022