Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Use typing.IO for file streams #1498

Merged
merged 4 commits into from Dec 15, 2022
Merged

Conversation

thehale
Copy link
Contributor

@thehale thehale commented Dec 12, 2022

The Python standard library provides the IO type for file streams. [source]

This commit replaces the complex Union type of the various IO implementations with the official IO type. This will improve the accuracy of type checking in users' IDEs.


There are also a few quality-of-life improvements for new code contributors like myself.

Many developers (like myself) like to use virtual environments included within
the current project. These virtual environment are local development constructs
and should not be checked into source control.

This commit adds two common virtual environment directory names to the
.gitignore to avoid accidental commits from future developers.
The current contribution instructions in `docs/dev/intro.md` direct new code
contributors to install the `dev` requirements. After following that
instruction, the minimal test suite fails with the following errors:

```
python -m venv .venv
source .venv/bin/activate
pip install -r requirements/dev.txt
pytest -m "not external" -m "not samples" -m "not slow"
```

=================================================================================================== short test summary info ====================================================================================================
FAILED tests/test_reader.py::test_get_images[pdflatex-outline.pdf-expected_images0] - ModuleNotFoundError: No module named 'PIL'
FAILED tests/test_reader.py::test_get_images[crazyones.pdf-expected_images1] - ModuleNotFoundError: No module named 'PIL'
FAILED tests/test_reader.py::test_get_images[git.pdf-expected_images2] - ModuleNotFoundError: No module named 'PIL'
FAILED tests/test_reader.py::test_get_images[imagemagick-CCITTFaxDecode.pdf-expected_images5] - ModuleNotFoundError: No module named 'PIL'
FAILED tests/test_reader.py::test_get_images[src6-expected_images6] - ModuleNotFoundError: No module named 'PIL'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/994/994636.pdf-tika-994636.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/952/952133.pdf-tika-952133.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/914/914568.pdf-tika-914568.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/952/952016.pdf-tika-952016.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/965/965118.pdf-tika-952016.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/959/959184.pdf-tika-959184.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/958/958496.pdf-tika-958496.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/972/972174.pdf-tika-972174.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/972/972243.pdf-tika-972243.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/969/969502.pdf-tika-969502.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction[https://arxiv.org/pdf/2201.00214.pdf-arxiv-2201.00214.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction_strict - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
FAILED tests/test_workflows.py::test_image_extraction2[https://corpora.tika.apache.org/base/docs/govdocs1/977/977609.pdf-tika-977609.pdf] - ImportError: pillow is required to do image extraction. It can be installed via 'pip install PyPDF2[image]'
======================================================================= 18 failed, 536 passed, 5 skipped, 53 deselected, 5 xfailed in 146.94s (0:02:26) ========================================================================

This commit adds `pillow` to  `requirements/dev.in` so that the minimal test
suite can pass on the first try so that new code contributors can start
implementing improvements with confidence.
The Python standard library provides the `IO` type for file streams. (Source:
https://docs.python.org/3/library/typing.html#typing.IO)

This commit replaces the complex Union type of the `IO` implementations with the
official `IO` type. This will improve the accuracy of type checking in users'
IDEs.
The CI system flagged some additional conflicts with the `IO` type in the writer
classes.

This commit changes the writer classes to use the standard `IO` type instead of
the union of IO implementations.
@MartinThoma MartinThoma changed the title Use official IO type for file streams MAINT: Use typing.IO for file streams Dec 15, 2022
@MartinThoma MartinThoma merged commit b8f787e into py-pdf:main Dec 15, 2022
@MartinThoma
Copy link
Member

Thank you for the contribution!

This time it might take a little bit longer until the release happens as I want to make the 3.0.0 release :-)

@MartinThoma
Copy link
Member

If you want, I'll add you to the list of contributors: contributors — PyPDF2 documentation

@thehale
Copy link
Contributor Author

thehale commented Dec 15, 2022

If you want, I'll add you to the list of contributors: contributors — PyPDF2 documentation

Go for it. :D Thanks!

@MartinThoma
Copy link
Member

Done 🤗

MartinThoma added a commit that referenced this pull request Dec 22, 2022
BREAKING CHANGES:
-  Deprecate features with PyPDF2==3.0.0 (#1489)
-  Refactor Fit / Zoom parameters (#1437)

New Features (ENH):
-  Add Cloning  (#1371)
-  Allow int for indirect_reference in PdfWriter.get_object (#1490)

Documentation (DOC):
-  How to read PDFs from S3 (#1509)
-  Make MyST parse all links as simple hyperlinks (#1506)
-  Changed 'latest' for 'stable' generated docs (#1495)
-  Adjust deprecation procedure (#1487)

Maintenance (MAINT):
-  Use typing.IO for file streams (#1498)

[Full Changelog](2.12.1...3.0.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants