Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to extract gray scale images #1460

Merged
merged 4 commits into from Dec 9, 2022

Conversation

joeywang4
Copy link
Contributor

Currently, when gray scale images are extracted, they will be incorrectly transformed to RGB images. This PR fixed this issue by changing the palette and the mode when images are extracted.

@codecov
Copy link

codecov bot commented Nov 30, 2022

Codecov Report

Base: 94.31% // Head: 94.01% // Decreases project coverage by -0.30% ⚠️

Coverage data is based on head (6efa8a6) compared to base (940819f).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1460      +/-   ##
==========================================
- Coverage   94.31%   94.01%   -0.31%     
==========================================
  Files          28       30       +2     
  Lines        5171     5443     +272     
  Branches      980     1038      +58     
==========================================
+ Hits         4877     5117     +240     
- Misses        177      197      +20     
- Partials      117      129      +12     
Impacted Files Coverage Δ
PyPDF2/filters.py 97.31% <100.00%> (+0.01%) ⬆️
PyPDF2/_merger.py 91.11% <0.00%> (-6.46%) ⬇️
PyPDF2/_writer.py 88.73% <0.00%> (-2.80%) ⬇️
PyPDF2/constants.py 100.00% <0.00%> (ø)
PyPDF2/_utils.py 99.48% <0.00%> (ø)
PyPDF2/__init__.py 100.00% <0.00%> (ø)
PyPDF2/generic/_data_structures.py 95.62% <0.00%> (+0.50%) ⬆️
PyPDF2/_reader.py 90.30% <0.00%> (+0.68%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@MartinThoma
Copy link
Member

Very nice!

Do you happen to have an example pdf where the new code would be applied? Or could you create one and upload it here?

I would add it to the https://github.com/py-pdf/PyPDF2 + add a test

@joeywang4
Copy link
Contributor Author

Ok, I have uploaded a test file grayscale.pdf and added it to test_get_images.

@MartinThoma
Copy link
Member

Hi @joeywang4 ,

I've just added the grayscale.pdf to the sample-files git submodule. Would you be so kind to remove the file from your PR + use sample-files/019-grayscale-image/grayscale-image.pdf?

I want to avoid that the PyPDF2 repository becomes bigger and bigger due to example PDF files. This might have an impact on people who clone PyPDF2 / install from the repository. Hence adding the file via submodule.

@MartinThoma MartinThoma self-requested a review December 8, 2022 21:01
Copy link
Member

@MartinThoma MartinThoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good - thanks for adding the test!

Please just remove the resources/grayscale.pdf and use SAMPLE_ROOT / "019-grayscale-image/grayscale-image.pdf" instead :-)

@joeywang4
Copy link
Contributor Author

@MartinThoma I have removed the test file and updated the path. Thanks for the suggestion!
I also changed the path of the extracted image file when running the test. Otherwise, the extracted image cannot be correctly written to a file when I ran the test locally on my computer.

@MartinThoma MartinThoma merged commit 22214e8 into py-pdf:main Dec 9, 2022
@MartinThoma
Copy link
Member

Very good work! 👍

If you want, I'll add you to the contributors list :-)

@joeywang4 joeywang4 deleted the gray-image branch December 9, 2022 21:39
MartinThoma added a commit that referenced this pull request Dec 10, 2022
New Features (ENH):
-  Add support to extract gray scale images (#1460)
-  Add 'threads' property to PdfWriter (#1458)
-  Add 'open_destination' property to PdfWriter (#1431)
-  Make PdfReader.get_object accept integer arguments (#1459)

Bug Fixes (BUG):
-  Scale PDF annotations (#1479)

Robustness (ROB):
-  Padding issue with AES encryption (#1469)
-  Accept empty object as null objects (#1477)

Documentation (DOC):
-  Add module documentation the PaperSize class (#1447)

Maintenance (MAINT):
-  Use 'page_number' instead of 'pagenum' (#1365)
-  Add List of pages to PageRangeSpec (#1456)

Testing (TST):
-  Cleanup temporary files (#1454)
-  Mark test_tounicode_is_identity as external (#1449)
-  Use Ubuntu 20.04 for running CI test suite (#1452)

[Full Changelog](2.11.2...2.12.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants