Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved performance and security for ContentStream_readInlineImage. #740

Merged
merged 4 commits into from Apr 15, 2022
Merged

Improved performance and security for ContentStream_readInlineImage. #740

merged 4 commits into from Apr 15, 2022

Conversation

sekrause
Copy link
Contributor

Recreated pull-request of the accidentally closed PR #331.

It fixes #329 by raising an exception when the stream ends and we haven't the end token for the inline image.

It also fixes #330 by using a more efficient parsing algorithm. For large inline images this change speeds up this method by many orders of magnitude:

  • Instead of only reading single bytes from the stream it reads larger chunks of 8 kB and uses Python's really fast find() method to check for the E the token. Only when the token is found it falls back to the normal algorithm that detects the end of the inline image.
  • Instead of using an immutable data it uses BytesIO to collect the output which support much faster appends.

@MartinThoma MartinThoma added nf-security Non-functional change: Security Tiny Pull requests that make a tiny change - and thus should be easy to merge labels Apr 12, 2022
@MartinThoma
Copy link
Member

@MasterOdin What do you think about the PR? Do you see anything that this could break?

PyPDF2/pdf.py Outdated Show resolved Hide resolved
PyPDF2/pdf.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Apr 15, 2022

Codecov Report

Merging #740 (3d5548d) into main (0890b06) will decrease coverage by 0.06%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##             main     #740      +/-   ##
==========================================
- Coverage   69.53%   69.47%   -0.07%     
==========================================
  Files           9        9              
  Lines        3309     3315       +6     
  Branches      782      783       +1     
==========================================
+ Hits         2301     2303       +2     
- Misses        767      769       +2     
- Partials      241      243       +2     
Impacted Files Coverage Δ
PyPDF2/pdf.py 72.17% <50.00%> (-0.15%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0890b06...3d5548d. Read the comment docs.

@MartinThoma
Copy link
Member

I'm currently trying to find a PDF with an inline image so that the code at least runs once

@MartinThoma
Copy link
Member

Ah, just found your comment again:

from reportlab.pdfgen import canvas
c = canvas.Canvas("test.pdf")
c.drawInlineImage("test.png", 100, 100, 100, 100)
c.drawString(200, 100, "Test")
c.showPage()
c.save()

@MartinThoma MartinThoma merged commit d71fb3e into py-pdf:main Apr 15, 2022
@MartinThoma
Copy link
Member

Thank you so much for all the time you invested into this over so 5 years! 🙏

MartinThoma added a commit that referenced this pull request Apr 15, 2022
Security (SEC):

- ContentStream_readInlineImage had potential infinite loop (#740)

Bug fixes (BUG):

- Fix merging encrypted files (#757)
- CCITTFaxDecode decodeParms can be an ArrayObject (#756)

Robustness improvements (ROBUST):

- title sometimes None (#744)

Documentation (DOC):

- Adjust short description of the package

Tests and Test setup (TST):

- Rewrite JS tests from unittest to pytest (#746)
- Increase Test coverage, mainly with filters (#756)
- Add test for inline images (#758)

Developer Experience Improvements (DEV):

- Remove unused Travis-CI configuration (#747)
- Show code coverage (#754, #755)
- Add mutmut (#760)

Miscellaneous:

- STY: Closing file handles, explicit exports, ... (#743)

All changes: 1.27.4...1.27.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nf-security Non-functional change: Security Tiny Pull requests that make a tiny change - and thus should be easy to merge
Projects
None yet
3 participants