Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Improve/rewrite PDF permission retrieval #2400

Merged
merged 4 commits into from Jan 18, 2024

Conversation

stefan6419846
Copy link
Collaborator

@stefan6419846 stefan6419846 commented Jan 8, 2024

Fixes #2391. Fixes #2399.

pypdf/constants.py Outdated Show resolved Hide resolved
MartinThoma
MartinThoma previously approved these changes Jan 8, 2024
Copy link
Member

@MartinThoma MartinThoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@MartinThoma
Copy link
Member

@pubpub-zz What do you think about this approach? Is it fine in your opinion you as well?

@pubpub-zz
Copy link
Collaborator

from my understanding of §3.5.3, permissions can also be attached to different users through PKCS#7.
it may be good to add a constructor to initialize properly the R1/R2/R6/R7 (for #2401) we should not merge this PR with this issue active

@MartinThoma
Copy link
Member

Thanks for pointing this out. That's something I wasn't aware of. I will read up on the specs before merging it 👍

@stefan6419846
Copy link
Collaborator Author

stefan6419846 commented Jan 10, 2024

from my understanding of §3.5.3, permissions can also be attached to different users through PKCS#7.

Do we really have support for certificate-based encryption anywhere? AFAIK we can only use user and owner passwords at the moment.

According to table 23 of PDF 3200-1:2008, the most important difference is "If bit 2 is set to 1, all other bits are ignored and all operations are permitted. If bit 2 is set to 0, permission for operations are based on the values of the remaining flags defined in Table 24." This seems to change the semantics of R2 only nevertheless.

it may be good to add a constructor to initialize properly the R1/R2/R6/R7 (for #2401) we should not merge this PR with this issue active

IMHO this is a separate issue (and rather a bug than an enhancement) and should get a dedicated PR. This PR primarily is about PDF permission retrieval, as the title indicates.

@MartinThoma
Copy link
Member

MartinThoma commented Jan 11, 2024

I've just checked other software:

from my understanding of §3.5.3, permissions can also be attached to different users through PKCS#7.
it may be good to add a constructor to initialize properly the R1/R2/R6/R7

For the moment, I think it's best to ignore that complexity. We already have a user interface which is public and I think the user_access_permissions property is better than the existing interface. It's also similar to PDFium and PyMuPDF - both are very good projects which I respect a lot.

I leave this open until Saturday (maybe Sunday), but if there are no new arguments / new insights, I would merge it.

Copy link
Collaborator

@pubpub-zz pubpub-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proposition to prepare future

pypdf/_reader.py Show resolved Hide resolved
Copy link

codecov bot commented Jan 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (480e840) 94.42% compared to head (e24b305) 94.42%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2400   +/-   ##
=======================================
  Coverage   94.42%   94.42%           
=======================================
  Files          49       49           
  Lines        7961     7998   +37     
  Branches     1608     1616    +8     
=======================================
+ Hits         7517     7552   +35     
- Misses        274      276    +2     
  Partials      170      170           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma MartinThoma merged commit bd571f5 into py-pdf:main Jan 18, 2024
15 checks passed
@stefan6419846 stefan6419846 deleted the user-access-permissions branch January 18, 2024 19:58
MartinThoma added a commit that referenced this pull request Jan 19, 2024
## What's new

pypdf==4.0.0 is a big milestone forward:

* We finally have a layout-mode text extraction.
  This enables users who want to detect / extract tables
  with heuristics to give it a try.
* We deprecated a lot of the old PyPDF2 API that was either
  not following PEP8 naming styles or was not using a
  property. Users comming from PyPDF2 might want to switch
  first to pypdf<4.0.0 to get helpful error messages
  that show the new API in their speicific cases.

A big 'Thank you!' the the whole pypdf community for your
work. Thanks to you, pypdf is better than ever.

Kudos to @shartzog who added the layout-mode with his first
contribution!

### Deprecations (DEP)
-  Drop Python 3.6 support (#2369) by @MartinThoma
-  Remove deprecated code (#2367) by @MartinThoma
-  Remove deprecated XMP properties (#2386) by @stefan6419846

### New Features (ENH)
-  Add "layout" mode for text extraction (#2388) by @shartzog
-  Add Jupyter Notebook integration for PdfReader (#2375) by @MartinThoma
-  Improve/rewrite PDF permission retrieval (#2400) by @stefan6419846

### Bug Fixes (BUG)
-  PdfWriter.add_uri was setting the wrong type (#2406) by @pmiller66
-  Add support for GBK2K cmaps (#2385) by @stefan6419846

### Documentation (DOC)
-  Add pmiller66 for #2406 as a contributor by @MartinThoma
-  Add missing expand parameter (#2393) by @Atomnp
-  Resolve build warnings (#2380) by @stefan6419846
-  Fix testing prerequisites (#2381) by @stefan6419846
-  Improve formatting of contributors page (#2383) by @stefan6419846
-  Add Tobeabellwether as a contributor for #2341 by @MartinThoma

### Developer Experience (DEV)
-  Make dependabot aware of our PR prefixes (#2415) by @stefan6419846
-  Fail on Sphinx issues (#2405) by @stefan6419846
-  Move title check to own workflow (#2384) by @MasterOdin
-  Write to temporary files instead of the working directory (#2379) by @stefan6419846
-  Ensure that the PR titles have the correct format (#2378) by @stefan6419846

### Maintenance (MAINT)
-  Complete FileSpecificationDictionaryEntries constants (#2416) by @MartinThoma
-  Return None instead of -1 when page is not attached (#2376) by @MartinThoma
-  Replace warning with logging.error (#2377) by @MartinThoma

### Testing (TST)
-  Add missing pytest.mark.samples annotations (#2412) by @kitterma
-  Correctly close temporary files (#2396) by @stefan6419846
-  Fix  side effect #2379 (#2395) by @pubpub-zz
-  Add test for layout extraction mode (#2390) by @MartinThoma

### Code Style (STY)
-  Use the UserAccessPermissions enum (#2398) by @MartinThoma
-  Run black (#2370) by @MartinThoma

[Full Changelog](3.17.4...4.0.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MAINT: Introduce a PdfReader property 'user_access_permissions' PDF permission retrieval
3 participants