Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add command to export annotations as JSON #37

Open
MartinThoma opened this issue Nov 12, 2023 · 5 comments
Open

ENH: Add command to export annotations as JSON #37

MartinThoma opened this issue Nov 12, 2023 · 5 comments

Comments

@MartinThoma
Copy link
Member

MartinThoma commented Nov 12, 2023

This was originally asked in py-pdf/pypdf#2291

Proposed syntax:

$ pdfly annotations export
[
{"page_index": 0, "/Subtype": }
]

So essentially the idea is to export the same as defined in 12.5 Annotations (PDF 1.7 specifications), but add the "page_index" to the dictionary.

For the moment, we would only support the JSON export. If we want to add more export formats, we can add --format with a default value of JSON.

@MartinThoma MartinThoma changed the title ENH: Add command to import/export annotations as JSON ENH: Add command to export annotations as JSON Nov 12, 2023
@mah-emad
Copy link

mah-emad commented Nov 13, 2023

Hi Martin,
basically I have a pdf stored in a database and follows a workflow across multiple specialties, each will put comments on the pdf and in the end sign it.
what i'm trying to do is export only the comments/signatures and save it to the database instead of saving the whole PDF again.

I tried pikepdf (which is bindings to qpdf) and I can export the whole document to JSON and through Deepdiff store only the difference and apply the patch later.
and I can always compare the 2 pdfs on the binary level with bsdiff4 and store the difference without converting to JSON first.

@MartinThoma
Copy link
Member Author

You can export the annotations like this:

from pypdf import PdfReader
import json

reader = PdfReader("annotated_pdf.pdf")

annotations = []
for page_index, page in enumerate(reader.pages):
    for annotation in page.annotations:
        annotations.append({"page_index": page_index, **annotation})

print(json.dumps(annotations, indent=4))  # ìndent=None` for a one-line export

I'm not sure yet if adding this to pdfly is useful for others, but at least it would solve your issue of exporting the annotations.

Importing the annotations from JSON to a PDF is a different topic.

When you want to have cryptographic sigantures you should save multiple versions of that file. Or the latest file as it will contain all of the revisions.

@mah-emad
Copy link

Hi Martin,
Thanks for your reply, when i try the code it fails with the following:
annotations.append({"page_index": page_index, **annotation}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: 'IndirectObject' object is not a mapping

any thoughts how to handle importing back annotations

@pubpub-zz
Copy link

Hi Martin, Thanks for your reply, when i try the code it fails with the following: annotations.append({"page_index": page_index, **annotation}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: 'IndirectObject' object is not a mapping

any thoughts how to handle importing back annotations

try :

annotations.append({"page_index": page_index, **(annotation.get_object())})

@mah-emad
Copy link

now the error is
TypeError: Object of type IndirectObject is not JSON serializable

i guess the problem is with the internal custom types. that will require mapping all your types to pure python types before trying to serialize

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants