ENH: Add command to export annotations as JSON #37

MartinThoma · 2023-11-12T11:01:09Z

This was originally asked in py-pdf/pypdf#2291

Proposed syntax:

$ pdfly annotations export
[
{"page_index": 0, "/Subtype": }
]

So essentially the idea is to export the same as defined in 12.5 Annotations (PDF 1.7 specifications), but add the "page_index" to the dictionary.

For the moment, we would only support the JSON export. If we want to add more export formats, we can add --format with a default value of JSON.

The text was updated successfully, but these errors were encountered:

mah-emad · 2023-11-13T06:46:42Z

Hi Martin,
basically I have a pdf stored in a database and follows a workflow across multiple specialties, each will put comments on the pdf and in the end sign it.
what i'm trying to do is export only the comments/signatures and save it to the database instead of saving the whole PDF again.

I tried pikepdf (which is bindings to qpdf) and I can export the whole document to JSON and through Deepdiff store only the difference and apply the patch later.
and I can always compare the 2 pdfs on the binary level with bsdiff4 and store the difference without converting to JSON first.

MartinThoma · 2023-11-13T09:19:55Z

You can export the annotations like this:

from pypdf import PdfReader
import json

reader = PdfReader("annotated_pdf.pdf")

annotations = []
for page_index, page in enumerate(reader.pages):
    for annotation in page.annotations:
        annotations.append({"page_index": page_index, **annotation})

print(json.dumps(annotations, indent=4))  # ìndent=None` for a one-line export

I'm not sure yet if adding this to pdfly is useful for others, but at least it would solve your issue of exporting the annotations.

Importing the annotations from JSON to a PDF is a different topic.

When you want to have cryptographic sigantures you should save multiple versions of that file. Or the latest file as it will contain all of the revisions.

mah-emad · 2023-11-13T14:12:21Z

Hi Martin,
Thanks for your reply, when i try the code it fails with the following:
annotations.append({"page_index": page_index, **annotation}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: 'IndirectObject' object is not a mapping

any thoughts how to handle importing back annotations

pubpub-zz · 2023-11-13T17:49:23Z

Hi Martin, Thanks for your reply, when i try the code it fails with the following: annotations.append({"page_index": page_index, **annotation}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: 'IndirectObject' object is not a mapping

any thoughts how to handle importing back annotations

try :

annotations.append({"page_index": page_index, **(annotation.get_object())})

mah-emad · 2023-11-13T18:19:47Z

now the error is
TypeError: Object of type IndirectObject is not JSON serializable

i guess the problem is with the internal custom types. that will require mapping all your types to pure python types before trying to serialize

MartinThoma changed the title ~~ENH: Add command to import/export annotations as JSON~~ ENH: Add command to export annotations as JSON Nov 12, 2023

MartinThoma mentioned this issue Nov 12, 2023

Import/Export annotations to FDF/JSON format py-pdf/pypdf#2291

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add command to export annotations as JSON #37

ENH: Add command to export annotations as JSON #37

MartinThoma commented Nov 12, 2023 •

edited

mah-emad commented Nov 13, 2023 •

edited

MartinThoma commented Nov 13, 2023

mah-emad commented Nov 13, 2023

pubpub-zz commented Nov 13, 2023

mah-emad commented Nov 13, 2023

ENH: Add command to export annotations as JSON #37

ENH: Add command to export annotations as JSON #37

Comments

MartinThoma commented Nov 12, 2023 • edited

mah-emad commented Nov 13, 2023 • edited

MartinThoma commented Nov 13, 2023

mah-emad commented Nov 13, 2023

pubpub-zz commented Nov 13, 2023

mah-emad commented Nov 13, 2023

MartinThoma commented Nov 12, 2023 •

edited

mah-emad commented Nov 13, 2023 •

edited