Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pypdf creates invalid links with add_annotation since PyPDF2 2.9.0 #2443

Open
rsinger417 opened this issue Feb 5, 2024 · 1 comment · May be fixed by #2450
Open

pypdf creates invalid links with add_annotation since PyPDF2 2.9.0 #2443

rsinger417 opened this issue Feb 5, 2024 · 1 comment · May be fixed by #2450
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

Comments

@rsinger417
Copy link
Contributor

Many years ago I used pypdf to create links for a book of maps for our storm sewer system. I had an index page that had links to all of the other pages and each page had links to the page with the maps to the North, South, East and West, and back to the index page. I could delete pages and all of the link remained good, that is the links would take me to the correct pages. This book had 550 pages in it, 449 links on the index page and a maximum 5 links per page. It worked great, when I needed to update pages I would just use Acrobat to replace the pages and after many years I forgot how I did it and all about pypdf. Now I had to make new pages and the book increased to 625 page with the pages being shuffled and I had to recreate all of the links all over again. I had to make 3590 links. I found pypdf again and managed to do it again, but this time when I delete pages the links to the adjacent page where broken.
I did some investigation with a smaller test file and found out that this bug was created with PyPDF2 version 2.9.0 with the introduction of add_annotation sub in _writer.py. The target_page_index was passed as an int where in the previous versions it was passed as an Indirectobject. I was able to correct the bug by add two lines of code and changing another line in PyPDF2 version 2.9.0.
I tried the fix in pypdf version 4.0.0 and it worked great. Problem solved.

Environment

Windows 10

Code add_annotation pypdf version 4.0.0 revised code

def add_annotation(
    self,
    page_number: Union[int, PageObject],
    annotation: Dict[str, Any],
) -> DictionaryObject:
    """
    Add a single annotation to the page.
    The added annotation must be a new annotation.
    It can not be recycled.

    Args:
        page_number: PageObject or page index.
        annotation: Annotation to be added (created with annotation).

    Returns:
        The inserted object
        This can be used for pop-up creation, for example
    """
    page = page_number
    if isinstance(page, int):
        page = self.pages[page]
    elif not isinstance(page, PageObject):
        raise TypeError("page: invalid type")

    to_add = cast(DictionaryObject, _pdf_objectify(annotation))
    to_add[NameObject("/P")] = page.indirect_reference

    if page.annotations is None:
        page[NameObject("/Annots")] = ArrayObject()
    assert page.annotations is not None

    # Internal link annotations need the correct object type for the
    # destination
    if to_add.get("/Subtype") == "/Link" and "/Dest" in to_add:
        tmp = cast(Dict[Any, Any], to_add[NameObject("/Dest")])

        pages_obj = cast(Dict[str, Any], self.get_object(self._pages))        #from PyPDF2 2.8.1 _writer.py lines 1532-1533 NEW LINE
        page_dest = pages_obj[PA.KIDS][tmp["target_page_index"]]       # IndirectObject instead of int for target page NEW LINE

        dest = Destination(
            NameObject("/LinkName"),
            page_dest, #replaced to pass IndirectObject instead of int REVISED LINE
            Fit(
                fit_type=tmp["fit"], fit_args=dict(tmp)["fit_args"]
            ),  # I have no clue why this dict-hack is necessary
        )
        to_add[NameObject("/Dest")] = dest.dest_array

    page.annotations.append(self._add_object(to_add))

    if to_add.get("/Subtype") == "/Popup" and NameObject("/Parent") in to_add:
        cast(DictionaryObject, to_add["/Parent"].get_object())[
            NameObject("/Popup")
        ] = to_add.indirect_reference

    return to_add
@stefan6419846
Copy link
Collaborator

Thanks for the report. Do you want to submit a corresponding PR for it?

@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Feb 6, 2024
rsinger417 added a commit to rsinger417/pypdf that referenced this issue Feb 8, 2024
passes an IndirectObject for the target page instead of an integer. passing an integer creates an invalid link.
resolves py-pdf#2443 Issue
@rsinger417 rsinger417 linked a pull request Feb 8, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants