BUG: Invalid Link #2450

rsinger417 · 2024-02-08T16:26:54Z

passes an IndirectObject for the target page instead of an integer. passing an integer creates an invalid link.

passes an IndirectObject for the target page instead of an integer. passing an integer creates an invalid link. resolves py-pdf#2443 Issue

stefan6419846 · 2024-02-08T16:28:32Z

Are you able to add a corresponding test case as well which shows the previous issue and demonstrates that your fix does indeed solve this?

rsinger417 · 2024-02-08T16:41:22Z

testpages.csv
testIndexCenterPnt.csv
Test Book0.pdf
link test .txt

link test.txt is the code. I could not upload a *.py file. The code needs to be changed to find the supporting files as the locations are hard coded.

An invalid link is created, it works, but if you delete a page the links are broken. Acrobat will re move then as invalid links when optimized.

stefan6419846 · 2024-02-09T07:42:20Z

Could you add something of this as some automated unit/integration test?

rsinger417 · 2024-02-09T17:23:49Z

I don't know what I'm doing. I never read the book on GitHub. I don't known how to automated unit/integration test.
The code I uploaded was how I tested it. The links work fine with the final pdf until until you remove pages or optimize it
with Acrobat. The links are invalid because the destination page in the link are integers. The code was broke with PyPDF2
version 2.90 (7/31/2022) with the introduction of the method add_annotation in the class PdfWriter. add_link was deprecated
in version 2.9.0. I got the code fix from version 2.8.1 (7/25/2022) from the add_link method in the class PdfWriter line 1532
and line 1534, this was the latest version that I found where the link worked correctly. I left line 1532 unchanged.
1532 pages_obj = cast(Dict[str, Any], self.get_object(self._pages))
1534 page_dest = pages_obj[PA.KIDS][pagedest] # TODO: switch for external link
I replaced "pagedest" with "tmp["target_page_index"]" which is the integer value of the page and
"page_dest" with "taget_page" which is the IndirectObject of the page. This IndirectObject references the same page even when
other pages are removed and keeping it from being an invalid link. I got rid of the TODO comment.

MartinThoma · 2024-02-13T21:57:41Z

@rsinger417 The test tests/test_generic.py::test_annotation_builder_link fails. Do you see why? (It could also be a test issue; I haven't looked into it so far)

rsinger417 · 2024-02-13T22:26:44Z

I can't see what the code is, but by the error message it is using AnnotationBuilder.link which has been deprecated. The class Link in _markup_annotations.py should be used. I think AnnotationBuilder is completely gone in version 4.0.0 *Raymond Singer* *Engineering Technician V* *T: 262.653.4154* *625 52nd Street, Room 302* *Kenosha, WI 53140*

…

On Tue, Feb 13, 2024 at 3:57 PM Martin Thoma ***@***.***> wrote: <https://github.com/rsinger417> This message originated from outside your organization ------------------------------ <https://github.com/rsinger417> @rsinger417 <https://github.com/rsinger417> The test tests/test_generic.py::test_annotation_builder_link fails. Do you see why? (It could also be a test issue; I haven't looked into it so far) — Reply to this email directly, view it on GitHub <#2450 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BF5XNWG6OMGU6BLFDUODEA3YTPOWFAVCNFSM6AAAAABDABLCT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSG4YDIOJTG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

stefan6419846 · 2024-02-15T20:28:40Z

I do not see the warning in the CI log, but an actual error which maps to a changed code line:

>           target_page = pages_obj[PA.KIDS][tmp["target_page_index"]]
E           IndexError: list index out of range

It seems like tmp["target_page_index"] returns an invalid page number?

rsinger417 · 2024-02-15T22:47:37Z

what is the value in "target_page_index"? It should be a page number in the pdf.

stefan6419846 · 2024-02-19T12:27:37Z

what is the value in "target_page_index"? It should be a page number in the pdf.

See the CI output which displays this value:

annotation = {'/Type': '/Annot', '/Subtype': '/Link', '/Rect': RectangleObject([100, 100, 300, 200]), '/Border': [50, 10, 4], '/Dest': {'target_page_index': 1, 'fit': '/Fit', 'fit_args': []}, '/P': IndirectObject(4, 0, 140453995250688)}

You should be able to locally debug this as well, as this is a permanent error. The background is that writer only has one page due to

pypdf/tests/test_generic.py

Line 911 in cc306ad

writer.add_page(page)

Thus target_page_index=1 now points to an invalid page:

pages_obj: {'/Type': '/Pages', '/Count': 1, '/Kids': [IndirectObject(4, 0, 140453995250688)]}
pages_obj[PA.KIDS]: [IndirectObject(4, 0, 140453995250688)]
tmp: {'target_page_index': 1, 'fit': '/Fit', 'fit_args': []}

With this analysis, your proposed change is a breaking one and thus most likely requires a deprecation process - although it might be debatable whether the current implementation would constitute as a bug or desired behavior.

rsinger417 · 2024-02-19T18:22:27Z

The test is wrong
lines 954-955 should come before line 952 since there is no page 1 when link_annotation is call you will get the
"IndexError: list index out of range." You have to add the page first. The first page in the pdf is page 0, the second page is page 1.
If there is no second page in the pdf you will still get an error. You could put a test to check if it is out of range and return an error
such as "No such page 1 in pdf"

945 # Part 4: Internal Link
946 with pytest.warns(DeprecationWarning):
947 link_annotation = AnnotationBuilder.link(
948 rect=(100, 100, 300, 200),
949 target_page_index=1,
950 border=[50, 10, 4],
951 )
952 writer.add_annotation(0, link_annotation)
953
954 for page in reader.pages[1:]:
955 writer.add_page(page)

stefan6419846 · 2024-02-19T19:31:31Z

The test is wrong

The test has been there before your change and worked, thus I would assume that this is/was some intended functionality. Apparently it was considered a valid use case to generate annotations for invalid pages which might be added later on. Yes, we could argue about how useful this is, but this is just how it has been in the past. The original change where this has been introduced is #1189.

Let's wait for the opinion of the other maintainers regarding this.

rsinger417 · 2024-02-19T20:20:34Z

Unless you add_page first there is no page and thus no IndirectLink to it. If there is no IndirecLink then the link will be invalid. How can you link to an internal page that does not exist. PR #1189 does not have this test in it. The date of this PR #1189 commit was when the links became invalid PyPDF2 2.9.0 7/31/2022. The links were good in PyPDF2 2.8.1. The invalid links will work but they are not valid and will be broken if pages are removed and if the file is optimized in Acrobat they will be removed.

stefan6419846 · 2024-02-25T18:11:44Z

@pubpub-zz @MartinThoma Any input on #2450 (comment) and how to continue with the (now failing) test regarding possibly breaking behavior?

pubpub-zz · 2024-02-25T21:23:52Z

destinations within the document are using indirect objects:

but for remote destination numbers are accepted:

(...)

Having a look deeper, I think the issue is coming from the error in the typing in the Link constructor which is not dealing with link to external documents.

stefan6419846 · 2024-02-26T07:12:43Z

@pubpub-zz It seems like your misread my comment. The remaining issue is that until now, pypdf would allow links to pages not (yet) added to the file; while with the solution proposed in this PR, we would restrict this to pages already present in the file. I consider this a possibly breaking change, but wanted to get a second opinion on this before continuing.

pubpub-zz · 2024-02-26T20:40:10Z

@pubpub-zz It seems like your misread my comment. The remaining issue is that until now, pypdf would allow links to pages not (yet) added to the file; while with the solution proposed in this PR, we would restrict this to pages already present in the file. I consider this a possibly breaking change, but wanted to get a second opinion on this before continuing.

Having to add the pages within the PdfWriter before referencing in links is mandatory : you can not guess what will be the IndirectObject before adding it into the PdfWriter.

stefan6419846 · 2024-02-27T09:16:25Z

In the current release, this would work indeed, id est referencing an invalid page. After merging this PR, this would change, thus it might be a breaking one - although I am not sure whether we consider the old behavior (allowing invalid references) a bug or not, as for a bug we would not have to really consider this a breaking change.

pubpub-zz · 2024-02-27T12:34:15Z

The legacy code was compatible with both arguments
With the current code you cannot create links to external documents

stefan6419846 · 2024-02-28T14:35:35Z

@rsinger417 Could you please check/verify whether you can keep support for external links which do not point to the same document?

rsinger417 · 2024-02-28T23:19:29Z

external link works, the annotation uses "url" instead of "target_page_index" for an internal link such as
mylink=Link(rect=[600,600,700,700], border=[0,0,1,[3,2]], url="#2450")
rather than
mylink=Link(rect=[130,60,230,25], target_page_index=0)

pubpub-zz · 2024-02-29T15:00:56Z

external link works, the annotation uses "url" instead of "target_page_index" for an internal link such as mylink=Link(rect=[600,600,700,700], border=[0,0,1,[3,2]], url="#2450") rather than mylink=Link(rect=[130,60,230,25], target_page_index=0)

in PDF there is different links:
/Goto which are for link to pages
/URI which are links to URI/URL
but also:
/GotoR which are links to pages in a remote document
/GotoE which are links to pages in a document embedded in the document

It is there two links where page index are used with.

ZupoLlask · 2024-03-22T21:28:43Z

I think this PR may also fix issue #2346. Probably it may make sense to check if indeed that issue will also be solved.

MartinThoma · 2024-04-14T15:20:45Z

I would assume that this is/was some intended functionality. Apparently it was considered a valid use case to generate annotations for invalid pages which might be added later on.

I'd be fine with removing it. The main question is if we need to go through the deprecation process (which would take quite long) or if we can simply say that it was a bug.

I don't see the use-case here, hence I'd say it is a bug - especially when there was this change of behavior in pypdf==2.9.0 (written by rsinger417 before; I didn't check).

Is there anything stopping people from adding the pages first?

Update _writer.py - Invalid Link Fix py-pdf#2448

1596d06

passes an IndirectObject for the target page instead of an integer. passing an integer creates an invalid link. resolves py-pdf#2443 Issue

Update _writer.py

fdd0c6f

rsinger417 mentioned this pull request Feb 8, 2024

Update _markup_annotations.py #2447

Closed

rsinger417 added 4 commits February 8, 2024 15:33

Update _writer.py

4104ecc

Update _writer.py

a795c31

Update _writer.py

c1d3459

Update _writer.py

7fb4d9e

MartinThoma changed the title ~~Update _writer.py - Invalid Link Fix #2448~~ BUG: Invalid Link Feb 13, 2024

MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Feb 13, 2024

ZupoLlask mentioned this pull request Mar 22, 2024

Microsoft Word table of contents Link annotation error. #2346

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Invalid Link #2450

BUG: Invalid Link #2450

rsinger417 commented Feb 8, 2024 •

edited by MartinThoma

stefan6419846 commented Feb 8, 2024

rsinger417 commented Feb 8, 2024

stefan6419846 commented Feb 9, 2024

rsinger417 commented Feb 9, 2024

MartinThoma commented Feb 13, 2024

rsinger417 commented Feb 13, 2024 via email

stefan6419846 commented Feb 15, 2024

rsinger417 commented Feb 15, 2024

stefan6419846 commented Feb 19, 2024

rsinger417 commented Feb 19, 2024

stefan6419846 commented Feb 19, 2024

rsinger417 commented Feb 19, 2024

stefan6419846 commented Feb 25, 2024

pubpub-zz commented Feb 25, 2024

stefan6419846 commented Feb 26, 2024

pubpub-zz commented Feb 26, 2024

stefan6419846 commented Feb 27, 2024

pubpub-zz commented Feb 27, 2024

stefan6419846 commented Feb 28, 2024

rsinger417 commented Feb 28, 2024

pubpub-zz commented Feb 29, 2024

ZupoLlask commented Mar 22, 2024

MartinThoma commented Apr 14, 2024

BUG: Invalid Link #2450

Are you sure you want to change the base?

BUG: Invalid Link #2450

Conversation

rsinger417 commented Feb 8, 2024 • edited by MartinThoma

stefan6419846 commented Feb 8, 2024

rsinger417 commented Feb 8, 2024

stefan6419846 commented Feb 9, 2024

rsinger417 commented Feb 9, 2024

MartinThoma commented Feb 13, 2024

rsinger417 commented Feb 13, 2024 via email

stefan6419846 commented Feb 15, 2024

rsinger417 commented Feb 15, 2024

stefan6419846 commented Feb 19, 2024

rsinger417 commented Feb 19, 2024

stefan6419846 commented Feb 19, 2024

rsinger417 commented Feb 19, 2024

stefan6419846 commented Feb 25, 2024

pubpub-zz commented Feb 25, 2024

stefan6419846 commented Feb 26, 2024

pubpub-zz commented Feb 26, 2024

stefan6419846 commented Feb 27, 2024

pubpub-zz commented Feb 27, 2024

stefan6419846 commented Feb 28, 2024

rsinger417 commented Feb 28, 2024

pubpub-zz commented Feb 29, 2024

ZupoLlask commented Mar 22, 2024

MartinThoma commented Apr 14, 2024

rsinger417 commented Feb 8, 2024 •

edited by MartinThoma