HTML links to document page broken after merge #471

sanohin · 2018-11-21T14:30:48Z

If you have links in PDF file (html anchor tag with element id as href) they would not work after merging.

<a href="#target">Go to target</a>
....some content
<div id="target>Target here</div>

report = PdfFileReader(BytesIO(pdf)) # rendered html file to pdf with html links
merger = PdfFileMerger(strict=False)
merger.append(report)

result = BytesIO()
merger.write(result)
result.seek(0)
return result.read()

jackneil · 2019-06-07T18:33:47Z

I second this issue. It occurs when merging even via the pageObject.mergePage method

advename · 2020-04-20T12:21:06Z

I'll jump on the train - having the same issue here!

MartinThoma · 2022-06-26T12:29:52Z

Can anybody share a PDF which shows this issue? Is it still an issue with the latest PyPDF2 version?

SimplyOm · 2022-07-19T09:09:15Z

@MartinThoma Not sure if you're still looking for an example, but you can find one below:

I have generated book.pdf using sample doc creation of jupyter-book project. You can see that internal HTML links in Contents section of book_out.pdf don't work, which work fine in book.pdf. The conversion from book.pdf to book_out.pdf uses the below code snippet:

from PyPDF2 import PdfReader, PdfWriter

PDF = "./doc/_build/pdf/book.pdf"
OUT_PDF = "./doc/_build/pdf/book_out.pdf"

reader = PdfReader(PDF)
writer = PdfWriter()

for page in reader.pages:
    writer.addPage(page)

with open(OUT_PDF, "wb") as f:
    writer.write(f)

MartinThoma · 2022-07-19T12:43:33Z

Thank you @SimplyOm 🤗

dc-em · 2022-10-04T04:33:17Z

Hi @MartinThoma , is this solved? facing same issue

pubpub-zz · 2022-10-04T05:00:29Z

In progress, should come back soon

pubpub-zz · 2022-10-11T21:41:03Z

I think the issue is found :
a) The links are using named dest, not copied with the add_page : I've coded the append/merge functions into PdfWriter
b) some types were not matching : I've added a function for that (implemented in the merge
changes in pr #1371 (still in progress)

manathan1984 · 2022-10-30T05:13:09Z

On a relevant issue, when using merge, the internal links of a pdf seem to be broken. I refer to links, for example, to a reference at the end of the pdf in a research paper or to a section of the paper. Any ideas on how to keep those links active when mergin?

The method `.clone(pdf_dest,[force_duplicate])` clones the objects and all referenced objects. If an object is already cloned, the already cloned object is returned (unless force_duplicate is set) mainly for internal use but can be used on a page for pageObject/DictionnaryObject/[Encoded/Decoded/Content]Stream an extra parameter ignore_fields list that provide the list of fields that should not be cloned. When available, the pointer to an object is available in `indirect_obj` attribute. New API for add_page/insert_page that : * returns the cloned page object * ignore_fields can be provided as a parameter. ## Others * file is closed at the end of PdfWriter.write when a filename is provided * Breaking Change: `add_outline_item` now has a parameter before which is not the last parameter ## Update * The public API of PdfMerger has been added to PdfWriter (ready to make PdfMerger an alias of it) * Process properly Outline merging * Process properly Named destinated Deals with #1194, #1322, #471, #1337

pubpub-zz · 2023-02-09T05:34:56Z

@manathan1984,
it is now recommended to use PdfWriter and append() that should fix the issues. Can you try it and update the status of this issue?

DX9807 · 2023-02-09T09:18:38Z

@pubpub-zz

writer = PdfWriter()
for pdf in ["cover_page.pdf", "main_report.pdf", "back_cover.pdf"]:
    writer.append(pdf)

with open("result.pdf", "wb") as f:
    writer.write(f)

getting below error when using PdfWriter and append() .

AttributeError: 'NumberObject' object has no attribute 'indirect_reference'

pubpub-zz · 2023-02-09T12:09:47Z

@DX9807
Can you please provide the pdf

DX9807 · 2023-02-10T06:38:37Z

@pubpub-zz
Check the files given below
back_cover.pdf
central.pdf
cover_page.pdf

While trying to merge the above pdfs using PdfWriter and its append method I am getting this error.

AttributeError: 'NumberObject' object has no attribute 'indirect_reference'

But when I use PdfMerger class and the corresponding append method the pdfs get merged but the internal hyperlinks are not
working in this case,

closes py-pdf#471 the issue was with named destination using numbers instead of indirect object to point pages. This is normally not expected.

The issue was with named destination using numbers instead of indirect object to point pages. This is normally not expected. Closes #471 Closes #1898

rocketrefrigerator · 2023-09-12T08:35:43Z

Hello, it is included in 3.16.0?

stefan6419846 · 2023-09-12T08:37:32Z

If you have a look at the last commit referenced here (b1fa953), you will see that this fix is included since version 3.11.1.

wangkev mentioned this issue Dec 15, 2020

Adding other PDFs into output Kozea/WeasyPrint#1271

Closed

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 7, 2022

MartinThoma added the needs-pdf The issue needs a PDF file to show the problem label Jun 26, 2022

abarker mentioned this issue Jul 15, 2022

links affected abarker/pdfCropMargins#40

Closed

MartinThoma added Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests and removed needs-pdf The issue needs a PDF file to show the problem labels Jul 19, 2022

pubpub-zz mentioned this issue Oct 11, 2022

ENH: Add Cloning #1371

Merged

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Oct 12, 2022

add test for iis py-pdf#471

1e55376

pubpub-zz self-assigned this Feb 26, 2023

pubpub-zz mentioned this issue May 23, 2023

BUG: Append pdf with named destination using numbers for pages #1858

Merged

MartinThoma closed this as completed in #1858 Jun 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML links to document page broken after merge #471

HTML links to document page broken after merge #471

sanohin commented Nov 21, 2018 •

edited

jackneil commented Jun 7, 2019

advename commented Apr 20, 2020

MartinThoma commented Jun 26, 2022

SimplyOm commented Jul 19, 2022 •

edited by MartinThoma

MartinThoma commented Jul 19, 2022

dc-em commented Oct 4, 2022

pubpub-zz commented Oct 4, 2022

pubpub-zz commented Oct 11, 2022

manathan1984 commented Oct 30, 2022

pubpub-zz commented Feb 9, 2023

DX9807 commented Feb 9, 2023 •

edited by MartinThoma

pubpub-zz commented Feb 9, 2023

DX9807 commented Feb 10, 2023 •

edited

rocketrefrigerator commented Sep 12, 2023

stefan6419846 commented Sep 12, 2023

HTML links to document page broken after merge #471

HTML links to document page broken after merge #471

Comments

sanohin commented Nov 21, 2018 • edited

jackneil commented Jun 7, 2019

advename commented Apr 20, 2020

MartinThoma commented Jun 26, 2022

SimplyOm commented Jul 19, 2022 • edited by MartinThoma

MartinThoma commented Jul 19, 2022

dc-em commented Oct 4, 2022

pubpub-zz commented Oct 4, 2022

pubpub-zz commented Oct 11, 2022

manathan1984 commented Oct 30, 2022

pubpub-zz commented Feb 9, 2023

DX9807 commented Feb 9, 2023 • edited by MartinThoma

pubpub-zz commented Feb 9, 2023

DX9807 commented Feb 10, 2023 • edited

rocketrefrigerator commented Sep 12, 2023

stefan6419846 commented Sep 12, 2023

sanohin commented Nov 21, 2018 •

edited

SimplyOm commented Jul 19, 2022 •

edited by MartinThoma

DX9807 commented Feb 9, 2023 •

edited by MartinThoma

DX9807 commented Feb 10, 2023 •

edited