Reusing PdfMerger after write generates PDF with extra pages #1337

bashirmindee · 2022-09-09T14:10:36Z

I was trying to merge the same PDF with itself multiple number of times. 1st I want to have the original PDF, then I want to have the PDF duplicated, then I want to duplicate it three times, and so forth.

Environment

ubuntu 20.04
Python==3.8.12+
Package Version

pip==21.1.1
PyPDF2==2.10.5
setuptools==56.0.0
typing-extensions==4.3.0

$ python -m platform
Linux-5.11.0-40-generic-x86_64-with-glibc2.29
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.10.5

Code + PDF

This is a minimal, complete example that shows the issue:

##script.py

from PyPDF2 import PdfReader, PdfMerger

merger = PdfMerger()
reader = PdfReader("blank.pdf")

for j in range(9):
    merger.append(reader)
    merger.write(f"generated_pdfs/{len(merger.pages)}.pdf")

Here is the blank.pdf that causes the issue.

Expected behavior

1.pdf: must contain 1 page but contains 1 page ✅
2.pdf: must contain 2 page but contains 2 page ❌
3.pdf: must contain 3 page but contains 3 page ❌
4.pdf: must contain 4 page but contains 4 page ❌
5.pdf: must contain 5 page but contains 5 page ❌
6.pdf: must contain 6 page but contains 6 page ❌

MartinThoma · 2022-09-24T13:09:23Z

Interesting. For PyPDF2==2.10.9

1.pdf contains 1 page
2.pdf contains 3 pages (+2)
3.pdf contains 6 pages (+3)
4.pdf contains 10 pages (+4)
5.pdf contains 15 pages (+5)
...

I'm not sure why ...

pubpub-zz · 2022-09-27T21:39:53Z

@bashirmindee
a change part of PR #1371 in f9d7d19 should fix it. The other commits should not be required.
the change is small you should be able to copy it, if you want to try

The method `.clone(pdf_dest,[force_duplicate])` clones the objects and all referenced objects. If an object is already cloned, the already cloned object is returned (unless force_duplicate is set) mainly for internal use but can be used on a page for pageObject/DictionnaryObject/[Encoded/Decoded/Content]Stream an extra parameter ignore_fields list that provide the list of fields that should not be cloned. When available, the pointer to an object is available in `indirect_obj` attribute. New API for add_page/insert_page that : * returns the cloned page object * ignore_fields can be provided as a parameter. ## Others * file is closed at the end of PdfWriter.write when a filename is provided * Breaking Change: `add_outline_item` now has a parameter before which is not the last parameter ## Update * The public API of PdfMerger has been added to PdfWriter (ready to make PdfMerger an alias of it) * Process properly Outline merging * Process properly Named destinated Deals with #1194, #1322, #471, #1337

pubpub-zz · 2023-01-31T21:11:19Z

@bashirmindee
with the lastest version of pypdf

##script.py

from PyPDF2 import PdfReader, PdfWriter

writer = PdfWriter()
reader = PdfReader("blank.pdf")

for j in range(9):
    writer.append(reader)    
    writer.write(f"generated_pdfs/{len(writer.pages)}.pdf")
    writer.reset_translation(reader)  # to append independent pages

pubpub-zz · 2023-02-05T21:45:49Z

I close this as solved

MartinThoma added the workflow-merge From a users perspective, merging is the affected feature/workflow label Sep 24, 2022

MartinThoma added help wanted We appreciate help everywhere - this one might be an easy start! is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF labels Sep 24, 2022

MartinThoma changed the title ~~Reusing PdfFileMerger after write generates PDF with extra pages~~ Reusing PdfMerger after write generates PDF with extra pages Sep 24, 2022

pubpub-zz mentioned this issue Oct 11, 2022

ENH: Add Cloning #1371

Merged

pubpub-zz closed this as completed Feb 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reusing PdfMerger after write generates PDF with extra pages #1337

Reusing PdfMerger after write generates PDF with extra pages #1337

bashirmindee commented Sep 9, 2022 •

edited by MartinThoma

MartinThoma commented Sep 24, 2022

pubpub-zz commented Sep 27, 2022

pubpub-zz commented Jan 31, 2023

pubpub-zz commented Feb 5, 2023

Reusing PdfMerger after write generates PDF with extra pages #1337

Reusing PdfMerger after write generates PDF with extra pages #1337

Comments

bashirmindee commented Sep 9, 2022 • edited by MartinThoma

Environment

Code + PDF

Expected behavior

MartinThoma commented Sep 24, 2022

pubpub-zz commented Sep 27, 2022

pubpub-zz commented Jan 31, 2023

pubpub-zz commented Feb 5, 2023

bashirmindee commented Sep 9, 2022 •

edited by MartinThoma