You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I am using the emplace function to swap pages but I would like to preserve references. I then perform some string-match on the output pdf and would like to extract the matched text bounding box coordinates.
I use the following for emplacing.
pdf=Pdf.open('../tests/resources/fourpages.pdf')
congress=Pdf.open('../tests/resources/congress.pdf')
pdf.pages.append(congress.pages[0]) # Transfer page to new pdfpdf.pages[2].emplace(pdf.pages[-1])
delpdf.pages[-1] # Remove donor pagepdf.pages[2].objgenpdf.save()
I then use pdfplumber to read the saved pdf, and find the matching words and its bounding box coordinates are way off for the emplaced pages. I have to repair the pdf with ghostscript to correct this issue.
pike = pdfplumber.open('path') pike.pages[x].search("value",regex = False,case= False,return_chars=False) #where x is the emplaced pdf page number
So, since the emplace function is causing this downstream error, should I be retaining any additional elements with the retain argument? Name.Parent,Name.Contents, Name.CropBox, Name.MediaBox, Name.Resources, Name.Rotate, Name.Type
If I simply copy the pages over one another, this error does not happen. So something within emplace causes this error.
I did the following
and results seem to be fine to me, although it looks as if plumber returns position relative to the top left corner rather than bottom left as is conventional for PDF. So it seems fine to me, although perhaps your example is different from your actual code.
OCRmyPDF uses emplace as the primary means of adding OCR text to PDFs, i.e. if it were broken somehow, OCRmyPDF would be failing in most cases too.
If either PDF has structural markup, they won't be preserved by the emplace function, and migrating them unfortunately is quite complicated. QPDF doesn't do that yet but the author intends to implement it, so it will have to wait for that.
Hi! I am using the emplace function to swap pages but I would like to preserve references. I then perform some string-match on the output pdf and would like to extract the matched text bounding box coordinates.
I use the following for emplacing.
I then use pdfplumber to read the saved pdf, and find the matching words and its bounding box coordinates are way off for the emplaced pages. I have to repair the pdf with ghostscript to correct this issue.
pike = pdfplumber.open('path')
pike.pages[x].search("value",regex = False,case= False,return_chars=False) #where x is the emplaced pdf page number
So, since the emplace function is causing this downstream error, should I be retaining any additional elements with the retain argument?
Name.Parent,Name.Contents, Name.CropBox, Name.MediaBox, Name.Resources, Name.Rotate, Name.Type
If I simply copy the pages over one another, this error does not happen. So something within emplace causes this error.
Any help is appreciated @jbarlow83
The text was updated successfully, but these errors were encountered: