Remove everything except images #3425
Answered
by
JorjMcKie
nsklei
asked this question in
Looking for help
-
I am looking for a solution to remove all elements from a PDF (text, paths ...) except images. My current solution extracts all images from the document and inserts them inside a new document. |
Beta Was this translation helpful? Give feedback.
Answered by
JorjMcKie
Apr 30, 2024
Replies: 1 comment 3 replies
-
Sure there is a canonical way: Use redaction annotations!
Since the recent version, all 3 categories, images, graphics and text can be selectively kept or removed. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
That happen when the PDF was sloppily created: with text or drawings not completely inside page.rect.
For drawings, you can / should use the option that erase even when only parts are overlapping.
I believe
fitz.PDF_REDACT_LINE_ART_REMOVE_IF_TOUCHED
(=2).For stubborn text, simply increase the redaction rectangle like so
page.rect + (-20, -20, 20, 20)
which is a rect 20 points larger in every direction.