Tag image or attachment position in readtext #2392

patrickitts · 2024-01-04T14:20:26Z

Explanation

To be able to reconstruct a document (like an HTML page), it would be necessary to add a tag like [tagimage]1[/tagimage] in the extracted text at the place the image was found.
In the exemaple 1 is the place of the images in page.images

Code Example

How would your feature be used? (Remove this if it is not applicable.)

from pypdf import PdfReader, PdfWriter

...  # your new feature in action!
print(page.extract_text(withTags=1))

results :

some text
[tagimage]0[/tagimage]
other text
[tagimage]1[/tagimage]

MartinThoma · 2024-01-04T15:58:24Z

What is your use-case for which you would need this?

It sounds as if you wanted to convert a PDF to a HTML. There are tools for that; have you tried them?

patrickitts assigned MartinThoma Jan 4, 2024

MartinThoma removed their assignment Jan 4, 2024

MartinThoma added the is-feature A feature request label Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tag image or attachment position in readtext #2392

Tag image or attachment position in readtext #2392

patrickitts commented Jan 4, 2024 •

edited

MartinThoma commented Jan 4, 2024

Tag image or attachment position in readtext #2392

Tag image or attachment position in readtext #2392

Comments

patrickitts commented Jan 4, 2024 • edited

Explanation

Code Example

MartinThoma commented Jan 4, 2024

patrickitts commented Jan 4, 2024 •

edited