Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag image or attachment position in readtext #2392

Open
patrickitts opened this issue Jan 4, 2024 · 1 comment
Open

Tag image or attachment position in readtext #2392

patrickitts opened this issue Jan 4, 2024 · 1 comment
Labels
is-feature A feature request

Comments

@patrickitts
Copy link

patrickitts commented Jan 4, 2024

Explanation

To be able to reconstruct a document (like an HTML page), it would be necessary to add a tag like [tagimage]1[/tagimage] in the extracted text at the place the image was found.
In the exemaple 1 is the place of the images in page.images

Code Example

How would your feature be used? (Remove this if it is not applicable.)

from pypdf import PdfReader, PdfWriter

...  # your new feature in action!
print(page.extract_text(withTags=1))

results :

some text
[tagimage]0[/tagimage]
other text
[tagimage]1[/tagimage]

@MartinThoma MartinThoma removed their assignment Jan 4, 2024
@MartinThoma MartinThoma added the is-feature A feature request label Jan 4, 2024
@MartinThoma
Copy link
Member

What is your use-case for which you would need this?

It sounds as if you wanted to convert a PDF to a HTML. There are tools for that; have you tried them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-feature A feature request
Projects
None yet
Development

No branches or pull requests

2 participants