Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

customTextRenderer occasionally inserts HTML content into the <br> tags generated by PDF.js #1173

Closed
4 tasks done
etripier opened this issue Nov 16, 2022 · 5 comments
Closed
4 tasks done
Labels
bug Something isn't working

Comments

@etripier
Copy link

Before you start - checklist

  • I followed instructions in documentation written for my React-PDF version
  • I have checked if this bug is not already reported
  • I have checked if an issue is not listed in Known issues
  • If I have a problem with PDF rendering, I checked if my PDF renders properly in PDF.js demo

Description

PDF.js inserts a DOM node containing the associated text content if present:

https://github.com/mozilla/pdf.js/blob/c7d6ab2f7123c5a65155c55aa19d9d9abd8c2ff2/src/display/text_layer.js#L368

It then inserts a <br> element if hasEOL is also true for the same associated text content:

https://github.com/mozilla/pdf.js/blob/c7d6ab2f7123c5a65155c55aa19d9d9abd8c2ff2/src/display/text_layer.js#L371

Because the behavior of react-pdf is to iterate through each element of text content and then assume that the DOM node found by index is the only associated DOM node, the count drifts every time an element of text content with length and hasEOL is hit.

I implemented a disgusting hack that looks like

const textContentItems = [...textContent.items].flatMap((textContentItem) =>
  textContentItem.hasEOL && textContentItem.str
    ? [textContentItem, null]
    : [textContentItem]
);

textContentItems.forEach(function (item, itemIndex) {
  if (!item) {
    return;
  }
  ...

Lmk what you think! Love the ease-of-use of this component.

Steps to reproduce

All the PDF examples I have are confidential but I can try to create one if requested.

  1. Load PDF containing multi-line statements such that PDF.js parses one line with str present and hasEOL: true.
  2. customTextRenderer={({ str }) => str}
  3. Select all text on page, doesn't match.

Expected behavior

Text selection should match.

Actual behavior

Text selection doesn't match.

Additional information

No response

Environment

  • Browser (if applicable): Chrome
  • React-PDF version: 6.1.0
  • React version: 16.8
  • Webpack version (if applicable):
@etripier etripier added the bug Something isn't working label Nov 16, 2022
@EricLiu0614
Copy link

Hi @etripier - Thank you for raising it, I just experienced the same issue. Do you know if there is a workaround to get through it until @wojtekmaj get a chance to take a look ;)

@wojtekmaj
Copy link
Owner

wojtekmaj commented Nov 17, 2022

I don't get it. The length of textContent.items matches the number of rendered children on text layer. If it has hasEOL false, it's going to be a span, if true, then br. I double checked that and added more unit tests in 7c1c925 to ensure that and I'm still unable to reproduce. Perhaps the sample PDF we have doesn't have this issue?

@EricLiu0614
Copy link

There is a similar issue reported here #1042

I am experiencing the same issue using this pdf file - https://www.saudiembassy.net/sites/default/files/ContractPublicWorks05.pdf

And I am running a simple demo based on this code base but upgrade the react-pdf to 6.1.0 and applied the customTextRenderer like below

function Row({ index, style }) {
  function onPageRenderSuccess(page) {
    console.log(`Page ${page.pageNumber} rendered`);
  }

  return (
    <div style={style}>
      <Page onRenderSuccess={onPageRenderSuccess} pageIndex={index} customTextRenderer={({ str }) => str}/>
    </div>
  );
}

pdf-1

@etripier
Copy link
Author

@wojtekmaj The issue is specifically that some tokens containing text and a line break will render both a <span> and a <br>, meaning that the number of rendered elements no longer matches 1-1 with the input. You can see the conditions that lead to this result here.

@wojtekmaj
Copy link
Owner

Thanks guys for all the info - this really helped me out! v6.1.1 released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants