Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract's WASM file too big to process #4748

Open
wrbl606 opened this issue Mar 12, 2023 · 2 comments · May be fixed by #4749
Open

Tesseract's WASM file too big to process #4748

wrbl606 opened this issue Mar 12, 2023 · 2 comments · May be fixed by #4749

Comments

@wrbl606
Copy link

wrbl606 commented Mar 12, 2023

Describe the problem and steps to reproduce it:

Hello, I'm writing the what-to-click extension. I've added the tesseract.js library locally for OCR functionality, which works fine, but it caused Firefox Addons linter to fail.

What happened?

Tesseract uses WebAssembly to speed up the process of analysing images. This comes with a file size overhead, which is so great, that I can no longer upload my extension to the Developer Hub:

image

image

The linter suggestion is valid, I would very much like to split the file into smaller ones (the enourmous filesize comes from a blob included in it), but I don't see a way of doing this because of the way the file is handled -- it's automatically loaded by tesseract, not the extension code, so import/export directives doesn't work and I also doesn't have the browser object available. The blob is also critical to be included in the file as because of this issue, importScripts is not available.

What did you expect to happen?

I expected to be able to submit the next version of my extension as a Firefox addon.

Anything else we should know?

The simplest solution would be to bump the singular file size limit to 5MB, as the problematic file is 4.8MB big, and such limit bump shouldn't cause overload on linter servers. However, if you see any option to reduce the filesize by any means I'm certinly open to it.

┆Issue is synchronized with this Jira Task

@lmk123
Copy link

lmk123 commented Apr 10, 2023

I'm having the same problem.

I have developed an extension to recognize text in images, which also uses tesseract.js. My extension uploads to the chrome extension webstore fine, but I have this problem when uploading in firefox add-ons.

@lmk123
Copy link

lmk123 commented Apr 13, 2023

I found a workaround that can solve this problem, hope it can help you.

naptha/tesseract.js#732 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants