Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework error reporting from worker threads so all promises resolve #654

Open
Balearica opened this issue Aug 29, 2022 · 0 comments
Open

Comments

@Balearica
Copy link
Collaborator

There are currently many issues related to how Tesseract.js handles errors within workers. For example, the issues below are all either directly about promises never resolving, or are more difficult to troubleshoot because promises never resolve (no error message in the main thread).

  1. Worker loading language traineddata progress 0 #414
  2. Stuck on "loading language traineddata" #439
  3. Promise catch not triggering #471
  4. Tesseract.recognize promise never resolves on mobile #510
  5. Firefox does not get any results.  #513
  6. When the "corePath" or the "workerPath" cannot be downloaded, there is no way to catch the error #528
  7. Node.js: Loading corrupted language trained data does not throw an error #602

While we could implement smaller bug fixes for many of these, the larger issue appears to be that error messages are only reported from the worker threads to the main thread when caught and sent using res.reject. This means that any error that does not trigger a res.reject call will not be reported.

https://github.com/naptha/tesseract.js/blob/master/src/createWorker.js#L168-L174

Reworking to use the worker.onerror event would likely be more robust, and make it impossible for error messages to slip through the cracks in the future.

Balearica added a commit that referenced this issue Nov 25, 2022
See #662 for explanation of Tesseract.js Version 4 changes.  List below is auto-generated from commits. 

* Added image preprocessing functions (rotate + save images)

* Updated createWorker to be async

* Reworked createWorker to be async and throw errors per #654

* Reworked createWorker to be async and throw errors per #654

* Edited detect to return null when detection fails rather than throwing error per #526

* Updated types per #606 and #580 (#663) (#664)

* Removed unused files

* Added savePDF option to recognize per #488; cleaned up code for linter

* Updated download-pdf example for node to use new savePDF option

* Added OutputFormats option/interface for setting output

* Allowed for Tesseract parameters to be set through recognition options per #665

* Updated docs

* Edited loadLanguage to no longer overwrite cache with data from cache per #666

* Added interface for setting 'init only' options per #613

* Wrapped caching in try block per #609

* Fixed unit tests

* Updated setImage to resolve memory leak per #678

* Added debug output option per #681

* Fixed bug with saving images per #588

* Updated examples

* Updated readme and Tesseract.js-core version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant