Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Canvas Support for Node #649

Open
GICodeWarrior opened this issue Aug 26, 2022 · 4 comments
Open

Implement Canvas Support for Node #649

GICodeWarrior opened this issue Aug 26, 2022 · 4 comments

Comments

@GICodeWarrior
Copy link

Is your feature request related to a problem? Please describe.
I'm using some code both in the browser and in Node.js (using node-canvas), and I get an error when I try to pass a canvas to recognize in node.

Describe the solution you'd like
Please consider implementing support for canvas in node. The node-canvas implementation largely matches the browser API.
https://github.com/Automattic/node-canvas

Describe alternatives you've considered
It would be helpful to return a clearer error message when an unsupported type is provided to recognize.

In the meantime, I can write some code to detect running in node and convert the canvas before recognition.

Additional context
The errors I get when trying to use canvas with tesseract.js in node are below:

Error in pixReadMem: size < 12
node:internal/event_target:969
  process.nextTick(() => { throw err; });
                           ^

Error: RuntimeError: null function or function signature mismatch
    at Worker.<anonymous> (<redacted>/node_modules/tesseract.js/src/createWorker.js:173:15)
    at Worker.emit (node:events:513:28)
    at MessagePort.<anonymous> (node:internal/worker:233:53)
    at MessagePort.[nodejs.internal.kHybridDispatch] (node:internal/event_target:694:20)
    at MessagePort.exports.emitMessage (node:internal/per_context/messageport:23:28)
@Balearica
Copy link
Collaborator

We do not actually have any meaningful canvas integration for browser (i.e. we don't read the canvas data directly). Rather, we simply call the canvas.toBlob method as a convenience and then read it like any other blob/file. If the node implementation of canvas you are using has the same methods, you should simply be able to replace canvas with canvas.toBlob() or canvas.toDataURL() when passing to tesseract.js.

image.toBlob(async (blob) => {

@GICodeWarrior
Copy link
Author

Thanks for that context.

I guess the ask then is to support an ImageData instance or equivalent.
https://developer.mozilla.org/en-US/docs/Web/API/ImageData

HTMLCanvasElement.toBlob() converts the canvas contents to a PNG. Later, you must decode the PNG back to pixel data for recognition. If you accept ImageData directly, that removes the conversion into PNG and back again.

@Balearica
Copy link
Collaborator

Balearica commented Nov 26, 2022

I agree that supporting raw ImageData directly would theoretically be better. Unfortunately, I've run into issues implementing something along these lines in the past, so somebody else would need to contribute this feature.

A note for any future developers: versions 2 and 3 both had a major memory leak due to (presumably incorrectly) passing arrays directly from JavaScript to Webassembly (#678). I was not able to figure out why in a reasonable amount of time, so instead switched to writing images to the (virtual) Webassembly filesystem and having Tesseract read them from (virtual) disk in v4. Any future version that passes image data directly would need to figure out how to pass data directly and not cause memory leaks.

I would also advise benchmarking to see how much time writing/reading the .png files actually takes beforehand--I believe no actual compression/decompression occurs, so would not be surprised if the inefficiency caused by this encoding/decoding is marginal.

@nathanbabcock
Copy link
Contributor

I've started a PR #726 which is related to this. It doesn't directly address Node.js support, but it improves performance by avoiding PNG encoding as @GICodeWarrior suggested.

It still uses canvas.toBlob, so it shouldn't run into the memory leak issue that @Balearica mentioned. And thank you for the background info on that, it helped me explore the potential options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants