Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top property in rectangles #485

Open
MiguelBragaGarcia opened this issue Sep 24, 2020 · 5 comments
Open

Top property in rectangles #485

MiguelBragaGarcia opened this issue Sep 24, 2020 · 5 comments
Labels
Dependency Bug Valid bug where fixing is outside the scope of this repo

Comments

@MiguelBragaGarcia
Copy link

Worker does not work properly when the top property is filled.
I'm trying to extract the text vertically (top and bottom) instead of cutting it in the traditional way (left and right). With the exception of the last worker, everyone else cannot convert the image to text.

To Reproduce -RUN THE CODE BELOW

const { createWorker, createScheduler } = require('tesseract.js');
const path = require('path');


const scheduler = createScheduler();
const worker1 = createWorker();
const worker2 = createWorker();

const rectangles = [
  {
    left: 0,
    top: 0,
    width: 1486,
    height: 334,
  },
  {
    left: 0,
    top: 334,
    width: 1486,
    height: 334,
  },
];

(async () => {

  await worker1.load();
  await worker1.loadLanguage('eng');
  await worker1.initialize('eng');

  await worker2.load();
  await worker2.loadLanguage('eng');
  await worker2.initialize('eng');

  scheduler.addWorker(worker1);
  scheduler.addWorker(worker2);


  const results = await Promise.all(rectangles.map((rectangle) => (
    scheduler.addJob('recognize', 'https://tesseract.projectnaptha.com/img/eng_bw.png', { rectangle })
  )));
  console.log(results.map(r => r.data.text));
})();

Expected behavior
An array containing the two halves of the extracted text was expected, but only the last half was extracted.
And this is not just a problem with 2 workers I tested with 4 to try to speed up the process and 3 of the 4 workers did not work. Only the latter worked properly.

Screenshots
In this image I removed the empty characters ("") and the line break commands (\ n), to improve visualization.
bug

Desktop (please complete the following information):

  • OS: Linux Ubuntu 18.04.5 LTS
@MiguelBragaGarcia MiguelBragaGarcia changed the title Top properties on rectangles Top propertie on rectangles Sep 24, 2020
@MiguelBragaGarcia MiguelBragaGarcia changed the title Top propertie on rectangles Top property in rectangles Sep 24, 2020
@profabioalvespinto
Copy link

profabioalvespinto commented Oct 6, 2020

Hi,

I'm having some issues with the position of the rectangles, so I don't know if it is related.

What I did with was:
1st - Loop through an array of rectangles that I want to OCR and passing each of them to the function bellow;
2nd - A function that creates an canvas element that receives the rectangle from the original canvas and calls the worker passing the new canvas element;
3rd - The worker function more or less like you have.

Cheers

@cxcorp
Copy link

cxcorp commented Dec 30, 2020

I also had some weirdness with the rectangle option and also went with just slicing the image myself with ctx.getImageData() and passing that slice to Tesseract.

@squalvj
Copy link

squalvj commented Feb 4, 2021

which version you are using ?

@MiguelBragaGarcia
Copy link
Author

which version you are using ?

"tesseract.js": "^2.1.3"

@Balearica
Copy link
Collaborator

Interesting. As this is merely an argument we pass to Tesseract (nothing in this codebase crops the image) it seems likely that this is an issue with Tesseract. Looking at the issues over there, there are indeed people who report this feature is broken.

tesseract-ocr/tesseract#845

@Balearica Balearica added the Dependency Bug Valid bug where fixing is outside the scope of this repo label Sep 17, 2022
@naptha naptha deleted a comment from Kumar6174 Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependency Bug Valid bug where fixing is outside the scope of this repo
Projects
None yet
Development

No branches or pull requests

5 participants