Skip to content

Commit

Permalink
Reworked createWorker to be async and throw errors per #654
Browse files Browse the repository at this point in the history
  • Loading branch information
Balearica committed Sep 17, 2022
1 parent b87afe9 commit ca99c35
Show file tree
Hide file tree
Showing 20 changed files with 1,374 additions and 1,722 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,11 @@ Or more imperative
```javascript
import { createWorker } from 'tesseract.js';

const worker = createWorker({
const worker = await createWorker({
logger: m => console.log(m)
});

(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
Expand Down
32 changes: 5 additions & 27 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# API

- [createWorker()](#create-worker)
- [Worker.load](#worker-load)
- [Worker.writeText](#worker-writeText)
- [Worker.readText](#worker-readText)
- [Worker.removeFile](#worker-removeFile)
Expand Down Expand Up @@ -53,7 +52,7 @@ createWorker is a factory function that creates a tesseract worker, a worker is

```javascript
const { createWorker } = Tesseract;
const worker = createWorker({
const worker = await createWorker({
langPath: '...',
logger: m => console.log(m),
});
Expand All @@ -63,7 +62,6 @@ const worker = createWorker({

A Worker helps you to do the OCR related tasks, it takes few steps to setup Worker before it is fully functional. The full flow is:

- load
- FS functions // optional
- loadLanguauge
- initialize
Expand All @@ -82,23 +80,6 @@ Each function is async, so using async/await or Promise is required. When it is

jobId is generated by Tesseract.js, but you can put your own when calling any of the function above.

<a name="worker-load"></a>
### Worker.load(jobId): Promise

Worker.load() loads tesseract.js-core scripts (download from remote if not presented), it makes Web Worker/Child Process ready for next action.

**Arguments:**

- `jobId` Please see details above

**Examples:**

```javascript
(async () => {
await worker.load();
})();
```

<a name="worker-writeText"></a>
### Worker.writeText(path, text, jobId): Promise

Expand Down Expand Up @@ -273,8 +254,7 @@ Figures out what words are in `image`, where the words are in `image`, etc.
```javascript
const { createWorker } = Tesseract;
(async () => {
const worker = createWorker();
await worker.load();
const worker = await createWorker();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize(image);
Expand All @@ -287,8 +267,7 @@ With rectangle
```javascript
const { createWorker } = Tesseract;
(async () => {
const worker = createWorker();
await worker.load();
const worker = await createWorker();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize(image, {
Expand All @@ -313,8 +292,7 @@ Worker.detect() does OSD (Orientation and Script Detection) to the image instead
```javascript
const { createWorker } = Tesseract;
(async () => {
const worker = createWorker();
await worker.load();
const worker = await createWorker();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data } = await worker.detect(image);
Expand Down Expand Up @@ -361,7 +339,7 @@ Scheduler.addWorker() adds a worker into the worker pool inside scheduler, it is
```javascript
const { createWorker, createScheduler } = Tesseract;
const scheduler = createScheduler();
const worker = createWorker();
const worker = await createWorker();
scheduler.addWorker(worker);
```

Expand Down
33 changes: 11 additions & 22 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,9 @@ You can also check [examples](../examples) folder.
```javascript
const { createWorker } = require('tesseract.js');

const worker = createWorker();
const worker = await createWorker();

(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
Expand All @@ -24,12 +23,11 @@ const worker = createWorker();
```javascript
const { createWorker } = require('tesseract.js');

const worker = createWorker({
const worker = await createWorker({
logger: m => console.log(m), // Add logger here
});

(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
Expand All @@ -43,10 +41,9 @@ const worker = createWorker({
```javascript
const { createWorker } = require('tesseract.js');

const worker = createWorker();
const worker = await createWorker();

(async () => {
await worker.load();
await worker.loadLanguage('eng+chi_tra');
await worker.initialize('eng+chi_tra');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
Expand All @@ -59,10 +56,9 @@ const worker = createWorker();
```javascript
const { createWorker } = require('tesseract.js');

const worker = createWorker();
const worker = await createWorker();

(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
await worker.setParameters({
Expand All @@ -81,10 +77,9 @@ Check here for more details of pageseg mode: https://github.com/tesseract-ocr/te
```javascript
const { createWorker, PSM } = require('tesseract.js');

const worker = createWorker();
const worker = await createWorker();

(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
await worker.setParameters({
Expand All @@ -110,11 +105,10 @@ Node: [download-pdf.js](../examples/node/download-pdf.js)
```javascript
const { createWorker } = require('tesseract.js');

const worker = createWorker();
const worker = await createWorker();
const rectangle = { left: 0, top: 0, width: 500, height: 250 };

(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png', { rectangle });
Expand All @@ -128,7 +122,7 @@ const rectangle = { left: 0, top: 0, width: 500, height: 250 };
```javascript
const { createWorker } = require('tesseract.js');

const worker = createWorker();
const worker = await createWorker();
const rectangles = [
{
left: 0,
Expand All @@ -145,7 +139,6 @@ const rectangles = [
];

(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const values = [];
Expand All @@ -164,8 +157,8 @@ const rectangles = [
const { createWorker, createScheduler } = require('tesseract.js');

const scheduler = createScheduler();
const worker1 = createWorker();
const worker2 = createWorker();
const worker1 = await createWorker();
const worker2 = await createWorker();
const rectangles = [
{
left: 0,
Expand All @@ -182,8 +175,6 @@ const rectangles = [
];

(async () => {
await worker1.load();
await worker2.load();
await worker1.loadLanguage('eng');
await worker2.loadLanguage('eng');
await worker1.initialize('eng');
Expand All @@ -204,12 +195,10 @@ const rectangles = [
const { createWorker, createScheduler } = require('tesseract.js');

const scheduler = createScheduler();
const worker1 = createWorker();
const worker2 = createWorker();
const worker1 = await createWorker();
const worker2 = await createWorker();

(async () => {
await worker1.load();
await worker2.load();
await worker1.loadLanguage('eng');
await worker2.loadLanguage('eng');
await worker1.initialize('eng');
Expand Down
3 changes: 1 addition & 2 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,11 @@ Starting from 2.0.0-beta.1, you can get all these information in the final resul

```javascript
import { createWorker } from 'tesseract.js';
const worker = createWorker({
const worker = await createWorker({
logger: m => console.log(m)
});

(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
await worker.setParameters({
Expand Down
2 changes: 1 addition & 1 deletion docs/local-installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Tesseract.recognize(image, langs, {
Or

```javascript
const worker = createWorker({
const worker = await createWorker({
workerPath: 'https://unpkg.com/tesseract.js@v2.0.0/dist/worker.min.js',
langPath: 'https://tessdata.projectnaptha.com/4.0.0',
corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0/tesseract-core.wasm.js',
Expand Down
18 changes: 8 additions & 10 deletions examples/browser/basic-edge.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
<body>
<input type="file" id="uploader">
<script>
const recognize = function(evt){
const recognize = async function(evt){
const files = evt.target.files;
const worker = Tesseract.createWorker({
const worker = await Tesseract.createWorker({
/*
* As Edge don't support webassembly,
* here we force to use asm.js version.
Expand All @@ -21,14 +21,12 @@
*/
cacheMethod: 'none',
});
Promise.resolve()
.then(() => worker.load())
.then(() => worker.loadLanguage('eng'))
.then(() => worker.initialize('eng'))
.then(() => worker.recognize(files[0]))
.then((ret) => {
console.log(ret.data.text);
});

await worker.loadLanguage('eng');
await worker.initialize('eng');
const ret = await worker.recognize(files[0]);
console.log(ret.data.text);

}
const elm = document.getElementById('uploader');
elm.addEventListener('change', recognize);
Expand Down
3 changes: 1 addition & 2 deletions examples/browser/benchmark.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@

<script>
const { createWorker } = Tesseract;
const worker = createWorker();
const worker = await createWorker();
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');

Expand Down
3 changes: 1 addition & 2 deletions examples/browser/download-pdf.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,13 @@
<textarea id="board" readonly rows="8" cols="80">Upload an image file</textarea>
<script>
const { createWorker } = Tesseract;
const worker = createWorker({
const worker = await createWorker({
corePath: '/node_modules/tesseract.js-core/tesseract-core.wasm.js',
logger: m => console.log(m),
});
const uploader = document.getElementById('uploader');
const dlBtn = document.getElementById('download-pdf');
const recognize = async ({ target: { files } }) => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize(files[0]);
Expand Down
5 changes: 2 additions & 3 deletions examples/browser/image-processing.html
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,10 @@
<script>
const recognize = async ({ target: { files } }) => {
document.getElementById("imgInput").src = URL.createObjectURL(files[0]);
const worker = Tesseract.createWorker({
corePath: '/tesseract-core-simd.wasm.js',
const worker = await Tesseract.createWorker({
// corePath: '/tesseract-core-simd.wasm.js',
workerPath: "/dist/worker.dev.js"
});
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');

Expand Down

0 comments on commit ca99c35

Please sign in to comment.