Skip to content

Releases: naptha/tesseract.js

v4.1.0

03 Jun 00:57
Compare
Choose a tag to compare

What's Changed

  • Added ability to run layout analysis without recognition (#656)
  • Added support for OffscreenCanvas in browser version by @nathanbabcock (#766)
  • Fixed bug where recognize was running OCR even when not necessary (#769)
  • Fixed bug where certain valid langPath URLs caused errors in browser version (#558)
  • Removed problematic file-type and resolve-url dependencies (#773, #711)

Full Changelog: v4.0.6...v4.1.0

v4.0.6

16 May 03:23
Compare
Choose a tag to compare

What's Changed

  • Invalid langData (.traineddata files) are now cleared from cache (#753)
    • Note: setting cacheMethod: 'none' or cacheMethod: 'refresh' to prevent invalid files from being cached should no longer be necessary
  • Added source maps to esm build (#761)
  • Various updates to documentation

Full Changelog: v4.0.5...v4.0.6

v4.0.5

03 May 01:52
Compare
Choose a tag to compare

What's Changed

  • No changes to code
    • Removed unnecessary files to reduce the size of the npm package

Full Changelog: v4.0.4...v4.0.5

v4.0.4

01 May 00:57
Compare
Choose a tag to compare

What's Changed

  • Added SIMD-detection when corePath is manually specified (#735)
    • Important note for users who set corePath: for significantly faster performance, set corePath to a directory that includes both tesseract-core.wasm.js and tesseract-core-simd.wasm.js
    • See this comment for explanation
  • Improved auto-rotate feature (rotateAuto: true) (#747)
  • Switched default CDN from unpkg to jsdelivr (#743)
  • Updated various dependencies (#729, #736, #737, #739, #741)
  • Reduced size of npm package (#731, #734, #740)

New Contributors

Full Changelog: v4.0.3...v4.0.4

v4.0.3

30 Mar 02:41
Compare
Choose a tag to compare

What's Changed

  • Updated Tesseract to v5.3.0
    • This resolves bug with inverted (white on black) text recognition (#717)
  • Minor documentation fixes (#612, #614, #682, #673)
  • Better types for addJob by @nathanbabcock in #719

New Contributors

Full Changelog: v4.0.2...v4.0.3

v4.0.2

18 Dec 06:16
Compare
Choose a tag to compare

What's Changed

  • Fixed bug breaking compatibility with certain devices (#701)

Full Changelog: v4.0.1...v4.0.2

v4.0.1

10 Dec 05:17
Compare
Choose a tag to compare

What's Changed

  • Running recognize or detect with invalid image argument now throws error message (#699)
  • Fixed bug with custom langdata paths (#697)

New Contributors

Full Changelog: v4.0.0...v4.0.1

v4.0.0

25 Nov 20:24
Compare
Choose a tag to compare

Breaking Changes

  1. createWorker is now async
    1. In most code this means worker = Tesseract.createWorker() should be replaced with worker = await Tesseract.createWorker()
    2. Calling with invalid workerPath or corePath now produces error/rejected promise (#654)
  2. worker.load is no longer needed (createWorker now returns worker pre-loaded)
  3. getPDF function replaced by pdf recognize option (#488)
    1. This allows PDFs to be created when using a scheduler
    2. See browser and node examples for usage

Major New Features

  1. Processed images created by Tesseract can be retrieved using imageColor, imageGrey, and imageBinary options (#588)
    1. See image-processing.html example for usage
  2. Image rotation options rotateAuto and rotateRadians have been added, which significantly improve accuracy on certain documents
    1. See Issue #648 example of how auto-rotation improves accuracy
    2. See image-processing.html example for usage of rotateAuto option
  3. Tesseract parameters (usually set using worker.setParameters) can now be set for single jobs using worker.recognize options (#665)
    1. For example, a single job can be set to recognize only numbers using worker.recognize(image, {tessedit_char_whitelist: "0123456789"})
    2. As these settings are reverted after the job, this allows for using different parameters for specific jobs when working with schedulers
  4. Initialization parameters (e.g. load_system_dawg, load_number_dawg, and load_punc_dawg) can now be set (#613)
    1. The third argument to worker.initialize now accepts either (1) an object with key/value pairs or (2) a string containing contents to write to a config file
    2. For example, both of these lines set load_number_dawg to 0:
      1. worker.initialize('eng', "0", {load_number_dawg: "0"});
      2. worker.initialize('eng', "0", "load_number_dawg 0");

Other Changes

  1. loadLanguage now resolves without error when language is loaded but writing to cache fails
    1. This allows for running in Firefox incognito mode using default settings (#609)
  2. detect returns null values when OS detection fails rather than throwing error (#526)
  3. Memory leak causing crashes fixed (#678)
  4. Cache corruption should now be much less common (#666)

New Contributors

Full Changelog: v3.0.3...v4.0.0

v3.0.3

20 Sep 04:41
Compare
Choose a tag to compare

What's Changed

  • Invalid language data now throws error at initialize step (#602)
  • Recognition progress logging fixed (#655)
  • Minor changes to types, documentation

Full Changelog: v3.0.2...v3.0.3

v3.0.2

20 Aug 04:32
Compare
Choose a tag to compare

What's Changed

  • Updated to Tesseract.js-core v.3.0.1 (uses Tesseract v5.1.0)
  • Added SIMD-enabled build, automatic detection of supported devices
  • Fix caching of bad langData responses by @andreialecu in #585
  • Added benchmark code and assets per #628 by @Balearica in #629
  • Replaced child_process with worker_threads per #630 by @Balearica in #631
  • Updated to webpack 5 for compatibility with Node.js 18 by @Balearica in #640

New Contributors

Full Changelog: v2.1.5...v3.0.2