Skip to content
Quan Nguyen edited this page Mar 2, 2016 · 6 revisions

Tess4J is being developed and tested on Windows and Linux.

Instructions

Tesseract, Leptonica, and Ghostscript 32- and 64-bit DLLs, language data for English, and sample images are bundled with the program. Language data packs for Tesseract should be downloaded and placed into the tessdata folder.

Tesseract and Leptonica DLLs were built with VS2012/VS2013 and therefore depend on the Visual C++ Redistributable for VS2012 or Visual C++ Redistributable for VS2013.

The Linux shared object library (libtesseract.so) equivalent to the DLL is available in Tesseract 3.04, which can be built from the source with the instructions given in Tesseract Wiki.

Tess4J can be built and unit tested using Apache Ant and JUnit. Unzip the source and execute at the command line:

ant test

Notes: On platforms that do not have UTF-8 as their default charset, the output text may have character encoding issues. You may need to set the default character encoding for your program that calls Tess4J by supplying the JVM with the command-line option -Dfile.encoding=UTF8 or setting the environment variable JAVA_TOOL_OPTIONS to -Dfile.encoding=UTF8 for version 1.0. This is no longer needed since version 1.1.

Support for PDF documents is available through GPL Ghostscript, which should be installed and included in system path.

Images intended for OCR should have at least 200 DPI in resolution, typically 300 DPI, 1 bpp (bit per pixel) monochome or 8 bpp grayscale uncompressed TIFF or PNG format. PNG is usually smaller in size than other image formats and still keeps high quality due to its employing lossless data compression algorithms; TIFF has the advantage of the ability to contain multiple images (pages) in a file.

Log4J

As mentioned in #13 some warnings can show up when using Tess4J. Depending on which project you have you may control the log messages by adding an log4j.properties file to it. Here an log4j.properties example: tess4j/src/test/resources/log4j.properties

References