Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing "languages" attributes on default export #887

Open
MasGaNo opened this issue Feb 11, 2024 · 1 comment
Open

Missing "languages" attributes on default export #887

MasGaNo opened this issue Feb 11, 2024 · 1 comment
Milestone

Comments

@MasGaNo
Copy link

MasGaNo commented Feb 11, 2024

Tesseract.js version 5.0.4

Describe the bug
The languages constant object is missing from definition despite being exported in the index.js

To Reproduce
Steps to reproduce the behavior:

  1. Install tesseract.js
  2. Try to import import { languages } from 'tesseract.js';
  3. See error

Please attach any input image required to replicate this behavior.
image

Expected behavior
The expected behavior is to have access to languages in TypeScript codebase and avoid these kind of issue
Also, it will help to be more TypeSafe and to create some validators rules with Zod/Yup/Joi/... by passing this object directly as source of truth.
image

Device Version:

  • OS + Version: Windows 11
  • Browser Chrome

Additional context
My current workaround to fix this issue is to create a tesseract.d.ts file in my project and add this block:

export * from 'tesseract.js';

declare module "tesseract.js" {
  export const languages: Record<'AFR' | 'AMH' | 'ARA' | 'ASM' | 'AZE' | 'AZE_CYRL' | 'BEL' | 'BEN' | 'BOD' | 'BOS' | 'BUL' | 'CAT' | 'CEB' | 'CES' | 'CHI_SIM' | 'CHI_TRA' | 'CHR' | 'CYM' | 'DAN' | 'DEU' | 'DZO' | 'ELL' | 'ENG' | 'ENM' | 'EPO' | 'EST' | 'EUS' | 'FAS' | 'FIN' | 'FRA' | 'FRK' | 'FRM' | 'GLE' | 'GLG' | 'GRC' | 'GUJ' | 'HAT' | 'HEB' | 'HIN' | 'HRV' | 'HUN' | 'IKU' | 'IND' | 'ISL' | 'ITA' | 'ITA_OLD' | 'JAV' | 'JPN' | 'KAN' | 'KAT' | 'KAT_OLD' | 'KAZ' | 'KHM' | 'KIR' | 'KOR' | 'KUR' | 'LAO' | 'LAT' | 'LAV' | 'LIT' | 'MAL' | 'MAR' | 'MKD' | 'MLT' | 'MSA' | 'MYA' | 'NEP' | 'NLD' | 'NOR' | 'ORI' | 'PAN' | 'POL' | 'POR' | 'PUS' | 'RON' | 'RUS' | 'SAN' | 'SIN' | 'SLK' | 'SLV' | 'SPA' | 'SPA_OLD' | 'SQI' | 'SRP' | 'SRP_LATN' | 'SWA' | 'SWE' | 'SYR' | 'TAM' | 'TEL' | 'TGK' | 'TGL' | 'THA' | 'TIR' | 'TUR' | 'UIG' | 'UKR' | 'URD' | 'UZB' | 'UZB_CYRL' | 'VIE' | 'YID', string>;
}

But it would be better to generate the definition directly from project and by importing the JSDoc on the languages constants

Thank you.

@Balearica Balearica added this to the v6.0 milestone Feb 13, 2024
@Balearica
Copy link
Collaborator

I agree this is a good suggestion, and would reduce errors like the one you linked. However, I believe this is a breaking change so the soonest this could be implemented is Tesseract.js v6.0.

Making this change would break code for (1) TypeScript users specifying a custom language and (2) TypeScript users specifying multiple languages by concatenating them with + (e.g. eng+chi_sim). I do not believe this prevents us from ever making this change, as users with multiple languages can switch to specifying them with arrays (e.g. ['eng', 'chi_sim']) and users with custom languages (if any exist) can add a ts-ignore comment. However, this does mean such a change would need to wait for the next major release.

I will update the documentation to remove anything referencing the concatenation method for specifying multiple languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants