Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: How to spell check languages that do not have word breaks? #2146

Open
lysari opened this issue Apr 27, 2023 · 3 comments
Open

Q: How to spell check languages that do not have word breaks? #2146

lysari opened this issue Apr 27, 2023 · 3 comments

Comments

@lysari
Copy link

lysari commented Apr 27, 2023

I was working on Khmer dictionary. Everything works find when checks word by word. But I have a problem when I test it as a paragraph all that goes wrong. How do I fix that?
words : ព័ត៌មាន, អំពី, សមាជិក, បក្ស - working
paragraph : ព័ត៌មានអំពីសមាជិកបក្ស - not working

@lysari lysari changed the title [Problem] I have problems with unicode character [Issue] I have problems with unicode character Apr 27, 2023
@Jason3S Jason3S changed the title [Issue] I have problems with unicode character Q: How to spell check languages that do not have word breaks? Apr 28, 2023
@Jason3S
Copy link
Collaborator

Jason3S commented Apr 28, 2023

@lysari,

That is exactly the challenge with languages like Khmer and Thai. See:

It is possible to tell the spell checker to "compound" words, but that will only work with short sentences and will mark the entire sentence as wrong instead of just the incorrect word.

If this is a needed feature, consider funding its development.

@lysari
Copy link
Author

lysari commented May 4, 2023

@Jason3S Thank you for reply. If the feature is something others in the development community might be interested in, consider contributing to an existing open-source project, or working together with the community to develop the feature.

@lysari lysari closed this as completed May 4, 2023
@lysari lysari reopened this May 4, 2023
@heipiao233
Copy link

Also, Chinese, Japanese and Korean have no word breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants