Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: cannot search anything by i18n zh.yaml #465

Open
hitzhangjie opened this issue Jul 19, 2022 · 2 comments
Open

Question: cannot search anything by i18n zh.yaml #465

hitzhangjie opened this issue Jul 19, 2022 · 2 comments

Comments

@hitzhangjie
Copy link

Could you help sovling the searching problem?
I write a ebook here: https://hitzhangjie.pro/go-internals/, it uses hugo-book theme. I don't know why the searching not working.
I use Chrome to debug the problem, and I see the document is indexed.

Maybe the problem is relevant with i18n zh booksearchConfig, I change the tokenize function:
from str.replace(/[\x00-\x7F]/g, '').split('');
to str.split(/\W+/).concat(str.replace(/[\x00-\x7F]/g, '').split('')).filter(e => !!e);

And it worked.

@alex-shpak
Copy link
Owner

Hi!
Nice to see theme used :) that's a lot of content.
It is possible that search config tokenization needs update, as I don't have any idea how to search in chineese properly, and relied on google-help.

Although, when I naively put chineese 'Lorem ipsum' to page and search fractions of it, it works for me in zh locale.
Can you send what content you have and what are you trying to search as example?

@hitzhangjie
Copy link
Author

hitzhangjie commented Jul 20, 2022

For example, I want to search both Chinese and English words, like '码农' or 'AST' which appears in markdown file.

Let me show an example here to reproduce the problem.

Case: tokenize function uses 'str.replace(/[\x00-\x7F]/g, '').split('');'

Let's search '码农', then we see the search result seems OK, it returns a document:
image

Then I search 'AST', then we see the search result is empty, it should returns 2 documents:
image

Case: tokenize function uses 'str.split(/\W+/).concat(str.replace(/[\x00-\x7F]/g, '').split('')).filter(e => !!e);'

Let's search both words again, it works.

search '码农':
image

search 'ast':
image


ps: Actually I don't know the internals about tokenize function, I came across this problem before and wrote down the right tokenizer settings from Google. Hope this could help 'hugo-book' theme.

Here is the ebook address, you may want to test it here: https://hitzhangjie.pro/go-internals/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants