Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mkdocs search plugin supports zh_CN #2509

Closed
dpy013 opened this issue Jul 18, 2021 · 5 comments · Fixed by #2609
Closed

mkdocs search plugin supports zh_CN #2509

dpy013 opened this issue Jul 18, 2021 · 5 comments · Fixed by #2609

Comments

@dpy013
Copy link
Contributor

dpy013 commented Jul 18, 2021

Follow up #2497

The following error currently occurs when using mkdocs to generate content:

(-python) E:\Source\python>mkdocs serve
INFO     -  Building documentation...
Traceback (most recent call last):
  File "d:\program files\python3\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\program files\python3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "E:\Source\python\Scripts\mkdocs.exe\__main__.py", line 7, in <module>
  File "e:\source\python\lib\site-packages\click\core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "e:\source\python\lib\site-packages\click\core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "e:\source\python\lib\site-packages\click\core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "e:\source\python\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "e:\source\python\lib\site-packages\click\core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "e:\source\python\lib\site-packages\mkdocs\__main__.py", line 173, in serve_command
    serve.serve(dev_addr=dev_addr, livereload=livereload, **kwargs)
  File "e:\source\python\lib\site-packages\mkdocs\commands\serve.py", line 54, in serve
    config = builder()
  File "e:\source\python\lib\site-packages\mkdocs\commands\serve.py", line 49, in builder
    build(config, live_server=live_server, dirty=dirty)
  File "e:\source\python\lib\site-packages\mkdocs\commands\build.py", line 249, in build
    config = config['plugins'].run_event('config', config)
  File "e:\source\python\lib\site-packages\mkdocs\plugins.py", line 94, in run_event
    result = method(item, **kwargs)
  File "e:\source\python\lib\site-packages\mkdocs\contrib\search\__init__.py", line 56, in on_config
    self.config['lang'] = validate(config['theme']['locale'].language)
  File "e:\source\python\lib\site-packages\mkdocs\contrib\search\__init__.py", line 27, in run_validati
on
    raise config_options.ValidationError(
mkdocs.config.base.ValidationError: "zh" is not a supported language code.
@inouthack
Copy link

@xingkong0113 could this be related to #2472

@oprypin
Copy link
Contributor

oprypin commented Jul 18, 2021

It could be related but things shouldn't completely break just because the search plugin doesn't have specific support for that language.

cc @ultrabug

@oprypin
Copy link
Contributor

oprypin commented Jul 18, 2021

For the upcoming release, not to delay it, I think we will have to make the decision to not support Chinese language.

@squidfunk
Copy link
Sponsor Contributor

Adding Chinese language support to the search plugin is currently not possible because of a dependency on nodejieba. nodejieba itself depends on path and node-pre-gyp and potentially other libraries that are not available in a browser environment, and – even worse – seems to include native code. Until those dependencies are removed from lunr-languages and it's upstream dependencies, and replaced with isomorphic JavaScript, adding Chinese search support is blocked.

I'm happy to be proven wrong.

@Fusyong
Copy link

Fusyong commented Sep 18, 2021

I changed the default search component and implemented a simple Chinese search by using jieba to split the content into words.

Enable the search plugin in the config file, and specify the language as English and Japanese, and set the separators.

plugins:
    - search:
        lang:
            - en
            - ja
        separator: '[\s\-\.]+'

Change the default search component mkdocs.contrib.search.search_index.py, adding the following commented lines to split text and title into words.

import jieba # Chinese word separation module

class SearchIndex:

    # The above remains unchanged

    def _add_entry(self, title, text, loc):
        """
        A simple wrapper to add an entry and ensure the contents
        is UTF8 encoded.
        """
        text = text.replace('\u3000', ' ') # Replace Chinese full space
        text = text.replace('\u00a0', ' ')
        text = re.sub(r'[ \t\n\r\f\v]+', ' ', text.strip())

        # Split text into words
        text_seg_list = jieba.cut_for_search(text)  # Search engine mode, with higher recall rate
        text = " ".join(text_seg_list) # join words with space

        # Split title into words
        title_seg_list = jieba.cut(title, cut_all=False) # Precise mode, more readable
        title = " ".join(title_seg_list) # join words with space

        self._entries.append({
            'title': title,
            'text': str(text.encode('utf-8'), encoding='utf-8'),
            'location': loc
        })

    # The following remains unchanged

The actual result can be seen on my blog, as follow:

image

You can also copy and change the search component to your own plugin, refer to the plugin fastsearch

If you want to change the behavior of the search engine, for example, to split your input content into words, exclude deactivated words, etc., you can refer to lunr's Japanese component

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants