Lindera IPADIC NEologd Builder

IPADIC NEologd dictionary builder for Lindera. This project fork from fulmicoton's kuromoji-rs.

Install

% cargo install lindera-ipadic-neologd-builder

Build

The following products are required to build:

Rust >= 1.46.0
mecab >= 0.996 (for building a dictionary)

% cargo build --release

Dictionary version

This repository only tested data of mecab-ipadic-NEologd.

NOTE : This builder skip 2 words, カブシキガイシャ and タカラヅカカゲキダンキセイ, to avoid dictionary build failure. These words are SKIP_WORDS in src/lib.rs .

Building a dictionary

Building a dictionary with lindera-ipadic-neologd command:

% curl -L https://github.com/neologd/mecab-ipadic-neologd/archive/master.zip > ./mecab-ipadic-neologd-master.zip
% unzip -o mecab-ipadic-neologd-master.zip
% ./mecab-ipadic-neologd-master/bin/install-mecab-ipadic-neologd --create_user_dic -p $(pwd)/mecab-ipadic-neologd-master/tmp -y
% IPADIC_VERSION=$(find ./mecab-ipadic-neologd-master/build/mecab-ipadic-*-neologd-* -type d | awk -F "-" '{print $6"-"$7}')
% NEOLOGD_VERSION=$(find ./mecab-ipadic-neologd-master/build/mecab-ipadic-*-neologd-* -type d | awk -F "-" '{print $NF}')
% lindera-ipadic-neologd ./mecab-ipadic-neologd-master/build/mecab-ipadic-${IPADIC_VERSION}-neologd-${NEOLOGD_VERSION} lindera-ipadic-${IPADIC_VERSION}-neologd-${NEOLOGD_VERSION}

Dictionary format

Refer to the manual for details on the IPADIC dictionary format and part-of-speech tags.

Index	Name (Japanese)	Name (English)
0	品詞	part-of-speech
1	品詞細分類1	sub POS 1
2	品詞細分類2	sub POS 2
3	品詞細分類3	sub POS 3
4	活用形	conjugation type
5	活用型	conjugation form
6	原形	base form
7	読み	reading
8	発音	pronunciation

Tokenizing text using produced dictionary

You can tokenize text using produced dictionary with lindera command:

% echo "羽田空港限定トートバッグ" | lindera -d ./lindera-ipadic-2.7.0-20070801-neologd-20200130

羽田空港        名詞,固有名詞,一般,*,*,*,羽田空港,ハネダクウコウ,ハネダクーコー
限定    名詞,サ変接続,*,*,*,*,限定,ゲンテイ,ゲンテイ
トートバッグ    名詞,固有名詞,一般,*,*,*,トートバッグ,トートバッグ,トートバッグ
EOS

For more details about lindera command, please refer to the following URL:

Lindera CLI

API reference

The API reference is available. Please see following URL:

lindera-ipadic-neologd-builder

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
src		src
.gitignore		.gitignore
AUTHORS		AUTHORS
CHANGES.md		CHANGES.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
NOTICE.txt		NOTICE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

src

src

.gitignore

.gitignore

AUTHORS

AUTHORS

CHANGES.md

CHANGES.md

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

LICENSE

LICENSE

Makefile

Makefile

NOTICE.txt

NOTICE.txt

README.md

README.md

Repository files navigation

Lindera IPADIC NEologd Builder

Install

Build

Dictionary version

Building a dictionary

Dictionary format

Tokenizing text using produced dictionary

API reference

About

Releases 7

Sponsor this project

Packages

Contributors 3

Languages

License

lindera-morphology/lindera-ipadic-neologd-builder

Folders and files

Latest commit

History

Repository files navigation

Lindera IPADIC NEologd Builder

Install

Build

Dictionary version

Building a dictionary

Dictionary format

Tokenizing text using produced dictionary

API reference

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages