Skip to content
This repository has been archived by the owner on Mar 15, 2022. It is now read-only.

A Japanese morphological dictionary builder for IPADIC NEologd.


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



41 Commits

Repository files navigation

Lindera IPADIC NEologd Builder

License: MIT Join the chat at

IPADIC NEologd dictionary builder for Lindera. This project fork from fulmicoton's kuromoji-rs.


% cargo install lindera-ipadic-neologd-builder


The following products are required to build:

  • Rust >= 1.46.0
  • mecab >= 0.996 (for building a dictionary)
% cargo build --release

Dictionary version

This repository only tested data of mecab-ipadic-NEologd.

NOTE : This builder skip 2 words, カブシキガイシャ and タカラヅカカゲキダンキセイ, to avoid dictionary build failure. These words are SKIP_WORDS in src/ .

Building a dictionary

Building a dictionary with lindera-ipadic-neologd command:

% curl -L > ./
% unzip -o
% ./mecab-ipadic-neologd-master/bin/install-mecab-ipadic-neologd --create_user_dic -p $(pwd)/mecab-ipadic-neologd-master/tmp -y
% IPADIC_VERSION=$(find ./mecab-ipadic-neologd-master/build/mecab-ipadic-*-neologd-* -type d | awk -F "-" '{print $6"-"$7}')
% NEOLOGD_VERSION=$(find ./mecab-ipadic-neologd-master/build/mecab-ipadic-*-neologd-* -type d | awk -F "-" '{print $NF}')
% lindera-ipadic-neologd ./mecab-ipadic-neologd-master/build/mecab-ipadic-${IPADIC_VERSION}-neologd-${NEOLOGD_VERSION} lindera-ipadic-${IPADIC_VERSION}-neologd-${NEOLOGD_VERSION}

Dictionary format

Refer to the manual for details on the IPADIC dictionary format and part-of-speech tags.

Index Name (Japanese) Name (English) Notes
0 品詞 part-of-speech
1 品詞細分類1 sub POS 1
2 品詞細分類2 sub POS 2
3 品詞細分類3 sub POS 3
4 活用形 conjugation type
5 活用型 conjugation form
6 原形 base form
7 読み reading
8 発音 pronunciation

Tokenizing text using produced dictionary

You can tokenize text using produced dictionary with lindera command:

% echo "羽田空港限定トートバッグ" | lindera -d ./lindera-ipadic-2.7.0-20070801-neologd-20200130
羽田空港        名詞,固有名詞,一般,*,*,*,羽田空港,ハネダクウコウ,ハネダクーコー
限定    名詞,サ変接続,*,*,*,*,限定,ゲンテイ,ゲンテイ
トートバッグ    名詞,固有名詞,一般,*,*,*,トートバッグ,トートバッグ,トートバッグ

For more details about lindera command, please refer to the following URL:

API reference

The API reference is available. Please see following URL: