shinra-attribute-extraction

データ

SHINRA2020での前処理済みデータの森羅2020-JPタスクの学習・ターゲットデータ（トークナイズ済み, Mecab(IPA辞書)&BPE使用, 東北大BERT対応)を使用しています。

モデル

事前学習済みモデルとして東北大BERTを使用しています． BERTの上に属性ごとに独立した分類層を乗せています．

環境

pytorch
transformers>=3.0.1
fugashi
seqeval
mlflow

Docker

こちらをご利用いただけます．

学習

sh train.sh

※ model_pathはディレクトリです．validation setで最大精度のモデルと最終エポックのモデルを保存します．

train.shの例

python train.py \
    --input_path /path/to/Target_Category \
    --model_path /path/to/model_directory \
    --lr 1e-5 \
    --bsz 32 \
    --epoch 50 \
    --grad_acc 1 \
    --grad_clip 1.0

予測

sh predict.sh.
前処理済みのデータ（１カテゴリ）を入力に，森羅2020の出力形式で予測結果を出力.
※ model_pathはモデルファイルへのパスです．

predict.shの例

python predict.py \
    --input_path /path/to/Target_Category \
    --model_path /path/to/model_file \
    --output_path /path/to/output_file

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

shinra-attribute-extraction

データ

モデル

環境

Docker

学習

train.shの例

予測

predict.shの例

About

Releases

Packages

Contributors 2

Languages

ujiuji1259/shinra-attribute-extraction

Folders and files

Latest commit

History

Repository files navigation

shinra-attribute-extraction

データ

モデル

環境

Docker

学習

train.shの例

予測

predict.shの例

About

Resources

Stars

Watchers

Forks

Languages