NMTScore

A library of translation-based text similarity measures.

To learn more about how these measures work, have a look at Jannis' blog post. Also, read our paper, "NMTScore: A Multilingual Analysis of Translation-based Text Similarity Measures" (Findings of EMNLP).

Installation

Requires Python >= 3.7 and PyTorch
pip install nmtscore
Extra requirements for the Prism model: pip install nmtscore[prism]

Usage

NMTScorer

Instantiate a scorer and start scoring short sentence pairs.

from nmtscore import NMTScorer

scorer = NMTScorer()

scorer.score("This is a sentence.", "This is another sentence.")
# 0.45572562294591235

Different similarity measures

The library implements three different measures:

# Translation cross-likelihood (default)
scorer.score_cross_likelihood(a, b, tgt_lang="en", normalize=True, both_directions=True)

# Direct translation probability
scorer.score_direct(a, b, a_lang="en", b_lang="en", normalize=True, both_directions=True)

# Pivot translation probability
scorer.score_pivot(a, b, a_lang="en", b_lang="en", pivot_lang="en", normalize=True, both_directions=True)

The score method is a shortcut for cross-likelihood.

Batch processing

The scoring methods also accept lists of strings:

scorer.score(
    ["This is a sentence.", "This is a sentence.", "This is another sentence."],
    ["This is another sentence.", "This sentence is completely unrelated.", "This is another sentence."],
)
# [0.45572545262642583, 0.13128832336168145, 0.99999993180868]

The sentences in the first list are compared element-wise to the sentences in the second list.

The default batch size is 8. An alternative batch size can be specified as follows (independently for translating and scoring):

scorer.score_direct(
    a, b, a_lang="en", b_lang="en",
    score_kwargs={"batch_size": 16}
)

scorer.score_cross_likelihood(
    a, b,
    translate_kwargs={"batch_size": 16},
    score_kwargs={"batch_size": 16}
)

Different NMT models

This library currently supports three NMT models:

small100 by Mohammadshahi et al. (2022)
m2m100_418M and m2m100_1.2B by Fan et al. (2021)
prism by Thompson and Post (2020)

By default, the leanest model (small100) is loaded. The main results in the paper are based on the Prism model, which has some extra dependencies (see "Installation" above).

scorer = NMTScorer("small100", device=None)  # default
scorer = NMTScorer("small100", device="cuda:0")  # Enable faster inference on GPU
scorer = NMTScorer("m2m100_418M", device="cuda:0")
scorer = NMTScorer("m2m100_1.2B", device="cuda:0")
scorer = NMTScorer("prism", device="cuda:0")

Which model should I choose?

The page experiments/results/summary.md compares the models regarding their accuracy and latency.

Generally, we recommend Prism because it tends to have the highest accuracy. Also, Prism's implementation currently translates up 10x faster on GPU than the other models do, so we highly recommend to use Prism for the measures that require translation (score_pivot() and score_cross_likelihood()).
small100 is 3.4x faster for score_direct() and has 94–98% of Prism's accuracy.

Enable caching of NMT output

It can make sense to cache the translations and scores if they are needed repeatedly, e.g. in reference-based evaluation.

scorer.score_direct(
    a, b, a_lang="en", b_lang="en",
    score_kwargs={"use_cache": True}  # default: False
)

scorer.score_cross_likelihood(
    a, b,
    translate_kwargs={"use_cache": True},  # default: False
    score_kwargs={"use_cache": True}  # default: False
)

Activating this option will create an SQLite database in the ~/.cache directory. The directory can be overriden via the NMTSCORE_CACHE environment variable.

Print a version signature (à la SacreBLEU)

scorer.score(a, b, print_signature=True)
# NMTScore-cross|tgt-lang:en|model:alirezamsh/small100|normalized|both-directions|v0.3.0|hf4.26.1

Direct usage of NMT models

The NMT models also provide a direct interface for translating and scoring.

from nmtscore.models import load_translation_model

model = load_translation_model("small100")

model.translate("de", ["This is a test."])
# ["Das ist ein Test."]

model.score("de", ["This is a test."], ["Das ist ein Test."])
# [0.8286197781562805]

Experiments

See experiments/README.md

Citation

@inproceedings{vamvas-sennrich-2022-nmtscore,
    title = "{NMTS}core: A Multilingual Analysis of Translation-based Text Similarity Measures",
    author = "Vamvas, Jannis  and
      Sennrich, Rico",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.15",
    pages = "198--213"
}

License

Code: MIT License
Data: See data subdirectories

Changelog

v0.3.1
- Implement the distilled small100 model by Mohammadshahi et al. (2022) and use this model by default.
- Enable half-precision inference for m2m100 models and small100 by default; see (/experiments/results/summary.md) for benchmark results
v0.2.0
- Bugfix: Provide source language to m2m100 models (#2). The fix is backwards-compatible but a warning is now raised if m2m100 is used without specifying the input language.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
experiments		experiments
img		img
src/nmtscore		src/nmtscore
tests		tests
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

License

danielmalaton/nmtscore

Folders and files

Latest commit

History

Repository files navigation

NMTScore

Installation

Usage

NMTScorer

Different similarity measures

Batch processing

Different NMT models

Enable caching of NMT output

Print a version signature (à la SacreBLEU)

Direct usage of NMT models

Experiments

Citation

License

Changelog

About

Resources

License

Stars

Watchers

Forks

Languages