New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lepor : A machine translation evaluation Metric. #3178
base: develop
Are you sure you want to change the base?
Lepor : A machine translation evaluation Metric. #3178
Conversation
Let's think about new names for 'sentence_lepor' and 'corpus_lepor' which are more in keeping with NLTK practices. |
I think nltk/nltk/translate/__init__.py Lines 22 to 31 in 582e6e3
However, I would propose to use: from nltk.translate.lepor import sentence_lepor as lepor, corpus_lepor This matches the other scores more closely, and people can primarily use Beyond that, you can use cc: @alvations expressed interest in this PR.
|
Thank you @tomaarsen. I have made the suggested changes and pushed changes with Ikram |
I think running |
Yes, As the changes in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for the contribution!
Generally a good first draft for the PR. But there's some refactoring and function re-shuffling work as described in the code review.
Then after the revision, we'll have figure out some tests, either in the doc or unit test but first try to incorporate the suggestions/revisions in this first code review first.
nltk/translate/lepor.py
Outdated
""" | ||
|
||
r_len = len(reference) | ||
o_len = len(hypothesis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of o_len
is a little too arbitrary. The naming of the variable should be a little more readable, try:
ref_len = len(reference)
hyp_len = len(hypothesis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this way it is more readable. Done.
nltk/translate/lepor.py
Outdated
elif r_len < o_len: | ||
return math.exp(1 - (r_len / o_len)) | ||
else: | ||
return math.exp(1 - (o_len / r_len)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to follow how Eq(2) in the original lepor paper list down the three options, or at least annotate with some comments what the else: ...
represents (to align to the paper):
if ref_len == hyp_len:
return 1
elif ref_len < hyp_len:
return math.exp(1 - (ref_len / hyp_len))
else: # i.e. r_len > hyp_len
return math.exp(1 - (hyp_len / ref_len))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, It's better to have it as close to paper as possible.
nltk/translate/lepor.py
Outdated
def length_penalty(reference: List[str], hypothesis: List[str]) -> float: | ||
""" | ||
Function will calculate length penalty(LP) one of the components in LEPOR, which is defined to embrace | ||
the penalty for both longer and shorter hypothesis compared with the reference translations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small grammar suggestion:
This function calculates the length penalty(LP) for the LEPOR metric, which is defined to embrace the penalty for both longer and shorter hypothesis compared with the reference translations.
Refer from Eq (2) on https://aclanthology.org/C12-2044
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this.
nltk/translate/lepor.py
Outdated
Function will calculate length penalty(LP) one of the components in LEPOR, which is defined to embrace | ||
the penalty for both longer and shorter hypothesis compared with the reference translations. | ||
|
||
:param reference: reference sentence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style suggestion: Capitalize start of param description.
:param reference: Reference sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will do that. Actually, I saw it in blue_score.py. This wasn't the case there.
nltk/translate/lepor.py
Outdated
|
||
:param reference: reference sentence | ||
:type reference: str | ||
:param hypothesis: hypothesis sentence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param hypothesis: Hypothesis sentence
nltk/translate/lepor.py
Outdated
|
||
|
||
def ngram_positional_penalty( | ||
ref_words: List[str], hypothesis_words: List[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should follows the parts that the original LEPOR paper described, so that it's more readable and accessible to people learning the algorithm.From paper, "To calculate the value, there are two steps: aligning and calculating"
def alignment(ref_tokens: List[str], hyp_tokens: List[str]):
"""
This function computes the context-dependenct n-gram word alignment tasks that
takes into account the surrounding context (neighbouring words) of the potential
word to select a better matching pairs between the output and the reference.
This alignment task is used to compute the ngram positional difference penalty
component of the LEPOR score. Generally, the function finds the matching tokens
between the reference and hypothesis, then find the indices of longest matching
n-grams by checking the left and right unigram window of the matching tokens.
:param ref_tokens: A list of tokens in reference sentence.
:type ref_tokens: List[str]
:param hyp_tokens: A list of tokens in hypothesis sentence.
:type hyp_tokens: List[str]
"""
alignments = []
# Store the reference and hypothesis tokens length.
hyp_len = len(hyp_tokens)
ref_len = len(ref_tokens)
for hyp_index, hyp_token in enumerate(hyp_tokens):
# If no match.
if ref_tokens.count(hyp_token) == 0:
alignments.append(-1)
# If only one match.
elif ref_tokens.count(hyp_token) == 1:
alignments.append(ref_tokens.index(hyp_token))
# Otherwise, compute the multiple possibilities.
else:
# Keeps an index of where the hypothesis token matches the reference.
ref_indexes = [i for i, ref_token in enumerate(ref_tokens) if ref_token == hyp_token]
# Iterate through the matched tokens, and check if
# the one token to the left/right also matches.
is_matched = []
for ind, ref_index in enumerate(ref_indexes):
# The the one to the left token also matches.
if (
0 < ref_index - 1 < ref_len
and 0 < hyp_index - 1 < hyp_len
and ref_tokens[ref_index - 1] == hyp_tokens[hyp_index - 1]
):
is_matched[ind] = True
# The the one to the right token also matches.
elif (
0 < ref_index + 1 < ref_len
and 0 < hyp_index + 1 < hyp_len
and ref_tokens[ref_index + 1] == hyp_tokens[hyp_index + 1]
):
is_matched[ind] = True
# If the left and right tokens don't match.
else:
is_matched[ind] = False
# Stores the alignments that have matching phrases.
# If there's only a single matched alignment.
if is_matched.count(True) == 1:
alignments.append(ref_indexes[is_matched.index(True)])
# If there's multiple matched alignments that have matching
# tokens in the left/right window, we shift the index of the
# alignment to the right most matching token.
elif is_matched.count(True) > 1:
min_distance = 0
min_index = 0
for match, ref_index in zip(is_matched, ref_indexes):
if match:
distance = abs(hyp_index - ref_index)
if distance > min_distance:
min_distance = distance
min_index = ref_index
alignments.append(min_index)
# If there's no matched alignments,
# we still keep indexes of the matching tokens
# without explicitly checking for the left/right window.
else:
min_distance = 0
min_index = 0
for ref_index in ref_indexes:
distance = abs(hyp_index - ref_index)
if distance > min_distance:
min_distance = distance
min_index = ref_index
alignments.append(min_index)
# The alignments are one indexed to keep track of the ending slice pointer of the matching ngrams.
alignments = [a+1 for a in alignments if a != -1]
return alignments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the calculating part:
def ngram_positional_penalty(
ref_tokens: List[str], hyp_tokens: List[str]
) -> (float, float):
"""
This function calculates the n-gram position difference penalty (NPosPenal) described in the LEPOR paper.
The NPosPenal is an exponential of the length normalized n-gram matches between the reference and the hypothesis.
:param ref_tokens: list of words in reference sentence.
:type ref_tokens: List[str]
:param hyp_tokens: list of words in hypothesis sentence.
:type hyp_tokens: List[str]
:return: A tuple containing two elements:
- NPosPenal: N-gram positional penalty.
- match_count: Count of matched n-grams.
:rtype: tuple
"""
alignments = alignment(ref_tokens, hyp_tokens)
match_count = len(alignments)
# Stores the the n-gram position values (difference values) of aligned words
# between output and reference sentences,
# aka |PD| of eq (4) in https://aclanthology.org/C12-2044
pd = []
for i, a in enumerate(alignments):
pd.append(abs((i + 1) / len(hyp_tokens) - a / len(ref_tokens)))
npd = sum(pd) / len(hyp_tokens)
return math.exp(-npd), match_count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great suggestion. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your time in commenting on the code. it looks better now.
nltk/translate/lepor.py
Outdated
>>> sentence_lepor(reference, hypothesis) | ||
0.7824248013113159 | ||
|
||
:param reference: reference sentence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param references: Reference sentences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize first word in param description.
nltk/translate/lepor.py
Outdated
|
||
|
||
def sentence_lepor( | ||
reference: str, hypothesis: str, alpha: float = 1.0, beta: float = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence_lepor
should follow the general interface of the other metrics to take in a list of references per hypothesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See
nltk/nltk/translate/bleu_score.py
Line 91 in 582e6e3
>>> sentence_bleu([reference1, reference2, reference3], hypothesis1, weights) # doctest: +ELLIPSIS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the case with other metrics. i will provide the same facility as well.
nltk/translate/lepor.py
Outdated
hypothesis = nltk.word_tokenize(hypothesis) | ||
|
||
# reference = re.findall(r"[\w']+|[.,!?;]", reference) | ||
# hypothesis = re.findall(r"[\w']+|[.,!?;]", hypothesis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is in NLTK, I'll think nltk.word_tokenize()
would be expected, but you can try adding simple vs nltk tokenize as a parameter:
from nltk.tokenize import word_tokenize
def sentence_lepor(
reference: str, hypothesis: str, alpha: float = 1.0, beta: float = 1.0,
use_nltk_tokenize=True):
....
if use_nltk_tokenize:
reference = word_tokenize(reference)
hypothesis = word_tokenize(hypothesis)
else:
reference = re.findall(r"[\w']+|[.,!?;]", reference)
hypothesis = re.findall(r"[\w']+|[.,!?;]", hypothesis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's better, Perhaps we can also provide an option to pass a callable tokenizer(if not passed, we can use the one already in nltk).
nltk/translate/lepor.py
Outdated
[0.7824248013113159, 0.5639427891892225] | ||
|
||
|
||
:param references: reference sentences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the sentence_lepor, the expected behavior should take in multiple references per hypothesis
Thank you very much @alvations for giving me the opportunity to contribute and really appreciate your time in reviewing the code. I have made changes in the code as per your comments. Feel free to suggest more things if needed. Only one test is in the doc string for now but I can add more after finalizing whether we are going to have it in the doc string or unit test. BR, |
a870de1
to
06313e4
Compare
Hey @alvations , BR, |
Hi @alvations, |
Hey @alvations, @tomaarsen, and @stevenbird, Hope you all are doing great. BR, |
Hi,
Work on #3176
This PR contains the code for a machine translation evaluation metric LEPOR. I have provided 'sentence_lepor' and 'corpus_lepor' methods that can be used to calculate the scores for sentences in a list of individually.
Kindly suggest any improvements if required. Thank you for giving me the opportunity to contribute.
BR,
Ikram