Lepor : A machine translation evaluation Metric. #3178

ulhaqi12 · 2023-07-29T11:41:27Z

Hi,
Work on #3176
This PR contains the code for a machine translation evaluation metric LEPOR. I have provided 'sentence_lepor' and 'corpus_lepor' methods that can be used to calculate the scores for sentences in a list of individually.

Lepor for corpus

>>> hypothesis = ['a bird is on a stone.', 'scary crow was not bad.']
>>> references = ['a bird behind the stone.', 'scary cow was good.']
>>> corpus_lepor(references, hypothesis)
[0.7824248013113159, 0.5639427891892225]

Lepor for sentence

>>> hypothesis = 'a bird is on a stone.'
>>> references = 'a bird behind the stone.'
>>> sentence_lepor(references, hypothesis)
0.7824248013113159

Kindly suggest any improvements if required. Thank you for giving me the opportunity to contribute.

BR,
Ikram

stevenbird · 2023-07-29T13:37:24Z

Let's think about new names for 'sentence_lepor' and 'corpus_lepor' which are more in keeping with NLTK practices.

tomaarsen · 2023-07-29T13:49:24Z

I think sentence_lepor and corpus_lepor are reasonable names given that many other scoring functions use the same format:

nltk/nltk/translate/__init__.py

Lines 22 to 31 in 582e6e3

    
           from nltk.translate.bleu_score import sentence_bleu as bleu 
        
           from nltk.translate.ribes_score import sentence_ribes as ribes 
        
           from nltk.translate.meteor_score import meteor_score as meteor 
        
           from nltk.translate.metrics import alignment_error_rate 
        
           from nltk.translate.stack_decoder import StackDecoder 
        
           from nltk.translate.nist_score import sentence_nist as nist 
        
           from nltk.translate.chrf_score import sentence_chrf as chrf 
        
           from nltk.translate.gale_church import trace 
        
           from nltk.translate.gdfa import grow_diag_final_and 
        
           from nltk.translate.gleu_score import sentence_gleu as gleu

However, I would propose to use:

from nltk.translate.lepor import sentence_lepor as lepor, corpus_lepor

This matches the other scores more closely, and people can primarily use lepor(..., ...) just like bleu(..., ...) and meteor(..., ...,).

Beyond that, you can use pre-commit install to set up pre-commit for (only) this repo, that way it'll automatically run some formatting scripts. Otherwise the tests will fail on this PR.

cc: @alvations expressed interest in this PR.

Tom Aarsen

ulhaqi12 · 2023-07-31T10:28:50Z

Thank you @tomaarsen. I have made the suggested changes and pushed changes with pre-commit install.
@alvations, Your suggestions would be appreciated. Thank you.

Ikram

tomaarsen · 2023-07-31T10:50:46Z

I think running pre-commit run --all should prevent the pre-commit workflow from complaining any further. By default, pre-commit only runs the (formatting) scripts over files edited in that commit, i.e. only for __init__.py that you edited, so the changes in lepor.py are not compliant with the formatting rules yet.

ulhaqi12 · 2023-07-31T10:53:08Z

Yes, As the changes in lepor.py was in the previous commit, and after using pre-commit it was not tracking that file. Pushed it again and now commited lepor.py code again through pre-commit

alvations

Thank you again for the contribution!

Generally a good first draft for the PR. But there's some refactoring and function re-shuffling work as described in the code review.

Then after the revision, we'll have figure out some tests, either in the doc or unit test but first try to incorporate the suggestions/revisions in this first code review first.

alvations · 2023-07-31T11:54:51Z

nltk/translate/lepor.py

+    """
+
+    r_len = len(reference)
+    o_len = len(hypothesis)


The naming of o_len is a little too arbitrary. The naming of the variable should be a little more readable, try:

ref_len = len(reference) hyp_len = len(hypothesis)

Yes, this way it is more readable. Done.

alvations · 2023-07-31T11:56:54Z

nltk/translate/lepor.py

+    elif r_len < o_len:
+        return math.exp(1 - (r_len / o_len))
+    else:
+        return math.exp(1 - (o_len / r_len))


It might be good to follow how Eq(2) in the original lepor paper list down the three options, or at least annotate with some comments what the else: ... represents (to align to the paper):

if ref_len == hyp_len: return 1 elif ref_len < hyp_len: return math.exp(1 - (ref_len / hyp_len)) else: # i.e. r_len > hyp_len return math.exp(1 - (hyp_len / ref_len))

Sure, It's better to have it as close to paper as possible.

alvations · 2023-07-31T12:01:16Z

nltk/translate/lepor.py

+def length_penalty(reference: List[str], hypothesis: List[str]) -> float:
+    """
+    Function will calculate length penalty(LP) one of the components in LEPOR, which is defined to embrace
+    the penalty for both longer and shorter hypothesis compared with the reference translations.


Small grammar suggestion:

This function calculates the length penalty(LP) for the LEPOR metric, which is defined to embrace the penalty for both longer and shorter hypothesis compared with the reference translations.
Refer from Eq (2) on https://aclanthology.org/C12-2044

Thank you for this.

alvations · 2023-07-31T12:02:30Z

nltk/translate/lepor.py

+    Function will calculate length penalty(LP) one of the components in LEPOR, which is defined to embrace
+    the penalty for both longer and shorter hypothesis compared with the reference translations.
+
+    :param reference: reference sentence


Style suggestion: Capitalize start of param description.

:param reference: Reference sentence

Sure. I will do that. Actually, I saw it in blue_score.py. This wasn't the case there.

alvations · 2023-07-31T12:02:42Z

nltk/translate/lepor.py

+
+    :param reference: reference sentence
+    :type reference: str
+    :param hypothesis: hypothesis sentence


:param hypothesis: Hypothesis sentence

alvations · 2023-07-31T12:49:26Z

nltk/translate/lepor.py

+
+
+def ngram_positional_penalty(
+    ref_words: List[str], hypothesis_words: List[str]


This function should follows the parts that the original LEPOR paper described, so that it's more readable and accessible to people learning the algorithm.From paper, "To calculate the value, there are two steps: aligning and calculating"

def alignment(ref_tokens: List[str], hyp_tokens: List[str]): """ This function computes the context-dependenct n-gram word alignment tasks that takes into account the surrounding context (neighbouring words) of the potential word to select a better matching pairs between the output and the reference. This alignment task is used to compute the ngram positional difference penalty component of the LEPOR score. Generally, the function finds the matching tokens between the reference and hypothesis, then find the indices of longest matching n-grams by checking the left and right unigram window of the matching tokens. :param ref_tokens: A list of tokens in reference sentence. :type ref_tokens: List[str] :param hyp_tokens: A list of tokens in hypothesis sentence. :type hyp_tokens: List[str] """ alignments = [] # Store the reference and hypothesis tokens length. hyp_len = len(hyp_tokens) ref_len = len(ref_tokens) for hyp_index, hyp_token in enumerate(hyp_tokens): # If no match. if ref_tokens.count(hyp_token) == 0: alignments.append(-1) # If only one match. elif ref_tokens.count(hyp_token) == 1: alignments.append(ref_tokens.index(hyp_token)) # Otherwise, compute the multiple possibilities. else: # Keeps an index of where the hypothesis token matches the reference. ref_indexes = [i for i, ref_token in enumerate(ref_tokens) if ref_token == hyp_token] # Iterate through the matched tokens, and check if # the one token to the left/right also matches. is_matched = [] for ind, ref_index in enumerate(ref_indexes): # The the one to the left token also matches. if ( 0 < ref_index - 1 < ref_len and 0 < hyp_index - 1 < hyp_len and ref_tokens[ref_index - 1] == hyp_tokens[hyp_index - 1] ): is_matched[ind] = True # The the one to the right token also matches. elif ( 0 < ref_index + 1 < ref_len and 0 < hyp_index + 1 < hyp_len and ref_tokens[ref_index + 1] == hyp_tokens[hyp_index + 1] ): is_matched[ind] = True # If the left and right tokens don't match. else: is_matched[ind] = False # Stores the alignments that have matching phrases. # If there's only a single matched alignment. if is_matched.count(True) == 1: alignments.append(ref_indexes[is_matched.index(True)]) # If there's multiple matched alignments that have matching # tokens in the left/right window, we shift the index of the # alignment to the right most matching token. elif is_matched.count(True) > 1: min_distance = 0 min_index = 0 for match, ref_index in zip(is_matched, ref_indexes): if match: distance = abs(hyp_index - ref_index) if distance > min_distance: min_distance = distance min_index = ref_index alignments.append(min_index) # If there's no matched alignments, # we still keep indexes of the matching tokens # without explicitly checking for the left/right window. else: min_distance = 0 min_index = 0 for ref_index in ref_indexes: distance = abs(hyp_index - ref_index) if distance > min_distance: min_distance = distance min_index = ref_index alignments.append(min_index) # The alignments are one indexed to keep track of the ending slice pointer of the matching ngrams. alignments = [a+1 for a in alignments if a != -1] return alignments

Then the calculating part:

def ngram_positional_penalty( ref_tokens: List[str], hyp_tokens: List[str] ) -> (float, float): """ This function calculates the n-gram position difference penalty (NPosPenal) described in the LEPOR paper. The NPosPenal is an exponential of the length normalized n-gram matches between the reference and the hypothesis. :param ref_tokens: list of words in reference sentence. :type ref_tokens: List[str] :param hyp_tokens: list of words in hypothesis sentence. :type hyp_tokens: List[str] :return: A tuple containing two elements: - NPosPenal: N-gram positional penalty. - match_count: Count of matched n-grams. :rtype: tuple """ alignments = alignment(ref_tokens, hyp_tokens) match_count = len(alignments) # Stores the the n-gram position values (difference values) of aligned words # between output and reference sentences, # aka |PD| of eq (4) in https://aclanthology.org/C12-2044 pd = [] for i, a in enumerate(alignments): pd.append(abs((i + 1) / len(hyp_tokens) - a / len(ref_tokens))) npd = sum(pd) / len(hyp_tokens) return math.exp(-npd), match_count

This is really great suggestion. Thanks!

Thank you for your time in commenting on the code. it looks better now.

alvations · 2023-07-31T13:00:19Z

nltk/translate/lepor.py

+    >>> sentence_lepor(reference, hypothesis)
+    0.7824248013113159
+
+    :param reference: reference sentence


:param references: Reference sentences

Capitalize first word in param description.

alvations · 2023-07-31T13:02:00Z

nltk/translate/lepor.py

+
+
+def sentence_lepor(
+    reference: str, hypothesis: str, alpha: float = 1.0, beta: float = 1.0


The sentence_lepor should follow the general interface of the other metrics to take in a list of references per hypothesis.

See

nltk/nltk/translate/bleu_score.py

Line 91 in 582e6e3

>>> sentence_bleu([reference1, reference2, reference3], hypothesis1, weights) # doctest: +ELLIPSIS

Yes, this is the case with other metrics. i will provide the same facility as well.

alvations · 2023-07-31T13:04:52Z

nltk/translate/lepor.py

+    hypothesis = nltk.word_tokenize(hypothesis)
+
+    # reference = re.findall(r"[\w']+|[.,!?;]", reference)
+    # hypothesis = re.findall(r"[\w']+|[.,!?;]", hypothesis)


Since this is in NLTK, I'll think nltk.word_tokenize() would be expected, but you can try adding simple vs nltk tokenize as a parameter:

from nltk.tokenize import word_tokenize def sentence_lepor( reference: str, hypothesis: str, alpha: float = 1.0, beta: float = 1.0, use_nltk_tokenize=True): .... if use_nltk_tokenize: reference = word_tokenize(reference) hypothesis = word_tokenize(hypothesis) else: reference = re.findall(r"[\w']+|[.,!?;]", reference) hypothesis = re.findall(r"[\w']+|[.,!?;]", hypothesis)

Yes, that's better, Perhaps we can also provide an option to pass a callable tokenizer(if not passed, we can use the one already in nltk).

alvations · 2023-07-31T13:06:05Z

nltk/translate/lepor.py

+    [0.7824248013113159, 0.5639427891892225]
+
+
+    :param references: reference sentences


Same comment as the sentence_lepor, the expected behavior should take in multiple references per hypothesis

ulhaqi12 · 2023-07-31T14:46:05Z

Thank you very much @alvations for giving me the opportunity to contribute and really appreciate your time in reviewing the code.

I have made changes in the code as per your comments. Feel free to suggest more things if needed. Only one test is in the doc string for now but I can add more after finalizing whether we are going to have it in the doc string or unit test.

BR,
Ikram

ulhaqi12 · 2023-08-07T17:34:36Z

Hey @alvations ,
Thank you for your time and guidance. I already pushed the required changes as per your recent review. Kindly suggest if there is some improvement required.
Can you give me a little guide on how should i go for testing? should I implement more tests within doc strings?

BR,
Ikram

ulhaqi12 · 2023-09-04T09:57:46Z

Hi @alvations,
A kind reminder.

ulhaqi12 · 2023-11-27T14:52:06Z

Hey @alvations, @tomaarsen, and @stevenbird,

Hope you all are doing great.
I was wondering if we could look into the remaining work with your suggestions and close this one,
Thank you

BR,
Ikram

iulhaq added 2 commits July 29, 2023 13:57

translate: added lepor score evaluation metric

fe59e4c

translate:lepor: improved doc strings for documentation

3d0177d

translate:lepor: minor changes and pre-commit

64cb348

iulhaq added 2 commits July 31, 2023 13:47

removed lepor code -without pre-commit install

4f47de5

pre-commit install with complete code.

9ac6816

iulhaq added 2 commits July 31, 2023 14:24

changed list[str] - not supported in pyhton3.7

c298681

final changes for Python3.7 tested locally.

ae66812

alvations requested changes Jul 31, 2023

View reviewed changes

alvations added enhancement translate nice idea labels Jul 31, 2023

translate:lepor: Worked on requested changes - first review

0fa1412

iulhaq added 2 commits July 31, 2023 17:54

translate:lepor: doc string syntax correction

b44e5ee

translate:lepor: added tokeizer choice option

06313e4

ulhaqi12 force-pushed the feature/lepor-tranlation-evaluation-metric branch from a870de1 to 06313e4 Compare July 31, 2023 20:39

Merge branch 'develop' into feature/lepor-tranlation-evaluation-metric

0e162c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lepor : A machine translation evaluation Metric. #3178

Lepor : A machine translation evaluation Metric. #3178

ulhaqi12 commented Jul 29, 2023

stevenbird commented Jul 29, 2023

tomaarsen commented Jul 29, 2023 •

edited

ulhaqi12 commented Jul 31, 2023

tomaarsen commented Jul 31, 2023

ulhaqi12 commented Jul 31, 2023

alvations left a comment

alvations Jul 31, 2023

ulhaqi12 Jul 31, 2023

alvations Jul 31, 2023

ulhaqi12 Jul 31, 2023

alvations Jul 31, 2023

ulhaqi12 Jul 31, 2023

alvations Jul 31, 2023

ulhaqi12 Jul 31, 2023

alvations Jul 31, 2023

alvations Jul 31, 2023 •

edited

alvations Jul 31, 2023

ulhaqi12 Jul 31, 2023

ulhaqi12 Jul 31, 2023

alvations Jul 31, 2023

alvations Jul 31, 2023

alvations Jul 31, 2023

alvations Jul 31, 2023

ulhaqi12 Jul 31, 2023

alvations Jul 31, 2023

ulhaqi12 Jul 31, 2023

alvations Jul 31, 2023

ulhaqi12 commented Jul 31, 2023

ulhaqi12 commented Aug 7, 2023

ulhaqi12 commented Sep 4, 2023

ulhaqi12 commented Nov 27, 2023



		def ngram_positional_penalty(
		ref_words: List[str], hypothesis_words: List[str]



		def sentence_lepor(
		reference: str, hypothesis: str, alpha: float = 1.0, beta: float = 1.0

		[0.7824248013113159, 0.5639427891892225]


		:param references: reference sentences

Lepor : A machine translation evaluation Metric. #3178

Are you sure you want to change the base?

Lepor : A machine translation evaluation Metric. #3178

Conversation

ulhaqi12 commented Jul 29, 2023

stevenbird commented Jul 29, 2023

tomaarsen commented Jul 29, 2023 • edited

ulhaqi12 commented Jul 31, 2023

tomaarsen commented Jul 31, 2023

ulhaqi12 commented Jul 31, 2023

alvations left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvations Jul 31, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ulhaqi12 commented Jul 31, 2023

ulhaqi12 commented Aug 7, 2023

ulhaqi12 commented Sep 4, 2023

ulhaqi12 commented Nov 27, 2023

tomaarsen commented Jul 29, 2023 •

edited

alvations Jul 31, 2023 •

edited