One-shot BLEU-[2, 3, 4] computation #2320

marco-roberti · 2019-06-15T14:39:19Z

Hello everyone,

I need to compute the BLEU score with more than one ngram length (ideally, BLEU2, BLEU3, BLEU4, and BLEU5). In my case, this is a very long task, as every hypothesis has some thousand references.

Reading the implementation of the corpus_bleu function, which takes weights:Tuple between its parameters - and thus calculating BLEU-[len(weights)] - , I found out that it gets all the information to compute BLEU-m s.t. 2 <= m < len(weights).
Wouldn't it be nice to have a more general function that can compute BLEU with different ngram lengths at the same time? A possible implementation would be the weights parameter accepting a list of weight tuples as value, computing precisions that are useful for the longest tuple and using those precisions to get all the desired BLEU scores using the weights list.

This would result in a more general implementation and it would avoid the computation waste of calculating the same values more than one time.

stevenbird · 2019-07-14T11:39:45Z

This would be nice to have... it's a question of who would like to implement it.

agannon · 2019-10-03T23:57:29Z

Hi, I was looking at getting involved with contributing to NLTK and saw this with the 'goodfirstbug' tag. I will take a crack at this problem if that is OK.

alvations · 2019-10-04T02:35:29Z

Feel free to take a look at https://github.com/nltk/nltk/blob/develop/CONTRIBUTING.md and create a pull-request and someone will review it before merging.

nltk#2320 corpus_bleu function runs inefficiently when being used with different weightings by recalculating the underlying values each time the function is called instead of reusing them. * Creates a unit test with the expected behavior of a more general function that can take multiple weightings and return multiple BLEU scores

nltk#2320 corpus_bleu function runs inefficiently when being used with different weightings by recalculating the underlying values each time the function is called instead of reusing them. * Creates a more generalized function that can calculate BLEU scores for multiple pre-specified weighings * Adjusts other functions/tests impacted accordingly

nltk#2320 corpus_bleu function runs inefficiently when being used with different weightings by recalculating the underlying values each time the function is called instead of reusing them. * Creates a more generalized function that can calculate BLEU scores for multiple pre-specified weightings * Adjusts other functions/tests impacted accordingly

BatMrE · 2021-08-26T11:32:42Z

Hi @stevenbird @alvations if its currently open can I get involved with it??

tomaarsen · 2021-08-26T11:44:24Z

@BatMrE Yes, you may. We welcome contributions!
Feel free to look at CONTRIBUTING.md for more information.

BatMrE · 2021-08-29T18:09:37Z

@ tomaarsen @alvations
From the issue raised statement:
Wouldn't it be nice to have a more general function that can compute BLEU with different ngram lengths at the same time?

I will ultimately be calling the Bleu funnction for each of the 2 , 3 ,4 ... len(weights) , so how will it be saving computation waste of calculating the same values more than one time??

tomaarsen · 2021-08-30T08:42:38Z

@BatMrE
corpus_bleu (and also sentence_bleu) are called with a weights parameter. This is a list or tuple of floats with a certain length, such that the weights generally sum to 1. The length of this list or tuple determines whether BLEU-2, BLEU-3, BLEU-4, etc. is used. For example, with a length of 2, BLEU-2 is used.

For each BLEU-k, the computation has to compute all k-grams, all k-1-grams, k-2-grams, ..., 2-grams, 1-grams. This can be noticed in the following section of the corpus_bleu algorithm:

nltk/nltk/translate/bleu_score.py

Lines 174 to 179 in f989fe6

    
           # For each order of ngram, calculate the numerator and 
        
           # denominator for the corpus-level modified precision. 
        
           for i, _ in enumerate(weights, start=1): 
        
               p_i = modified_precision(references, hypothesis, i) 
        
               p_numerators[i] += p_i.numerator 
        
               p_denominators[i] += p_i.denominator

In short, calculating BLEU-4 requires going over all 4-grams, 3-grams, 2-grams and 1-grams. If we then also want to compute BLEU-3, then corpus_bleu (or sentence_bleu) is called again, which recomputes the 3-grams, 2-grams and 1-grams. The idea is that BLEU-3 could also be computed in the original call for BLEU-4.

There's a few options to implement this. One option is to always compute the lower ngram order BLEU's and cache them, so if they are requested, no recomputation is needed. However, this would increase the overhead a lot when a user isn't interested in the lower order ngram BLEU's.

Another option is to add a boolean parameter to corpus_bleu and sentence_bleu - if this parameter is True, then a dictionary (or tuple, perhaps?) of the BLEU scores is returned.

Something to think about:
Say BLEU-4 is being computed with weights (0.3, 0.3, 0.3, 0.1), which weights should be used to compute BLEU-3? Simply a reweighed version of weights[:3]? Or should the user supply multiple weights?
If so, this should be known from the function documentation.

Or should the user be able to optionally supply a list of weights? In the event that this occurs, multiple BLEU scores will be computed, one for each of the weights. Perhaps it would then still be possible to reuse computation of ngrams.

Tom Aarsen

alvations added good first issue SMT labels Jul 16, 2019

agannon mentioned this issue Oct 10, 2019

Feature/multiple ngram bleu #2424

Closed

BatMrE mentioned this issue Aug 31, 2021

Added multi Bleu functionality and tests #2793

Merged

tomaarsen linked a pull request Sep 12, 2021 that will close this issue

Added multi Bleu functionality and tests #2793

Merged

tomaarsen closed this as completed in #2793 Nov 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One-shot BLEU-[2, 3, 4] computation #2320

One-shot BLEU-[2, 3, 4] computation #2320

marco-roberti commented Jun 15, 2019 •

edited

stevenbird commented Jul 14, 2019

agannon commented Oct 3, 2019

alvations commented Oct 4, 2019

BatMrE commented Aug 26, 2021

tomaarsen commented Aug 26, 2021

BatMrE commented Aug 29, 2021 •

edited

tomaarsen commented Aug 30, 2021

One-shot BLEU-[2, 3, 4] computation #2320

One-shot BLEU-[2, 3, 4] computation #2320

Comments

marco-roberti commented Jun 15, 2019 • edited

stevenbird commented Jul 14, 2019

agannon commented Oct 3, 2019

alvations commented Oct 4, 2019

BatMrE commented Aug 26, 2021

tomaarsen commented Aug 26, 2021

BatMrE commented Aug 29, 2021 • edited

tomaarsen commented Aug 30, 2021

marco-roberti commented Jun 15, 2019 •

edited

BatMrE commented Aug 29, 2021 •

edited