Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One-shot BLEU-[2, 3, 4] computation #2320

Closed
marco-roberti opened this issue Jun 15, 2019 · 7 comments · Fixed by #2793
Closed

One-shot BLEU-[2, 3, 4] computation #2320

marco-roberti opened this issue Jun 15, 2019 · 7 comments · Fixed by #2793

Comments

@marco-roberti
Copy link

marco-roberti commented Jun 15, 2019

Hello everyone,

I need to compute the BLEU score with more than one ngram length (ideally, BLEU2, BLEU3, BLEU4, and BLEU5). In my case, this is a very long task, as every hypothesis has some thousand references.

Reading the implementation of the corpus_bleu function, which takes weights:Tuple between its parameters - and thus calculating BLEU-[len(weights)] - , I found out that it gets all the information to compute BLEU-m s.t. 2 <= m < len(weights).
Wouldn't it be nice to have a more general function that can compute BLEU with different ngram lengths at the same time? A possible implementation would be the weights parameter accepting a list of weight tuples as value, computing precisions that are useful for the longest tuple and using those precisions to get all the desired BLEU scores using the weights list.

This would result in a more general implementation and it would avoid the computation waste of calculating the same values more than one time.

@stevenbird
Copy link
Member

This would be nice to have... it's a question of who would like to implement it.

@agannon
Copy link

agannon commented Oct 3, 2019

Hi, I was looking at getting involved with contributing to NLTK and saw this with the 'goodfirstbug' tag. I will take a crack at this problem if that is OK.

@alvations
Copy link
Contributor

Feel free to take a look at https://github.com/nltk/nltk/blob/develop/CONTRIBUTING.md and create a pull-request and someone will review it before merging.

agannon added a commit to agannon/nltk that referenced this issue Oct 10, 2019
nltk#2320

corpus_bleu function runs inefficiently when being used with
different weightings by recalculating the underlying values
each time the function is called instead of reusing them.

* Creates a unit test with the expected behavior of a more general
  function that can take multiple weightings and return multiple
  BLEU scores
agannon added a commit to agannon/nltk that referenced this issue Oct 10, 2019
nltk#2320

corpus_bleu function runs inefficiently when being used with
different weightings by recalculating the underlying values
each time the function is called instead of reusing them.

* Creates a more generalized function that can calculate BLEU scores
  for multiple pre-specified weighings
* Adjusts other functions/tests impacted accordingly
agannon added a commit to agannon/nltk that referenced this issue Oct 12, 2019
nltk#2320

corpus_bleu function runs inefficiently when being used with
different weightings by recalculating the underlying values
each time the function is called instead of reusing them.

* Creates a more generalized function that can calculate BLEU scores
  for multiple pre-specified weightings
* Adjusts other functions/tests impacted accordingly
agannon added a commit to agannon/nltk that referenced this issue Oct 13, 2019
nltk#2320

corpus_bleu function runs inefficiently when being used with
different weightings by recalculating the underlying values
each time the function is called instead of reusing them.

* Creates a more generalized function that can calculate BLEU scores
  for multiple pre-specified weightings
* Adjusts other functions/tests impacted accordingly
@BatMrE
Copy link
Contributor

BatMrE commented Aug 26, 2021

Hi @stevenbird @alvations if its currently open can I get involved with it??

@tomaarsen
Copy link
Member

@BatMrE Yes, you may. We welcome contributions!
Feel free to look at CONTRIBUTING.md for more information.

@BatMrE
Copy link
Contributor

BatMrE commented Aug 29, 2021

@ tomaarsen @alvations
From the issue raised statement:
Wouldn't it be nice to have a more general function that can compute BLEU with different ngram lengths at the same time?

I will ultimately be calling the Bleu funnction for each of the 2 , 3 ,4 ... len(weights) , so how will it be saving computation waste of calculating the same values more than one time??

@tomaarsen
Copy link
Member

@BatMrE
corpus_bleu (and also sentence_bleu) are called with a weights parameter. This is a list or tuple of floats with a certain length, such that the weights generally sum to 1. The length of this list or tuple determines whether BLEU-2, BLEU-3, BLEU-4, etc. is used. For example, with a length of 2, BLEU-2 is used.

For each BLEU-k, the computation has to compute all k-grams, all k-1-grams, k-2-grams, ..., 2-grams, 1-grams. This can be noticed in the following section of the corpus_bleu algorithm:

# For each order of ngram, calculate the numerator and
# denominator for the corpus-level modified precision.
for i, _ in enumerate(weights, start=1):
p_i = modified_precision(references, hypothesis, i)
p_numerators[i] += p_i.numerator
p_denominators[i] += p_i.denominator

In short, calculating BLEU-4 requires going over all 4-grams, 3-grams, 2-grams and 1-grams. If we then also want to compute BLEU-3, then corpus_bleu (or sentence_bleu) is called again, which recomputes the 3-grams, 2-grams and 1-grams. The idea is that BLEU-3 could also be computed in the original call for BLEU-4.

There's a few options to implement this. One option is to always compute the lower ngram order BLEU's and cache them, so if they are requested, no recomputation is needed. However, this would increase the overhead a lot when a user isn't interested in the lower order ngram BLEU's.

Another option is to add a boolean parameter to corpus_bleu and sentence_bleu - if this parameter is True, then a dictionary (or tuple, perhaps?) of the BLEU scores is returned.

Something to think about:
Say BLEU-4 is being computed with weights (0.3, 0.3, 0.3, 0.1), which weights should be used to compute BLEU-3? Simply a reweighed version of weights[:3]? Or should the user supply multiple weights?
If so, this should be known from the function documentation.

Or should the user be able to optionally supply a list of weights? In the event that this occurs, multiple BLEU scores will be computed, one for each of the weights. Perhaps it would then still be possible to reuse computation of ngrams.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants