Added multi Bleu functionality and tests #2793

BatMrE · 2021-08-31T10:55:55Z

Feature as per #2320

from bleu_score import corpus_bleu

references = [[['this', 'is', 'a', 'test'], ['this', 'is' 'test']]]
candidates = [['this', 'is', 'a', 'test']]
print('Individual 1-gram: %f' % corpus_bleu(references, candidates, weights=(1, 0, 0, 0)))
print('Individual 2-gram: %f' % corpus_bleu(references, candidates, weights=(0, 1, 0, 0)))

Individual 1-gram: 1.000000
Individual 2-gram: 1.000000

tomaarsen · 2021-09-02T10:28:07Z

I've added a few more tests to give myself some more confidence that your changes work correctly - I hope that's okay.

tomaarsen

Edit: I had some time to push some of these suggestions myself. Note that I only modified comments, and not actual code. In short, the following text is outdated, but perhaps it helps explain my commit c730dc3.

The only change request that I still have is to have another look at the if isinstance(weights, tuple). Preferably, I'd like to keep weights=[0.8, 0.2] as a working parameter. In this PR, that throws an exception.

Also, the docstring should be modified to include a modified return and rtype section.

E.g.

    :return: The corpus-level BLEU score. Returns a list if multiple weights were supplied.
    :rtype: float / list(float)

It would also be nice to include an example in the docstring of how weights can be used to finetune the BLEU score.
For example, placing the following comment in the whitespace here:

nltk/nltk/translate/bleu_score.py

Lines 147 to 149 in f989fe6

    
               0.6223... 
        
               :param list_of_references: a corpus of lists of reference sentences, w.r.t. hypotheses

    Custom weights may be supplied to fine-tune the BLEU score further. 
    A tuple of float weights for unigrams, bigrams, trigrams and so on can be given.

    >>> weights = (0.1, 0.3, 0.5, 0.1)
    >>> corpus_bleu(list_of_references, hypotheses, weights=weights) # doctest: +ELLIPSIS
    0.5818...

    This particular weight gave extra value to trigrams.

    Furthermore, multiple weights can be given, resulting in multiple BLEU scores.

    >>> weights = [
    ...     (0.5, 0.5),
    ...     (0.333, 0.333, 0.334),
    ...     (0.25, 0.25, 0.25, 0.25),
    ...     (0.2, 0.2, 0.2, 0.2, 0.2)
    ... ]
    >>> corpus_bleu(list_of_references, hypotheses, weights=weights) # doctest: +ELLIPSIS
    [0.8242..., 0.7067..., 0.5920..., 0.4719...]

Also, sentence_bleu should also have their docstring updated. This function simply calls corpus_bleu, so it also gets upgraded with this PR.

nltk/translate/bleu_score.py

tomaarsen · 2021-09-02T10:56:10Z

nltk/translate/bleu_score.py

+    if isinstance(weights, tuple):
+        weights = [weights]


This is a little bit dangerous. People might be using list weights right now, and that would stop working. Perhaps a safer solution is to check if weights and isinstance(weights[0], float):?

Currently, the following program will break:

from nltk.translate.bleu_score import corpus_bleu # define ref's and hyp's here, left out for conciseness. weights = [0.2, 0.8] bleu_scores = corpus_bleu( list_of_references=[[ref1a, ref1b, ref1c], [ref2a]], hypotheses=[hyp1, hyp2], weights=weights, )

Throws the following exception:

Traceback (most recent call last): File "[...]\nltk_2793.py", line 110, in <module> out = bleu_scores = corpus_bleu( File "[...]\nltk\translate\bleu_score.py", line 174, in corpus_bleu max_weight_length = max(len(weight) for weight in weights) File "[...]\nltk\translate\bleu_score.py", line 174, in <genexpr> max_weight_length = max(len(weight) for weight in weights) TypeError: object of type 'float' has no len()

This didn't break before. People will expect that something that worked before to still work later.

@BatMrE I think the functionality of using weights like [0.2, 0.8] like in my example above should still be supported. I would rather not merge this PR until that is resolved.

Hey @tomaarsen,

I agree with your point however while testing for the script:

from nltk.translate.bleu_score import corpus_bleu # define ref's and hyp's here, left out for conciseness. weights = [0.2, 0.8] bleu_scores = corpus_bleu( list_of_references=[[ref1a, ref1b, ref1c], [ref2a]], hypotheses=[hyp1, hyp2], weights=weights, )

with code:

if isinstance(weights, tuple): weights = [weights] max_weight_length = max(len(weight) for weight in weights)

I am getting the expected result as 0.7496592017168556
and 460 passed, 32 skipped, 9 xfailed, 1 warning in 222.27s (0:03:42)

also your code:

if weights and isinstance(weights[0], float): max_weight_length = max(len(weight) for weight in weights)

gives the same result 0.7496592017168556
and 12 failed, 448 passed, 32 skipped, 9 xfailed, 1 warning in 210.82s (0:03:30)

I'm not sure how you get the expected result with your current implementation. I still only get

TypeError: object of type 'float' has no len()

This is the script I'm using:

hyp1 = [ "It", "is", "a", "guide", "to", "action", "which", "ensures", "that", "the", "military", "always", "obeys", "the", "commands", "of", "the", "party", ] ref1a = [ "It", "is", "a", "guide", "to", "action", "that", "ensures", "that", "the", "military", "will", "forever", "heed", "Party", "commands", ] ref1b = [ "It", "is", "the", "guiding", "principle", "which", "guarantees", "the", "military", "forces", "always", "being", "under", "the", "command", "of", "the", "Party", ] ref1c = [ "It", "is", "the", "practical", "guide", "for", "the", "army", "always", "to", "heed", "the", "directions", "of", "the", "party", ] hyp2 = [ "he", "read", "the", "book", "because", "he", "was", "interested", "in", "world", "history", ] ref2a = [ "he", "was", "interested", "in", "world", "history", "because", "he", "read", "the", "book", ] weights = [0.2, 0.8] from nltk.translate.bleu_score import corpus_bleu out = bleu_scores = corpus_bleu( list_of_references=[[ref1a, ref1b, ref1c], [ref2a]], hypotheses=[hyp1, hyp2], weights=weights, ) print(out)

Can you explain how you get the correct results with weights [0.2, 0.8] currently?

Also, I meant something along the lines of

if weights and isinstance(weights[0], float): weights = [weights] max_weight_length = max(len(weight) for weight in weights)

That said, this does give problems when the weights are for example integers, like (0, 0, 1).

I have updated as per weights type, tests are passing, let me know for any issue will fix now

nltk/translate/bleu_score.py

…ulti-BLEU

stevenbird · 2021-09-28T08:28:47Z

@BatMrE please take a look at the above feedback, thanks

…ulti-BLEU

stevenbird · 2021-10-10T00:15:48Z

@BatMrE thanks for your further work. Can you report whether you dealt with the requested changes from @tomaarsen please?

BatMrE · 2021-10-11T18:58:58Z

@BatMrE thanks for your further work. Can you report whether you dealt with the requested changes from @tomaarsen please?

Yes, the changes expected by @tomaarsen for different data types for weight is resolved in my implementation, tested with different formats

tomaarsen · 2021-10-11T20:46:13Z

I'll have a look at your changes when I have some more time!

tomaarsen

@BatMrE One more thing, it seems that you've removed some commits that were meant to improve the documentation of this file, such as c730dc3. I'm not sure whether this was accidental, but I believe I should be able to re-push these if it was accidental.

There are some closed conversations on this PR that were solved by this commit.

nltk/translate/bleu_score.py

tomaarsen · 2021-11-20T22:37:57Z

Thank you for these changes @BatMrE!

Added multi Bleu functionality and tests

d96754b

tomaarsen requested changes Sep 2, 2021

View reviewed changes

tomaarsen linked an issue Sep 12, 2021 that may be closed by this pull request

One-shot BLEU-[2, 3, 4] computation #2320

Closed

tomaarsen added enhancement nice idea SMT labels Sep 12, 2021

Merge branch 'develop' of https://github.com/nltk/nltk into feature/m…

a300327

…ulti-BLEU

BatMrE added 2 commits October 7, 2021 00:10

added supports for weights

dda94de

Merge branch 'develop' of https://github.com/nltk/nltk into feature/m…

021a77f

…ulti-BLEU

BatMrE force-pushed the feature/multi-BLEU branch from 1c2050c to 021a77f Compare October 6, 2021 18:49

tomaarsen requested changes Oct 12, 2021

View reviewed changes

nltk/translate/bleu_score.py Outdated Show resolved Hide resolved

BatMrE and others added 2 commits November 13, 2021 21:23

Code changes for weights and doc

adf9e1d

Merge branch 'develop' of https://github.com/nltk/nltk into pr/2793

72e6e6f

tomaarsen approved these changes Nov 20, 2021

View reviewed changes

tomaarsen merged commit 3c2a5a6 into nltk:develop Nov 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added multi Bleu functionality and tests #2793

Added multi Bleu functionality and tests #2793

BatMrE commented Aug 31, 2021 •

edited

tomaarsen commented Sep 2, 2021

tomaarsen left a comment •

edited

tomaarsen Sep 2, 2021

tomaarsen Sep 10, 2021

BatMrE Sep 29, 2021

tomaarsen Sep 29, 2021

BatMrE Oct 6, 2021

stevenbird commented Sep 28, 2021

stevenbird commented Oct 10, 2021

BatMrE commented Oct 11, 2021

tomaarsen commented Oct 11, 2021

tomaarsen left a comment •

edited

tomaarsen commented Nov 20, 2021

	0.6223...

	:param list_of_references: a corpus of lists of reference sentences, w.r.t. hypotheses

Added multi Bleu functionality and tests #2793

Added multi Bleu functionality and tests #2793

Conversation

BatMrE commented Aug 31, 2021 • edited

tomaarsen commented Sep 2, 2021

tomaarsen left a comment • edited

Choose a reason for hiding this comment

tomaarsen Sep 2, 2021

Choose a reason for hiding this comment

tomaarsen Sep 10, 2021

Choose a reason for hiding this comment

BatMrE Sep 29, 2021

Choose a reason for hiding this comment

tomaarsen Sep 29, 2021

Choose a reason for hiding this comment

BatMrE Oct 6, 2021

Choose a reason for hiding this comment

stevenbird commented Sep 28, 2021

stevenbird commented Oct 10, 2021

BatMrE commented Oct 11, 2021

tomaarsen commented Oct 11, 2021

tomaarsen left a comment • edited

Choose a reason for hiding this comment

tomaarsen commented Nov 20, 2021

BatMrE commented Aug 31, 2021 •

edited

tomaarsen left a comment •

edited

tomaarsen left a comment •

edited