New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added multi Bleu functionality and tests #2793
Conversation
I've added a few more tests to give myself some more confidence that your changes work correctly - I hope that's okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: I had some time to push some of these suggestions myself. Note that I only modified comments, and not actual code. In short, the following text is outdated, but perhaps it helps explain my commit c730dc3.
The only change request that I still have is to have another look at the if isinstance(weights, tuple)
. Preferably, I'd like to keep weights=[0.8, 0.2]
as a working parameter. In this PR, that throws an exception.
Also, the docstring should be modified to include a modified return
and rtype
section.
E.g.
:return: The corpus-level BLEU score. Returns a list if multiple weights were supplied.
:rtype: float / list(float)
It would also be nice to include an example in the docstring of how weights can be used to finetune the BLEU score.
For example, placing the following comment in the whitespace here:
nltk/nltk/translate/bleu_score.py
Lines 147 to 149 in f989fe6
0.6223... | |
:param list_of_references: a corpus of lists of reference sentences, w.r.t. hypotheses |
Custom weights may be supplied to fine-tune the BLEU score further.
A tuple of float weights for unigrams, bigrams, trigrams and so on can be given.
>>> weights = (0.1, 0.3, 0.5, 0.1)
>>> corpus_bleu(list_of_references, hypotheses, weights=weights) # doctest: +ELLIPSIS
0.5818...
This particular weight gave extra value to trigrams.
Furthermore, multiple weights can be given, resulting in multiple BLEU scores.
>>> weights = [
... (0.5, 0.5),
... (0.333, 0.333, 0.334),
... (0.25, 0.25, 0.25, 0.25),
... (0.2, 0.2, 0.2, 0.2, 0.2)
... ]
>>> corpus_bleu(list_of_references, hypotheses, weights=weights) # doctest: +ELLIPSIS
[0.8242..., 0.7067..., 0.5920..., 0.4719...]
Also, sentence_bleu
should also have their docstring updated. This function simply calls corpus_bleu
, so it also gets upgraded with this PR.
nltk/translate/bleu_score.py
Outdated
if isinstance(weights, tuple): | ||
weights = [weights] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little bit dangerous. People might be using list
weights right now, and that would stop working. Perhaps a safer solution is to check if weights and isinstance(weights[0], float):
?
Currently, the following program will break:
from nltk.translate.bleu_score import corpus_bleu
# define ref's and hyp's here, left out for conciseness.
weights = [0.2, 0.8]
bleu_scores = corpus_bleu(
list_of_references=[[ref1a, ref1b, ref1c], [ref2a]],
hypotheses=[hyp1, hyp2],
weights=weights,
)
Throws the following exception:
Traceback (most recent call last):
File "[...]\nltk_2793.py", line 110, in <module>
out = bleu_scores = corpus_bleu(
File "[...]\nltk\translate\bleu_score.py", line 174, in corpus_bleu
max_weight_length = max(len(weight) for weight in weights)
File "[...]\nltk\translate\bleu_score.py", line 174, in <genexpr>
max_weight_length = max(len(weight) for weight in weights)
TypeError: object of type 'float' has no len()
This didn't break before. People will expect that something that worked before to still work later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BatMrE I think the functionality of using weights like [0.2, 0.8]
like in my example above should still be supported. I would rather not merge this PR until that is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tomaarsen,
I agree with your point however while testing for the script:
from nltk.translate.bleu_score import corpus_bleu
# define ref's and hyp's here, left out for conciseness.
weights = [0.2, 0.8]
bleu_scores = corpus_bleu(
list_of_references=[[ref1a, ref1b, ref1c], [ref2a]],
hypotheses=[hyp1, hyp2],
weights=weights,
)
with code:
if isinstance(weights, tuple):
weights = [weights]
max_weight_length = max(len(weight) for weight in weights)
I am getting the expected result as 0.7496592017168556
and 460 passed, 32 skipped, 9 xfailed, 1 warning in 222.27s (0:03:42)
also your code:
if weights and isinstance(weights[0], float):
max_weight_length = max(len(weight) for weight in weights)
gives the same result 0.7496592017168556
and 12 failed, 448 passed, 32 skipped, 9 xfailed, 1 warning in 210.82s (0:03:30)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how you get the expected result with your current implementation. I still only get
TypeError: object of type 'float' has no len()
This is the script I'm using:
hyp1 = [
"It",
"is",
"a",
"guide",
"to",
"action",
"which",
"ensures",
"that",
"the",
"military",
"always",
"obeys",
"the",
"commands",
"of",
"the",
"party",
]
ref1a = [
"It",
"is",
"a",
"guide",
"to",
"action",
"that",
"ensures",
"that",
"the",
"military",
"will",
"forever",
"heed",
"Party",
"commands",
]
ref1b = [
"It",
"is",
"the",
"guiding",
"principle",
"which",
"guarantees",
"the",
"military",
"forces",
"always",
"being",
"under",
"the",
"command",
"of",
"the",
"Party",
]
ref1c = [
"It",
"is",
"the",
"practical",
"guide",
"for",
"the",
"army",
"always",
"to",
"heed",
"the",
"directions",
"of",
"the",
"party",
]
hyp2 = [
"he",
"read",
"the",
"book",
"because",
"he",
"was",
"interested",
"in",
"world",
"history",
]
ref2a = [
"he",
"was",
"interested",
"in",
"world",
"history",
"because",
"he",
"read",
"the",
"book",
]
weights = [0.2, 0.8]
from nltk.translate.bleu_score import corpus_bleu
out = bleu_scores = corpus_bleu(
list_of_references=[[ref1a, ref1b, ref1c], [ref2a]],
hypotheses=[hyp1, hyp2],
weights=weights,
)
print(out)
Can you explain how you get the correct results with weights [0.2, 0.8]
currently?
Also, I meant something along the lines of
if weights and isinstance(weights[0], float):
weights = [weights]
max_weight_length = max(len(weight) for weight in weights)
That said, this does give problems when the weights are for example integers, like (0, 0, 1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated as per weights type, tests are passing, let me know for any issue will fix now
@BatMrE please take a look at the above feedback, thanks |
1c2050c
to
021a77f
Compare
@BatMrE thanks for your further work. Can you report whether you dealt with the requested changes from @tomaarsen please? |
Yes, the changes expected by @tomaarsen for different data types for weight is resolved in my implementation, tested with different formats |
I'll have a look at your changes when I have some more time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BatMrE One more thing, it seems that you've removed some commits that were meant to improve the documentation of this file, such as c730dc3. I'm not sure whether this was accidental, but I believe I should be able to re-push these if it was accidental.
There are some closed conversations on this PR that were solved by this commit.
Thank you for these changes @BatMrE! |
Feature as per #2320
from bleu_score import corpus_bleu