Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError when using SmoothingFunction's method4 with a single word as hypothesis #2838

Closed
KristiyanVachev opened this issue Oct 3, 2021 · 2 comments · Fixed by #2839

Comments

@KristiyanVachev
Copy link

KristiyanVachev commented Oct 3, 2021

When using the SmoothingFunction's method4 with a a single word as a hypothesis I get the following error. Which is of course because math.log(1) is 0.
I ended up using just method2 or method1 which doesn't use the function that throws the error, but I guess it would be useful to report it, as some other poor soul may stumble with it.

from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import SmoothingFunction

refs = [['short', 'skirts'], ['too', 'expensive'], ['girls', 'wearing'], ['being', 'laughed', 'at']]
hyp = ['too']

chencherry = SmoothingFunction()
sentence_bleu(refs, hyp, weights=(1, 0, 0, 0), smoothing_function=chencherry.method4)

ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-130-00301cf17f56> in <module>()
      6 
      7 chencherry = SmoothingFunction()
----> 8 sentence_bleu(refs, hyp, weights=(1, 0, 0, 0), smoothing_function=chencherry.method4)

2 frames
/usr/local/lib/python3.7/dist-packages/nltk/translate/bleu_score.py in sentence_bleu(references, hypothesis, weights, smoothing_function, auto_reweigh, emulate_multibleu)
     87     return corpus_bleu([references], [hypothesis],
     88                         weights, smoothing_function, auto_reweigh,
---> 89                         emulate_multibleu)
     90 
     91 

/usr/local/lib/python3.7/dist-packages/nltk/translate/bleu_score.py in corpus_bleu(list_of_references, hypotheses, weights, smoothing_function, auto_reweigh, emulate_multibleu)
    197     #       smoothing method allows.
    198     p_n = smoothing_function(p_n, references=references, hypothesis=hypothesis,
--> 199                              hyp_len=hyp_len, emulate_multibleu=emulate_multibleu)
    200     s = (w * math.log(p_i) for i, (w, p_i) in enumerate(zip(weights, p_n)))
    201     s =  bp * math.exp(math.fsum(s))

/usr/local/lib/python3.7/dist-packages/nltk/translate/bleu_score.py in method4(self, p_n, references, hypothesis, hyp_len, *args, **kwargs)
    542         for i, p_i in enumerate(p_n):
    543             if p_i.numerator == 0 and hyp_len != 0:
--> 544                 incvnt = i+1 * self.k / math.log(hyp_len) # Note that this K is different from the K from NIST.
    545                 p_n[i] = 1 / incvnt
    546         return p_n

ZeroDivisionError: float division by zero
@h0rv
Copy link
Contributor

h0rv commented Oct 3, 2021

Looking at the problem, the develop branch has fixed the possibility of dividing by zero by adding hyp_len > 1. However, when passing in a hypothesis of length 1 (like in the example given), it throws the following...

ValueError                                Traceback (most recent call last)
<ipython-input-14-4bbf3f14b792> in <module>
      6
      7 chencherry = SmoothingFunction()
----> 8 sentence_bleu(refs, hyp, weights=(1, 0, 0, 0), smoothing_function=chencherry.method4)
      9

~/.local/lib/python3.9/site-packages/nltk/translate/bleu_score.py in sentence_bleu(references, hypothesis, weights, smoothing_function, auto_reweigh)
     96     :rtype: float
     97     """
---> 98     return corpus_bleu(
     99         [references], [hypothesis], weights, smoothing_function, auto_reweigh
    100     )

~/.local/lib/python3.9/site-packages/nltk/translate/bleu_score.py in corpus_bleu(list_of_references, hypotheses, weights, smoothing_function, auto_reweigh)
    218     )
    219     s = (w_i * math.log(p_i) for w_i, p_i in zip(weights, p_n))
--> 220     s = bp * math.exp(math.fsum(s))
    221     return s
    222

~/.local/lib/python3.9/site-packages/nltk/translate/bleu_score.py in <genexpr>(.0)
    217         p_n, references=references, hypothesis=hypothesis, hyp_len=hyp_lengths
    218     )
--> 219     s = (w_i * math.log(p_i) for w_i, p_i in zip(weights, p_n))
    220     s = bp * math.exp(math.fsum(s))
    221     return s

ValueError: math domain error

@h0rv
Copy link
Contributor

h0rv commented Oct 3, 2021

I found the new problem. I will create a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants