ZeroDivisionError when using SmoothingFunction's method4 with a single word as hypothesis #2838

KristiyanVachev · 2021-10-03T14:49:14Z

When using the SmoothingFunction's method4 with a a single word as a hypothesis I get the following error. Which is of course because math.log(1) is 0.
I ended up using just method2 or method1 which doesn't use the function that throws the error, but I guess it would be useful to report it, as some other poor soul may stumble with it.

from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import SmoothingFunction

refs = [['short', 'skirts'], ['too', 'expensive'], ['girls', 'wearing'], ['being', 'laughed', 'at']]
hyp = ['too']

chencherry = SmoothingFunction()
sentence_bleu(refs, hyp, weights=(1, 0, 0, 0), smoothing_function=chencherry.method4)

ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-130-00301cf17f56> in <module>()
      6 
      7 chencherry = SmoothingFunction()
----> 8 sentence_bleu(refs, hyp, weights=(1, 0, 0, 0), smoothing_function=chencherry.method4)

2 frames
/usr/local/lib/python3.7/dist-packages/nltk/translate/bleu_score.py in sentence_bleu(references, hypothesis, weights, smoothing_function, auto_reweigh, emulate_multibleu)
     87     return corpus_bleu([references], [hypothesis],
     88                         weights, smoothing_function, auto_reweigh,
---> 89                         emulate_multibleu)
     90 
     91 

/usr/local/lib/python3.7/dist-packages/nltk/translate/bleu_score.py in corpus_bleu(list_of_references, hypotheses, weights, smoothing_function, auto_reweigh, emulate_multibleu)
    197     #       smoothing method allows.
    198     p_n = smoothing_function(p_n, references=references, hypothesis=hypothesis,
--> 199                              hyp_len=hyp_len, emulate_multibleu=emulate_multibleu)
    200     s = (w * math.log(p_i) for i, (w, p_i) in enumerate(zip(weights, p_n)))
    201     s =  bp * math.exp(math.fsum(s))

/usr/local/lib/python3.7/dist-packages/nltk/translate/bleu_score.py in method4(self, p_n, references, hypothesis, hyp_len, *args, **kwargs)
    542         for i, p_i in enumerate(p_n):
    543             if p_i.numerator == 0 and hyp_len != 0:
--> 544                 incvnt = i+1 * self.k / math.log(hyp_len) # Note that this K is different from the K from NIST.
    545                 p_n[i] = 1 / incvnt
    546         return p_n

ZeroDivisionError: float division by zero

The text was updated successfully, but these errors were encountered:

h0rv · 2021-10-03T19:57:37Z

Looking at the problem, the develop branch has fixed the possibility of dividing by zero by adding hyp_len > 1. However, when passing in a hypothesis of length 1 (like in the example given), it throws the following...

ValueError                                Traceback (most recent call last)
<ipython-input-14-4bbf3f14b792> in <module>
      6
      7 chencherry = SmoothingFunction()
----> 8 sentence_bleu(refs, hyp, weights=(1, 0, 0, 0), smoothing_function=chencherry.method4)
      9

~/.local/lib/python3.9/site-packages/nltk/translate/bleu_score.py in sentence_bleu(references, hypothesis, weights, smoothing_function, auto_reweigh)
     96     :rtype: float
     97     """
---> 98     return corpus_bleu(
     99         [references], [hypothesis], weights, smoothing_function, auto_reweigh
    100     )

~/.local/lib/python3.9/site-packages/nltk/translate/bleu_score.py in corpus_bleu(list_of_references, hypotheses, weights, smoothing_function, auto_reweigh)
    218     )
    219     s = (w_i * math.log(p_i) for w_i, p_i in zip(weights, p_n))
--> 220     s = bp * math.exp(math.fsum(s))
    221     return s
    222

~/.local/lib/python3.9/site-packages/nltk/translate/bleu_score.py in <genexpr>(.0)
    217         p_n, references=references, hypothesis=hypothesis, hyp_len=hyp_lengths
    218     )
--> 219     s = (w_i * math.log(p_i) for w_i, p_i in zip(weights, p_n))
    220     s = bp * math.exp(math.fsum(s))
    221     return s

ValueError: math domain error

h0rv · 2021-10-03T21:53:43Z

I found the new problem. I will create a PR

tomaarsen added bug good first issue metrics labels Oct 3, 2021

h0rv mentioned this issue Oct 3, 2021

Fix Bleu Score smoothing function from taking log(0) #2839

Merged

tomaarsen linked a pull request Oct 4, 2021 that will close this issue

Fix Bleu Score smoothing function from taking log(0) #2839

Merged

stevenbird closed this as completed in #2839 Oct 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeroDivisionError when using SmoothingFunction's method4 with a single word as hypothesis #2838

ZeroDivisionError when using SmoothingFunction's method4 with a single word as hypothesis #2838

KristiyanVachev commented Oct 3, 2021 •

edited

h0rv commented Oct 3, 2021

h0rv commented Oct 3, 2021

ZeroDivisionError when using SmoothingFunction's method4 with a single word as hypothesis #2838

ZeroDivisionError when using SmoothingFunction's method4 with a single word as hypothesis #2838

Comments

KristiyanVachev commented Oct 3, 2021 • edited

h0rv commented Oct 3, 2021

h0rv commented Oct 3, 2021

KristiyanVachev commented Oct 3, 2021 •

edited