Fix LC cutoff policy of text tiling #2936

richarddwang · 2022-01-25T03:22:32Z

According to the paper (Hearst, Marti A.. “Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages.” Comput. Linguistics 23 (1997): 33-64.) that the NLTK TextTiling algorithm is probably based on,

One version of this function entails drawing a boundary only if the depth score exceeds s_hat - sigma (the liberal measure, LC). This function can be varied to achieve correspondingly varying precision/recall trade-offs. A higher precision but lower recall can be found by setting the limit to be depth scores exceeding s_hat - sigma/2 (the conservative measure, HC) instead of s_hat - sigma

I believe LC/HC uses a threshold which is (s_hat - sigma)/(s_hat - sigma/2) respectively.

stevenbird · 2022-01-25T06:57:15Z

@richarddwang - thanks for the fix (and answering my question about this conditional).

richarddwang · 2022-02-11T08:02:06Z

@stevenbird - Is it good to be merged? Or we need some extra modification?

stevenbird · 2022-03-26T22:04:19Z

Thanks @richarddwang. Sorry for the delay!

fix LC cutoff policy of text tiling

b35b4c6

stevenbird self-assigned this Jan 25, 2022

stevenbird merged commit bfb88ce into nltk:develop Mar 26, 2022

tomaarsen mentioned this pull request Dec 13, 2022

TextTiling cutoff_policy (HC is not set properly) #2497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LC cutoff policy of text tiling #2936

Fix LC cutoff policy of text tiling #2936

richarddwang commented Jan 25, 2022

stevenbird commented Jan 25, 2022

richarddwang commented Feb 11, 2022

stevenbird commented Mar 26, 2022

Fix LC cutoff policy of text tiling #2936

Fix LC cutoff policy of text tiling #2936

Conversation

richarddwang commented Jan 25, 2022

stevenbird commented Jan 25, 2022

richarddwang commented Feb 11, 2022

stevenbird commented Mar 26, 2022