Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LC cutoff policy of text tiling #2936

Merged
merged 1 commit into from
Mar 26, 2022
Merged

Fix LC cutoff policy of text tiling #2936

merged 1 commit into from
Mar 26, 2022

Conversation

richarddwang
Copy link
Contributor

According to the paper (Hearst, Marti A.. “Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages.” Comput. Linguistics 23 (1997): 33-64.) that the NLTK TextTiling algorithm is probably based on,

One version of this function entails drawing a boundary only if the depth score exceeds s_hat - sigma (the liberal measure, LC). This function can be varied to achieve correspondingly varying precision/recall trade-offs. A higher precision but lower recall can be found by setting the limit to be depth scores exceeding s_hat - sigma/2 (the conservative measure, HC) instead of s_hat - sigma

I believe LC/HC uses a threshold which is (s_hat - sigma)/(s_hat - sigma/2) respectively.

@stevenbird stevenbird self-assigned this Jan 25, 2022
@stevenbird
Copy link
Member

@richarddwang - thanks for the fix (and answering my question about this conditional).

@richarddwang
Copy link
Contributor Author

@stevenbird - Is it good to be merged? Or we need some extra modification?

@stevenbird stevenbird merged commit bfb88ce into nltk:develop Mar 26, 2022
@stevenbird
Copy link
Member

Thanks @richarddwang. Sorry for the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants