Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Markdown corpus readers #2902

Merged
merged 2 commits into from Jul 19, 2022
Merged

Create Markdown corpus readers #2902

merged 2 commits into from Jul 19, 2022

Conversation

elespike
Copy link
Contributor

@elespike elespike commented Dec 6, 2021

Hello,

I've written these markdown corpus readers to handle markdown documents, and figured I'd ask about including them in NLTK.

There are 4 requirements that would be new:

  • markdown_it
  • mdit_plain
  • mdit_py_plugins
  • yaml

Hopefully you'll find this useful!

@tomaarsen
Copy link
Member

@elespike
#2903 has fixed a broken test case which snuck into this PR. I've merged develop into this PR to resolve this broken test.

@stevenbird
Copy link
Member

@elespike – thanks for your contribution. Expanding NLTK functionality brings a maintenance cost, so there needs to be a clear case for adding code. Can you please make the case, e.g. by pointing us to markdown corpora used in NLP work? Thanks.

@elespike
Copy link
Contributor Author

elespike commented Dec 7, 2021

@stevenbird While I'm not familiar with and haven't found current NLP work using markdown corpora, the inspiration here is to be able to more easily perform NLP on knowledge bases built with markdown, either with:

These readers allow for easy extraction of quotes, code blocks, links, images, and sections, as well as everything that NLTK can do with plain text. =)

edit: forgot to mention https://stackedit.io/, and also Jupyter notebooks!

@stevenbird stevenbird merged commit 1f28903 into nltk:develop Jul 19, 2022
@stevenbird
Copy link
Member

@elespike: thanks for your patience... this does look useful, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants