Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove duplicate document in dataset #697

Merged
merged 3 commits into from Feb 25, 2020
Merged

Conversation

srayuws
Copy link
Contributor

@srayuws srayuws commented Feb 21, 2020

Language Modeling part is duplicated at the start and between Entailment and Machine Translation

Language Modeling part is duplicated at the start and between Entailment and Machine Translation
Copy link
Contributor

@zhangguanheng66 zhangguanheng66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we have Wikitext2, WikiText103, and PennTreebank datasets in both experimental and normal folder. Not sure why you think they are duplicate.

@srayuws
Copy link
Contributor Author

srayuws commented Feb 22, 2020

In the documentation page https://pytorch.org/text/datasets.html here, there are two identical sections of Language Modeling :

https://pytorch.org/text/datasets.html#language-modeling, and
https://pytorch.org/text/datasets.html#id1

I think these two sections are duplicate? Or we need some more description about their difference?

@zhangguanheng66
Copy link
Contributor

Got it. I would rather remove the second one. Thanks.

@srayuws
Copy link
Contributor Author

srayuws commented Feb 25, 2020

Updated. Also added some empty newlines to make each section is separated by two lines as in other parts.

@zhangguanheng66 zhangguanheng66 merged commit 995e5cc into pytorch:master Feb 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants