remove duplicate document in dataset #697

srayuws · 2020-02-21T07:08:48Z

Language Modeling part is duplicated at the start and between Entailment and Machine Translation

zhangguanheng66

Currently, we have Wikitext2, WikiText103, and PennTreebank datasets in both experimental and normal folder. Not sure why you think they are duplicate.

srayuws · 2020-02-22T06:27:24Z

In the documentation page https://pytorch.org/text/datasets.html here, there are two identical sections of Language Modeling :

https://pytorch.org/text/datasets.html#language-modeling, and
https://pytorch.org/text/datasets.html#id1

I think these two sections are duplicate? Or we need some more description about their difference?

zhangguanheng66 · 2020-02-24T15:48:15Z

Got it. I would rather remove the second one. Thanks.

srayuws · 2020-02-25T03:42:23Z

Updated. Also added some empty newlines to make each section is separated by two lines as in other parts.

remove duplicate document in dataset

96b007c

Language Modeling part is duplicated at the start and between Entailment and Machine Translation

zhangguanheng66 reviewed Feb 21, 2020

View reviewed changes

zhangguanheng66 self-requested a review February 24, 2020 15:47

move language modeling to the first section

1deb6f7

zhangguanheng66 approved these changes Feb 25, 2020

View reviewed changes

Merge branch 'master' into master

ecaed14

zhangguanheng66 merged commit 995e5cc into pytorch:master Feb 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove duplicate document in dataset #697

remove duplicate document in dataset #697

srayuws commented Feb 21, 2020

zhangguanheng66 left a comment

srayuws commented Feb 22, 2020

zhangguanheng66 commented Feb 24, 2020

srayuws commented Feb 25, 2020

remove duplicate document in dataset #697

remove duplicate document in dataset #697

Conversation

srayuws commented Feb 21, 2020

zhangguanheng66 left a comment

Choose a reason for hiding this comment

srayuws commented Feb 22, 2020

zhangguanheng66 commented Feb 24, 2020

srayuws commented Feb 25, 2020