Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #2874
Hello!
Pull request overview
CITATION.cff
to nltk.CITATION.cff
fileThe schema guide for
CITATION.cff
files can be found here. TheCITATION.cff
which I've written consists of two parts:Most notably, the book part is listed under
preferred-citation
. When the "Cite this repository" button is clicked, this will be the part that it uses to generate the actual citation.The "Cite this repository" button
Here's a quick screenshot of it:
And after clicking on it:
It can be seen on my
feature/citation
branch of my fork: https://github.com/tomaarsen/nltk/tree/feature/citationThe NLTK software part
I'm open to discussions about this - citations are very important, and I would like to get a common agreement on the CITATION.cff. The next snippet is about the "Software" section of the citation, which provides information about NLTK as a software, but is not used in the BibTeX or APA citation.
title
: I've opted forNatural Language Toolkit
.nltk
orNLTK
are alternatives.message
: This is one of the default options, which lets users know to use the citation for the book instead.authors
: Here I've gone with the info as can be found in the setup.py file, as opposed to mentioning specific users. That said, we may want to list both individual users and then "NLTK Team" or "NLTK Contributors", as suggested in the third code block here.For example:
license
: I've listed Apache-2.0, as the code is licensed under that. That said, there are other licences for e.g. the documentation. We can also supply a list of licenses here.The remainder of this section speaks for itself.
The NLTK book part
This section specifies the part of the CITATION.cff which is actually used in the citation generation. The snippet is as follows:
title
: The full title of the booktype
: A book, as opposed to software.authors
: Listed in the same order as the authors of the book, and the citations as provided by Google Scholar. I've provided ORCID ID's of Ewan Klein and Steven Bird. Obviously, these can be removed. Beyond that, more information can be added here about each author, but I left it fairly minimal. Also, one more thing to note: The order of authors from this PR, from Google Scholar, and from the book itself differ from the order of authors on the current README.This is definitely something to look into!
year
: Speaks for itself. I decided to not include the month or date, as they don't seem to be included in the BibTeX that Google Scholar provides.publisher
: "O'Reilly Media, Inc." differs slightly from the citation that Google Scholar provides, which is " O'Reilly Media, Inc." (with an extra space before). This is also something to consider, as it affects the generated APA or BibTeX citation.Generated citations:
APA
Generated by Google Scholar:
Generated by GitHub from https://github.com/tomaarsen/nltk/tree/feature/citation according to this CITATION.cff:
These are nearly identical - the difference is that the Google Scholar citation includes an additional space for the publisher, alongside extra quotes.
BibTeX
Generated by Google Scholar:
Generated by GitHub from https://github.com/tomaarsen/nltk/tree/feature/citation according to this CITATION.cff:
Yet again these are nearly identical, with the same difference in publisher. Beyond that, the order is different, which should not matter. The label with which these can be referenced also differs, but that's also not a big deal.
For context, this is the citation that is currently suggested to be used according to the README:
(Some notes: the differing order of authors, and the shrunk title)
Note
This entire PR assumes that we still wish for citations to cite the book, rather than the software.