Allowed properties overloading for CoreNLPParser tag, Authors.md #2789

BatMrE · 2021-08-29T12:27:40Z

This is a fix to #2112
Added the overloading properties in code as per the standards

before fix : [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111\xa01111\xa01111', 'O')]
after fix : [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111', 'DATE'), ('1111', 'DATE'), ('1111', 'DATE')]

tomaarsen · 2021-08-30T08:06:49Z

For those interested in the reasoning behind the changes in this PR, feel free to have a look at #2112 itself, where all the changes are already described and laid out.

dimazest · 2021-08-31T00:20:54Z

nltk/parse/corenlp.py

@@ -339,9 +339,11 @@ def tag_sents(self, sentences):
        """
        # Converting list(list(str)) -> list(str)
        sentences = (" ".join(words) for words in sentences)
-        return [sentences[0] for sentences in self.raw_tag_sents(sentences)]
+        if properties is None:


Please add a description for the properties argument in the docstring, also mention that "tokenize.whitespace" is set to True by default (but it won't if properties are specified.

dimazest · 2021-08-31T00:22:44Z

before fix : [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111\xa01111\xa01111', 'O')]
after fix : [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111', 'DATE'), ('1111', 'DATE'), ('1111', 'DATE')]

would it be possible to make a test out of this example?

BatMrE · 2021-09-02T12:41:34Z

before fix : [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111\xa01111\xa01111', 'O')]
after fix : [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111', 'DATE'), ('1111', 'DATE'), ('1111', 'DATE')]

would it be possible to make a test out of this example?

Sure , bit busy, will add over weekend

BatMrE · 2021-09-05T17:52:58Z

before fix : [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111\xa01111\xa01111', 'O')]
after fix : [('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111', 'DATE'), ('1111', 'DATE'), ('1111', 'DATE')]

would it be possible to make a test out of this example?

@tomaarsen @dimazest
any idea how to create 'api_return_value' dictionary for a sample?? like in:
https://github.com/nltk/nltk/blob/develop/nltk/test/unit/test_corenlp.py#L261

dimazest · 2021-09-07T19:10:53Z

you should be able to manually use CoreNLP Rest API and intercept returned result

tomaarsen · 2021-09-07T20:09:14Z

Keep in mind that the CoreNLP tests are being skipped, because our GitHub Actions CI doesn't currently support downloading third party tools such as the Stanford models. Preferably we would still have tests for this PR though, as we likely want to support third party tools eventually.

dimazest · 2021-09-07T20:11:05Z

The actual call to corenlp should be mock, then tests won't need corenlp running.

tomaarsen · 2021-10-12T09:10:44Z

Since #2820 there are automated tests for CoreNLP. We would like to see some tests to back up these changes, is possible.

tomaarsen · 2021-10-26T12:23:06Z

nltk/parse/corenlp.py

@@ -339,9 +339,11 @@ def tag_sents(self, sentences):
        """
        # Converting list(list(str)) -> list(str)
        sentences = (" ".join(words) for words in sentences)
-        return [sentences[0] for sentences in self.raw_tag_sents(sentences)]
+        if properties is None:
+            properties = {"tokenize.whitespace": "true"}


I've merged develop into this PR. Because the CoreNLP tests now run on the CI, we can see that some tests fail due to this change.

@tomaarsen apart from automated tests for CoreNLP is there any other test that needs to be added

The PR should have some tests that show that the new behaviour works like intended, so you would only need to add tests for the functionality that you added. So, just for these functions.

stevenbird · 2022-07-05T12:30:05Z

@BatMrE: just wondering if there's interest in wrapping this up. I think it's just a question of adding a test or two.

Allowed properties overloading for CoreNLPParser tag, Authors.md

bbbb697

dimazest reviewed Aug 31, 2021

View reviewed changes

tomaarsen added enhancement parsing labels Sep 12, 2021

Merge branch 'develop' into hotfix/CoreNLPParser-tag

e8735d6

tomaarsen requested changes Oct 26, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowed properties overloading for CoreNLPParser tag, Authors.md #2789

Allowed properties overloading for CoreNLPParser tag, Authors.md #2789

BatMrE commented Aug 29, 2021 •

edited

tomaarsen commented Aug 30, 2021

dimazest Aug 31, 2021

dimazest commented Aug 31, 2021 •

edited

BatMrE commented Sep 2, 2021

BatMrE commented Sep 5, 2021

dimazest commented Sep 7, 2021

tomaarsen commented Sep 7, 2021

dimazest commented Sep 7, 2021

tomaarsen commented Oct 12, 2021 •

edited

tomaarsen Oct 26, 2021

BatMrE Nov 28, 2021

tomaarsen Nov 28, 2021

stevenbird commented Jul 5, 2022

Allowed properties overloading for CoreNLPParser tag, Authors.md #2789

Are you sure you want to change the base?

Allowed properties overloading for CoreNLPParser tag, Authors.md #2789

Conversation

BatMrE commented Aug 29, 2021 • edited

tomaarsen commented Aug 30, 2021

dimazest Aug 31, 2021

Choose a reason for hiding this comment

dimazest commented Aug 31, 2021 • edited

BatMrE commented Sep 2, 2021

BatMrE commented Sep 5, 2021

dimazest commented Sep 7, 2021

tomaarsen commented Sep 7, 2021

dimazest commented Sep 7, 2021

tomaarsen commented Oct 12, 2021 • edited

tomaarsen Oct 26, 2021

Choose a reason for hiding this comment

BatMrE Nov 28, 2021

Choose a reason for hiding this comment

tomaarsen Nov 28, 2021

Choose a reason for hiding this comment

stevenbird commented Jul 5, 2022

BatMrE commented Aug 29, 2021 •

edited

dimazest commented Aug 31, 2021 •

edited

tomaarsen commented Oct 12, 2021 •

edited