Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add json "strict" parameter to CoreNLP #2993

Merged
merged 1 commit into from
May 13, 2022
Merged

Conversation

james-huang
Copy link
Contributor

@james-huang james-huang commented May 4, 2022

This allows the (optional) processing of text control characters without raising errors.

Attempting to process text with control characters like the vertical tab \x0b/\v causes the following error:

from nltk.parse.corenlp import CoreNLPParser
CORENLP_PARSER = CoreNLPParser(url='http://localhost:9000/')

CORENLP_PARSER.api_call(
    'Hello\x0bWorld!',
    properties={
        'annotators': 'ssplit',
        'tokenize.language': 'en',
    }
)

# JSONDecodeError
# Invalid control character at: line x column y (char z)

A simple fix to this is to allow processing of text that does not follow strict json specs.
This is done by passing strict=False.

CoreNLP deals in strings and doesn't really care about the json specs.
The commit currently maintains backwards compatibility.
Maybe there is even an argument made to make the default non-strict?

[1] https://docs.python-requests.org/en/latest/api/#requests.Response.json
[2] https://docs.python.org/3/library/json.html#json.loads
[3] https://docs.python.org/3/library/json.html#json.JSONDecoder

This allows the (optional) processing of text control characters without raising errors.
@stevenbird stevenbird merged commit 11d36fb into nltk:develop May 13, 2022
@stevenbird
Copy link
Member

Thanks @james-huang

tomaarsen added a commit to tomaarsen/nltk that referenced this pull request Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants