Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate feasibility of UTF8StreamJsonParser without canonicalization #994

Open
carterkozak opened this issue Apr 21, 2023 · 1 comment

Comments

@carterkozak
Copy link
Contributor

This is a more specific follow-up from one of the components of FasterXML/jackson-benchmarks#6

I'd like to determine whether this comment still holds true, initial benchmarking showed a fairly substantial improvement for small inputs, and the UTF8StreamJsonParser has optimizations outside of keys that I'd expect to tip the scales in its favor.

if (enc == JsonEncoding.UTF8) {
/* and without canonicalization, byte-based approach is not performant; just use std UTF-8 reader
* (which is ok for larger input; not so hot for smaller; but this is not a common case)
*/
if (JsonFactory.Feature.CANONICALIZE_FIELD_NAMES.enabledIn(factoryFeatures)) {
ByteQuadsCanonicalizer can = rootByteSymbols.makeChild(factoryFeatures);
return new UTF8StreamJsonParser(_context, parserFeatures, _in, codec, can,
_inputBuffer, _inputPtr, _inputEnd, bytesProcessed, _bufferRecyclable);
}
}
return new ReaderBasedJsonParser(_context, parserFeatures, constructReader(), codec,
rootCharSymbols.makeChild(factoryFeatures));

carterkozak added a commit to carterkozak/jackson-core that referenced this issue Apr 21, 2023
…icalization

Previously, the ReaderBasedJsonParser was used instead, which
is less performant when reading from an InputStream (and handling
charset decoding in addition to json parsing).

This commit updates the JsonFactory factory methods to respect
the canonicalization configuration, where previously a canonicalizing
implementaiton was always used.

I have added guards around both `_symbols.addName` and
`_symbols.findName` based on the existing implementation from
`SmileParser`. For correctness, only the guards around `addName`
are required, but we avoid unnecessary hashing by guarding
both.
@cowtowncoder
Copy link
Member

+1 for this; hopefully for 2.16 timeframe (I saw PR).

carterkozak added a commit to carterkozak/jackson-core that referenced this issue Apr 24, 2023
…icalization

Previously, the ReaderBasedJsonParser was used instead, which
is less performant when reading from an InputStream (and handling
charset decoding in addition to json parsing).

This commit updates the JsonFactory factory methods to respect
the canonicalization configuration, where previously a canonicalizing
implementaiton was always used.

I have added guards around both `_symbols.addName` and
`_symbols.findName` based on the existing implementation from
`SmileParser`. For correctness, only the guards around `addName`
are required, but we avoid unnecessary hashing by guarding
both.
carterkozak added a commit to carterkozak/jackson-core that referenced this issue Apr 24, 2023
…icalization

Previously, the ReaderBasedJsonParser was used instead, which
is less performant when reading from an InputStream (and handling
charset decoding in addition to json parsing).

This commit updates the JsonFactory factory methods to respect
the canonicalization configuration, where previously a canonicalizing
implementaiton was always used.

I have added guards around both `_symbols.addName` and
`_symbols.findName` based on the existing implementation from
`SmileParser`. For correctness, only the guards around `addName`
are required, but we avoid unnecessary hashing by guarding
both.
carterkozak added a commit to carterkozak/jackson-core that referenced this issue May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants