Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can UTF8StreamJsonParser#finishString be made faster with VarHandles? #929

Open
jpountz opened this issue Feb 27, 2023 · 2 comments
Open

Comments

@jpountz
Copy link

jpountz commented Feb 27, 2023

I was looking at UTF8StreamJsonParser#finishString which seems to mostly consist of scanning for a trailing quote, with an optimized code path for ASCII strings. This optimized code path for ASCII strings could be further optimized by using VarHandles to compare multiple bytes at once.

The current code looks like this:

        while (ptr < max) {
            int c = (int) inputBuffer[ptr] & 0xFF;
            if (codes[c] != 0) {
                if (c == INT_QUOTE) {
                    _inputPtr = ptr+1;
                    _textBuffer.setCurrentLength(outPtr);
                    return;
                }
                break;
            }
            ++ptr;
            outBuf[outPtr++] = (char) c;
        }

and could become something like:

private static final VarHandle VH_LE_LONG =
      MethodHandles.byteArrayViewVarHandle(long[].class, ByteOrder.LITTLE_ENDIAN);

[...]

        while (ptr <= max - Long.BYTES) {
            long next8Bytes = VH_LE_LONG.get(inputBuffer, ptr);
            if ((next8Bytes & 0x8080808080808080L) != 0) {
                // At least one of the bytes has the higher bit set, this is not a pure ASCII string
                break;
            }
            if (hasValue(next8Bytes, INT_QUOTE)) { // Implement hasValue based on https://graphics.stanford.edu/~seander/bithacks.html#ValueInWord
                // one of the next 8 bytes is a quote
                break;
            }
            // Maybe this loop can become unnecessary via https://github.com/FasterXML/jackson-core/issues/910
            for (int i = 0; i < Long.BYTES; ++i) {
              outBuf[outPtr + i] = (char) inputBuffer[ptr + i];
            }
            ptr += Long.BYTES;
            outPtr += Long.BYTES;
        }
        while (ptr < max) {
            int c = (int) inputBuffer[ptr] & 0xFF;
            if (codes[c] != 0) {
                if (c == INT_QUOTE) {
                    _inputPtr = ptr+1;
                    _textBuffer.setCurrentLength(outPtr);
                    return;
                }
                break;
            }
            ++ptr;
            outBuf[outPtr++] = (char) c;
        }

As VarHandles were introduced in Java 9, this would require releasing a MR JAR or bumping the min required version.

I haven't had a chance to measure if it made a significant difference but wanted to log the idea in case it gets someone's attention.

@pjfanning
Copy link
Member

  • looks like an interesting idea
  • jackson-core is a MR jar (in 2.15 branch, at least - 2.14 and before are not MR jars) - only changed recently
  • Jackson is a volunteer project so it would be best if someone volunteered to drive this
  • if someone is using jmh to benchmark this - they should be careful to set the MR setting in the local jar - https://github.com/pjfanning/jackson-number-parse-bench does this

@cowtowncoder
Copy link
Member

Sounds interesting, although I would not be too excited about multi-release jars (I know there's bit of that already for FastDoubleParser, but that's just merged in and not maintained as part of jackson-core).

When (... if) we finally get back to Jackson 3.0 work tho, baseline is probably going to be increased past Java 8 (at least 11 but I suspect by the time, 17 would be better choice) and that'd be be moot issue.

But as per @pjfanning's comments, what would really be interesting would be jmh benchmarking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants