Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize HpackStaticTable by using a perfect hash function #12713

Merged
merged 8 commits into from Oct 3, 2022

Conversation

amirhadadi
Copy link
Contributor

@amirhadadi amirhadadi commented Aug 18, 2022

Motivation:

HpackStaticTable performance can be improved by using a perfect hash function.

Modifications:

Use 2 tables, one for mapping header name -> index and one for mapping header name + value -> index.
Choose the tables size and the hash function in such a way that each hash bucket contains a single entry.

Results:

Benchmark (optimized) Mode Cnt Score Error Units
HpackStaticTableBenchmark.lookupHttp false avgt 10 15.998 ±1.646 ns/op
HpackStaticTableBenchmark.lookupHttp true avgt 10 10.457 ±0.274 ns/op
HpackStaticTableBenchmark.lookupHttps false avgt 10 20.942 ±1.365 ns/op
HpackStaticTableBenchmark.lookupHttps true avgt 10 10.618 ±0.138 ns/op
HpackStaticTableBenchmark.lookupNameOnlyMatchEndTable false avgt 10 13.710 ±0.273 ns/op
HpackStaticTableBenchmark.lookupNameOnlyMatchEndTable true avgt 10 3.156 ±0.052 ns/op
HpackStaticTableBenchmark.lookupNoNameMatch false avgt 10 3.528 ±0.047 ns/op
HpackStaticTableBenchmark.lookupNoNameMatch true avgt 10 3.145 ±0.031 ns/op

Caveats:

This implementation couples HpackStaticTable implementation to the implementation of AsciiString.hashCode, relying on the values it returns for the static table headers to yield a perfect hash function. If AsciiString.hashCode implementation changes, HpackStaticTable implementation will also need to change. Moreover, if AsciiString.hashCode can return different values on different jvm instances (for reasons other than endianness), then it invalidates the approach taken here (or at least makes its implementation much more complex).

Motivation:

HpackStaticTable performance can be improved by using a perfect hash function.

Modifications:

Use 2 tables, one for mapping header name -> index and one for mapping header name + value -> index.
Choose the tables and the hash function in such a way that each entry maps to a single hash bucket.

Results:

Benchmark                                      (optimized)  Mode  Cnt   Score   Error  Units
HpackStaticTableBenchmark.lookupHttp                 false  avgt   10  15.998 ± 1.646  ns/op
HpackStaticTableBenchmark.lookupHttp                  true  avgt   10  10.457 ± 0.274  ns/op
HpackStaticTableBenchmark.lookupHttps                false  avgt   10  20.942 ± 1.365  ns/op
HpackStaticTableBenchmark.lookupHttps                 true  avgt   10  10.618 ± 0.138  ns/op
HpackStaticTableBenchmark.lookupNameOnlyMatch        false  avgt   10  13.710 ± 0.273  ns/op
HpackStaticTableBenchmark.lookupNameOnlyMatch         true  avgt   10   3.156 ± 0.052  ns/op
HpackStaticTableBenchmark.lookupNoNameMatch          false  avgt   10   3.528 ± 0.047  ns/op
HpackStaticTableBenchmark.lookupNoNameMatch           true  avgt   10   3.145 ± 0.031  ns/op

Caveats:

This implementation couples HpackStaticTable implementation to the implementation of AsciiString.hashCode, relying on the values it returns for the static table headers to yield a perfect hash function. If AsciiString.hashCode implementation changes, HpackStaticTable implementation will also need to change. Moreover, if AsciiString.hashCode can return different values on different platforms (maybe due to endianness) or in general in different jvm instances, then it invalidates the approach taken here (or at least makes its implementation much more complex).
Analyzing HpackEncoder code, we cannot reach the relevant HpackStaticTable methods with null name or value.
@chrisvest
Copy link
Contributor

I don't think AsciiString.hashCode has any dependence on endianness. The computed value should be the same on all platforms. If you find that it isn't, then that's a bug.

@amirhadadi
Copy link
Contributor Author

@chrisvest if you execute PlatformDependent.hashCodeAsciiSafe(new byte[]{1,2}, 0, 2) and switch the endianness in PlatformDependent by modifying PlatformDependent.BIG_ENDIAN_NATIVE_ORDER you get:

BIG_ENDIAN_NATIVE_ORDER = true => PlatformDependent.hashCodeAsciiSafe(new byte[]{1,2}, 0, 2) == -1008832313

BIG_ENDIAN_NATIVE_ORDER = false => PlatformDependent.hashCodeAsciiSafe(new byte[]{1,2}, 0, 2) == -1008832058

The reason is that the unsafe implementation PlatformDependent0.hashCodeAscii uses UNSAFE.getShort/Int/Long which depend on endianness, and the safe implementation should give the same results.

I'm sure @Scottmitch can explain further, as he implemented that.

BTW, are there any CI builds running on big endian machines?

@chrisvest
Copy link
Contributor

@amirhadadi Did you run that on a BE system? You can't just change PlatformDependent.BIG_ENDIAN_NATIVE_ORDER to true if the system isn't actually BE.

But no, I don't think we have any CI jobs that run BE.

@amirhadadi
Copy link
Contributor Author

@chrisvest originally I didn't test it on a BE system, but after you asked I managed to use the approach outlined here to test it on an emulated s390x architecture.
I verified:

  1. That PlatformDependent.BIG_ENDIAN_NATIVE_ORDER == true.
  2. That PlatformDependent0.hashCodeAscii(new byte[]{1,2}, 0, 2) == -1008832313.
  3. That HpackStaticTableTest is passing.

Regarding

You can't just change PlatformDependent.BIG_ENDIAN_NATIVE_ORDER to true if the system isn't actually BE.

You are 100% right that just flipping this flag will not emulate a big endian system, but I wanted to check something very specific. I only wanted to check what values PlatformDependent.hashCodeAsciiSafe would return for a big endian platform. This boils down to code such as:

private static short getShortSafe(byte[] bytes, int offset) {
        if (BIG_ENDIAN_NATIVE_ORDER) {
            return (short) (bytes[offset] << 8 | (bytes[offset + 1] & 0xff));
        }
        return (short) (bytes[offset] & 0xff | (bytes[offset + 1] << 8));
}

So it at least appears to seem correct to just flip BIG_ENDIAN_NATIVE_ORDER and call PlatformDependent.hashCodeAsciiSafe.

@normanmaurer
Copy link
Member

@chrisvest so you are happy with that ?

@normanmaurer
Copy link
Member

@ejona86 can you check as well ?

Copy link
Member

@normanmaurer normanmaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
@chrisvest
Copy link
Contributor

@normanmaurer yeah, I'm happy with the reply. I'll make a note to fix the hashcode so BE computes the same as LE.

@normanmaurer
Copy link
Member

I would love to have @ejona86 to at least double check as well

@chrisvest
Copy link
Contributor

Let's merge this

@chrisvest chrisvest merged commit 2ed95c9 into netty:4.1 Oct 3, 2022
@chrisvest chrisvest added this to the 4.1.83.Final milestone Oct 3, 2022
chrisvest pushed a commit that referenced this pull request Oct 3, 2022
Motivation:

HpackStaticTable performance can be improved by using a perfect hash function.

Modifications:

Use 2 tables, one for mapping header name -> index and one for mapping header name + value -> index.
Choose the tables and the hash function in such a way that each entry maps to a single hash bucket.

Results:

Benchmark                                      (optimized)  Mode  Cnt   Score   Error  Units
HpackStaticTableBenchmark.lookupHttp                 false  avgt   10  15.998 ± 1.646  ns/op
HpackStaticTableBenchmark.lookupHttp                  true  avgt   10  10.457 ± 0.274  ns/op
HpackStaticTableBenchmark.lookupHttps                false  avgt   10  20.942 ± 1.365  ns/op
HpackStaticTableBenchmark.lookupHttps                 true  avgt   10  10.618 ± 0.138  ns/op
HpackStaticTableBenchmark.lookupNameOnlyMatch        false  avgt   10  13.710 ± 0.273  ns/op
HpackStaticTableBenchmark.lookupNameOnlyMatch         true  avgt   10   3.156 ± 0.052  ns/op
HpackStaticTableBenchmark.lookupNoNameMatch          false  avgt   10   3.528 ± 0.047  ns/op
HpackStaticTableBenchmark.lookupNoNameMatch           true  avgt   10   3.145 ± 0.031  ns/op

Caveats:

This implementation couples HpackStaticTable implementation to the implementation of AsciiString.hashCode, relying on the values it returns for the static table headers to yield a perfect hash function. If AsciiString.hashCode implementation changes, HpackStaticTable implementation will also need to change. Moreover, if AsciiString.hashCode can return different values on different platforms (maybe due to endianness) or in general in different jvm instances, then it invalidates the approach taken here (or at least makes its implementation much more complex).

Co-authored-by: ahadadi <ahadadi@outbrain.com>
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
@chrisvest chrisvest modified the milestones: 4.1.83.Final, 5.0.0.Alpha6 Oct 3, 2022
@normanmaurer
Copy link
Member

@amirhadadi thanks for all the hard work on this one !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants