Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prefer use of java.util.zip.CRC32C #269

Merged
merged 1 commit into from Jan 20, 2021
Merged

prefer use of java.util.zip.CRC32C #269

merged 1 commit into from Jan 20, 2021

Conversation

bokken
Copy link
Contributor

@bokken bokken commented Dec 1, 2020

java 9 introduced a jre implementation of crc32: https://docs.oracle.com/javase/9/docs/api/java/util/zip/CRC32C.html
The hotspot jvm also includes intrinsics for native optimizations. According to this issue[1], as long as you are on a recent version (last 2 years), the non-intrinsified implementation is ~2x faster than the hadoop pure java implementation (which is what PureJavaCrc32C is based on). The intrinsified implementation is another 2-3x faster than that.

[1] - https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8191328

@xerial
Copy link
Owner

xerial commented Dec 2, 2020

Interesting. Is this only for Java9 or later? My concern is that some Spark users are still using JDK8 (e.g., Databricks). Although, SnappyFramedFormat is not used at Parquet, Spark itself, so having this optimization would be safe as long as we can create JDK8 compatible jar files.

@bokken
Copy link
Contributor Author

bokken commented Dec 2, 2020

I am still using some jdk 8 environments also. The change will use java.util.zip.CRC32C if present, and fall back to the existing PureJavaCrc32C if it is not.
So anyone running jdk 9 or newer gets the benefits and those on jdk 8 continue to work as before.

@bokken
Copy link
Contributor Author

bokken commented Dec 2, 2020

The comparison/diff looks normal if you exclude the white spaces. I am guessing something must have been strange with the line endings.

@xerial xerial self-assigned this Dec 7, 2020
@xerial xerial merged commit 822513d into xerial:master Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants