Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-984: Save the software version that wrote each ORC file. #904

Merged
merged 6 commits into from
Sep 12, 2021

Conversation

omalley
Copy link
Contributor

@omalley omalley commented Sep 9, 2021

What changes were proposed in this pull request?

I add a string to the file footer that records the version that wrote the file. We already had recorded the implementation that wrote the file as Footer.writer. I added a method that combines these two fields in the reader to produce a user facing string that describes the software version.

I also add a field to the meta data tool to show the version that wrote the file.

Because of that change and the fact that the files change size based on whether the ORC version is a snapshot or not, I had to extend the tests for TestFileDump to allow some slop for the size and to ignore the file version.

How was this patch tested?

It passes the unit tests after I updated the tools tests.

@omalley
Copy link
Contributor Author

omalley commented Sep 9, 2021

The software version in the API and tool shows up as "ORC Java 1.8.0-SNAPSHOT". When run from IntelliJ the unit tests can't find the software version, so it comes in "ORC Java unknown".

@omalley
Copy link
Contributor Author

omalley commented Sep 9, 2021

I haven't updated the C++ reader, writer, and tool yet.

@dongjoon-hyun
Copy link
Member

BTW, according to GitHub Action jobs, all Java11+ seems to complain for some reasons.

@guiyanakuang
Copy link
Member

guiyanakuang commented Sep 9, 2021

BTW, according to GitHub Action jobs, all Java11+ seems to complain for some reasons.

This should be a bug in findbugs. Java 11 compiles try-with-resources to generate bytecode differently than java8. Findbugs analyze bytecode matche RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE.
spotbugs/spotbugs#259

@omalley
Copy link
Contributor Author

omalley commented Sep 10, 2021

Ok, the changes for C++ were relatively straightforward. I did update the C++ with the new writer versions, such as the important ORC-14.

I updated the orc_split_elim_new.orc so that it was written by the 1.8.0-SNAPSHOT. I also added orc_split_elim_cpp.orc that is the same file written by c++. The Java and C++ tools show the files as having being written by the corresponding versions.

@omalley
Copy link
Contributor Author

omalley commented Sep 10, 2021

BTW, to generate the new c++ ORC file, I wrote a little utility that takes an ORC file and rewrites it using c++. It is in https://github.com/omalley/orc/tree/cpp-convert .

@dongjoon-hyun
Copy link
Member

So, everything is ready, right, @omalley ?

c++/src/Common.cc Outdated Show resolved Hide resolved
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @omalley .
Merged to main/1.7/1.6/1.5 according to the PR content.

@dongjoon-hyun dongjoon-hyun merged commit cf720d7 into apache:main Sep 12, 2021
dongjoon-hyun pushed a commit that referenced this pull request Sep 12, 2021
I add a string to the file footer that records the version that wrote the file. We already had recorded the implementation that wrote the file as Footer.writer. I added a method that combines these two fields in the reader to produce a user facing string that describes the software version.

I also add a field to the meta data tool to show the version that wrote the file.

Because of that change and the fact that the files change size based on whether the ORC version is a snapshot or not, I had to extend the tests for TestFileDump to allow some slop for the size and to ignore the file version.

It passes the unit tests after I updated the tools tests.

(cherry picked from commit cf720d7)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Sep 12, 2021
I add a string to the file footer that records the version that wrote the file. We already had recorded the implementation that wrote the file as Footer.writer. I added a method that combines these two fields in the reader to produce a user facing string that describes the software version.

I also add a field to the meta data tool to show the version that wrote the file.

Because of that change and the fact that the files change size based on whether the ORC version is a snapshot or not, I had to extend the tests for TestFileDump to allow some slop for the size and to ignore the file version.

It passes the unit tests after I updated the tools tests.

(cherry picked from commit cf720d7)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 7e3d557)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Sep 12, 2021
I add a string to the file footer that records the version that wrote the file. We already had recorded the implementation that wrote the file as Footer.writer. I added a method that combines these two fields in the reader to produce a user facing string that describes the software version.

I also add a field to the meta data tool to show the version that wrote the file.

Because of that change and the fact that the files change size based on whether the ORC version is a snapshot or not, I had to extend the tests for TestFileDump to allow some slop for the size and to ignore the file version.

It passes the unit tests after I updated the tools tests.

(cherry picked from commit cf720d7)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 7e3d557)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 2aa9b06)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants