Skip to content

Commit

Permalink
[FEDE-4869] Upgrade Arrow version to 4.0.0 (#16)
Browse files Browse the repository at this point in the history
* Make BaseValueVector#MAX_ALLOCATION_SIZE configurable

This closes apache#65

Some of the tests are based on the assumption that the JVM can allocate at least
2GB of memory, which is not a common occurence (JVM usually defaults at 512MB).
Current Travis CI VM only have 3GB of memory total, which would have make challenging
to run some of the tests on them

Add a system property to change BaseValueVector.MAX_ALLOCATION_SIZE to allow to use
a much smaller value during tests.

* prefix arrow's version with siren

* use our version of netty

* updated readme about siren's changes

* fixed dependency issue with our own artifactory

* use our version of netty

* shade the arrow memory jar

* improved doc

* fix readme

* document the changes done to the arrow fork

* ARROW-5856: [Python] [Packaging] Fix use of C++ / Cython API from wheels

Author: Antoine Pitrou <antoine@python.org>

Closes apache#4884 from pitrou/ARROW-5856-cython-so-version and squashes the following commits:

a411d7a <Antoine Pitrou> Avoid C++ ABI issues with DictionaryMemo
eaede5b <Antoine Pitrou> Revert ARROW-5082 (" Stop exporting copies of shared libraries in wheel")
4594f78 <Antoine Pitrou> ARROW-5856:  Try to fix Cython API from wheels

* use our version of netty

* set drill's default value

* use our version of netty

* bumped to 0.8.0

* update to 0.14.1

* comment unneeded modules

* update release procedure with unneeded modules commented out

* bump version to siren-0.14.1-1 and update readme

* do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged

* fix slice bounds

* improved readme

* Upgraded netty dependendcy to siren-4.1.27-3

* [FEDE-3917] netty direct memory counter deprecation with bump to siren-0.14.1-6-SNAPSHOT (#10)

* Release siren-0.14.1-5

* Bump to version siren-0.14.1-6-SNAPSHOT

* Use Siren Netty fork version siren-4.1.27-4 to release siren-0.14.1-6

* Bump version to release siren-0.14.1-5 using Netty Siren siren-4.1.27-4

* Set the version to siren-0.14.1-6-SNAPSHOT

Co-authored-by: Martin Anseaume <martin.anseaume@siren.io>

* Fix rebase

* Fix pom

* Fix rebase - Bump version to siren-4.0.0-1-SNAPSHOT

* Clean-up

* Comment out memory-netty module and remove use siren netty

* Update siren netty version

* Fix rebase

* Comment netty

* Fix TestValueVector

* Remove unused imports

* Remove unused import

* Clean up

* Clean up - fix checkstyle

* Clean up - fix checkstyle 2

* WIP

* WIP - add siren netty plugin

* WIP - ignore failing test

* Fix rebase as per peer review

* Fix rebase python

* Fix ZeroVector unused imports

* Fix checkstyle

* Fix rebase

* Fix rebase

* Fix checkstyle imports violation

* Fix checkstyle and update netty version

* Use Arrow's ArrowBuf instead of our own

* Fix imports

* Fix checkstyle

* Tentative fix: workaround for the shading issue but creat new problems

* fix missing import

* Fix import

* Fix imports

* Revert ignored unit tests

* Revert automatic format changes

* Revert format to original format

* Update vector pom as per peer review

* Revert ignored TestArrowBufHasher.testHasherNegative

* Clean up

* Update as per peer review

* Undo unnecessary change in tasks.yml

* Undo unnecessary changes

* Revert changes

* Revert changes

* Revert changes as per peer review

* Update as per peer review

* Add back class path dependency exclusion

* Remove whitespace

* Update memory access with default value

* Remove class path exclusion

* Update readme

* Exclude memory-core from the shaded netty in memory-netty package

* Add information for checking that Siren version of Netty is used

* [FEDE-5144] Fix the static initialization of MemoryUtil (#17)

* Remove unecessary setAccessible call

* Catch all errors thrown by the setAccessible call

New version of java can throw InaccessibleObjectException which Arrow
didn't handle

* Update netty version to the stable version siren-4.1.48-1

Co-authored-by: Laurent Goujon <laurent@dremio.com>
Co-authored-by: Stéphane Campinas <stephane.campinas@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Martin Anseaume <martin.anseaume@siren.io>
Co-authored-by: Johnny Hujol <itudoben@users.noreply.github.com>
Co-authored-by: ggdupont <ger.dupont@gmail.com>
  • Loading branch information
7 people committed Nov 26, 2021
1 parent f959141 commit 7947cf9
Show file tree
Hide file tree
Showing 23 changed files with 213 additions and 54 deletions.
98 changes: 98 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,73 @@
under the License.
-->

# Siren fork of Arrow

- The properties `drill.enable_unsafe_memory_access` and
`arrow.enable_unsafe_memory_access` are prefixed with `siren` and their
default value is set to `true`. The first property is deprecated.

- In order to avoid conflict with a version of `netty` used in Elasticsearch, we
relocate the netty custom package and dependency in `memory` into a package
named `siren`. The relocation is achieved thanks to the maven shade plugin.

- The Siren's fork of `netty` is used in `vector`. This means that `netty`
imports in that module need to be prefixed with `siren`.

## Check that Siren version of Netty is used
- In order to check that Siren version of Netty is being used,
run the unit test `CheckAccessibleTest` in
`https://github.com/sirensolutions/siren-platform/blob/master/core/src/test/java/io/siren/federate/core/common/CheckAccessibleTest.java`.
- Note: the unit test `CheckAccessibleTest` is currently ignored, please set it again to ignore after running the test.
The unit test is ignored because the settings in `CheckAccessibleTest` is not taken into account when the whole unit test suite is run, therefore it fails.
This could be because when the class is loaded, the default settings is used (which is a static block) and the new settings in the `CheckAccessibleTest` is
then not applied when the test suit is run.

## Build

To build the `memory`, `format` and `vector` modules:

```sh
$ cd java
$ mvn clean package
```

Because of the default value change of `unsafe_memory_access` property, some
tests in `vector` fail.

```sh
mvn -pl memory,memory/memory-core,memory/memory-netty,memory/memory-unsafe,format,vector install -Dsiren.arrow.enable_unsafe_memory_access=false -Dsiren.drill.enable_unsafe_memory_access=false
```

## Make a new release

- Tests should pass.

- Make a new version:

```sh
mvn versions:set -DnewVersion=siren-0.14.1-2
```

- tag the commit for the release

```sh
git tag --sign siren-0.14.1-2
````

- Deploy to Siren's artifactory
```sh
$ mvn deploy -DskipTests=true -P artifactory -Dartifactory_username=<USERNAME> -Dartifactory_password=<PASSWORD>
```
## Update to a new version of Apache Arrow
- add `git@github.com:apache/arrow.git` as the `upstream` remote.
- execute `git fetch --all --tags`
- create a temporary branch from `siren-changes`
- rebase against the new tag.
# Apache Arrow
[![Build Status](https://ci.appveyor.com/api/projects/status/github/apache/arrow/branch/master?svg=true)](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/branch/master)
Expand Down Expand Up @@ -98,6 +165,37 @@ integrations in other projects, we'd be happy to have you involved:
- [Learn the format][2]
- Contribute code to one of the reference implementations

### How to Contribute

We prefer to receive contributions in the form of GitHub pull requests. Please
send pull requests against the [github.com/apache/arrow][4] repository.

If you are looking for some ideas on what to contribute, check out the [JIRA
issues][3] for the Apache Arrow project. Comment on the issue and/or contact
[dev@arrow.apache.org](http://mail-archives.apache.org/mod_mbox/arrow-dev/)
with your questions and ideas.

If you’d like to report a bug but don’t have time to fix it, you can still post
it on JIRA, or email the mailing list
[dev@arrow.apache.org](http://mail-archives.apache.org/mod_mbox/arrow-dev/)

To contribute a patch:

1. Break your work into small, single-purpose patches if possible. It’s much
harder to merge in a large change with a lot of disjoint features.
2. Create a JIRA for your patch on the [Arrow Project
JIRA](https://issues.apache.org/jira/browse/ARROW).
3. Submit the patch as a GitHub pull request against the master branch. For a
tutorial, see the GitHub guides on forking a repo and sending a pull
request. Prefix your pull request name with the JIRA name (ex:
https://github.com/apache/arrow/pull/240).
4. Make sure that your code passes the unit tests. You can find instructions
how to run the unit tests for each Arrow component in its respective README
file.
5. Add new unit tests for your code.

Thank you in advance for your contributions!

[1]: mailto:dev-subscribe@arrow.apache.org
[2]: https://github.com/apache/arrow/tree/master/format
[3]: https://issues.apache.org/jira/browse/ARROW
Expand Down
2 changes: 1 addition & 1 deletion java/adapter/orc/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
<parent>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-java-root</artifactId>
<version>4.0.0</version>
<version>siren-4.0.0-1-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion java/format/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
<parent>
<artifactId>arrow-java-root</artifactId>
<groupId>org.apache.arrow</groupId>
<version>4.0.0</version>
<version>siren-4.0.0-1-SNAPSHOT</version>
</parent>

<artifactId>arrow-format</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion java/gandiva/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<parent>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-java-root</artifactId>
<version>4.0.0</version>
<version>siren-4.0.0-1-SNAPSHOT</version>
</parent>

<groupId>org.apache.arrow.gandiva</groupId>
Expand Down
4 changes: 3 additions & 1 deletion java/memory/memory-core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
<parent>
<artifactId>arrow-memory</artifactId>
<groupId>org.apache.arrow</groupId>
<version>4.0.0</version>
<version>siren-4.0.0-1-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand All @@ -30,10 +30,12 @@
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.immutables</groupId>
<artifactId>value</artifactId>
<version>2.8.2</version>
</dependency>
</dependencies>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,15 @@ public class BoundsChecking {
static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BoundsChecking.class);

static {
String envProperty = System.getenv("ARROW_ENABLE_UNSAFE_MEMORY_ACCESS");
String oldProperty = System.getProperty("drill.enable_unsafe_memory_access");
String envProperty = System.getenv().getOrDefault("SIREN_ARROW_ENABLE_UNSAFE_MEMORY_ACCESS", "true");
String oldProperty = System.getProperty("siren.drill.enable_unsafe_memory_access", "true");
if (oldProperty != null) {
logger.warn("\"drill.enable_unsafe_memory_access\" has been renamed to \"arrow.enable_unsafe_memory_access\"");
logger.warn("\"arrow.enable_unsafe_memory_access\" can be set to: " +
" true (to not check) or false (to check, default)");
logger.warn("\"siren.drill.enable_unsafe_memory_access\" has been renamed to " +
"\"siren.arrow.enable_unsafe_memory_access\"");
logger.warn("\"siren.arrow.enable_unsafe_memory_access\" can be set to: " +
" true (to not check, default) or false (to check)");
}
String newProperty = System.getProperty("arrow.enable_unsafe_memory_access");
String newProperty = System.getProperty("siren.arrow.enable_unsafe_memory_access", "true");

// The priority of determining the unsafe flag:
// 1. The system properties take precedence over the environmental variable.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,6 @@ public Object run() {

// get the offset of the address field in a java.nio.Buffer object
Field addressField = java.nio.Buffer.class.getDeclaredField("address");
addressField.setAccessible(true);
BYTE_BUFFER_ADDRESS_OFFSET = UNSAFE.objectFieldOffset(addressField);

Constructor<?> directBufferConstructor;
Expand All @@ -99,10 +98,7 @@ public Object run() {
constructor.setAccessible(true);
logger.debug("Constructor for direct buffer found and made accessible");
return constructor;
} catch (NoSuchMethodException e) {
logger.debug("Cannot get constructor for direct buffer allocation", e);
return e;
} catch (SecurityException e) {
} catch (Exception e) {
logger.debug("Cannot get constructor for direct buffer allocation", e);
return e;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ private boolean getFlagValue(ClassLoader classLoader) throws Exception {
}

/**
* Ensure the flag for bounds checking is enabled by default.
* Siren: Ensure the flag for bounds checking is disabled by default.
* This will protect users from JVM crashes.
*/
@Test
public void testDefaultValue() throws Exception {
ClassLoader classLoader = copyClassLoader();
if (classLoader != null) {
boolean boundsCheckingEnabled = getFlagValue(classLoader);
Assert.assertTrue(boundsCheckingEnabled);
Assert.assertFalse(boundsCheckingEnabled);
}
}

Expand Down
46 changes: 45 additions & 1 deletion java/memory/memory-netty/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
<parent>
<artifactId>arrow-memory</artifactId>
<groupId>org.apache.arrow</groupId>
<version>4.0.0</version>
<version>siren-4.0.0-1-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand All @@ -35,9 +35,11 @@
<groupId>io.netty</groupId>
<artifactId>netty-common</artifactId>
</dependency>
<!--provided by elasticsearch-->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.immutables</groupId>
Expand All @@ -62,6 +64,48 @@
<user.timezone>UTC</user.timezone>
</systemPropertyVariables>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<dependencyReducedPomLocation>${project.build.directory}/dependency-reduced-pom.xml</dependencyReducedPomLocation>
<relocations>
<relocation>
<pattern>io.netty</pattern>
<shadedPattern>siren.io.netty</shadedPattern>
</relocation>
</relocations>
<artifactSet>
<excludes>
<exclude>org.slf4j</exclude>
<exclude>com.google.code.findbugs</exclude>
<exclude>com.google.guava</exclude>
<exclude>org.apache.arrow:arrow-memory-core:*:*</exclude>
</excludes>
</artifactSet>
<!--to prevent the Invalid signature file error -->
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
Expand Down
2 changes: 1 addition & 1 deletion java/memory/memory-unsafe/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
<parent>
<artifactId>arrow-memory</artifactId>
<groupId>org.apache.arrow</groupId>
<version>4.0.0</version>
<version>siren-4.0.0-1-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion java/memory/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<parent>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-java-root</artifactId>
<version>4.0.0</version>
<version>siren-4.0.0-1-SNAPSHOT</version>
</parent>
<artifactId>arrow-memory</artifactId>
<name>Arrow Memory</name>
Expand Down
38 changes: 27 additions & 11 deletions java/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

<groupId>org.apache.arrow</groupId>
<artifactId>arrow-java-root</artifactId>
<version>4.0.0</version>
<version>siren-4.0.0-1-SNAPSHOT</version>
<packaging>pom</packaging>

<name>Apache Arrow Java Root POM</name>
Expand All @@ -33,7 +33,7 @@
<dep.junit.jupiter.version>5.4.0</dep.junit.jupiter.version>
<dep.slf4j.version>1.7.25</dep.slf4j.version>
<dep.guava.version>20.0</dep.guava.version>
<dep.netty.version>4.1.48.Final</dep.netty.version>
<dep.netty.version>siren-4.1.48-1</dep.netty.version>
<dep.jackson.version>2.11.4</dep.jackson.version>
<dep.hadoop.version>2.7.1</dep.hadoop.version>
<dep.fbs.version>1.12.0</dep.fbs.version>
Expand Down Expand Up @@ -657,15 +657,12 @@
<module>format</module>
<module>memory</module>
<module>vector</module>
<module>tools</module>
<module>adapter/jdbc</module>
<module>plasma</module>
<module>flight/flight-core</module>
<module>flight/flight-grpc</module>
<module>performance</module>
<module>algorithm</module>
<module>adapter/avro</module>
<module>compression</module>
<!--<module>tools</module>-->
<!--<module>adapter/jdbc</module>-->
<!--<module>plasma</module>-->
<!--<module>flight</module>-->
<!--<module>performance</module>-->
<!--<module>algorithm</module>-->
</modules>

<profiles>
Expand Down Expand Up @@ -758,6 +755,25 @@
</plugins>
</build>
</profile>


<!-- To deploy to Siren artifactory -->
<profile>
<id>artifactory</id>

<distributionManagement>
<repository>
<id>artifactory-releases</id>
<name>artifactory-releases</name>
<url>${artifactory.url}/libs-release-local</url>
</repository>
<snapshotRepository>
<id>artifactory-snapshots</id>
<name>artifactory-snapshots</name>
<url>${artifactory.url}/libs-snapshot-local</url>
</snapshotRepository>
</distributionManagement>
</profile>

</profiles>

Expand Down

0 comments on commit 7947cf9

Please sign in to comment.