Skip to content

Commit

Permalink
[FEDE-5150] Upgrade arrow version from 4.0.0 to 7.0.0 (#26)
Browse files Browse the repository at this point in the history
* [FEDE-4869] Upgrade Arrow version to 4.0.0 (#16)

* Make BaseValueVector#MAX_ALLOCATION_SIZE configurable

This closes apache#65

Some of the tests are based on the assumption that the JVM can allocate at least
2GB of memory, which is not a common occurence (JVM usually defaults at 512MB).
Current Travis CI VM only have 3GB of memory total, which would have make challenging
to run some of the tests on them

Add a system property to change BaseValueVector.MAX_ALLOCATION_SIZE to allow to use
a much smaller value during tests.

* prefix arrow's version with siren

* use our version of netty

* updated readme about siren's changes

* fixed dependency issue with our own artifactory

* use our version of netty

* shade the arrow memory jar

* improved doc

* fix readme

* document the changes done to the arrow fork

* ARROW-5856: [Python] [Packaging] Fix use of C++ / Cython API from wheels

Author: Antoine Pitrou <antoine@python.org>

Closes apache#4884 from pitrou/ARROW-5856-cython-so-version and squashes the following commits:

a411d7a <Antoine Pitrou> Avoid C++ ABI issues with DictionaryMemo
eaede5b <Antoine Pitrou> Revert ARROW-5082 (" Stop exporting copies of shared libraries in wheel")
4594f78 <Antoine Pitrou> ARROW-5856:  Try to fix Cython API from wheels

* use our version of netty

* set drill's default value

* use our version of netty

* bumped to 0.8.0

* update to 0.14.1

* comment unneeded modules

* update release procedure with unneeded modules commented out

* bump version to siren-0.14.1-1 and update readme

* do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged

* fix slice bounds

* improved readme

* Upgraded netty dependendcy to siren-4.1.27-3

* [FEDE-3917] netty direct memory counter deprecation with bump to siren-0.14.1-6-SNAPSHOT (#10)

* Release siren-0.14.1-5

* Bump to version siren-0.14.1-6-SNAPSHOT

* Use Siren Netty fork version siren-4.1.27-4 to release siren-0.14.1-6

* Bump version to release siren-0.14.1-5 using Netty Siren siren-4.1.27-4

* Set the version to siren-0.14.1-6-SNAPSHOT

Co-authored-by: Martin Anseaume <martin.anseaume@siren.io>

* Fix rebase

* Fix pom

* Fix rebase - Bump version to siren-4.0.0-1-SNAPSHOT

* Clean-up

* Comment out memory-netty module and remove use siren netty

* Update siren netty version

* Fix rebase

* Comment netty

* Fix TestValueVector

* Remove unused imports

* Remove unused import

* Clean up

* Clean up - fix checkstyle

* Clean up - fix checkstyle 2

* WIP

* WIP - add siren netty plugin

* WIP - ignore failing test

* Fix rebase as per peer review

* Fix rebase python

* Fix ZeroVector unused imports

* Fix checkstyle

* Fix rebase

* Fix rebase

* Fix checkstyle imports violation

* Fix checkstyle and update netty version

* Use Arrow's ArrowBuf instead of our own

* Fix imports

* Fix checkstyle

* Tentative fix: workaround for the shading issue but creat new problems

* fix missing import

* Fix import

* Fix imports

* Revert ignored unit tests

* Revert automatic format changes

* Revert format to original format

* Update vector pom as per peer review

* Revert ignored TestArrowBufHasher.testHasherNegative

* Clean up

* Update as per peer review

* Undo unnecessary change in tasks.yml

* Undo unnecessary changes

* Revert changes

* Revert changes

* Revert changes as per peer review

* Update as per peer review

* Add back class path dependency exclusion

* Remove whitespace

* Update memory access with default value

* Remove class path exclusion

* Update readme

* Exclude memory-core from the shaded netty in memory-netty package

* Add information for checking that Siren version of Netty is used

* [FEDE-5144] Fix the static initialization of MemoryUtil (#17)

* Remove unecessary setAccessible call

* Catch all errors thrown by the setAccessible call

New version of java can throw InaccessibleObjectException which Arrow
didn't handle

* Update netty version to the stable version siren-4.1.48-1

Co-authored-by: Laurent Goujon <laurent@dremio.com>
Co-authored-by: Stéphane Campinas <stephane.campinas@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Martin Anseaume <martin.anseaume@siren.io>
Co-authored-by: Johnny Hujol <itudoben@users.noreply.github.com>
Co-authored-by: ggdupont <ger.dupont@gmail.com>

* Bumped version number to siren-4.0.0-1

* Fix rebase

* Remove scope from arrow-memory-netty to make netty available for entire project else our shading will fail.

Signed-off-by: George Apaaboah <george.apaaboah@gmail.com>

* Fix readme - remove duplicate section on how to contribute

* Update readme as per review

* Use stable netty version siren-4.1.68-1 after the release of netty

Signed-off-by: George Apaaboah <george.apaaboah@gmail.com>

Co-authored-by: Laurent Goujon <laurent@dremio.com>
Co-authored-by: Stéphane Campinas <stephane.campinas@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Martin Anseaume <martin.anseaume@siren.io>
Co-authored-by: Johnny Hujol <itudoben@users.noreply.github.com>
Co-authored-by: ggdupont <ger.dupont@gmail.com>
  • Loading branch information
7 people committed Aug 29, 2022
1 parent ea6875f commit 722702c
Show file tree
Hide file tree
Showing 23 changed files with 179 additions and 41 deletions.
67 changes: 67 additions & 0 deletions README.md
Expand Up @@ -17,6 +17,73 @@
under the License.
-->

# Siren fork of Arrow

- The properties `drill.enable_unsafe_memory_access` and
`arrow.enable_unsafe_memory_access` are prefixed with `siren` and their
default value is set to `true`. The first property is deprecated.

- In order to avoid conflict with a version of `netty` used in Elasticsearch, we
relocate the netty custom package and dependency in `memory` into a package
named `siren`. The relocation is achieved thanks to the maven shade plugin.

- The Siren's fork of `netty` is used in `vector`. This means that `netty`
imports in that module need to be prefixed with `siren`.

## Check that Siren version of Netty is used
- In order to check that Siren version of Netty is being used,
run the unit test `CheckAccessibleTest` in
`https://github.com/sirensolutions/siren-platform/blob/master/core/src/test/java/io/siren/federate/core/common/CheckAccessibleTest.java`.
- Note: the unit test `CheckAccessibleTest` is currently ignored, please set it again to ignore after running the test.
The unit test is ignored because the settings in `CheckAccessibleTest` is not taken into account when the whole unit test suite is run, therefore it fails.
This could be because when the class is loaded, the default settings is used (which is a static block) and the new settings in the `CheckAccessibleTest` is
then not applied when the test suit is run.

## Build

To build the `memory`, `format` and `vector` modules:

```sh
$ cd java
$ mvn clean package
```

Because of the default value change of `unsafe_memory_access` property, some
tests in `vector` fail.

```sh
mvn -pl memory,memory/memory-core,memory/memory-netty,memory/memory-unsafe,format,vector install -Dsiren.arrow.enable_unsafe_memory_access=false -Dsiren.drill.enable_unsafe_memory_access=false
```

## Make a new release of Siren's Apache Arrow

- Tests should pass.

- Make a new version:

```sh
mvn versions:set -DnewVersion=siren-0.14.1-2
```

- tag the commit for the release

```sh
git tag --sign siren-0.14.1-2
````

- Deploy to Siren's artifactory
```sh
$ mvn deploy -DskipTests=true -P artifactory -Dartifactory_username=<USERNAME> -Dartifactory_password=<PASSWORD>
```
## Update to a new version of Siren's Apache Arrow

- add `git@github.com:apache/arrow.git` as the `upstream` remote.
- execute `git fetch --all --tags`
- create a temporary branch from `siren-changes`
- rebase against the new tag.

# Apache Arrow

[![Fuzzing Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/arrow.svg)](https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:arrow)
Expand Down
2 changes: 1 addition & 1 deletion java/adapter/orc/pom.xml
Expand Up @@ -104,7 +104,7 @@
<parent>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-java-root</artifactId>
<version>9.0.0</version>
<version>siren-9.0.0-1-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion java/format/pom.xml
Expand Up @@ -15,7 +15,7 @@
<parent>
<artifactId>arrow-java-root</artifactId>
<groupId>org.apache.arrow</groupId>
<version>9.0.0</version>
<version>siren-9.0.0-1-SNAPSHOT</version>
</parent>

<artifactId>arrow-format</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion java/gandiva/pom.xml
Expand Up @@ -14,7 +14,7 @@
<parent>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-java-root</artifactId>
<version>9.0.0</version>
<version>siren-9.0.0-1-SNAPSHOT</version>
</parent>

<groupId>org.apache.arrow.gandiva</groupId>
Expand Down
2 changes: 1 addition & 1 deletion java/memory/memory-core/pom.xml
Expand Up @@ -13,7 +13,7 @@
<parent>
<artifactId>arrow-memory</artifactId>
<groupId>org.apache.arrow</groupId>
<version>9.0.0</version>
<version>siren-9.0.0-1-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
Expand Up @@ -34,14 +34,15 @@ public class BoundsChecking {
static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(BoundsChecking.class);

static {
String envProperty = System.getenv("ARROW_ENABLE_UNSAFE_MEMORY_ACCESS");
String oldProperty = System.getProperty("drill.enable_unsafe_memory_access");
String envProperty = System.getenv().getOrDefault("SIREN_ARROW_ENABLE_UNSAFE_MEMORY_ACCESS", "true");
String oldProperty = System.getProperty("siren.drill.enable_unsafe_memory_access", "true");
if (oldProperty != null) {
logger.warn("\"drill.enable_unsafe_memory_access\" has been renamed to \"arrow.enable_unsafe_memory_access\"");
logger.warn("\"arrow.enable_unsafe_memory_access\" can be set to: " +
" true (to not check) or false (to check, default)");
logger.warn("\"siren.drill.enable_unsafe_memory_access\" has been renamed to " +
"\"siren.arrow.enable_unsafe_memory_access\"");
logger.warn("\"siren.arrow.enable_unsafe_memory_access\" can be set to: " +
" true (to not check, default) or false (to check)");
}
String newProperty = System.getProperty("arrow.enable_unsafe_memory_access");
String newProperty = System.getProperty("siren.arrow.enable_unsafe_memory_access", "true");

// The priority of determining the unsafe flag:
// 1. The system properties take precedence over the environmental variable.
Expand Down
Expand Up @@ -81,7 +81,6 @@ public Object run() {

// get the offset of the address field in a java.nio.Buffer object
Field addressField = java.nio.Buffer.class.getDeclaredField("address");
addressField.setAccessible(true);
BYTE_BUFFER_ADDRESS_OFFSET = UNSAFE.objectFieldOffset(addressField);

Constructor<?> directBufferConstructor;
Expand All @@ -99,10 +98,7 @@ public Object run() {
constructor.setAccessible(true);
logger.debug("Constructor for direct buffer found and made accessible");
return constructor;
} catch (NoSuchMethodException e) {
logger.debug("Cannot get constructor for direct buffer allocation", e);
return e;
} catch (SecurityException e) {
} catch (Exception e) {
logger.debug("Cannot get constructor for direct buffer allocation", e);
return e;
}
Expand Down
Expand Up @@ -55,15 +55,15 @@ private boolean getFlagValue(ClassLoader classLoader) throws Exception {
}

/**
* Ensure the flag for bounds checking is enabled by default.
* Siren: Ensure the flag for bounds checking is disabled by default.
* This will protect users from JVM crashes.
*/
@Test
public void testDefaultValue() throws Exception {
ClassLoader classLoader = copyClassLoader();
if (classLoader != null) {
boolean boundsCheckingEnabled = getFlagValue(classLoader);
Assert.assertTrue(boundsCheckingEnabled);
Assert.assertFalse(boundsCheckingEnabled);
}
}

Expand Down
51 changes: 50 additions & 1 deletion java/memory/memory-netty/pom.xml
Expand Up @@ -13,7 +13,7 @@
<parent>
<artifactId>arrow-memory</artifactId>
<groupId>org.apache.arrow</groupId>
<version>9.0.0</version>
<version>siren-9.0.0-1-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand All @@ -35,16 +35,65 @@
<groupId>io.netty</groupId>
<artifactId>netty-common</artifactId>
</dependency>
<!--provided by elasticsearch-->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.immutables</groupId>
<artifactId>value</artifactId>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<dependencyReducedPomLocation>${project.build.directory}/dependency-reduced-pom.xml</dependencyReducedPomLocation>
<relocations>
<relocation>
<pattern>io.netty</pattern>
<shadedPattern>siren.io.netty</shadedPattern>
</relocation>
</relocations>
<artifactSet>
<excludes>
<exclude>org.slf4j</exclude>
<exclude>com.google.code.findbugs</exclude>
<exclude>com.google.guava</exclude>
<exclude>org.apache.arrow:arrow-memory-core:*:*</exclude>
</excludes>
</artifactSet>
<!--to prevent the Invalid signature file error -->
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

<profiles>
<profile>
<!-- This profile turns on integration testing. It activates the failsafe plugin and will run any tests
Expand Down
2 changes: 1 addition & 1 deletion java/memory/memory-unsafe/pom.xml
Expand Up @@ -13,7 +13,7 @@
<parent>
<artifactId>arrow-memory</artifactId>
<groupId>org.apache.arrow</groupId>
<version>9.0.0</version>
<version>siren-9.0.0-1-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion java/memory/pom.xml
Expand Up @@ -14,7 +14,7 @@
<parent>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-java-root</artifactId>
<version>9.0.0</version>
<version>siren-9.0.0-1-SNAPSHOT</version>
</parent>
<artifactId>arrow-memory</artifactId>
<name>Arrow Memory</name>
Expand Down
36 changes: 27 additions & 9 deletions java/pom.xml
Expand Up @@ -20,7 +20,7 @@

<groupId>org.apache.arrow</groupId>
<artifactId>arrow-java-root</artifactId>
<version>9.0.0</version>
<version>siren-9.0.0-1-SNAPSHOT</version>
<packaging>pom</packaging>

<name>Apache Arrow Java Root POM</name>
Expand Down Expand Up @@ -694,14 +694,14 @@
<module>format</module>
<module>memory</module>
<module>vector</module>
<module>tools</module>
<module>adapter/jdbc</module>
<module>plasma</module>
<module>flight</module>
<module>performance</module>
<module>algorithm</module>
<module>adapter/avro</module>
<module>compression</module>
<!--<module>tools</module>-->
<!--<module>adapter/jdbc</module>-->
<!--<module>plasma</module>-->
<!--<module>flight</module>-->
<!--<module>performance</module>-->
<!--<module>algorithm</module>-->
<!--<module>adapter/avro</module>-->
<!--<module>compression</module>-->
</modules>

<profiles>
Expand Down Expand Up @@ -887,6 +887,24 @@
</reporting>
</profile>

<!-- To deploy to Siren artifactory -->
<profile>
<id>artifactory</id>

<distributionManagement>
<repository>
<id>artifactory-releases</id>
<name>artifactory-releases</name>
<url>${artifactory.url}/libs-release-local</url>
</repository>
<snapshotRepository>
<id>artifactory-snapshots</id>
<name>artifactory-snapshots</name>
<url>${artifactory.url}/libs-snapshot-local</url>
</snapshotRepository>
</distributionManagement>
</profile>

</profiles>

</project>
2 changes: 1 addition & 1 deletion java/vector/pom.xml
Expand Up @@ -14,7 +14,7 @@
<parent>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-java-root</artifactId>
<version>9.0.0</version>
<version>siren-9.0.0-1-SNAPSHOT</version>
</parent>
<artifactId>arrow-vector</artifactId>
<name>Arrow Vectors</name>
Expand Down
2 changes: 2 additions & 0 deletions java/vector/src/main/codegen/includes/vv_imports.ftl
Expand Up @@ -20,6 +20,8 @@ import static org.apache.arrow.util.Preconditions.checkState;

import com.google.flatbuffers.FlatBufferBuilder;

import siren.io.netty.buffer.*;

import org.apache.arrow.memory.*;
import org.apache.arrow.util.Preconditions;
import org.apache.arrow.vector.types.Types;
Expand Down
Expand Up @@ -37,7 +37,7 @@
import org.apache.arrow.vector.util.OversizedAllocationException;
import org.apache.arrow.vector.util.TransferPair;

import io.netty.util.internal.PlatformDependent;
import siren.io.netty.util.internal.PlatformDependent;

/**
* BaseFixedWidthVector provides an abstract interface for
Expand Down
Expand Up @@ -119,6 +119,11 @@ public void setInitialCapacity(int valueCount) {
lastValueCapacity = valueCount;
}

/**
* Get the current value capacity for the vector.
*
* @return number of elements that vector can hold.
*/
@Override
protected int getValueBufferValueCapacity() {
return capAtMaxInt(valueBuffer.capacity() * 8);
Expand Down
Expand Up @@ -17,18 +17,18 @@

package org.apache.arrow.vector;

import static io.netty.util.internal.PlatformDependent.getByte;
import static io.netty.util.internal.PlatformDependent.getInt;
import static io.netty.util.internal.PlatformDependent.getLong;
import static org.apache.arrow.memory.util.LargeMemoryUtil.checkedCastToInt;
import static siren.io.netty.util.internal.PlatformDependent.getByte;
import static siren.io.netty.util.internal.PlatformDependent.getInt;
import static siren.io.netty.util.internal.PlatformDependent.getLong;

import org.apache.arrow.memory.ArrowBuf;
import org.apache.arrow.memory.BoundsChecking;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.vector.ipc.message.ArrowFieldNode;
import org.apache.arrow.vector.util.DataSizeRoundingUtil;

import io.netty.util.internal.PlatformDependent;
import siren.io.netty.util.internal.PlatformDependent;

/**
* Helper class for performing generic operations on a bit vector buffer.
Expand Down
Expand Up @@ -35,7 +35,7 @@
import org.apache.arrow.vector.util.DecimalUtility;
import org.apache.arrow.vector.util.TransferPair;

import io.netty.util.internal.PlatformDependent;
import siren.io.netty.util.internal.PlatformDependent;

/**
* Decimal256Vector implements a fixed width vector (32 bytes) of
Expand Down
Expand Up @@ -35,7 +35,7 @@
import org.apache.arrow.vector.util.DecimalUtility;
import org.apache.arrow.vector.util.TransferPair;

import io.netty.util.internal.PlatformDependent;
import siren.io.netty.util.internal.PlatformDependent;

/**
* DecimalVector implements a fixed width vector (16 bytes) of
Expand Down

0 comments on commit 722702c

Please sign in to comment.