Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support building and testing Alluxio on ARM64 platform #12704

Closed
liusheng opened this issue Jan 7, 2021 · 31 comments
Closed

Support building and testing Alluxio on ARM64 platform #12704

liusheng opened this issue Jan 7, 2021 · 31 comments
Labels
area-build Build (maven, tarball) and tests type-feature This issue is a feature request

Comments

@liusheng
Copy link
Contributor

liusheng commented Jan 7, 2021

Is your feature request related to a problem? Please describe.
Currently, more and more softwares have supported running on ARM64 platform, it would be good if Alluxio also support running and testing on ARM platform.

Describe the solution you'd like
Try to make Alluxio can be sucessfully built and run all the test cases passed on ARM64 server, and then setup ARM CI job for Alluxio project to make sure future development won't break ARM64 support.

Describe alternatives you've considered
N/A

Urgency
Medium

Additional context
N/A

@liusheng liusheng added the type-feature This issue is a feature request label Jan 7, 2021
@liusheng liusheng changed the title Support building and running Alluxio on ARM64 platform Support building and testing Alluxio on ARM64 platform Jan 7, 2021
@gpang gpang added the area-build Build (maven, tarball) and tests label Jan 7, 2021
@calvinjia
Copy link
Contributor

@liusheng Thanks for making this request. Have you already identified platform dependent issues when using ARM64?

@liusheng
Copy link
Contributor Author

liusheng commented Jan 8, 2021

@calvinjia
Thanks for pay attention to this issue, I haven't identified platform dependent issues of Alluxio, but I have tried to build Alluxio on ARM64 server, there are some errors of dependent issues. such as:

> node-sass@4.13.0 postinstall /opt/alluxio/webui/node_modules/node-sass
> node scripts/build.js

Building: /opt/alluxio/webui/node/node /opt/alluxio/webui/node_modules/node-gyp/bin/node-gyp.js rebuild --verbose --libsass_ext= --libsass_cflags= --libsass_ldflags= --libsass_library=

lerna ERR! npm install --production  --no-save --no-package-lock --no-shrinkwrap stderr:
Cannot download "https://github.com/sass/node-sass/releases/download/v4.13.0/linux-arm64-64_binding.node": 

HTTP error 404 Not Found
...

[INFO] Alluxio UI 2.5.0-SNAPSHOT .......................... FAILURE [01:03 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:06 min (Wall Clock)
[INFO] Finished at: 2021-01-08T02:52:34Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.salesforce.servicelibs:proto-backwards-compatibility:1.0.5:backwards-compatibility-check (default) on project alluxio-core-transport: OS not supported. Unable to find a protolock binary for the classifier linux-aarch_64 -> [Help 1]
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (Install npm dependencies for packages) on project alluxio-webui: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1]

@calvinjia
Copy link
Contributor

Thanks for the details, these would be necessary for developing Alluxio on ARM64, which I think is a more far off goal. Do yo know if the Alluxio system (pre-compiled) runs fine on ARM64?

@jpohanka
Copy link

@calvinjia We are running our Alluxio+Presto clusters on Graviton2 instances in production for several weeks. There were some setbacks in several places, but we managed to have it operational. We observed a significant performance degradation in both Alluxio and Presto clusters when using the official Docker images. We found out, that Amazon Corretto JVM eliminates this performance degradation.

@calvinjia
Copy link
Contributor

@jpohanka Thanks for reporting this!

@liusheng
Copy link
Contributor Author

liusheng commented Jan 25, 2021

Hi @calvinjia,
Sorry for reply lately. For building Alluxio on ARM64, I have found two explicit problems:

  • protolock dependency missed for ARM64 platform, which is loaded by proto-backwards-compat-maven-plugin, I have submitted a PR try to support in the proto-backwards-compat-maven-plugin repo.
  • node-sass dependency missed for ARM64 platform, it is used by Alluxio webui module, there are already existed issues reported by Apple ARM users, please see Issue and Iusse.
    When I re-package the proto-backwards-compat-maven-plugin by adding binary of protolock for ARM64 and skip the webui module by -pl '!webui' the whole building process can be sucessfully executed.
[INFO] Alluxio Parent 2.5.0-SNAPSHOT ...................... SUCCESS [ 10.205 s]
[INFO] Alluxio Core 2.5.0-SNAPSHOT ........................ SUCCESS [  1.140 s]   
[INFO] Alluxio Core - Transport 2.5.0-SNAPSHOT ............ SUCCESS [01:04 min]
[INFO] Alluxio Core - Base module 2.5.0-SNAPSHOT .......... SUCCESS [ 19.018 s]  
[INFO] Alluxio Core - Common Utilities 2.5.0-SNAPSHOT ..... SUCCESS [ 43.488 s]  
[INFO] Alluxio Core - Client 2.5.0-SNAPSHOT ............... SUCCESS [  0.559 s]
[INFO] Alluxio Core - Client - File System 2.5.0-SNAPSHOT . SUCCESS [ 24.173 s]
[INFO] Alluxio Examples 2.5.0-SNAPSHOT .................... SUCCESS [  9.591 s]                     
[INFO] Alluxio Table 2.5.0-SNAPSHOT ....................... SUCCESS [  0.593 s]               
[INFO] Alluxio Table - Base Module 2.5.0-SNAPSHOT ......... SUCCESS [  5.826 s]        
[INFO] Alluxio Table - Client 2.5.0-SNAPSHOT .............. SUCCESS [  7.645 s]
[INFO] Alluxio Job Service 2.5.0-SNAPSHOT ................. SUCCESS [  0.549 s] 
[INFO] Alluxio Job Service - Common Utilities 2.5.0-SNAPSHOT SUCCESS [ 11.289 s]
[INFO] Alluxio Job Service - Client 2.5.0-SNAPSHOT ........ SUCCESS [  8.316 s]
[INFO] Alluxio Shell 2.5.0-SNAPSHOT ....................... SUCCESS [ 23.132 s]
[INFO] Alluxio Core - Client - HDFS 2.5.0-SNAPSHOT ........ SUCCESS [ 11.816 s] 
[INFO] Alluxio Stress 2.5.0-SNAPSHOT ...................... SUCCESS [  0.536 s]
[INFO] Alluxio Stress - Common 2.5.0-SNAPSHOT ............. SUCCESS [ 10.757 s]   
[INFO] Alluxio Stress - Shell 2.5.0-SNAPSHOT .............. SUCCESS [ 12.323 s]         
[INFO] Alluxio Table - Shell 2.5.0-SNAPSHOT ............... SUCCESS [  8.138 s] 
[INFO] Alluxio Assembly 2.5.0-SNAPSHOT .................... SUCCESS [  0.495 s]
[INFO] Alluxio Assembly - Client 2.5.0-SNAPSHOT ........... SUCCESS [ 10.756 s]
[INFO] Alluxio Core - Server 2.5.0-SNAPSHOT ............... SUCCESS [  0.686 s]
[INFO] Alluxio Core - Server - Common Utilities 2.5.0-SNAPSHOT SUCCESS [ 23.049 s]
[INFO] Alluxio Under File System 2.5.0-SNAPSHOT ........... SUCCESS [  0.591 s]
[INFO] Alluxio Under File System - Local FS 2.5.0-SNAPSHOT  SUCCESS [  7.598 s]  
[INFO] Alluxio Core - Server - Master 2.5.0-SNAPSHOT ...... SUCCESS [ 38.310 s]
[INFO] Alluxio Core - Server - Proxy 2.5.0-SNAPSHOT ....... SUCCESS [ 11.682 s]
[INFO] Alluxio Core - Server - Worker 2.5.0-SNAPSHOT ...... SUCCESS [ 23.518 s]
[INFO] Alluxio Job Service - Server 2.5.0-SNAPSHOT ........ SUCCESS [ 19.444 s]
[INFO] Alluxio Log Server 2.5.0-SNAPSHOT .................. SUCCESS [  6.993 s]
[INFO] Alluxio Table - Server 2.5.0-SNAPSHOT .............. SUCCESS [  0.542 s]
[INFO] Alluxio Table - Server - Common 2.5.0-SNAPSHOT ..... SUCCESS [  9.710 s]
[INFO] Alluxio Table - Server - Master 2.5.0-SNAPSHOT ..... SUCCESS [ 12.551 s]
[INFO] Alluxio Assembly - Server 2.5.0-SNAPSHOT ........... SUCCESS [ 17.517 s]
[INFO] Alluxio Integration 2.5.0-SNAPSHOT ................. SUCCESS [  0.596 s]
[INFO] Alluxio Integration - FUSE 2.5.0-SNAPSHOT .......... SUCCESS [ 17.983 s]
[INFO] Alluxio Shaded Libraries 2.5.0-SNAPSHOT ............ SUCCESS [  0.503 s]
[INFO] Alluxio Shaded Libraries - Hadoop 3.3.0 ............ SUCCESS [  6.857 s]
[INFO] Alluxio Integration - Tools 2.5.0-SNAPSHOT ......... SUCCESS [  0.539 s]
[INFO] Alluxio Integration - Tools - HMS 2.5.0-SNAPSHOT ... SUCCESS [ 22.523 s]
[INFO] Alluxio Integration - Validation Tools 2.5.0-SNAPSHOT SUCCESS [ 12.528 s]
[INFO] Alluxio MiniCluster 2.5.0-SNAPSHOT ................. SUCCESS [ 11.480 s]
[INFO] Alluxio Shaded Libraries - Client 2.5.0-SNAPSHOT ... SUCCESS [02:11 min]
[INFO] Alluxio Shaded Libraries - Ozone 2.5.0-SNAPSHOT .... SUCCESS [  7.141 s]
[INFO] Alluxio Shaded Libraries - Cosn 2.5.0-SNAPSHOT ..... SUCCESS [  0.556 s]
[INFO] Alluxio Table - Server - UnderDB 2.5.0-SNAPSHOT .... SUCCESS [  0.756 s]
[INFO] Alluxio Table - Server - UnderDB - Hive 2.5.0-SNAPSHOT SUCCESS [ 21.216 s]
[INFO] Alluxio Table - Server - UnderDB - Glue 2.5.0-SNAPSHOT SUCCESS [ 15.294 s]
[INFO] Alluxio Tests 2.5.0-SNAPSHOT ....................... SUCCESS [ 44.293 s]
[INFO] Alluxio Under File System - HDFS 2.5.0-SNAPSHOT .... SUCCESS [ 15.605 s]
[INFO] Alluxio Under File System - Microsoft Azure DataLake Gen 2 2.5.0-SNAPSHOT SUCCESS [ 13.567 s]
[INFO] Alluxio Under File System - Microsoft Azure DataLake 2.5.0-SNAPSHOT SUCCESS [ 15.570 s]
[INFO] Alluxio Under File System - Tencent Cloud COS 2.5.0-SNAPSHOT SUCCESS [  8.755 s]
[INFO] Alluxio Under File System - GCS 2.5.0-SNAPSHOT ..... SUCCESS [  7.773 s]
[INFO] Alluxio Under File System - Qiniu Kodo 2.5.0-SNAPSHOT SUCCESS [  7.833 s]
[INFO] Alluxio Under File System - Aliyun OSS 2.5.0-SNAPSHOT SUCCESS [  7.917 s]
[INFO] Alluxio Under File System - S3 2.5.0-SNAPSHOT ...... SUCCESS [ 10.011 s]
[INFO] Alluxio Under File System - Swift 2.5.0-SNAPSHOT ... SUCCESS [  8.980 s]
[INFO] Alluxio Under File System - Microsoft Azure Blob Storage 2.5.0-SNAPSHOT SUCCESS [ 15.862 s]
[INFO] Alluxio Under File System - Web 2.5.0-SNAPSHOT ..... SUCCESS [  7.085 s]
[INFO] Alluxio Under File System - Apache Ozone 2.5.0-SNAPSHOT SUCCESS [ 17.477 s]
[INFO] Alluxio Under File System - Tencent Cloud COSN 2.5.0-SNAPSHOT SUCCESS [ 14.319 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  15:01 min
[INFO] Finished at: 2021-01-25T10:13:40+03:00
[INFO] -----------------------------------------------------------------------

Because the above two issues, the first one only do Protobuf Backwards Compatibility Check and second one only effect the webui module. So maybe the pre-compiled package of Alluxio can be directly run on ARM64 server. I will try to verify this.

HI @jpohanka,
Thanks for the info, We are also doing similar things as you have done, We will also try to make more Big-data softwares run on ARM64, and will also try to make them have better performance on ARM64, welcome to discuss problems you have met :)

@rmichela
Copy link

Since the protolock maven plugin is merely a linter, you can disable it using a platform specific maven build profile.

@rmichela
Copy link

Protolock maven plugin now includes arm64 support https://github.com/salesforce/proto-backwards-compat-maven-plugin/releases/tag/v1.0.6

@liusheng
Copy link
Contributor Author

Hi @rmichela ,
Thank you for help doing that :)

@martin-g
Copy link

I am trying to build Alluxio on Linux ARM64 but AlluxioFuseFileSystemTest fails with:

[ERROR] alluxio.fuse.AlluxioFuseFileSystemTest.flush  Time elapsed: 0.026 s  <<< ERROR!
java.lang.IllegalArgumentException: unsupported integer type: ADDRESS
	at jnr.ffi.provider.AbstractMemoryIO.getInt(AbstractMemoryIO.java:151)
	at jnr.ffi.Struct$IntegerAlias.get(Struct.java:1007)
	at alluxio.fuse.AlluxioFuseFileSystem.flushInternal(AlluxioFuseFileSystem.java:356)
	at alluxio.fuse.AlluxioFuseFileSystem.lambda$flush$3(AlluxioFuseFileSystem.java:352)
	at alluxio.fuse.AlluxioFuseUtils.call(AlluxioFuseUtils.java:278)
	at alluxio.fuse.AlluxioFuseFileSystem.flush(AlluxioFuseFileSystem.java:352)
	at alluxio.fuse.AlluxioFuseFileSystemTest.flush(AlluxioFuseFileSystemTest.java:209)
...

It passes fine on x86_64.

Any hints what might be the reason and how to debug it ?

@martin-g
Copy link

I think I've found the problem with jnr-ffi - jnr/jnr-ffi#283 (comment)

@martin-g
Copy link

jnr/jnr-ffi#284 fixes the problem with jnr-ffi and AlluxioFuseFileSystemTest passes fine now!

The next problem is with alluxio.hub.agent.process.AgentProcessMonitorTest:

[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 134
[ERROR] Crashed tests:
[ERROR] alluxio.hub.agent.process.AgentProcessMonitorTest
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/ubuntu/git/alluxio/hub/server && /usr/lib/jvm/java-8-openjdk-arm64/jre/bin/java -Djava.net.preferIPv4Stack=true -jar /home/ubuntu/git/alluxio/hub/server/target/surefire/surefirebooter2232650743840537627.jar /home/ubuntu/git/alluxio/hub/server/target/surefire 2021-11-19T13-11-16_018-jvmRun2 surefire5132209916224152351tmp surefire_161141190907714089217tmp

I will try to debug it!

@martin-g
Copy link

The problem seems to be in Netty native. I've updated Netty to 4.1.70 and netty-tcnative-boringssl to 2.0.46 but still it breaks.
@normanmaurer Can you help here ? Do you see what might be the problem in the attached hs_err_pid file ? Thank you!

mvn-dependency-tree.txt
hs_err_pid148138.log

martin-g added a commit to martin-g/alluxio that referenced this issue Nov 22, 2021
martin-g added a commit to martin-g/alluxio that referenced this issue Nov 23, 2021
This fixes AlluxioFuseFileSystemTest on Linux ARM64
Also see jnr/jnr-ffi#284

While here also update jnr-fuse to 0.5.7
@martin-g
Copy link

@ Alluxio team: We have found the problem with Netty - netty/netty-tcnative#681 (comment). The reason is that the Ratis dependency brings libnetty-tcnative-**.so as well.
If netty/netty#11856 is merged then Alluxio will start failing because of this problem.
Apache Ratis project has been notified about this problem: https://issues.apache.org/jira/browse/RATIS-1443

@yuzhu
Copy link
Contributor

yuzhu commented Nov 24, 2021

@martin-g thanks for this finding. should we hold off on any netty upgrades until this issue is resolved?

@martin-g
Copy link

@yuzhu Do you know whether ratis-thirdparty-misc is needed by Alluxio or it just comes as a transitive dependency of org.apache.ratis:ratis-server ?
Because I've excluded it in hub/server/pom.xml and the build passed for several more modules:

[INFO] Alluxio Hub Server 2.8.0-SNAPSHOT .................. SUCCESS [ 41.206 s]
[INFO] Alluxio Assembly - Server 2.8.0-SNAPSHOT ........... SUCCESS [  1.572 s]
[INFO] Alluxio Shaded Libraries 2.8.0-SNAPSHOT ............ SUCCESS [  0.030 s]
[INFO] Alluxio Shaded Libraries - Hadoop 3.3.0 ............ SUCCESS [  0.068 s]
[INFO] Alluxio Integration - Tools 2.8.0-SNAPSHOT ......... SUCCESS [  0.093 s]
[INFO] Alluxio Integration - Tools - HMS 2.8.0-SNAPSHOT ... SUCCESS [ 24.428 s]
[INFO] Alluxio Integration - Validation Tools 2.8.0-SNAPSHOT SUCCESS [ 12.856 s]
[INFO] Alluxio MiniCluster 2.8.0-SNAPSHOT ................. SUCCESS [ 26.619 s]
[INFO] Alluxio Shaded Libraries - Client 2.8.0-SNAPSHOT ... SUCCESS [  0.101 s]
[INFO] Alluxio Shaded Libraries - Ozone 2.8.0-SNAPSHOT .... SUCCESS [  7.329 s]
[INFO] Alluxio Table - Server - UnderDB 2.8.0-SNAPSHOT .... SUCCESS [  0.090 s]
[INFO] Alluxio Table - Server - UnderDB - Hive 2.8.0-SNAPSHOT SUCCESS [ 17.608 s]
[INFO] Alluxio Table - Server - UnderDB - Glue 2.8.0-SNAPSHOT SUCCESS [  7.871 s]
[INFO] Alluxio Tests 2.8.0-SNAPSHOT ....................... FAILURE [49:38 min]

Alluxio Tests failures are related to jnr-fuse:

<testcase name="touchAndLs" classname="alluxio.client.fuse.JNRFuseIntegrationTest" time="8.588">
    <error message="touch: cannot touch &apos;/tmp/alluxio-tests/JNRFuseIntegrationTest-touchAndLs-63fb9625-31f8-47b2-b079-8aa66a95413a/touchTestFile&apos;: Input/output error&#10;" type=
"ExitCodeException exitCode=1"><![CDATA[ExitCodeException exitCode=1: touch: cannot touch '/tmp/alluxio-tests/JNRFuseIntegrationTest-touchAndLs-63fb9625-31f8-47b2-b079-8aa66a95413a/touchT
estFile': Input/output error

        at alluxio.shell.ShellCommand.run(ShellCommand.java:71)
        at alluxio.util.ShellUtils.execCommand(ShellUtils.java:237)
        at alluxio.client.fuse.AbstractFuseIntegrationTest.touchAndLs(AbstractFuseIntegrationTest.java:305)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
       ...

I will investigate more tomorrow!

@martin-g
Copy link

ratis-thirdparty-misc is being used!

[ERROR] /home/ubuntu/git/alluxio/core/server/common/src/main/java/alluxio/master/journal/raft/RaftJournalSystem.java:[68,55] package org.apache.ratis.thirdparty.com.google.protobuf does not exist
[ERROR] /home/ubuntu/git/alluxio/core/server/common/src/main/java/alluxio/master/journal/raft/SnapshotReplicationManager.java:[55,55] package org.apache.ratis.thirdparty.com.google.protobuf does not exist

So yes, you should not upgrade Netty to 4.1.71 or newer before Ratis team fixing https://issues.apache.org/jira/browse/RATIS-1443

martin-g added a commit to martin-g/alluxio that referenced this issue Nov 25, 2021
…erver

ratis-thirdparty-misc brings old versions of netty-tcnative-boringssl-static that breaks the **ProcessMonitorTest's
@martin-g
Copy link

The problem with JNRFuseIntegrationTest is:

2021-11-25 12:51:17,265 [Thread-78] DEBUG fuse.AlluxioFuseFileSystem (AlluxioFuseUtils.java:call) - Enter: getattr(path=/)
2021-11-25 12:51:17,266 [Thread-78] ERROR fuse.AlluxioFuseFileSystem (AlluxioFuseFileSystem.java:getattrInternal) - Failed to get info of /
java.lang.NullPointerException
        at alluxio.client.file.FileSystemContext.acquireClosableClientResource(FileSystemContext.java:505)
        at alluxio.client.file.FileSystemContext.acquireMasterClientResource(FileSystemContext.java:460)
        at alluxio.client.file.BaseFileSystem.rpc(BaseFileSystem.java:570)
        at alluxio.client.file.BaseFileSystem.getStatus(BaseFileSystem.java:274)
        at alluxio.client.file.FileSystem.getStatus(FileSystem.java:341)
        at alluxio.fuse.AlluxioFuseFileSystem.getattrInternal(AlluxioFuseFileSystem.java:391)
        at alluxio.fuse.AlluxioFuseFileSystem.lambda$getattr$4(AlluxioFuseFileSystem.java:385)
        at alluxio.fuse.AlluxioFuseUtils.call(AlluxioFuseUtils.java:278)
        at alluxio.fuse.AlluxioFuseFileSystem.getattr(AlluxioFuseFileSystem.java:384)
        at ru.serce.jnrfuse.AbstractFuseFS.lambda$init$1(AbstractFuseFS.java:99)
        at jnr.ffi.provider.jffi.NativeClosureProxy$$impl$$0.invoke(Unknown Source)

martin-g added a commit to martin-g/alluxio that referenced this issue Nov 29, 2021
@martin-g
Copy link

I will need help with the problem of AlluxioFuseFileSystem on Linux ARM64.

I'm attaching the logs of mvn test -Dtest=JNRFuseIntegrationTest#cat executed on AMD64 and ARM64.
I have the following change in .../tests/src/test/resources/log4j.properties :

diff --git tests/src/test/resources/log4j.properties tests/src/test/resources/log4j.properties
index 06e6a8ad22..8456301807 100644
--- tests/src/test/resources/log4j.properties
+++ tests/src/test/resources/log4j.properties
@@ -51,3 +51,8 @@ log4j.appender.PROXY_LOGGER.MaxFileSize=10MB
 log4j.appender.PROXY_LOGGER.MaxBackupIndex=100
 log4j.appender.PROXY_LOGGER.layout=org.apache.log4j.PatternLayout
 log4j.appender.PROXY_LOGGER.layout.ConversionPattern=%d{ISO8601} [%t] %-5p %c{1} - %m%n
+
+log4j.logger.alluxio.fuse.AlluxioFuseFileSystem=debug
+log4j.logger.alluxio.client.file.FileSystemContext=debug
+log4j.logger.io.netty.handler.ssl=debug
+log4j.logger.io.netty.util.internal.NativeLibraryLoader=debug

On AMD64 I see:

[Thread-41] DEBUG fuse.AlluxioFuseFileSystem (AlluxioFuseUtils.java:call) - Enter: getattr(path=/catTestFile)
 398 -[Thread-41] DEBUG file.FileSystemContext (FileSystemContext.java:acquireClosableClientResource) - ------ POOL 1: alluxio.client.file.FileSystemMasterClientPool@1c32386d
 399 -[Thread-41] DEBUG fuse.AlluxioFuseFileSystem (AlluxioFuseUtils.java:call) - Exit (0): getattr(path=/catTestFile) in 38 ms
 400 -[Thread-42] DEBUG fuse.AlluxioFuseFileSystem (AlluxioFuseUtils.java:call) - Enter: open(path=/catTestFile)

But on ARM64 the Enter: open(path=/catTestFile) is never called.
@SerCeMan Any help/hints would be very welcome !

tests-x64.log
tests-arm64.log

@martin-g
Copy link

martin-g commented Dec 1, 2021

It appeared that jnr-fuse does not support Linux ARM64 at all: SerCeMan/jnr-fuse#14

@JunLuo
Copy link
Contributor

JunLuo commented Mar 30, 2022

hi there, I'm working at building alluxio on arm64(KunpengCPU) but failed with:
Failed to execute goal org.codehaus.mojo:buildnumber-maven-plugin:1.4:create-metadata (default) on project alluxio-core-common: Execution default of goal org.codehaus.mojo:buildnumber-maven-plugin:1.4:create-metadata failed.: NullPointerException -> [Help 1]

and stack is:

[ERROR] Failed to execute goal org.codehaus.mojo:buildnumber-maven-plugin:1.4:create-metadata (default) on project alluxio-core-common: Execution default of goal org.codehaus.mojo:buildnumber-maven-plugin:1.4:create-metadata failed.: NullPointerException -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:buildnumber-maven-plugin:1.4:create-metadata (default) on project alluxio-core-common: Execution default of goal org.codehaus.mojo:buildnumber-maven-plugin:1.4:create-metadata failed.
    at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:306)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:211)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:165)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:157)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:121)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:127)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:294)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
Caused by: org.apache.maven.plugin.PluginExecutionException: Execution default of goal org.codehaus.mojo:buildnumber-maven-plugin:1.4:create-metadata failed.
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:148)
    at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:301)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:211)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:165)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:157)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:121)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:127)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:294)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
Caused by: java.lang.NullPointerException
    at java.util.Hashtable.put (Hashtable.java:460)
    at org.codehaus.mojo.build.CreateMetadataMojo.execute (CreateMetadataMojo.java:188)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
    at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:301)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:211)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:165)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:157)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:121)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:127)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:294)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)

env info:
os is kylin v10(based on linux)
jdk is 1.8
maven is 3.8
alluxio source version:2.7.3

I've add maven profile of os like this below:

    <profile>
            <activation>
                    <os>
                            <family>Linux</family>
                            <arch>aarch64</arch>
                    </os>
            </activation>
    </profile>

but it does not work.

Any suggestion will be grateful.

@martin-g
Copy link

Hi @JunLuo !
IMO the OS and CPU arch are not relevant here.
Your build fails due to buildnumber-maven-plugin, i.e. this plugin fails to collect the needed information.
Do you have .git/ folder in your local repo ?

@JunLuo
Copy link
Contributor

JunLuo commented Mar 30, 2022

Hi @martin-g
Thanks for your reply.
The source is downloaded from tag 2.7.3 and I did not change anything.
I found there was a .github/ folder .
I removed the folder and built again, but still failed with create-metadata failed.: NullPointerException

@martin-g
Copy link

You can work it around by adding -Dmaven.buildNumber.skip=true to the Maven command.
The plugin needs .git/ in the root of the project to get the Git hash and use it as a build number.

Another property that could help is -Dmaven.buildNumber.revisionOnScmFailure=2.7.3. See https://www.mojohaus.org/buildnumber-maven-plugin/create-mojo.html#revisionOnScmFailure

@JunLuo
Copy link
Contributor

JunLuo commented Mar 31, 2022

It work and I success built in kunpeng CPU, Thanks a lot!

@odidev
Copy link

odidev commented Jul 4, 2022

@cheyang , with the continuation of our discussion here, I tried building Alluxio docker images using the build-image.sh script for Linux/ARM64.

I edited go installation for ARM64 in tarball.sh. Also, the maven version 3.6.2-jdk-8 used in the build-image.sh script does not support Linux/ARM64. I updated maven to 3.8.6-openjdk-8. And then executed below command:

$ bash build-image.sh -b branch-2.3-fuse

Build is failing for Linux/ARM64 with the logs below:

[INFO] Alluxio Under File System - Apache Ozone 2.3.1-SNAPSHOT SKIPPED 
[INFO] Alluxio UI 2.3.1-SNAPSHOT .......................... FAILURE [04:54 min] 
[INFO] Alluxio Integration - YARN 2.3.1-SNAPSHOT .......... SKIPPED 
[INFO] ------------------------------------------------------------------------ 
[INFO] BUILD FAILURE 
[INFO] ------------------------------------------------------------------------ 
[INFO] Total time:  06:18 min (Wall Clock) 
[INFO] Finished at: 2022-06-30T12:47:11Z 
[INFO] ------------------------------------------------------------------------ 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (Install npm dependencies for packages) on project alluxio-webui: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1] 

However, after updating maven to 3.8.6-openjdk-8 for Linux/AMD64, build-image.sh also fails with the logs below:

[INFO] Alluxio Under File System - Web 2.3.1-SNAPSHOT ..... SUCCESS [ 15.589 s] 
[INFO] Alluxio Under File System - Apache Ozone 2.3.1-SNAPSHOT FAILURE [ 24.699 s] 
[INFO] Alluxio UI 2.3.1-SNAPSHOT .......................... SUCCESS [06:06 min] 
[INFO] Alluxio Integration - YARN 2.3.1-SNAPSHOT .......... SUCCESS [ 22.833 s] 
[INFO] ------------------------------------------------------------------------ 
[INFO] BUILD FAILURE 
[INFO] ------------------------------------------------------------------------ 
[INFO] Total time:  10:24 min (Wall Clock) 
[INFO] Finished at: 2022-06-30T12:35:32Z 
[INFO] ------------------------------------------------------------------------ 
[ERROR] Failed to execute goal on project alluxio-core-server-common: Could not resolve dependencies for project org.alluxio:alluxio-core-server-common:jar:2.3.1-SNAPSHOT: Failed to collect dependencies at io.atomix.copycat.alluxio:copycat-server:jar:1.2.15: Failed to read artifact descriptor for io.atomix.copycat.alluxio:copycat-server:jar:1.2.15: Could not transfer artifact io.atomix.copycat.alluxio:copycat-server:pom:1.2.15 from/to spring-releases (https://repo.spring.io/libs-release): authentication failed for https://repo.spring.io/libs-release/io/atomix/copycat/alluxio/copycat-server/1.2.15/copycat-server-1.2.15.pom, status: 401 Unauthorized -> [Help 1] 

[ERROR] Failed to execute goal on project alluxio-underfs-ozone: Could not resolve dependencies for project org.alluxio:alluxio-underfs-ozone:jar:2.3.1-SNAPSHOT: Failed to collect dependencies at org.apache.hadoop:hadoop-ozone-client:jar:0.5.0-beta -> org.apache.hadoop:hadoop-ozone-common:jar:0.5.0-beta -> org.apache.hadoop:hadoop-hdfs-client:jar:2.7.3: Failed to read artifact descriptor for org.apache.hadoop:hadoop-hdfs-client:jar:2.7.3: Could not transfer artifact org.apache.hadoop:hadoop-hdfs-client:pom:2.7.3 from/to spring-releases (https://repo.spring.io/libs-release): authentication failed for https://repo.spring.io/libs-release/org/apache/hadoop/hadoop-hdfs-client/2.7.3/hadoop-hdfs-client-2.7.3.pom, status: 401 Unauthorized -> [Help 1] 

It seems that some of the alluxio’s dependencies also need updations to match updated maven version and to support Linux/ARM64.

@naushadh
Copy link

I can confirm targeting ARM platform works, and I successfully built and pushed the image here: https://hub.docker.com/r/naushadh/alluxio/tags

$ git clone https://github.com/Alluxio/alluxio.git
$ cd integrations/docker
$ docker build -t alluxio/alluxio:2.8.1 --platform arm64 .
$ docker tag alluxio/alluxio:2.8.1 naushadh/alluxio:2.8.1
$ docker push naushadh/alluxio:2.8.1

Would be nice if this can be done by the CI process so there is official Docker images that work in ARM.

@duduhandelman
Copy link

Thanks you very much @naushadh
I would like to build 2.9.0 docker for ARM, Anything special I should consider before that?

Thanks
David

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The PR/Issue does not have recent activities and will be closed automatically label Jan 26, 2023
@naushadh
Copy link

@duduhandelman nothing special, you should be able to follow the steps I'd shared above.

@github-actions github-actions bot removed the stale The PR/Issue does not have recent activities and will be closed automatically label Jan 27, 2023
@jja725 jja725 closed this as completed Jan 30, 2023
@michael1589
Copy link

I can confirm targeting ARM platform works, and I successfully built and pushed the image here: https://hub.docker.com/r/naushadh/alluxio/tags

$ git clone https://github.com/Alluxio/alluxio.git
$ cd integrations/docker
$ docker build -t alluxio/alluxio:2.8.1 --platform arm64 .
$ docker tag alluxio/alluxio:2.8.1 naushadh/alluxio:2.8.1
$ docker push naushadh/alluxio:2.8.1

Would be nice if this can be done by the CI process so there is official Docker images that work in ARM.

@naushadh I tried naushadh/alluxio:2.8.1 on my aarch64 machine and get this:

Fri, Mar 24 2023 10:19:57 am | standard_init_linux.go:228: exec user process caused: exec format error

Seems this image can not run on aarch64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-build Build (maven, tarball) and tests type-feature This issue is a feature request
Projects
None yet
Development

No branches or pull requests