Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build native for multiple platforms #40

Open
willcohen opened this issue Jun 24, 2021 · 16 comments
Open

Build native for multiple platforms #40

willcohen opened this issue Jun 24, 2021 · 16 comments

Comments

@willcohen
Copy link
Collaborator

I started to work through this a little this morning.

@desruisseaux it looks like you went down this path before, per commit 3a2a611: "Abandon the attempt to provide native files for different platforms in the same JAR file." Was the reason due to size?

Observations so far:

  • Dockcross seems to generally work, though the current build instructions are a little fragile
  • I can get dynamic to work on Mac as currently written, but trying to convert to static causes errors. I'll try to see if I can get static to work on Linux, though this means that I need to get this whole thing working on dockcross first. The CMake errors that's giving me are leading me to my above question.
  • For static we'll ultimately need to likely build PROJ from source on our own, since most build environments have differing versions available
  • Graal doesn't allow for saving an embedded native library to a temp file and then referencing it, meaning that dynamically linking to a system library may still be an important use case (beyond file size) even if we get a wide enough version of the libraries compiled. We'd need to think through how to enable both options.
@willcohen
Copy link
Collaborator Author

Additional question -- to simplify the compilation/linking of all this glue code, are you all open to considering jumping to use Panama's native interoperability? I think since JDK16 it's gotten enough memory access functions that it works reasonably well, and JDK 17 LTS is coming out rather soon. I can try poking around and seeing if it works at all (via jextract etc). I suspect it'd dramatically simplify the code base if it works since I think we'd be able to excise all the internal C++ code, but it'd mean that the minimum JDK would probably be 16/17 and not 11.

@desruisseaux
Copy link
Collaborator

It was for 3 reasons, with file size indeed one of them:

  • File size (1 Mb per platform for JNI bindings alone, not counting PROJ itself).
  • Because I didn't have the build environment for providing native files for the 3 platforms.
  • Because the compiled JNI bindings was looking for a very specific version of PROJ (the one at the time of building) and broke at the first PROJ update. But this is probably because of my lack of knowledge about how to configure CMake for targeting a range of versions.

We can mitigate the file size issue by providing the Java code in a single JAR file and the JNI bindings in separated files for each platform. This is the strategy applied by JavaFX for example, using a custom Maven plugin for selecting automatically the right JAR file.

@desruisseaux
Copy link
Collaborator

desruisseaux commented Jun 24, 2021

Yes I thought about Panama too. But it is still in incubation even in JDK 17. Maybe more important, in my understanding the current Panama version works with C only, not yet with C++, Fortran, etc. Compatibility with C++ is on their "to do" list, but for a future version. PROJ-JNI code relies extensively on PROJ C++ API and uses a lot of features not available through PROJ C API. It also have some tricky code managing interaction between C++ "smart pointers" and Java garbage collector, and I do not know if fhat functionality would be easy or not to reproduce in Panama.

@willcohen
Copy link
Collaborator Author

willcohen commented Jan 11, 2022

As a quick update here: I've been messing around with this some more, using dtype-next. I still don't have anything clean enough to show in a repo just yet, but this looks like it'll be able to help with the native libraries problem. Pre JDK-17, it uses jna, and for JDK-17+, it uses Panama, can switch between them pretty seamlessly, AND is able to output a Java API rather than force everyone into Clojure. (edit: it also supports Graal, though I haven't quite gotten that working yet here!)

My one request would be that PROJ-JNI hold off on deploying anything to Maven Central in the meantime while I mess around with this a little more -- if this works the way I think it does, this might end up being the most maintenance-free way to get cross-platform native bindings to the JVM as an option (it can also draw from the system library path, of course, so I think there could be -slim variants that don't package anything specifically, to avoid the size issue) with low overhead.

@willcohen
Copy link
Collaborator Author

I still need to clean up the proof-of-concept into a workable repo, but it definitely works! REPL output here:

proj.api> (def p1 (proj-coord-array {:n 1}))
;; => #'proj.api/p1
proj.api> p1
;; => [{:x 0.0, :y 0.0, :z 0.0, :t 0.0}]
proj.api> (native/proj_trans_array (proj-create-crs-to-crs {:source-crs "EPSG:4326" :target-crs "EPSG:3586"}) 1 1 p1)
;; => 0
proj.api> p1
;; => [{:x 3.0250865971411645E7, :y -610981.481754199, :z 0.0, :t 0.0}]

It first initializes an array containing one proj_coord (well, an array with one struct with four doubles created externally) and calls it p1, creates a crs using proj_create_crs_to_crs, and then passes both to proj_trans_array. Everything is running in the JVM except for the original proj lib. It works on Panama for JDK17+, and can fall back to JNA in all other cases.

The main issue is I haven't currently figured out how to deal with pass-by-value functions (hence the need for proj_trans_array, which can take the pointer, versus proj_trans), so this may not immediately be able to mirror the entire API. However, this means that a jar can contain X number of precompiled proj libraries for other platforms and it should work without needing an additional compilation step!

@desruisseaux
Copy link
Collaborator

Is it mirroring the PROJ C API? The C API is specific to PROJ, while the C++ API is very close to the model of ISO 19111 international standard.

@willcohen
Copy link
Collaborator Author

C, though I see no reason the bulk of the C portion of the ISO functionality couldn’t be implemented fairly quickly (@kbevers I suspect I’m entering a minefield here!)

Re smart pointers, the good part about dtype-next is that it has a clean method (pardon the Clojure) for attaching the disposal functions to the pointer objects at the time of creation, so when the JVM decides it’s done with the pointer on the JVM side it’ll call back to PROJ to dispose, or if it’s otherwise allocated memory it can free it too.

There’s definitely a few issues left to work through, but it does seem like there’s some real upsides for sure! It may be that for the most C++-API-like approach, PROJ-JNI is still the best bet, but for the most portable JVM solution, this is another path.

@desruisseaux
Copy link
Collaborator

It is not a minefield, it just depends on the goal. The C API provides relatively opaque objects for CRS definitions and coordinate operations. It is possible to get some information like ellipsoid axis lengths, but not as detailed and unambiguous as what the C++ API allows. On the other side, the C++ API maps ISO 19111 almost fully.

It may be a trade-off between leveraging the power of ISO 19111 or avoiding the need to compile JNI locally. If we can not have both of them in same time, we may consider if we want the two approaches to coexist and how.

@willcohen
Copy link
Collaborator Author

That actually makes a lot of sense in distinguishing the use cases. In a situation detail-oriented enough to want the unambiguity of full compatibility with the spec, maybe it’s not that much extra work having one library installed locally and doing a little legwork to get this compiled for those needs.

@willcohen
Copy link
Collaborator Author

willcohen commented Mar 24, 2022

@kbevers @desruisseaux

Just a quick update here as I keep working through this. The one major remaining task I set for myself before trying to post a first version of a working repo based on #40 (comment) was to figure out fallbacks for users on platforms where the jar doesn't include a natively-compiled PROJ library.

My current thought process is the following, and I'd love some feedback on it:

  • A distributable standard jar could include PROJ (at this point, 9.0.0+) for some set number of common architectures, maybe windows, linux, mac x64 + aarch64 that represent some majority of use cases.
  • There could be eventual other variants for shaded/unshaded that could build for other architectures individually based on less common use cases, and building that could be automated via CI so this doesn't become a nightmare to maintain all the cross-compilation
  • My main concern is ensuring that there's a reasonable fallback set of functionality for platforms using the primary jar that don't have a precompiled binary. To that end, I've got a rickety-but-almost-working script to build PROJ 9 into WASM via emscripten. Still dealing with initializing all the various pieces in the right order, but it seems like the build is getting close to working, with the output wrapper .js looking like this:
...
/** @type {function(...*):?} */
var _proj_coord = Module["_proj_coord"] = createExportWrapper("proj_coord");

/** @type {function(...*):?} */
var _proj_xy_dist = Module["_proj_xy_dist"] = createExportWrapper("proj_xy_dist");

/** @type {function(...*):?} */
var _proj_trans_array = Module["_proj_trans_array"] = createExportWrapper("proj_trans_array");

/** @type {function(...*):?} */
var _proj_create_crs_to_crs = Module["_proj_create_crs_to_crs"] = createExportWrapper("proj_create_crs_to_crs");

/** @type {function(...*):?} */
var _proj_context_get_database_path = Module["_proj_context_get_database_path"] = createExportWrapper("proj_context_get_database_path");
  • This would then mean that if there's no binary version of PROJ, then there's conceivably still a way that those users could run the WASM version on the JVM.

Perhaps more importantly, a WASM build into the mix actually means that the clojure wrapper would then be able to serve double duty.
The .clj version of the wrapper would either call natively to the correct compiled version of proj, or fall back to calling the WASM version for other-platform users of the JVM. I'd create a Java API to obscure all the clojure stuff so that any JVM user should be able to reference the various C interface functions accordingly via the various Panama/JNA/etc paths.

Since clojure also targets javascript, this would then open the door to having an analogous clojurescript .cljs interface to just the WASM version. I would also then want to target a Javascript API that similarly references all the C functions of PROJ, accessed via the WASM version, and I guess that could become an npm library too.

Assuming this works the way I think it will, this'd mean that there'd be a way to have both JVM and JS ecosystems have access to the upstream version of PROJ as-native-as-possible, without needing to deal with C or the native compilation.

The substantial downside, though, as noted above is that this stays C only in terms of interoperability, so all of the C++ API wouldn't be present for either of these two options.

@desruisseaux
Copy link
Collaborator

I think it would be helpful to have the two parts of the work as two separated branches:

  • One branch about including the binary in the JAR file.
  • A separated branch, possibly based on above branch, adding the WASM work.

For the first branch, we can reduce the size of the JAR files by splitting them as below:

  • One JAR file which contains only the Java code. Basically the currently existing JAR file.
  • A set of separated JAR files for the binaries. Each JAR file is for exactly one platform.

The pom.xml file on JAR files for the Linux binaries would look like (simplified):

<groupId>org.osgeo</groupId>
<artifactId>proj-bin</artifactId>
<version>1.0-SNAPSHOT</version>
<plugins>
  <plugin>
    <artifactId>maven-jar-plugin</artifactId>
    <configuration>
      <classifier>linux</classifier>
    </configuration>
  </plugin>
</plugins>

For other platforms, we would use the same pom.xml with only a different value in the <classifier> element. Then in the main pom.xml (the one for the pure Java code), we could have something like:

<dependency>
  <groupId>org.osgeo</groupId>
  <artifactId>proj-bin</artifactId>
  <version>${project.version}</version>
  <classifier>${platform}</classifier>
</dependency>

<profiles>
  <profile>
    <id>linux</id>
    <activation>
      <os>
        <family>unix</family>
      </os>
    </activation>
    <properties>
      <platform>linux</platform>
    </properties>
  </profile>
  <!-- Same for Windows, MacOS, etc. -->
</profiles>

An example is provided in Nexus Tips and Tricks section 5.5.3 (Platform Classifiers). JavaFX use a similar technique, but using a custom Maven plugin instead of Maven classifiers.

With this approach, users would download only the JAR file for their platform and we would not have to restrain the number of supported platforms because of file size concerns. (I wonder however who can manage to build a JAR for each platform…)

@desruisseaux
Copy link
Collaborator

One more thing: the pure Java code is under MIT license, but the proj-bin JAR file containing PROJ binary, if it includes the EPSG database, will have to be under MIT + EPSG terms of use license.

@willcohen
Copy link
Collaborator Author

Makes sense. I'll see if I can figure out the maven method to help point to the right jar. In terms of building, I've been able to get Windows + Mac working via a script, and I can get a bunch of linux architectures to compile with dockcross. The eventual plan, I think, would be to have GitHub Actions (which does support running on a Mac builder and a Windows one) to use the Mac instance to build Mac natively + Linux via dockcross for as many architectures as possible, and Windows natively as well.

@desruisseaux
Copy link
Collaborator

I added a wiki page for a Maven project layout proposal:

https://github.com/OSGeo/PROJ-JNI/wiki/LayoutProposal

Please feel free to edit.

@willcohen
Copy link
Collaborator Author

willcohen commented Jun 17, 2022

Hi all. As a quick followup, it took me much longer than I expected to get sqlite and libtiff to link correctly but I've successfully built a working PROJ 9.0.1 with webassembly using emscripten. This means that for platforms that don't have access to a built binary, it'll be possible to fallback to a JS build of PROJ, and once I get the wasm fully working in pure js, then it should work to use graaljs -- which to my knowledge is pure java -- meaning that native (well, transpiled) proj should work for anyone on the JVM.

Here I allocate an array of one coordinate and transform it via webassembly:

image

In the next few weeks I'll try to get this working prototype posted. There's still a little more cleanup to do!

@hobu

Edit: it works as a prototype on graaljs too, if a little slowly, since I haven't yet figured out how to import the sqlite db into the emscripten filesystem with a graaljs context rather than embedding them in the js itself:

(eval context
    "var o1 = _malloc(32);
     var p1 = Module.HEAPF64.subarray(o1/8, o1/8 + 4);
     var t1 = ccall('proj_create_crs_to_crs','number',['number','string','string','number'],[_proj_context_create(),'EPSG:3586','EPSG:4326',0]);
     ccall('proj_trans_array','number',['number','number','number','number'], [t1, 1, 1, o1]);
     p1")
;; => #object[org.graalvm.polyglot.Value 0x497d8c6f "Float64Array(4)[34.24438675300125, -73.6513909034731, 1.0609979113e-314, 1.600083993565264e-303]"]

@desruisseaux
Copy link
Collaborator

Hello @willcohen. Given that WebAssembly is language-neutral and not particularly related to Java, should this work be in a new project, something like "PROJ-WASM"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants