Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add native support for C elements #66

Draft
wants to merge 215 commits into
base: master
Choose a base branch
from

Conversation

Berstanio
Copy link
Contributor

This PR is not fully ready yet, but I open it anyway to make room for discussion, be it conceptionally, about the design or what else!

Motivation

jnigen makes it easy to write JNI code, to integrate libraries. This PR tries to take this a step further, providing automatic java jnigen code generation for a header file, while also adding support for pointer types, structs, unions, callbacks.
The focus for development was put on:

  • Correctness/Portability
  • Thinness
  • Avoiding reflection
  • Inlinability, static linking and avoidance of symbol exposure

Interface design

Methods

Methods are bound by just generating jnigen wrapper code. This function in C int testFunc(int test); would be generated as:

    public static int testFunc(int test) {
        return testFunc_internal(test);
    }

Callbacks

Callbacks are implemented as functional interfaces. They look something like:

    public interface methodWithCallbackFloatArg extends Closure {
        void methodWithCallbackFloatArg_call(float arg0);
    }

To pass them to the native side, a ClosureObject needs to be constructed around them. Calling ClosureObject.fromClosure(your_callback); will create one.
A method expecting a closure looks like this:

public static void call_methodWithCallback(ClosureObject<methodWithCallback> fnPtr)

IMPORTANT: Closures do have a manual memory life cycle. They need to be freed manually. More on that later

Enum

C enums are implemented as java enums which have an ID (id != ordinal). Important to note is, that C enums can have duplicate ID's, while the java enums can't. To solve this, enums with the same ID will be merged together with an "_".

Struct/Union (The term "StackElement" will refer to both of them)

StackElements are implemented as classes. They are only pass by value. They have getter and setter methods, that are named by the field name. They can either have a manual or automatic lifecycle, more on that later. Unions can not be passed in closures.

Pointer

There are a lot of pointer types. What they have all in common is, that they can have manual or automatic memory management and that they all point to something. Their address can be retrieved by getPointer()
The basic ones are FloatPointer, DoublePointer and VoidPointer. They are pretty self explanatory.
CSizedIntPointer is a pointer, to an C integer. A CSizedIntPointer has a backing CType, that defines the type it points to. e.g. new CSizedIntPointer("int"); or new CSizedIntPointer("char");. This is needed, to correctly calculate the size of the CSizedIntPointer on a specific machine. The CType needs to be set to whatever is it supposed to be used for. If I want a int*, I need to do new CSizedIntPointer("int");. jnigen will do bound and type checks to ensure correctness.

Every Enum/StackElement has a inner class, that is their pointer type.
A StackElement can be converted to a pointer in two ways:

  1. StackElement#asPointer reinterprets the StackElement as a pointer. Every change to the pointer will be reflected on the StackElement
  2. Creating a Pointer and calling set, this will copy the StackElement
    The same goes for converting back, asStackElement reinterprets the address and get copies the StackElement.

Last we have PointerPointer<T extends Pointing>. This class is used for every pointer, that goes deeper than one layer, like void**. When creating a PointerPointer you need to pass a supplier, how to create the dereference pointer object.
For a float** this would look like: new PointerPointer(FloatPointer::new). A int** can be created by: new PointerPointer<>(CSizedIntPointer.pointerPointer("int"));
The API gets a bit cumbersome for a depth of 3+, but this is very rare.

GC

Every C element, except Enums, are bound to dynamic memory management.
Closures always need manual memory management, if you don't need them anymore, you need to call ClosureObject#free.
All others have the rule: If they are created by java code and you don't set them to manual management manually, they will be freed by the GC. If the come from native code (even if they maybe origin in java code), than they are under manual management and it is your responsibility to free them, if needed.

Exceptions

C++ exceptions are handled and implemented as a CXXException. The code should be compiled with -fexceptions to work properly.

Implementation details

About functions

The native part is implemented like this:

    static private native int testFunc_internal(int test);/*
    	HANDLE_JAVA_EXCEPTION_START()
    	CHECK_AND_THROW_C_TYPE(env, int, test, 0, return 0);
    	return (jint)testFunc((int)test);
    	HANDLE_JAVA_EXCEPTION_END()
    	return 0;
    */

So lets dissect this.
First of all, this design for static linking.
But one major caveat is, that a "jint" is not guaranteed to have the same size as a "int", the downcast might be non-functional. To address this, the generator is supposed to always pick the java type, that is guaranteed to hold the C type in all cases (the generator is therefor supposed to run on a 64bit machine). On the C side, we introduce a runtime check:
CHECK_AND_THROW_C_TYPE(env, int, test, 0, return 0);
This is a mostly compile-time macro, that is supposed to check the bounds of an number, whether it fits in a C type. If it fails, it will throw a java exception.
Than we have HANDLE_JAVA_EXCEPTION_START/END(). This is just a C++ try catch, that does three things:

  1. If a closure throws a Java exception, this will be converted to a JavaExceptionMarker C++ exception. Now if we encounter a JavaExceptionMarker, we will set the backing java exception and eat the C++ exception. This allows throwing java exceptions through C code.
  2. We encounter a std::exception, than we will create a CXXException based on exception::what();
  3. We encounter something else thrown, than we will just call it an unknown error on java side.

If a Java exception is converted to a JavaExceptionMarker, it will need to read the stacktrace for exception::what(). This can be expensive and disabled with CHandler.setDisableCXXExceptionMessage(true);

About FFITypes.java

Every binding process will generate a FFITypes class. The purpose of this class is, to map CTypes to libFFI types. The generator will emit a compile-time macro for every c type it encounters. This way on runtime, the exact size details for on this platform can be retrieved. FFITypes also handles struct FFI types.

About Closures

Closures are implemented with libFFI closures. They define a signature, that is retrieved from the FFITypes. Below is an example:

    public interface methodWithCallbackCharArg extends Closure {

        CTypeInfo[] __ffi_cache = new CTypeInfo[] { FFITypes.getCTypeInfo(-2), FFITypes.getCTypeInfo(7) };

        void methodWithCallbackCharArg_call(char arg0);

        default CTypeInfo[] functionSignature() {
            return __ffi_cache;
        }

        default void invoke(JavaTypeWrapper[] parameters, JavaTypeWrapper returnType) {
            methodWithCallbackCharArg_call((char) parameters[0].asLong());
        }
    }

A function call from C -> Java works like the following:
The ffi closure will do the argument packing and call callbackHandler in the CHandler class. There we will go over all arguments in the args array. If we encounter any number, we will copy it. If we encounter a struct, we allocated a new pointer, put the struct into it, and put the pointer into the new args array.
Now we allocate a DirectByteBuffer that wraps the pointer, and pass it to java with CHandler#dispatchCallback, where it will be unpacked into java values.

About CHandler#reExportSymbolsGlobally

On unix, System#load calls dlopen to open a shared library. On linux, this defaults to RTLD_LOCAL, on mac to RTLD_GLOBAL. To have easier symbol resolution when opening depending shared libs, we rexport the symbols on linux explicitly with RTLD_GLOBAL.

This would be all for the moment, if I have forgotten something I will append it. If there are any questions left, please ask!

Outstanding tasks

  1. Add CI builds for all targets
  2. Test on 32bit machines and other
  3. Test it on actual libraries to bind
  4. variadic is unsupported
  5. Performance tests

@Berstanio Berstanio marked this pull request as draft April 16, 2024 16:22
@Moonlils
Copy link

So much work! I can see this PR becoming very handy🥰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants