Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libloading support #1541

Closed
Wenzel opened this issue Mar 25, 2019 · 22 comments
Closed

libloading support #1541

Wenzel opened this issue Mar 25, 2019 · 22 comments

Comments

@Wenzel
Copy link

Wenzel commented Mar 25, 2019

Hi,

I would like to rewrite a virtual machine introspection library in Rust:
https://github.com/libvmi/libvmi

This library is building a unified API accross hypervisor's VMI APIs.

I considered bindgen for the task, which would be perfect to generate a kvm-sys or xen-sys crate, but I have a requirement that it should be able to load these libraries dynamically, at runtime, by locating a lib.so file, loading it with libloading. like a plugin.
https://docs.rs/libloading/0.5.0/libloading/

I haven't found anything in the docs regarding dynamic loading.

Is it a use case that have already think about ?
How difficult would it be to implement ?

Thanks !

@emilio
Copy link
Contributor

emilio commented Mar 25, 2019

So, basically what you want, I assume, is something like bindgen, but generating something like:

struct Lib {
    foo: Option<extern "C" fn(i32) -> i32>,
    bar: Option<extern "C" fn(*mut c_void) -> ()>,
}

And such, instead of:

extern "C" {
    fn foo(arg: i32) -> i32;
    fn bar(arg: *mut c_void) -> ();
}

(and co, which is basically what bindgen does)

Is that right? If so, I don't know what the best way to approach this would be.

Tweaking bindgen to generate something like this wouldn't be hard, but it's not clear it'd be flexible enough for the general case...

It should be possible to use syn and such to parse bindgen's output and generate an arbitrary library as well... Depends on the requirements really, and on how stable the API is.

If the API is stable (which it should if it's a dylib), then writing a rust program that processes bindgen's output and outputs the struct you want with some helpers and such should be pretty easy. We could even put it in the repo if it's useful for more people.

I'd be happy to help if you want to give that a shot and get stuck or what not.

@emilio
Copy link
Contributor

emilio commented Mar 25, 2019

I could also be convinced to add a special mode to bindgen that generates what you want, but I think adding it in a separate program would be easier.

@hug-dev
Copy link

hug-dev commented Dec 6, 2019

Hi!
This functionality would also be very helpful for us and I think for every crate loading dynamically a C library with a very big API.
As far as I know, people today seem to manually write the API when using libloading (example with PKCS 11).

Having bindgen (or another program) do it would be fantastic!
I would be happy to have a look on how to do it as a separate program: do you have any leads on where would one start 😃 ?

@emilio
Copy link
Contributor

emilio commented Dec 6, 2019

I think this generally needs a concrete proposal of what would be needed and what kind of inputs would bindgen get. I don't know the use case well enough. I'm not sure if my comments above reflect the use case given I got no reply from the reporter.

In particular, how is bindgen supposed to know which functions are part of the dylib vs. part of some other imported header?

@astraw
Copy link

astraw commented Dec 7, 2019

I am also interested in this.

At least in linux, I think the output of "nm -D" could be used by bindgen to know which functions are in the .so file.

@Wenzel
Copy link
Author

Wenzel commented Dec 7, 2019

@emilio i have to apologize here, I read your reply in public transport In March 2018, but couldn't add a reply.
Then, I just forgot.

So, basically what you want, I assume, is something like bindgen, but generating something like:

struct Lib {
    foo: Option<extern "C" fn(i32) -> i32>,
    bar: Option<extern "C" fn(*mut c_void) -> ()>,
}

Indeed, that's what I'm looking for.
For example, there would be an option in bindgen to optionally generate dynamic bindings:

    let bindings = bindgen::Builder::default()
        // The input header we would like to generate
        // bindings for.
        .header(XEN_HEADERS_WRAPPER)
        // Run rustfmt on the bindings
        .rustfmt_bindings(true)
        // allow dynamic linking
        .allow_dynamic_link(true)
        // Finish the builder and generate the bindings.
        .generate()

And this would generate a struct Lib containiing function pointer, that would be filled when you would run Lib::init().

I could also be convinced to add a special mode to bindgen that generates what you want, but I think adding it in a separate program would be easier.

I think it could be integrated in bindgen, as it's one more option available for users ?

But I'm still interested, more now than before !

Thanks @emilio

@Wenzel
Copy link
Author

Wenzel commented Dec 7, 2019

In particular, how is bindgen supposed to know which functions are part of the dylib vs. part of some other imported header?

When you are loading a library at runtime, your program has to know in advance which symbols it wants to load from this library.
dlopen or libloading doesn't provide a way to enumerate the available symbols.

Also, you have to explicitely define the functions parameters.
Example with xenctrl in libvmi:
https://github.com/libvmi/libvmi/blob/master/libvmi/driver/xen/libxc_wrapper.h#L33

typedef struct {
    void *handle;

    /* Xen 4.1+ */
    xc_interface* (*xc_interface_open)
    (xentoollog_logger *logger, xentoollog_logger *dombuild_logger, unsigned open_flags);

    int (*xc_interface_close)
    (xc_interface *xch);

    int (*xc_version)
    (xc_interface *xch, int cmd, void *arg);

    void* (*xc_map_foreign_range)
    (xc_interface *xch, uint32_t domid, int size, int prot, unsigned long mfn );

So what we could have, is a build.rs where the user specifies which symbols he is interested in, and bindgen can generate the function signature automatically, as well as a struct Lib, where the init() method is calling libloading and filling the function pointers.

what do you think @hug-dev, @emilio ?

@emilio
Copy link
Contributor

emilio commented Dec 8, 2019

At least in linux, I think the output of "nm -D" could be used by bindgen to know which functions are in the .so file.

Right, but this grows quite a dependency on bindgen, plus part of the point of using a dylib is that there can be functions that are not present in different versions of the library. Bindgen itself uses this quite a lot. For example, if you have libclang 9, you'll be able to use some functions that are not present in libclang 5.

Just magically running nm or any other tool like that would mean that generating bindings for a library would depend on the library the user had installed at the point of generating the bindings which is not great.

Listing all the symbols looks fair enough. I think that should be reasonably straight-forward to implement.

Also, btw, this wouldn't be that bad to implement as an input step to bindgen if you're allowed to use some C++. bindgen generates the right thing for this:

extern "C" void my_function();

struct my_type_t {
  int b;
};

extern "C" int my_other_function(struct my_type_t*);

// This could be generated from build.rs pretty easily and used as an input to
// bindgen.
struct Library {
  decltype(my_function)* my_function;
  decltype(my_other_function)* my_other_function;
};

@hug-dev
Copy link

hug-dev commented Dec 9, 2019

In particular, how is bindgen supposed to know which functions are part of the dylib vs. part of some other imported header?

Can it only care about the functions defined in the main header file as opposed to the ones that get imported from it?
I would expect a dynamic library to implement all of the APIs defined in its header file? Maybe I am wrong because I do not see the whole picture or there are other uses cases.
That would make things easier if the developper does not have to write by hand all of the function symbols they want to load.

Concerning integration with libloading, as the get method returns a Symbol<T> where T is the function type maybe it would be convenient for the fields of the Library structure to also be of type Symbol<T>?

So what we could have, is a build.rs where the user specifies which symbols he is interested in, and bindgen can generate the function signature automatically, as well as a struct Lib, where the init() method is calling libloading and filling the function pointer

I agree! A pub fn init<P: AsRef<OsStr>>(filename: P) -> Result<bindgen::Library> wound indeed be excellent if it could:

  • call libloading::Library::new on the filename given to create a new libloading Library instance
  • instantiate a new bindgen::Library instance by assigning each field function_name to lib.get(b"function_name\0").unwrap()

@Wenzel
Copy link
Author

Wenzel commented Mar 9, 2020

hey @emilio , could you give us an update of the status of this issue so far ?

are we still brainstorming on the interface, should we write some specifications, or maybe are you allocating this project for this year's GSoC ?

Thanks.

@hug-dev
Copy link

hug-dev commented Jun 17, 2020

If this issue is still up for grabs, @joechrisellis will be looking at it around mid-july 😃

@joechrisellis
Copy link
Contributor

Hi guys, I have a proposal for what we might want to do in this case. I also have a prototype implementation that I can submit for review if you think this is the right approach. Let me know if you have any thoughts, or if anybody has a better suggestion. :)

I agree with Emilio's comment:

In particular, how is bindgen supposed to know which functions are part of the dylib vs. part of some other imported header?

This is not trivial to determine given the current architecture. For example, if you're including two libraries like so:

let bindings = // ...
    .header("wrapper.h)
    .header("some_other_wrapper.h)
    // ...

There is currently no way to determine whether a given symbol is from wrapper.h or some_other_wrapper.h. Ultimately this means that an implementation like:

let bindings = // ...
    .header("wrapper.h")
    .loading_header("some_other_wrapper.h")
    // ...

Is non-trivial, because by the time we reach codegen the information about which function comes from which library is lost. We have no idea whether to put a given function into a library struct, or just treat it as we would normally.

With that in mind, I am proposing two builder options loading and loading_library_name (or similar) that should be used in this manner:

let bindings = // ...
    .header("wrapper.h")
    .loading(true)
    .loading_library_name("MyLibrary") // optional, will default to the header filename
    // ...

For a wrapper.h containing:

int foo(int x);
void bar(double x);

I would expect this to generate bindings resembling:

/* automatically generated by rust-bindgen 0.54.1 */

pub struct MyLibrary<'a> {
    foo: libloading::Symbol<
        'a,
        unsafe extern "C" fn(x: ::std::os::raw::c_int) -> ::std::os::raw::c_int,
    >,
    bar: libloading::Symbol<'a, unsafe extern "C" fn(x: f64)>,
}
impl<'a> MyLibrary<'a> {
    pub fn new(lib: &libloading::Library) -> MyLibrary {
        unsafe {
            MyLibrary {
                foo: lib.get("foo".as_bytes()).unwrap(),
                bar: lib.get("bar".as_bytes()).unwrap(),
            }
        }
    }
}

With these bindings, you call the functions in the library like this:

#![allow(non_upper_case_globals)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]
include!(concat!(env!("OUT_DIR"), "/bindings.rs"));

extern crate libloading;

pub fn main() {
    let lib = libloading::Library::new("/path/to/lib.so").unwrap();
    let library_wrapper = MyLibrary::new(&lib);
    unsafe { (library_wrapper.bar)(123.0) };
}

By default, all of the symbols in the header file would be included. You could select specific symbols using the whitelist functionality already in bindgen.

With this, whether or not we're generating libloading bindings is a global option for a single invocation of bindgen, which avoids requiring any ownership knowledge about which symbol comes from which header, and whether it is dynamically loaded or not. If the user wants multiple dynamically-loaded libraries, they'd do:

let lib1_bindings = // ...
    .header("dynamic_header1.h")
    .loading(true)
    .loading_library_name("SomeLib1")
    // ...

let lib2_bindings = // ...
    .header("dynamic_header2.h")
    .loading_linking(true)
    .loading_library_name("SomeLib2")
    // ...

let out_path = PathBuf::from(env::var("OUT_DIR").unwrap());
lib1_bindings
    .write_to_file(out_path.join("lib1_bindings.rs"))
    .expect("Couldn't write bindings!");
lib2_bindings
    .write_to_file(out_path.join("lib2_bindings.rs"))
    .expect("Couldn't write bindings!");

And then include them as needed in their code, like this:

#![allow(non_upper_case_globals)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]
extern libloading;
include!(concat!(env!("OUT_DIR"), "/lib1_bindings.rs"));
include!(concat!(env!("OUT_DIR"), "/lib2_bindings.rs"));

There is a pitfall -- if you want to use two dynamically loaded libraries and they have a common include like #include <stdint.h>, you'd get name collisions when you include! them both in Rust. This can be circumvented with:

mod lib1 {
    include!(concat!(env!("OUT_DIR"), "/lib1_bindings.rs"));
}
mod lib2 {
    include!(concat!(env!("OUT_DIR"), "/lib2_bindings.rs"));
}
pub use lib1::Lib1;
pub use lib2::Lib2;

As I said, I have a small prototype implementation of this -- if you guys think this is the right path forwards, I'd be happy to submit it for review! :)

@hug-dev
Copy link

hug-dev commented Jul 21, 2020

The proposal looks very good for me, and would be helpful as one of our potential use-case, to generate Rust bindings to dynamically load a C library.

Some header files also include other files which contains other unrelated functions. At the end of the pre-processing step, the header file might contain a lot of functions that are not needed from the Rust code but are in the "MyLibrary" structure. Dynamically loading those symbols will create errors if they are not implemented by the library. For example, the library that we are looking at includes stdint.h, and in the bindings generated I can see bindings for the C standard functions: free, malloc, calloc, etc...
With the current proposal, it would be possible to only whitelist the functions that we want, but it might be tedious if the API contains a lot of methods.
I am perfectly fine with that as a first step, but would it be possible to differentiate somehow during the preprocessing step between the function that are in the current header file and the ones that get included? And only generate bindings for the former? That would be a great addition, maybe also useful for the generic use-case of bindgen.

Happy to go ahead with this anyway!

@Wenzel @emilio Any opinion on this proposal?

@dergroncki
Copy link

dergroncki commented Jul 26, 2020

@joechrisellis If possible I would like to tryout your prototype (on Windows).

@emilio
Copy link
Contributor

emilio commented Jul 27, 2020

Sorry for the lag replying here, but if people are fine with all functions from a set of bindings ending up in the same LibLoading thing then the proposal above seems sensible. If interested folks could give it a spin and see if it works for them it'd be awesome.

Thanks!

@joechrisellis
Copy link
Contributor

Hi guys, I've made a draft PR here that you might like to check out. There's more information inside the PR. 😄

@dergroncki
Copy link

dergroncki commented Jul 27, 2020

It is not obvious to me how I can test the PR (how to get bindgen which includes the PR). Any hint would be great.

@kulp
Copy link
Member

kulp commented Jul 28, 2020

@dergroncki, maybe the screenshot below will help.
Screen Shot 2020-07-27 at 17 07 43

@joechrisellis
Copy link
Contributor

@dergroncki -- to test it, you can:

  1. Download and install the GitHub CLI.
  2. Clone the bindgen repo: gh repo clone rust-lang/rust-bindgen
  3. Checkout the pull request: cd rust-bindgen && gh pr checkout 1846
  4. In your project which uses bindgen, change the path in your Cargo.toml to point toward the repo you just cloned.
[build-dependencies]
bindgen = { path = "/path/to/rust-bindgen" }

When building your project it should source bindgen locally. You should then be able to use the dynamic loading features as described above. 🙂

@Michael-F-Bryan
Copy link
Contributor

As an alternative to @joechrisellis's proposed solution, my solution was to make a wrapper that uses syn to parse the output from bindgen, replacing all extern functions with a type that'll load the function pointers at runtime.

It's still in the proof-of-concept stage, but I've set up a suite of integration tests and my trivial hello world example (fn add(c_int, c_int) -> c_int) passes.

@hug-dev
Copy link

hug-dev commented Aug 17, 2020

If the bindgen maintainers agree of having this feature in-tree I would say this is better for an ease-of-use (no need of another crate) and a maintenance (the libloading feature is maintained with the rest of bindgen code) point of view.
However there might be things to share between both implementation:

  • the fact that your load_from_path method directly takes a AsRef<::std::ffi::OsStr> might be easier to use
  • storing the libloading::Library pointer inside the struct
    *.storing extern "C" function pointers in the struct as opposed as Symbol from bindgen. That and point before might remove a dependency on libloading for consumers?
  • tests

@joechrisellis
Copy link
Contributor

I agree with @hug-dev here, -- keeping this feature in-tree is probably a better idea for maintenance and avoiding bitrot, but @Michael-F-Bryan's has a lot of nice-to-haves!

the fact that your load_from_path method directly takes a AsRef<::std::ffi::OsStr> might be easier to use

Definitely -- I would be happy to change the implementation in the PR at the moment to reflect this, pending @Michael-F-Bryan's permission!

storing the libloading::Library pointer inside the struct

This is also useful -- I'll change the PR to do this, too.

LoganBarnett pushed a commit to LoganBarnett/rust-bindgen that referenced this issue Dec 2, 2023
Closes rust-lang#1541.
Closes rust-lang#1846.

Co-authored-by: Michael-F-Bryan <michaelfbryan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants