Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to the API from C? #2265

Closed
bagder opened this issue Aug 11, 2020 · 32 comments
Closed

Access to the API from C? #2265

bagder opened this issue Aug 11, 2020 · 32 comments
Assignees
Labels
A-ffi Area: ffi (C API) C-feature Category: feature. This is adding a new feature. E-hard Effort: hard. Likely requires a deeper understanding of how hyper's internals work.

Comments

@bagder
Copy link
Contributor

bagder commented Aug 11, 2020

If a C program would like to use hyper for h1/h2, is there a C API offered?

@seanmonstar
Copy link
Member

In short, there isn't yet an API, largely because no one has shown significant interest that one be provided.

I'm interested to hear more of the use case that you're thinking of.

@bagder
Copy link
Contributor Author

bagder commented Aug 11, 2020

I have networking code written in plain old C that is used, well let's say "widely" and it isn't an understatement - let's call it curl. I'm right now researching the feasibility of switching over to using hyper for the the HTTP/1 and HTTP/2 handling parts. For this to work, I would need a C based API that's sensible and stable going forward. A plausible first step to ponder on: optionally (as a build-time option) replace nghttp2 with hyper for HTTP/2.

I was primarily curious if I had missed an existing such API but I now realize I hadn't. I also understand that you possibly don't want to do this for me and my single (still "theoretical") use case, but maybe I'm not alone.

@seanmonstar
Copy link
Member

Yea, I knew you from curl, just didn't want to assume what you might want to do :)

I've thought about exposing a C API a bunch of times, but hadn't had anyone really ask, so I could only guess at interest. I'm happy to help figure this out, though! We could probably spec out some hyper_c crate that provides a stable C API. Would you be looking to plug in to your own event loop? hyper currently accepts custom read and write objects, and they can provide their own way to notify hyper when IO is ready again. Of course, if not needing to use your own event loop, we could probably expose less internals, but 🤷 .

Can I also ask how "interested" you are? Are you just exploring what things look like? Do you have a deadline you'd hope to have this "theoretical" stuff in curl? Do you think you or others would be able to contribute to building hyper C bindings?

Knowing these things would help me put together a proposal. 😀

@bagder
Copy link
Contributor Author

bagder commented Aug 12, 2020

I would definitely need my own event loop (libcurl itself is event loop agnostic and allows apps to use what they want).

On the level of my interest: it's a little of a chicken and egg problem. If the API is "suitable enough" I'm very interested, as in I'm probably getting funding to work on this project. It's not a done deal though and there's no deadline at all at this point. Right now, without even an API to look at and think about, the project is not feasible at all (as I don't consider myself qualified enough to work effectively on the hyper side of the puzzle)

I of course also have challenges of my own on "my side" of things that may or may not put obstacles in the way independently of your project and C API. I'm not trying to suggest that this just is all just waiting for a C API as the only bump in the road.

I don't even think I yet have a fair grasp of what exactly I would want the API to look like even if I would dream up my ideal API. Early days still. Dipping toes.

Do you think you or others would be able to contribute to building hyper C bindings?

I will certainly join in and do my part - if nothing else for entirely selfish reasons so that the bindings end up as good as possible for my intended use case. After all, I have some experience in making and offering C APIs. I haven't actually asked around anywhere so I don't know what level of interest we can hope for from others.

@seanmonstar
Copy link
Member

OK, so I'm working on a draft for a header file (hyper.h or whatever). Does that seem useful in your evaluation?

So far the basic stuff is pretty straight forward, requests, responses, headers, the client. The more complex thing to represent I feel is a Future, which in Rust, client.send(req) returns a Future<Output = Result<Response, Error>>. Futures in Rust have a single poll method, and a user calls that to do work, along with a Waker argument. poll either returns the Output, or Pending to say it would need to be polled again at a later time. When it is Pending, it will have stored the Waker somewhere to signal when more work could happen.

In hyper, it's typical that IO is the reason the Future couldn't progress. Internally, it will have called poll_read(waker, buf), and it's assumed that the IO type will register with something (such as epoll), and when ready again, it'd call waker.wake(). That would signal the top future to try again.

When looking in curl, there seems to be some similar concepts in the multi API. It looks like you collect together these transfers into a CURLM, the user waits on epoll (or select or whatever), and then calls curl_multi_perform to make more progress. From a cursory reading, the simpler version will just re-poll all the transfers, and curl_multi_socket_action can be used when you have a larger number of sockets and don't want to try all transfer on ever poll, just the ones that are ready.

The last wrinkle is that in Rust Futures can have different output types, and in hyper, there are potentially a couple of extra "background" futures (so Future<Output = ()>) that the user would need to also poll. I think we can easily provide a hyper_task_pool that you push these futures into that knows which ones to poll. The wrinkle is that some resolve to () (C's void), and some to a Response. So, the hyper_task would need some way to check what return type it has, and pull that out. I suppose that could be a simple internal union...

@laudiacay
Copy link

Hi, someone I know directed me to this issue- I've done some rust work in the past and am interested in contributing if you think I could be useful.

Just wondering- were y'all considering bindgen at all?

@bagder
Copy link
Contributor Author

bagder commented Aug 14, 2020

libcurl provides a "multi" API that performs any amount of parallel transfers within the same thread. It exists in two flavors, one that assumes the use of select() and one that lets the user plug-in and use their own event system - the latter for better scaling and performance when moving up in the thousands of simultaneous connections. We have users with libevent, libev, libuv, plain epoll and virtually every other known event-based system.

So for curl it is important that support libraries are single-threaded (at least appearing so) and non-blocking.

A typical non-blocking behavior for us, and perhaps one that you could mimic with your Futures, is to offer information in an API about how to poll it for completeness (like are we waiting for READ and/or WRITE on a socket) and/or a timeout when it should get polled even if no event has been detected. The the curl event loop just polls the socket waiting for a trigger event or a timeout, and only if one of those happens it calls the library that then performs whatever it needs needs to do - and can then update the next "waiting conditions".

@seanmonstar seanmonstar added B-rfc Blocked: More comments would be useful in determine next steps. E-hard Effort: hard. Likely requires a deeper understanding of how hyper's internals work. labels Aug 19, 2020
@seanmonstar
Copy link
Member

I'm not done yet, but I've made some progress and pushed it into this branch here: https://github.com/hyperium/hyper/tree/hyper-capi

Some relevant pieces:

It doesn't work yet, but does show how the API mostly looks. There's a hyper_io type, which shows how the callbacks work. There's a hyper_executor and hyper_task, for polling async tasks/futures. There's only a few pieces of the header file missing, and then of course the actual code for the headers will be needed.

@bagder
Copy link
Contributor Author

bagder commented Aug 21, 2020

I like the spirit and the initial direction but it also makes it more clear to me that this is pretty challenging. I'm not sure what the best way is to provide feedback or at what point, so forgive me for just blurting out a few random points/questions here:

negative error values for (unsigned) "size_t" really?

That will cause code analyzers/compilers to get unhappy.

hyper_request_set_* - copies the given input string presumably

Just suggesting it could be clarified in the description

How to set/change request headers

Is there a default set? How would a client pass on request headers to use in the request?

how do we get (individual) response headers?

The example just shows a write callback but that's for the body, right? How would a client get response headers?

for a request body, it reads with the callback until

it returns 0?

how do you envision server push to work this this API?

how would trailers work?

if hyper recv/send on the socket, how does it signal connection close?

Ie if the connection is closed, perhaps prematurely, how is that passed to the application here?

how does the client know the transfer is done?

how is HTTPS (TLS really) glued into this?

hyper_request_set_uri

I fear "regular" URL interop issues here if hyper parses a URI that potentially is also parsed separately by the client application - I would prefer setting individual parts to mitigate that risk.

On connections and streams

In a curl world, curl would (most probably) ideally:

  1. parse URL
  2. resolve host name
  3. create the TCP connections
  4. happy eyeball
  5. handshake the TLS
  6. verify the certificate
  7. then engage hyper.

@seanmonstar
Copy link
Member

Thanks for the feedback! I spend a lot less time writing C, so I may not always think of the best way to express concepts in the language.

negative error values for (unsigned) "size_t"

Yea, I originally started those callbacks with int return values, and changed them to size_t and forgot to update the defines. Though, that leaves me with a question for curl: I notice the write callbacks also return size_t, and there's #define CURL_WRITEFUNC_PAUSE 0x10000001 as a sentinel return value. Does this never cause problems with apps just happening to write that many bytes?

request and response headers

These APIs are missing from the header file while I investigated the best way to pass and return strings in C. The plan in my head is to provide hyper_headers *, with setter, getter (by name), and iterator functions.

The example just shows a write callback but that's for the body, right?

The write callback is not just for the body, but the IO write function for all writing. It will also be called when writing headers, or connection frames in HTTP/2. This is because hyper doesn't assume the IO transport is a file descriptor, it could be anything. The same applies to the read callback. It should just read from a socket (or whatever other source)

The body will be another pollable thing, similar to a hyper_task, that yields some sort of buffer type, which I also need to add.

how do you envision server push to work this this API?

I haven't thought of that yet. It could possibly be something that is optionally polled out of the response future. Is it a needed feature in the first draft?

how would trailers work?

Trailers can also be polled out of the "body" type (in Rust), so it'd just be modeling that similar to polling for bytes.

if hyper recv/send on the socket, how does it signal connection close?

The read/write callbacks do the actual socket IO, so if there was an error, they can return HYPER_IO_ERROR, and set extra details on the userdata perhaps. If the read or write returns 0, so a clean close, but hyper wasn't expecting that yet, it will return an error to which is currently polling (the response future, or the body polls), which will be translated into a HYPERE_SOMETHING code. We could add in a way to get more details about the error, or it can just set on the userdata.

how does the client know the transfer is done?

When the user has the response, and has polled the body until it signals it's finished, then it's done.

how is HTTPS (TLS really) glued into this?

That would be handled by the read and write callbacks on hyper_io. The data that hyper wants to write to the transport is passed, and in those functions it can be encrypted before writing to the socket.

hyper_request_set_uri

I'm curious to hear more. hyper's Uri type in Rust does have the ability to individually set the scheme, authority, and path (modeled after HTTP/2/3), but wasn't sure if that needed to be exposed. We could add that though.

In a curl world, curl would (most probably) ideally:

parse URL
resolve host name
create the TCP connections
happy eyeball
handshake the TLS
verify the certificate
then engage hyper.

I agree completely. I've so far been modeling hyper's hyper::client::conn, which just models HTTP over a single connection. There is also hyper::Client, which has connection management, but I figured that wouldn't be needed at first.

@bagder
Copy link
Contributor Author

bagder commented Aug 22, 2020

a question for curl: I notice the write callbacks also return size_t, and there's #define CURL_WRITEFUNC_PAUSE 0x10000001 as a sentinel return value. Does this never cause problems with apps just happening to write that many bytes?

It's not a problem there because libcurl can never deliver that many bytes in a single callback invocation.

@bagder
Copy link
Contributor Author

bagder commented Aug 22, 2020

hyper_request_set_uri - I'm curious to hear more

URLs and URIs are seriously underspecified today to the degree that you never really know how a parser will deal with the "URL" you give it. See URL-interop for details. Therefor, there's a security problem waiting to happen when we mix multiple parser implementations. (Famously discussed in Orange Tsai's 2017 presentation: Exploiting Url Parsers)

@seanmonstar
Copy link
Member

To aid with reviewing of the design, I've turned the branch into a pull request (#2278). I hope to make it more usable over the next couple weeks.

@DemiMarie
Copy link

Some of my own comments:

  • libcurl handles OOM by returning an error, while Rust libraries typically abort in this case.
  • Rust libraries often have larger code size than their C counterparts.
  • I believe that libcurl is often used in embedded environments, where OOM is quite common, and where code size is often a significant limitation.

I would love to see Rust used in libcurl, but I also don’t want it to be a regression for libcurl’s users. If hyper could recover from OOM and have code size comparable to the C code it replaces, I would consider that a major advance for both hyper and the Rust ecosystem in general.

@bagder
Copy link
Contributor Author

bagder commented Sep 16, 2020

Rust libraries typically abort in this case.

I had no idea. That's a horrible way to deal with problems and not the decision to make for system libraries.

Is that what hyper does too? If so, it'll be a concern.

I would love to see Rust used in libcurl, but I also don’t want it to be a regression for libcurl’s users.

I completely and wholeheartedly agree, and in fact you can consider me (as lead developer in the curl project) signed up on the mission to see us move on both these items. Put simply: Hyper cannot be a transparent and good choice for libcurl until it can be used in a way that doesn't cause any notable regressions or issues. The job set out here is thus not done until we're there.

Regarding code size and curl being used in different environments etc: that is indeed true and we should not presume that all users want or can go with Rust solutions - not even when/if we can make them used completely transparently. curl has been used on over 70 operating systems on 20 different CPU architectures. It is quite simply much more portable and versatile than Rust is. The idea from my end is to make curl to optionally use Hyper as a backend (selected at build-time).

@XAMPPRocky
Copy link

XAMPPRocky commented Sep 16, 2020

I had no idea. That's a horrible way to deal with problems and not the decision to make for system libraries.

Is that what hyper does too? If so, it'll be a concern.

It's the default behaviour, but you can override it, this would just need to be exposed somehow in the hyper C API. The API is not currently stable, so you have to use a nightly Rust compiler.

@bagder
Copy link
Contributor Author

bagder commented Sep 16, 2020

this would just need to be exposed somehow in the hyper C API

Sure, if you think that's a valuable feature to expose. I would be fine with just making the C API always return error on errors...

@XAMPPRocky
Copy link

XAMPPRocky commented Sep 16, 2020

Sorry, maybe exposed was the wrong word. It totally could be just an error when you’re using the API. I more meant that there would need to be a function to set the override hook, see the below example.

#![feature(alloc_error_hook)]

#[no_mangle]
pub unsafe extern "C" fn unset_alloc_abort() {
    std::alloc::set_alloc_error_hook(|_| panic!(“alloc error”));
}

#[repr(C)] struct Foo;

#[no_mangle]
pub unsafe extern "C" fn new_foo() -> *mut Foo {
    catch_unwind(|| {
        // Allocations
        Box::into_raw(Box::new(Foo))
    }).unwrap_or(std::ptr::null_mut())
}

@a1phyr
Copy link

a1phyr commented Sep 16, 2020

Catching panics is not only nice for OOM error handling, but it is necessary anyway as panicking across FFI boundaries is UB.

@mitsuhiko
Copy link
Contributor

The other more insane version could be to longjump out of the error hook.

@bagder
Copy link
Contributor Author

bagder commented Sep 16, 2020

you still want the library to not leak memory or any other resources, even in the OOM situation, so simply "jumping" out of it is rarely the best idea - said without having a clue about the Hyper internals.

@KamilaBorowska
Copy link

KamilaBorowska commented Sep 16, 2020

Currently Rust standard library doesn't attempt to handle OOM (for instance, Vec::push may fail in a way that cannot be handled). RFC 2116 does provide a way to handle allocation failures on standard library collections (by either unwinding or try_reserve methods that can yield Result), but so far it's nightly-only.

@DemiMarie
Copy link

Personally, I don’t believe we should rely on unwinding to handle OOM errors. The devices which have the greatest need to recover from OOM are also the ones where the code size overhead of unwinding is the greatest concern. I greatly prefer the Zig approach, where any function that takes an allocator is passed one explicitly, and where running out of memory is just another error. Maybe I will write an RFC for that someday.

@bagder
Copy link
Contributor Author

bagder commented Sep 17, 2020

The devices which have the greatest need to recover from OOM are also the ones where the code size overhead of unwinding is the greatest concern

I strongly disagree. A system library, any system library, has the obligation to clean up after itself. In all and any circumstances. If a library function call fails, it needs to return an error and it it needs to never leak memory whatever error happened. Applications using these libraries should be able to depend on that, and if the application thinks a memory error, or any other serious error, being returned from said library is a show-stopper then the application can decide to abort or exit. It is not the choice to make for the library. It never is.

This gets even more emphasized if a library (like Hyper) is used by another library (like libcurl), where this second library thinks of this behavior and "rock solidness" to be something to be proud of and is even pushed as a "selling point" to users, and if the sub-library then cannot deliver on this promise then libcurl cannot either and then: 😞

@jplatte
Copy link
Contributor

jplatte commented Sep 17, 2020

@Badger the comment you replied to did not advocate for the library handling OOMs itself (by exiting, aborting or whatever), just for catching OOMs without unwinding. This would require use of data types with built-in fallible allocation, which is not yet the case for Rust's std types (but that is being worked on IIRC).

@bagder
Copy link
Contributor Author

bagder commented Sep 17, 2020

@jplatte Thanks for clarifying that for me! (reading this again, I see it might come off as an ironic comment but it isn't, as said before I'm a Rust cluebie and I appreciate getting told about facts I seem to have gotten wrong. 👍 from me).

@DemiMarie
Copy link

@bagder No offence taken. Your comment is one of the best arguments I have seen as to why OOM should not be a fatal error, even though much code and even some system libraries (glib, macOS’s Cocoa, librsvg, cmark, gumbo) considers it to be such. Ironically, I believe that Rust, due to its memory safety and automatic resource cleanup, is in a better position than C to recover from OOM errors, once libstd is updated to provide fallible allocation.

@seanmonstar
Copy link
Member

std::alloc::set_alloc_error_hook

I've just tried some experiments, and it indeed works just fine to do some panic!(HyperOom) and then be able to catch that in hyper-capi, just returning a null pointer. The only issue to me there is with it being a nightly-only feature. The tracking issue hasn't seen much traffic, maybe stabilizing it can done soon?

I would be fine with just making the C API always return error on errors...

I think it needs to be part of hyper's C API, since it requires setting a global (Rust's alloc error hook), and it'd be problematic if different libraries all try to set that automatically.

@DemiMarie
Copy link

I think we really need a Rust RFC that makes the default hook panic.

@leo60228
Copy link

Unwinding in OOM is undefined behavior, as alloc::alloc::handle_alloc_error is marked as #[rustc_allocator_nounwind].

@RalfJung
Copy link
Contributor

RalfJung commented Oct 9, 2020

Unwinding in OOM is undefined behavior, as alloc::alloc::handle_alloc_error is marked as #[rustc_allocator_nounwind].

That attribute is going away together with the "panic" default hook: rust-lang/rust#76448.

@seanmonstar seanmonstar self-assigned this Nov 6, 2020
@seanmonstar seanmonstar added A-ffi Area: ffi (C API) C-feature Category: feature. This is adding a new feature. and removed B-rfc Blocked: More comments would be useful in determine next steps. labels Jan 8, 2021
@seanmonstar
Copy link
Member

With the merge of #2278, I've opened some new issues that are more fine grained (in the A-ffi label), and so I think this can be closed as "done".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ffi Area: ffi (C API) C-feature Category: feature. This is adding a new feature. E-hard Effort: hard. Likely requires a deeper understanding of how hyper's internals work.
Projects
None yet
Development

No branches or pull requests