Access to the API from C? #2265

bagder · 2020-08-11T15:16:22Z

If a C program would like to use hyper for h1/h2, is there a C API offered?

seanmonstar · 2020-08-11T15:36:10Z

In short, there isn't yet an API, largely because no one has shown significant interest that one be provided.

I'm interested to hear more of the use case that you're thinking of.

bagder · 2020-08-11T22:20:26Z

I have networking code written in plain old C that is used, well let's say "widely" and it isn't an understatement - let's call it curl. I'm right now researching the feasibility of switching over to using hyper for the the HTTP/1 and HTTP/2 handling parts. For this to work, I would need a C based API that's sensible and stable going forward. A plausible first step to ponder on: optionally (as a build-time option) replace nghttp2 with hyper for HTTP/2.

I was primarily curious if I had missed an existing such API but I now realize I hadn't. I also understand that you possibly don't want to do this for me and my single (still "theoretical") use case, but maybe I'm not alone.

seanmonstar · 2020-08-12T00:17:53Z

Yea, I knew you from curl, just didn't want to assume what you might want to do :)

I've thought about exposing a C API a bunch of times, but hadn't had anyone really ask, so I could only guess at interest. I'm happy to help figure this out, though! We could probably spec out some hyper_c crate that provides a stable C API. Would you be looking to plug in to your own event loop? hyper currently accepts custom read and write objects, and they can provide their own way to notify hyper when IO is ready again. Of course, if not needing to use your own event loop, we could probably expose less internals, but 🤷 .

Can I also ask how "interested" you are? Are you just exploring what things look like? Do you have a deadline you'd hope to have this "theoretical" stuff in curl? Do you think you or others would be able to contribute to building hyper C bindings?

Knowing these things would help me put together a proposal. 😀

bagder · 2020-08-12T09:21:52Z

I would definitely need my own event loop (libcurl itself is event loop agnostic and allows apps to use what they want).

On the level of my interest: it's a little of a chicken and egg problem. If the API is "suitable enough" I'm very interested, as in I'm probably getting funding to work on this project. It's not a done deal though and there's no deadline at all at this point. Right now, without even an API to look at and think about, the project is not feasible at all (as I don't consider myself qualified enough to work effectively on the hyper side of the puzzle)

I of course also have challenges of my own on "my side" of things that may or may not put obstacles in the way independently of your project and C API. I'm not trying to suggest that this just is all just waiting for a C API as the only bump in the road.

I don't even think I yet have a fair grasp of what exactly I would want the API to look like even if I would dream up my ideal API. Early days still. Dipping toes.

Do you think you or others would be able to contribute to building hyper C bindings?

I will certainly join in and do my part - if nothing else for entirely selfish reasons so that the bindings end up as good as possible for my intended use case. After all, I have some experience in making and offering C APIs. I haven't actually asked around anywhere so I don't know what level of interest we can hope for from others.

seanmonstar · 2020-08-13T20:21:57Z

OK, so I'm working on a draft for a header file (hyper.h or whatever). Does that seem useful in your evaluation?

So far the basic stuff is pretty straight forward, requests, responses, headers, the client. The more complex thing to represent I feel is a Future, which in Rust, client.send(req) returns a Future<Output = Result<Response, Error>>. Futures in Rust have a single poll method, and a user calls that to do work, along with a Waker argument. poll either returns the Output, or Pending to say it would need to be polled again at a later time. When it is Pending, it will have stored the Waker somewhere to signal when more work could happen.

In hyper, it's typical that IO is the reason the Future couldn't progress. Internally, it will have called poll_read(waker, buf), and it's assumed that the IO type will register with something (such as epoll), and when ready again, it'd call waker.wake(). That would signal the top future to try again.

When looking in curl, there seems to be some similar concepts in the multi API. It looks like you collect together these transfers into a CURLM, the user waits on epoll (or select or whatever), and then calls curl_multi_perform to make more progress. From a cursory reading, the simpler version will just re-poll all the transfers, and curl_multi_socket_action can be used when you have a larger number of sockets and don't want to try all transfer on ever poll, just the ones that are ready.

The last wrinkle is that in Rust Futures can have different output types, and in hyper, there are potentially a couple of extra "background" futures (so Future<Output = ()>) that the user would need to also poll. I think we can easily provide a hyper_task_pool that you push these futures into that knows which ones to poll. The wrinkle is that some resolve to () (C's void), and some to a Response. So, the hyper_task would need some way to check what return type it has, and pull that out. I suppose that could be a simple internal union...

laudiacay · 2020-08-14T01:46:39Z

Hi, someone I know directed me to this issue- I've done some rust work in the past and am interested in contributing if you think I could be useful.

Just wondering- were y'all considering bindgen at all?

bagder · 2020-08-14T08:46:56Z

libcurl provides a "multi" API that performs any amount of parallel transfers within the same thread. It exists in two flavors, one that assumes the use of select() and one that lets the user plug-in and use their own event system - the latter for better scaling and performance when moving up in the thousands of simultaneous connections. We have users with libevent, libev, libuv, plain epoll and virtually every other known event-based system.

So for curl it is important that support libraries are single-threaded (at least appearing so) and non-blocking.

A typical non-blocking behavior for us, and perhaps one that you could mimic with your Futures, is to offer information in an API about how to poll it for completeness (like are we waiting for READ and/or WRITE on a socket) and/or a timeout when it should get polled even if no event has been detected. The the curl event loop just polls the socket waiting for a trigger event or a timeout, and only if one of those happens it calls the library that then performs whatever it needs needs to do - and can then update the next "waiting conditions".

seanmonstar · 2020-08-19T01:35:32Z

I'm not done yet, but I've made some progress and pushed it into this branch here: https://github.com/hyperium/hyper/tree/hyper-capi

Some relevant pieces:

The header file: https://github.com/hyperium/hyper/blob/hyper-capi/hyper-capi/include/hyper.h
A client example: https://github.com/hyperium/hyper/blob/hyper-capi/hyper-capi/examples/client.c

It doesn't work yet, but does show how the API mostly looks. There's a hyper_io type, which shows how the callbacks work. There's a hyper_executor and hyper_task, for polling async tasks/futures. There's only a few pieces of the header file missing, and then of course the actual code for the headers will be needed.

bagder · 2020-08-21T13:28:58Z

I like the spirit and the initial direction but it also makes it more clear to me that this is pretty challenging. I'm not sure what the best way is to provide feedback or at what point, so forgive me for just blurting out a few random points/questions here:

negative error values for (unsigned) "size_t" really?

That will cause code analyzers/compilers to get unhappy.

hyper_request_set_* - copies the given input string presumably

Just suggesting it could be clarified in the description

How to set/change request headers

Is there a default set? How would a client pass on request headers to use in the request?

how do we get (individual) response headers?

The example just shows a write callback but that's for the body, right? How would a client get response headers?

for a request body, it reads with the callback until

it returns 0?

how do you envision server push to work this this API?

how would trailers work?

if hyper recv/send on the socket, how does it signal connection close?

Ie if the connection is closed, perhaps prematurely, how is that passed to the application here?

how does the client know the transfer is done?

how is HTTPS (TLS really) glued into this?

hyper_request_set_uri

I fear "regular" URL interop issues here if hyper parses a URI that potentially is also parsed separately by the client application - I would prefer setting individual parts to mitigate that risk.

On connections and streams

In a curl world, curl would (most probably) ideally:

parse URL
resolve host name
create the TCP connections
happy eyeball
handshake the TLS
verify the certificate
then engage hyper.

seanmonstar · 2020-08-21T18:25:49Z

Thanks for the feedback! I spend a lot less time writing C, so I may not always think of the best way to express concepts in the language.

negative error values for (unsigned) "size_t"

Yea, I originally started those callbacks with int return values, and changed them to size_t and forgot to update the defines. Though, that leaves me with a question for curl: I notice the write callbacks also return size_t, and there's #define CURL_WRITEFUNC_PAUSE 0x10000001 as a sentinel return value. Does this never cause problems with apps just happening to write that many bytes?

request and response headers

These APIs are missing from the header file while I investigated the best way to pass and return strings in C. The plan in my head is to provide hyper_headers *, with setter, getter (by name), and iterator functions.

The example just shows a write callback but that's for the body, right?

The write callback is not just for the body, but the IO write function for all writing. It will also be called when writing headers, or connection frames in HTTP/2. This is because hyper doesn't assume the IO transport is a file descriptor, it could be anything. The same applies to the read callback. It should just read from a socket (or whatever other source)

The body will be another pollable thing, similar to a hyper_task, that yields some sort of buffer type, which I also need to add.

how do you envision server push to work this this API?

I haven't thought of that yet. It could possibly be something that is optionally polled out of the response future. Is it a needed feature in the first draft?

how would trailers work?

Trailers can also be polled out of the "body" type (in Rust), so it'd just be modeling that similar to polling for bytes.

if hyper recv/send on the socket, how does it signal connection close?

The read/write callbacks do the actual socket IO, so if there was an error, they can return HYPER_IO_ERROR, and set extra details on the userdata perhaps. If the read or write returns 0, so a clean close, but hyper wasn't expecting that yet, it will return an error to which is currently polling (the response future, or the body polls), which will be translated into a HYPERE_SOMETHING code. We could add in a way to get more details about the error, or it can just set on the userdata.

how does the client know the transfer is done?

When the user has the response, and has polled the body until it signals it's finished, then it's done.

how is HTTPS (TLS really) glued into this?

That would be handled by the read and write callbacks on hyper_io. The data that hyper wants to write to the transport is passed, and in those functions it can be encrypted before writing to the socket.

hyper_request_set_uri

I'm curious to hear more. hyper's Uri type in Rust does have the ability to individually set the scheme, authority, and path (modeled after HTTP/2/3), but wasn't sure if that needed to be exposed. We could add that though.

In a curl world, curl would (most probably) ideally:

parse URL
resolve host name
create the TCP connections
happy eyeball
handshake the TLS
verify the certificate
then engage hyper.

I agree completely. I've so far been modeling hyper's hyper::client::conn, which just models HTTP over a single connection. There is also hyper::Client, which has connection management, but I figured that wouldn't be needed at first.

bagder · 2020-08-22T11:12:13Z

a question for curl: I notice the write callbacks also return size_t, and there's #define CURL_WRITEFUNC_PAUSE 0x10000001 as a sentinel return value. Does this never cause problems with apps just happening to write that many bytes?

It's not a problem there because libcurl can never deliver that many bytes in a single callback invocation.

bagder · 2020-08-22T11:20:36Z

hyper_request_set_uri - I'm curious to hear more

URLs and URIs are seriously underspecified today to the degree that you never really know how a parser will deal with the "URL" you give it. See URL-interop for details. Therefor, there's a security problem waiting to happen when we mix multiple parser implementations. (Famously discussed in Orange Tsai's 2017 presentation: Exploiting Url Parsers)

seanmonstar · 2020-09-09T20:06:37Z

To aid with reviewing of the design, I've turned the branch into a pull request (#2278). I hope to make it more usable over the next couple weeks.

DemiMarie · 2020-09-15T22:52:36Z

Some of my own comments:

libcurl handles OOM by returning an error, while Rust libraries typically abort in this case.
Rust libraries often have larger code size than their C counterparts.
I believe that libcurl is often used in embedded environments, where OOM is quite common, and where code size is often a significant limitation.

I would love to see Rust used in libcurl, but I also don’t want it to be a regression for libcurl’s users. If hyper could recover from OOM and have code size comparable to the C code it replaces, I would consider that a major advance for both hyper and the Rust ecosystem in general.

bagder · 2020-09-16T06:07:42Z

Rust libraries typically abort in this case.

I had no idea. That's a horrible way to deal with problems and not the decision to make for system libraries.

Is that what hyper does too? If so, it'll be a concern.

I would love to see Rust used in libcurl, but I also don’t want it to be a regression for libcurl’s users.

I completely and wholeheartedly agree, and in fact you can consider me (as lead developer in the curl project) signed up on the mission to see us move on both these items. Put simply: Hyper cannot be a transparent and good choice for libcurl until it can be used in a way that doesn't cause any notable regressions or issues. The job set out here is thus not done until we're there.

Regarding code size and curl being used in different environments etc: that is indeed true and we should not presume that all users want or can go with Rust solutions - not even when/if we can make them used completely transparently. curl has been used on over 70 operating systems on 20 different CPU architectures. It is quite simply much more portable and versatile than Rust is. The idea from my end is to make curl to optionally use Hyper as a backend (selected at build-time).

XAMPPRocky · 2020-09-16T08:24:10Z

I had no idea. That's a horrible way to deal with problems and not the decision to make for system libraries.

Is that what hyper does too? If so, it'll be a concern.

It's the default behaviour, but you can override it, this would just need to be exposed somehow in the hyper C API. The API is not currently stable, so you have to use a nightly Rust compiler.

std::alloc::set_alloc_error_hook (Tracking Issue)

bagder · 2020-09-16T08:31:03Z

this would just need to be exposed somehow in the hyper C API

Sure, if you think that's a valuable feature to expose. I would be fine with just making the C API always return error on errors...

XAMPPRocky · 2020-09-16T08:55:45Z

Sorry, maybe exposed was the wrong word. It totally could be just an error when you’re using the API. I more meant that there would need to be a function to set the override hook, see the below example.

#![feature(alloc_error_hook)]

#[no_mangle]
pub unsafe extern "C" fn unset_alloc_abort() {
    std::alloc::set_alloc_error_hook(|_| panic!(“alloc error”));
}

#[repr(C)] struct Foo;

#[no_mangle]
pub unsafe extern "C" fn new_foo() -> *mut Foo {
    catch_unwind(|| {
        // Allocations
        Box::into_raw(Box::new(Foo))
    }).unwrap_or(std::ptr::null_mut())
}

a1phyr · 2020-09-16T09:33:13Z

Catching panics is not only nice for OOM error handling, but it is necessary anyway as panicking across FFI boundaries is UB.

mitsuhiko · 2020-09-16T10:29:53Z

The other more insane version could be to longjump out of the error hook.

bagder · 2020-09-16T11:25:06Z

you still want the library to not leak memory or any other resources, even in the OOM situation, so simply "jumping" out of it is rarely the best idea - said without having a clue about the Hyper internals.

KamilaBorowska · 2020-09-16T11:36:05Z

Currently Rust standard library doesn't attempt to handle OOM (for instance, Vec::push may fail in a way that cannot be handled). RFC 2116 does provide a way to handle allocation failures on standard library collections (by either unwinding or try_reserve methods that can yield Result), but so far it's nightly-only.

DemiMarie · 2020-09-16T15:08:17Z

Personally, I don’t believe we should rely on unwinding to handle OOM errors. The devices which have the greatest need to recover from OOM are also the ones where the code size overhead of unwinding is the greatest concern. I greatly prefer the Zig approach, where any function that takes an allocator is passed one explicitly, and where running out of memory is just another error. Maybe I will write an RFC for that someday.

bagder · 2020-09-17T07:36:52Z

The devices which have the greatest need to recover from OOM are also the ones where the code size overhead of unwinding is the greatest concern

I strongly disagree. A system library, any system library, has the obligation to clean up after itself. In all and any circumstances. If a library function call fails, it needs to return an error and it it needs to never leak memory whatever error happened. Applications using these libraries should be able to depend on that, and if the application thinks a memory error, or any other serious error, being returned from said library is a show-stopper then the application can decide to abort or exit. It is not the choice to make for the library. It never is.

This gets even more emphasized if a library (like Hyper) is used by another library (like libcurl), where this second library thinks of this behavior and "rock solidness" to be something to be proud of and is even pushed as a "selling point" to users, and if the sub-library then cannot deliver on this promise then libcurl cannot either and then: 😞

jplatte · 2020-09-17T08:39:32Z

@Badger the comment you replied to did not advocate for the library handling OOMs itself (by exiting, aborting or whatever), just for catching OOMs without unwinding. This would require use of data types with built-in fallible allocation, which is not yet the case for Rust's std types (but that is being worked on IIRC).

bagder · 2020-09-17T08:44:33Z

@jplatte Thanks for clarifying that for me! (reading this again, I see it might come off as an ironic comment but it isn't, as said before I'm a Rust cluebie and I appreciate getting told about facts I seem to have gotten wrong. 👍 from me).

DemiMarie · 2020-09-17T15:56:09Z

@bagder No offence taken. Your comment is one of the best arguments I have seen as to why OOM should not be a fatal error, even though much code and even some system libraries (glib, macOS’s Cocoa, librsvg, cmark, gumbo) considers it to be such. Ironically, I believe that Rust, due to its memory safety and automatic resource cleanup, is in a better position than C to recover from OOM errors, once libstd is updated to provide fallible allocation.

seanmonstar · 2020-09-17T21:26:11Z

std::alloc::set_alloc_error_hook

I've just tried some experiments, and it indeed works just fine to do some panic!(HyperOom) and then be able to catch that in hyper-capi, just returning a null pointer. The only issue to me there is with it being a nightly-only feature. The tracking issue hasn't seen much traffic, maybe stabilizing it can done soon?

I would be fine with just making the C API always return error on errors...

I think it needs to be part of hyper's C API, since it requires setting a global (Rust's alloc error hook), and it'd be problematic if different libraries all try to set that automatically.

DemiMarie · 2020-09-17T21:47:40Z

I think we really need a Rust RFC that makes the default hook panic.

leo60228 · 2020-09-22T18:55:45Z

Unwinding in OOM is undefined behavior, as alloc::alloc::handle_alloc_error is marked as #[rustc_allocator_nounwind].

RalfJung · 2020-10-09T09:02:42Z

Unwinding in OOM is undefined behavior, as alloc::alloc::handle_alloc_error is marked as #[rustc_allocator_nounwind].

That attribute is going away together with the "panic" default hook: rust-lang/rust#76448.

seanmonstar · 2021-01-12T23:36:46Z

With the merge of #2278, I've opened some new issues that are more fine grained (in the A-ffi label), and so I think this can be closed as "done".

seanmonstar added B-rfc Blocked: More comments would be useful in determine next steps. E-hard Effort: hard. Likely requires a deeper understanding of how hyper's internals work. labels Aug 19, 2020

seanmonstar mentioned this issue Sep 9, 2020

Initial C API for hyper #2278

Merged

mitsuhiko mentioned this issue Sep 16, 2020

Tracking issue for the OOM hook (alloc_error_hook) rust-lang/rust#51245

Open

5 tasks

alippai mentioned this issue Sep 16, 2020

Support Hyper (the rust one) encode/uvicorn#779

Closed

seanmonstar mentioned this issue Oct 13, 2020

Location of hyper's C API #2299

Closed

seanmonstar self-assigned this Nov 6, 2020

seanmonstar added A-ffi Area: ffi (C API) C-feature Category: feature. This is adding a new feature. and removed B-rfc Blocked: More comments would be useful in determine next steps. labels Jan 8, 2021

seanmonstar mentioned this issue Jan 12, 2021

Provide way to catch OOM in C API #2396

Open

seanmonstar closed this as completed Jan 12, 2021

ell1e mentioned this issue Feb 5, 2021

Suggestion: add a pointer to libolm on https://matrix.org/sdks/ (detailed suggestion inside) matrix-org/matrix.org#931

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access to the API from C? #2265

Access to the API from C? #2265

bagder commented Aug 11, 2020

seanmonstar commented Aug 11, 2020

bagder commented Aug 11, 2020

seanmonstar commented Aug 12, 2020

bagder commented Aug 12, 2020

seanmonstar commented Aug 13, 2020

laudiacay commented Aug 14, 2020

bagder commented Aug 14, 2020

seanmonstar commented Aug 19, 2020

bagder commented Aug 21, 2020

seanmonstar commented Aug 21, 2020

bagder commented Aug 22, 2020

bagder commented Aug 22, 2020

seanmonstar commented Sep 9, 2020

DemiMarie commented Sep 15, 2020

bagder commented Sep 16, 2020 •

edited

XAMPPRocky commented Sep 16, 2020 •

edited

bagder commented Sep 16, 2020

XAMPPRocky commented Sep 16, 2020 •

edited

a1phyr commented Sep 16, 2020

mitsuhiko commented Sep 16, 2020

bagder commented Sep 16, 2020

KamilaBorowska commented Sep 16, 2020 •

edited

DemiMarie commented Sep 16, 2020

bagder commented Sep 17, 2020

jplatte commented Sep 17, 2020 •

edited

bagder commented Sep 17, 2020 •

edited

DemiMarie commented Sep 17, 2020

seanmonstar commented Sep 17, 2020

DemiMarie commented Sep 17, 2020

leo60228 commented Sep 22, 2020

RalfJung commented Oct 9, 2020

seanmonstar commented Jan 12, 2021

Access to the API from C? #2265

Access to the API from C? #2265

Comments

bagder commented Aug 11, 2020

seanmonstar commented Aug 11, 2020

bagder commented Aug 11, 2020

seanmonstar commented Aug 12, 2020

bagder commented Aug 12, 2020

seanmonstar commented Aug 13, 2020

laudiacay commented Aug 14, 2020

bagder commented Aug 14, 2020

seanmonstar commented Aug 19, 2020

bagder commented Aug 21, 2020

negative error values for (unsigned) "size_t" really?

hyper_request_set_* - copies the given input string presumably

How to set/change request headers

how do we get (individual) response headers?

for a request body, it reads with the callback until

how do you envision server push to work this this API?

how would trailers work?

if hyper recv/send on the socket, how does it signal connection close?

how does the client know the transfer is done?

how is HTTPS (TLS really) glued into this?

hyper_request_set_uri

On connections and streams

seanmonstar commented Aug 21, 2020

bagder commented Aug 22, 2020

bagder commented Aug 22, 2020

seanmonstar commented Sep 9, 2020

DemiMarie commented Sep 15, 2020

bagder commented Sep 16, 2020 • edited

XAMPPRocky commented Sep 16, 2020 • edited

bagder commented Sep 16, 2020

XAMPPRocky commented Sep 16, 2020 • edited

a1phyr commented Sep 16, 2020

mitsuhiko commented Sep 16, 2020

bagder commented Sep 16, 2020

KamilaBorowska commented Sep 16, 2020 • edited

DemiMarie commented Sep 16, 2020

bagder commented Sep 17, 2020

jplatte commented Sep 17, 2020 • edited

bagder commented Sep 17, 2020 • edited

DemiMarie commented Sep 17, 2020

seanmonstar commented Sep 17, 2020

DemiMarie commented Sep 17, 2020

leo60228 commented Sep 22, 2020

RalfJung commented Oct 9, 2020

seanmonstar commented Jan 12, 2021

bagder commented Sep 16, 2020 •

edited

XAMPPRocky commented Sep 16, 2020 •

edited

XAMPPRocky commented Sep 16, 2020 •

edited

KamilaBorowska commented Sep 16, 2020 •

edited

jplatte commented Sep 17, 2020 •

edited

bagder commented Sep 17, 2020 •

edited