Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client Stream of a streaming server can't be Boxed #123

Open
illicitonion opened this issue Dec 1, 2017 · 9 comments · May be fixed by #393
Open

Client Stream of a streaming server can't be Boxed #123

illicitonion opened this issue Dec 1, 2017 · 9 comments · May be fixed by #393
Assignees

Comments

@illicitonion
Copy link
Contributor

I've put a repro at https://github.com/illicitonion/grpc-rs-repro-0 which uses the proto from https://github.com/googleapis/googleapis/blob/master/google/bytestream/bytestream.proto

The call to call_sync, which doesn't move the stream out of the function, reliably works.

The call to call_future, which Boxes the stream, either gives me:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "RpcFailure(RpcStatus { status: Unknown, details: Some(\"Failed to create subchannel\") })"', src/libcore/result.rs:860:4
stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
   1: std::panicking::default_hook::{{closure}}
   2: std::panicking::default_hook
   3: std::panicking::rust_panic_with_hook
   4: std::panicking::begin_panic_new
   5: std::panicking::begin_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::result::unwrap_failed
   9: <core::result::Result<T, E>>::unwrap
  10: grpcrepro::main
  11: __rust_maybe_catch_panic
  12: std::rt::lang_start
  13: main

or

E1201 22:37:55.116403000 140735833875264 call.c:1731]                  assertion failed: grpc_cq_begin_op(call->cq, notify_tag)
Abort trap: 6

I believe both should be safe, and it's possible that the Stream being produced by the client isn't properly owning all of the state it needs.

@BusyJay
Copy link
Member

BusyJay commented Dec 2, 2017

That's because channel get dropped early. But it should return error instead of core.

@illicitonion
Copy link
Contributor Author

If the Call output by Channel::create_call relies on the underlying Channel not being dropped, shouldn't Call either take ownership of the underlying Channel (or an Arc pointing to it), or keep a reference to it (forcing you to manage the lifetime of Channel properly)?

@BusyJay
Copy link
Member

BusyJay commented Dec 5, 2017

Thanks for the suggestion. But channel may become invalid even a reference is kept, which can just ensure it's not dropped by the wrapper. And actually there are two kinds of calls here, client call and server call. For client call, we can get the channel easily, but for server call there is no API to obtain the underlying channel yet. So I think it's better to just return an error to state what's happening.

Besides, when env is dropped, all the completion queues are shutdown too, hence keeping a reference to channel is still useless.

We may need to explain the whole mechanism in our documentations. /cc #114

And I believe the coredump should be fixed by #125.

@stuhood
Copy link

stuhood commented Dec 5, 2017

Besides, when env is dropped, all the completion queues are shutdown too, hence keeping a reference to channel is still useless.

Then it sounds like the clone of the env (or at least an Arc reference to it) should be held in the Stream implementation?

While it's not a memory safety issue, it's definitely a deviation from the implicit contract of Streams, and represents something that should probably be fixed rather than documented.

@Ten0
Copy link
Contributor

Ten0 commented Oct 28, 2019

But channel may become invalid even a reference is kept, which can just ensure it's not dropped by the wrapper.
For client call, we can get the channel easily, but for server call there is no API to obtain the underlying channel yet.

This can, on the server side, happen in a wide variety of contexts.
However, on the server-side, it is unlikely that the server/environment/whatever gets dropped in the middle of a call, so covering the client side would cover the large majority of cases where this is a nuisance.

Currently, simple calls such as :

let answers_client = answers_grpc::AnswersClient::open_with_defaults();
let answers_stream = answers_client
        .as_publish_opt(
            &AnswersRequest::default(),
            call_config(),
        )
        .expect("Could not get answers");

succeed, while these:

let answers_stream = answers_grpc::AnswersClient::open_with_defaults()
        .as_publish_opt(
            &AnswersRequest::default(),
            call_config(),
        )
        .expect("Could not get answers");

don't, and since no lifetimes are involved, this is a terrible trap.

I also feel like Call should hold the Arc on the ChannelInner when built through Channel::create_call().

Another alternative that avoids the issue with the Call being different on the server side and on the client side would be to hold the Arc in the ClientXXXReceivers.

@Ten0 Ten0 linked a pull request Oct 28, 2019 that will close this issue
@Ten0
Copy link
Contributor

Ten0 commented Oct 29, 2019

This PR implements the latter by holding the Channel.

@BusyJay
Copy link
Member

BusyJay commented Oct 31, 2019

Holding Channel inside requests can lead to resource leak. For example, a long run combinator of futures can prevent Receiver or Sender from being dropped, which cause the channel not being released.

The provided example is not a trap if the contract is fully understood. Not all lifetime dependency should be constrained by lifetime. For example, when using a thread pool to spawn a task, no one thinks the task can still succeed after dropping the pool. It's considered a valid case that the future should be resolved as failure.

let f = tokio_threadpool::ThreadPool::new().spawn_handle(lazy(|| Ok::<_, ()>(42)));
f.wait().expect("could not finished.");

@Ten0
Copy link
Contributor

Ten0 commented Oct 31, 2019

To be honest I'm not sure I would necessarily know there's a problem if I see that tokio example somewhere. ^^'
And that is even though the need to call expect("could not finished.") exists for the single reason that futures may end up in a state where the future will never be executed, which undeniably helps in suspecting that behaviour.

In our case, it's pretty clear that the call can fail if there's a networking issue, is cancelled by the server, etc... which already justifies the presence of grpcio::Errors on the stream, so the presence of this error does not force you to think of "hey, you need to ensure by yourself that you don't drop the channel" like the wait() returning a Result does in tokio.

Also, In the case of tokio, I could imagine a scenario where one would think "I'm done with this whole futures set, let's close its thread pool". I can't think of a similar scenario in a case where you're in the middle of a bunch of gRPC calls and you would want to kill the channel, and even less so where having something like channel.close() wouldn't solve it.

Of course "if the contract is fully understood", well, you know. The point is that we've been several people not understanding this contract at first sight, so maybe we could make it easier to understand - be it by changing it or not.

Finally, I don't understand the resource-leak case you are talking about in the first paragraph.
If the combinator of futures isn't meant to ever resolve, why is it spawned/not dropped in the first place ?
Can you give a little more detailed example?

@BusyJay
Copy link
Member

BusyJay commented Nov 1, 2019

Users can close the channel for quickly shutdown or whatever suitable. Documentation needs to explain the mechanism clearly as I suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants