Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Reduce how much code gets monomorphized #1032

Merged
merged 15 commits into from Aug 23, 2022

Conversation

Marwes
Copy link
Contributor

@Marwes Marwes commented Jul 8, 2022

Motivation

tonic has a lot of generic code which causes a lot of monormorphization and therefore bloated compile times and binaries.

Solution

By extracting code which does not need to be generic over a type parameter into non generic (or less generic functions) we can reduce how much code gets instantiated and passed to LLVM.

This gives a ~23% reduction in the output of LLVM IR (according to cargo llvm-lines -p examples --bin streaming-client) and reduces the compilation time of cargo build --release --bin streaming-client by ~1.5s (21s to 19.5s). (Measured by just switching branches between this one and master to force recompilations)

Before

  Lines         Copies       Function name
  -----         ------       -------------
  Lines         Copies       Function name
  -----         ------       -------------
  45542 (100%)  1250 (100%)  (TOTAL)
   3588 (7.9%)     3 (0.2%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}
   3320 (7.3%)     3 (0.2%)  tonic::codec::encode::encode::{{closure}}
   1282 (2.8%)     1 (0.1%)  <tonic::codec::decode::Streaming<T> as futures_core::stream::Stream>::poll_next
   1035 (2.3%)    19 (1.5%)  core::result::Result<T,E>::map_err
    997 (2.2%)    21 (1.7%)  <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
    924 (2.0%)    21 (1.7%)  core::pin::Pin<&mut T>::map_unchecked_mut
    872 (1.9%)     2 (0.2%)  tonic::transport::channel::Channel::connect::{{closure}}
    760 (1.7%)    95 (7.6%)  core::pin::Pin<P>::new_unchecked
    720 (1.6%)     2 (0.2%)  streaming_client::pb::echo_client::EchoClient<T>::bidirectional_streaming_echo::{{closure}}
    710 (1.6%)     1 (0.1%)  tonic::codec::decode::Streaming<T>::decode_chunk
    621 (1.4%)     8 (0.6%)  std::thread::local::LocalKey<T>::try_with
    618 (1.4%)     3 (0.2%)  <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_trailers
    605 (1.3%)     1 (0.1%)  prost::encoding::decode_varint_slice
    566 (1.2%)    13 (1.0%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    555 (1.2%)     3 (0.2%)  <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_data
    455 (1.0%)     7 (0.6%)  tonic::request::Request<T>::map
    430 (0.9%)     1 (0.1%)  streaming_client::main::{{closure}}
    420 (0.9%)     9 (0.7%)  tonic::codec::encode::encode::{{closure}}::{{closure}}
    417 (0.9%)     1 (0.1%)  tonic::transport::channel::endpoint::Endpoint::connect::{{closure}}
    414 (0.9%)     9 (0.7%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}::{{closure}}
    373 (0.8%)     1 (0.1%)  core::ptr::drop_in_place<tonic::codec::encode::encode<tonic::codec::prost::ProstEncoder<streaming_client::pb::EchoRequest>,futures_util::stream::stream::map::Map<futures_util::stream::once::Once<futures_util::future::ready::Ready<streaming_client::pb::EchoRequest>>,core::result::Result<streaming_client::pb::EchoRequest,tonic::status::Status>::Ok>>::{{closure}}>
    373 (0.8%)     1 (0.1%)  core::ptr::drop_in_place<tonic::codec::encode::encode<tonic::codec::prost::ProstEncoder<streaming_client::pb::EchoRequest>,futures_util::stream::stream::map::Map<tokio_stream::stream_ext::throttle::Throttle<tokio_stream::stream_ext::map::Map<tokio_stream::iter::Iter<core::ops::range::Range<usize>>,streaming_client::echo_requests_iter::{{closure}}>>,core::result::Result<streaming_client::pb::EchoRequest,tonic::status::Status>::Ok>>::{{closure}}>
    367 (0.8%)     1 (0.1%)  streaming_client::pb::echo_client::EchoClient<T>::server_streaming_echo::{{closure}}

After

 Lines         Copies       Function name
  -----         ------       -------------
  35425 (100%)  1074 (100%)  (TOTAL)
   1587 (4.5%)     3 (0.3%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}
   1035 (2.9%)    19 (1.8%)  core::result::Result<T,E>::map_err
   1024 (2.9%)    21 (2.0%)  <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
    924 (2.6%)    21 (2.0%)  core::pin::Pin<&mut T>::map_unchecked_mut
    912 (2.6%)     6 (0.6%)  tonic::codec::encode::encode::{{closure}}::{{closure}}
    872 (2.5%)     2 (0.2%)  tonic::transport::channel::Channel::connect::{{closure}}
    720 (2.0%)     2 (0.2%)  streaming_client::pb::echo_client::EchoClient<T>::bidirectional_streaming_echo::{{closure}}
    688 (1.9%)    86 (8.0%)  core::pin::Pin<P>::new_unchecked
    605 (1.7%)     1 (0.1%)  prost::encoding::decode_varint_slice
    591 (1.7%)     3 (0.3%)  <futures_util::stream::try_stream::and_then::AndThen<St,Fut,F> as futures_core::stream::Stream>::poll_next
    586 (1.7%)    13 (1.2%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    582 (1.6%)     3 (0.3%)  <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_data
    455 (1.3%)     7 (0.7%)  tonic::request::Request<T>::map
    430 (1.2%)     1 (0.1%)  streaming_client::main::{{closure}}
    420 (1.2%)     9 (0.8%)  tonic::codec::encode::encode::{{closure}}::{{closure}}::{{closure}}
    417 (1.2%)     9 (0.8%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}::{{closure}}
    417 (1.2%)     1 (0.1%)  tonic::transport::channel::endpoint::Endpoint::connect::{{closure}}
    391 (1.1%)     5 (0.5%)  std::thread::local::LocalKey<T>::try_with
    367 (1.0%)     1 (0.1%)  streaming_client::pb::echo_client::EchoClient<T>::server_streaming_echo::{{closure}}
    347 (1.0%)     1 (0.1%)  tokio::runtime::basic_scheduler::CoreGuard::block_on::{{closure}}
    324 (0.9%)     2 (0.2%)  tonic::transport::service::connection::Connection::connect::{{closure}}
    320 (0.9%)     2 (0.2%)  tokio::park::thread::CachedParkThread::block_on
    314 (0.9%)     4 (0.4%)  tokio::coop::with_budget::{{closure}}
    312 (0.9%)     3 (0.3%)  tonic::codec::encode::encode
    310 (0.9%)     1 (0.1%)  streaming_client::streaming_echo::{{closure}}
    306 (0.9%)     1 (0.1%)  streaming_client::bidirectional_streaming_echo_throttle::{{closure}}
    306 (0.9%)     1 (0.1%)  streaming_client::pb::echo_client::EchoClient<tonic::transport::channel::Channel>::connect::{{closure}}

Closes #849 (supersedes it)

Copy link
Member

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! It would be interesting to know why these changes reduce the llvm ir bloat.

@@ -58,66 +58,62 @@ where
T: Encoder<Error = Status>,
U: Stream<Item = Result<T::Item, Status>>,
{
async_stream::stream! {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly async-stream causes llvm ir bloat? Is this something we can fix in that crate? I personally prefer to maintain this code vs a combinator but I accept that this may be an artifact of using this type of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to bloat it a bit more than the combinator. I figured in this case the use of async_stream was a bit unecessary and the combinator was clear enough. I tweaked it to use map instead which removes the closure. Once IntoFuture gets stable in 1.64 and_then could be used without the closure as well (I think).

@Marwes
Copy link
Contributor Author

Marwes commented Aug 10, 2022

Thanks for taking this on! It would be interesting to know why these changes reduce the llvm ir bloat.

Every combination of types that these functions gets called with gets a unique instantiation of binary code (monomorphized). So if we can shrink these generic functions, usually be moving code into less/non-generic functions rust/llvm can re-use those new functions for every generic function.

@Marwes
Copy link
Contributor Author

Marwes commented Aug 10, 2022

Did a couple of more tweaks to re-use code when the output type is the same.

@Marwes
Copy link
Contributor Author

Marwes commented Aug 10, 2022

With the new changes (and using a different/newer compiler) we go from 47011 to 33329 (-29%)

@LucioFranco
Copy link
Member

Nice! Seems like CI is failing?

Seems to be some weirdness around type inference and `Into` causing the downcasts to fail
Comment on lines +61 to +67
let mut buf = BytesMut::with_capacity(BUFFER_SIZE);

let compression_encoding = if compression_override == SingleMessageCompressionOverride::Disable
{
None
} else {
compression_encoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this code used to be executed lazily. Not sure how much it matters in practice, but I wonder if we want to move this stuff down into the closure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now this is fine, we can follow up if need be.

@LucioFranco
Copy link
Member

Thanks!

@LucioFranco LucioFranco merged commit 523a550 into hyperium:master Aug 23, 2022
@Marwes Marwes deleted the monomorph branch August 24, 2022 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants