Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size_of_val in a generator can make the generator bigger #62321

Open
tmandry opened this issue Jul 3, 2019 · 16 comments
Open

size_of_val in a generator can make the generator bigger #62321

tmandry opened this issue Jul 3, 2019 · 16 comments
Labels
A-coroutines Area: Coroutines C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tmandry
Copy link
Member

tmandry commented Jul 3, 2019

Consider the following code:

async fn foo() {
  let mut x = get_future();
  dbg!(std::mem::size_of_val(&x));
  x.await
}

Today, having the dbg! line roughly doubles the size of the future returned by foo. More precisely, it causes us to allocate storage for x twice (once for x, and once for the pinned variable that x is moved into inside the await).

This unfortunate state of events is caused by the fact that we cannot "peer into" size_of_val and see that the address of x never escapes the function. Without knowing this, we can't optimize away the storage of x once it has been moved.

This was first discussed here: #59123 (comment). One promising solution (suggested by @RalfJung) is that we inline all occurrences of size_of_val (and possibly some other intrinsics) in a MIR pass, before we get to the generator transform.

@csmoe csmoe added the A-coroutines Area: Coroutines label Jul 3, 2019
@Centril
Copy link
Contributor

Centril commented Jul 3, 2019

It would be good to elaborate on why size_of_val deserves an optimization specifically; you said something about not wanting to have our own "uncertainty principle"?

@Centril Centril added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-heavy Issue: Problems and improvements with respect to binary size of generated code. labels Jul 3, 2019
@tmandry
Copy link
Member Author

tmandry commented Jul 3, 2019

It would be good to elaborate on why size_of_val deserves an optimization specifically; you said something about not wanting to have our own "uncertainty principle"?

My thinking is that people will often want to use size_of_val to debug the size of their futures. For example, they might use it to look at the size of the sub-futures they await on, as in the example above.

In particular, since you can't name the type returned by an async fn today, you have to use size_of_val and borrow the future.

It's definitely going to confuse people that adding size_of_val to their async fn makes the future returned by the async fn bigger. It makes everything about optimizing for size seem brittle and confusing. I think this might be okay for an MVP (and you can still use size_of_val as long as you only look at one "nesting level" at a time), but it definitely isn't desirable from a "maturity" standpoint.

@Matthias247
Copy link
Contributor

I'm still not sure why size_of_val is the problem? It seems like anything that references a future before it's awaited would cause the issue?

More precisely, it causes us to allocate storage for x twice (once for x, and once for the pinned variable that x is moved into inside the await).
This unfortunate state of events is caused by the fact that we cannot "peer into" size_of_val and see that the address of x never escapes the function. Without knowing this, we can't optimize away the storage of x once it has been moved.

Shouldn't this not somehow proved by the borrow-checker?

I'm obviously not into compiler details and therefore don't understand the exact approach, but I'm wondering whether

  • x could be treated as a real stack object as long as certain conditions have not been met (e.g. it isn't .awaited or pinned yet). Opposed to also living in the generator storage.
  • x could be directly allocated in the generator storage once and then doesn't need another copy. Maybe this doesn't work because some kind of return value optimization is missing?

@withoutboats
Copy link
Contributor

Why can't x be pinned in the location it is at at size_of_val? To my naive perspective, it seems like the problem is with await doing an actual memcpy move, instead of merely conceptually consuming the value?

@tmandry
Copy link
Member Author

tmandry commented Jul 8, 2019

I'm still not sure why size_of_val is the problem? It seems like anything that references a future before it's awaited would cause the issue?

@Matthias247 #59087 tracks the problem of borrows disabling optimizations in general. In order to fix this general problem, we probably need

  1. Smarter dataflow analysis that tells us when no more borrows or pointers of any kind exist (e.g. similar to what the borrow checker is doing, but stricter so we don't make unsafe code UB).
  2. To inline functions in MIR, so we can apply our analysis (1) to cases like foo(&x).

For size_of_val(&x), it turns out that we only need (2) and not (1), because once we inline size_of_val the borrow vanishes (we replace the entire expression with a static size).

However, this comes with its own challenge (3): we need to compute the size of our type before MIR inlining, or otherwise stick a "placeholder" value in MIR that gets replaced later. We'll have to tackle this challenge regardless of whether we solve the "general" problem, so this issue exists to track this particular case.

EDIT: Now I see that the example in #59087 was originally using size_of_val, but I think it's good to have one issue that's tracking the broader case.

@tmandry
Copy link
Member Author

tmandry commented Jul 8, 2019

Why can't x be pinned in the location it is at at size_of_val? To my naive perspective, it seems like the problem is with await doing an actual memcpy move, instead of merely conceptually consuming the value?

@withoutboats Yes, that would work, and it might be easier than the solution I'm describing. We can't do this with a desugaring today, though, which AFAIK is how we implement .await. @cramertj tried to do something like this with drop_in_place, but it requires the original binding to be mutable: #59123 (comment)

@cramertj
Copy link
Member

cramertj commented Jul 8, 2019

@cramertj tried to do something like this with drop_in_place, but it requires the original binding to be mutable: #59123 (comment)

This is one of the reasons I've been pushing for let mut to be a deny-by-default lint rather than a hard error.

@tmandry
Copy link
Member Author

tmandry commented Jul 8, 2019

However, this comes with its own challenge (3): we need to compute the size of our type before MIR inlining, or otherwise stick a "placeholder" value in MIR that gets replaced later. We'll have to tackle this challenge regardless of whether we solve the "general" problem, so this issue exists to track this particular case.

Actually, could we "inline" size_of_val(&x) to size_of::<T>(), even though T is not nameable in the surface syntax? That would be a pretty clean solution that sidesteps issues with monomorphization. cc @eddyb

@tmandry
Copy link
Member Author

tmandry commented Jul 8, 2019

An alternative to inlining is to eliminate the move in await, sidestepping this issue altogether. See #59087 (comment).

@eddyb
Copy link
Member

eddyb commented Jul 9, 2019

Isn't there a SizeOf in MIR already? If it's not a nulary operator, you can add it as one.

And, yes, you can totally name the type. During and after typeck, that type parameter of size_of_val has a type applied to it in the caller (starts out as inference variable, and is resolved to some type during typeck).

(The reason you want a SizeOf in the MIR is intrinsics are not lang items and it's easier to support the operation directly)

@jkarneges
Copy link
Contributor

you can't name the type returned by an async fn

Are there plans to make this possible someday? More generically, I think this would be a matter of allowing opaque types to be named?

This would also be useful for storing non-boxed Futures in a container like Vec. I managed to get this to work using type inference, but it felt a little dirty.

@BatmanAoD
Copy link
Member

Is it guaranteed that async fn will return an impl Future? If so, you will at least be able to make a type alias at some point: #63063

That said, I don't see how that could help with getting the size of an async fn -> return type.

@jkarneges
Copy link
Contributor

jkarneges commented Sep 5, 2019

I think it could help? Here's getting the size using size_of rather than size_of_val, by constructing the opaque type and throwing it away, without taking a reference:

async fn foo() {
}

fn size_of<T>(_t: T) -> usize {
    std::mem::size_of::<T>()
}

fn main() {
    println!("{}", size_of(foo()));
}

So if it were possible to write the type name, then a constructed value shouldn't be needed?

@BatmanAoD
Copy link
Member

That's not even using a type alias, though! I didn't know you could just call async fn's directly from sync fns, but your code actually works as-is in nightly: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=0929f776acca7565497bab5acfe0fe13

@eddyb
Copy link
Member

eddyb commented Sep 5, 2019

@jkarneges You can just return a pair of both T and usize, though, and that should let you both log the size and keep using the future.

@jkarneges
Copy link
Contributor

Hmm, I suppose what I'm asking for is not an alias for all kinds of an opaque type, but an alias for a single kind of an opaque type. For example, if I have two async fn's foo() and bar(), each returning distinct impl Future types, it would be useful to be able to individually name the return types of each one. If the return type of foo() could be named FooFuture, that could make it possible to write mem::size_of::<FooFuture>() or Vec<FooFuture> without having to construct the type or rely on type inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-coroutines Area: Coroutines C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests