Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootsnap cache incorrect or not being used? #979

Closed
schneems opened this issue Apr 8, 2020 · 25 comments
Closed

Bootsnap cache incorrect or not being used? #979

schneems opened this issue Apr 8, 2020 · 25 comments

Comments

@schneems
Copy link
Contributor

schneems commented Apr 8, 2020

Linked internal tickets

  • 952389

Issue

The bootsnap cache is present after a deploy:

$ heroku run bash
~ $ du -hs tmp/cache
64M tmp/cache
~ $ ls tmp/cache
bootsnap-compile-cache  bootsnap-load-path-cache

However if a rails process runs, the cache size doubles:

~ $ rails runner puts 'done' >> /dev/null
~ $ du -hs tmp/cache
125M  tmp/cache

And running the same command twice makes it faster, this makes me think that bootstrap isn't being used:

$ heroku run bash
Running bash on ⬢ issuetriage... up, run.2775 (Standard-1X)
~ $ time rails runner puts "done"
# ...
real  0m7.197s
user  0m5.760s
sys 0m1.048s
~ $ time rails runner puts "done"
# ...
real  0m2.953s
user  0m2.384s
sys 0m0.492s

We need to investigate why. My first guess would be that the difference between build and runtime file structure prevents the original cache from being used. But that's just a guess.

@casperisfine
Copy link

My first guess would be that the difference between build and runtime file structure prevents the original cache from being used.

I've always been a bit fuzzy on buildpacks, but IIRC rake assets:precompile is ran with the source in $BUILD_DIR (/build?) and then later moved to $APP_DIR (/app?)?

If so then I believe your guess is correct. Bootsnap cache keys are a hash of the source file realpath. So if you move your app, all cache keys change.

So yeah you cache pretty much exactly doubling in size is indicative of that.

Unfortunately I don't think there's a way to really handle that. In Shopify/bootsnap#336 I suggest we could try to store parts of the cache inside the gems own directory, so this could help, but for the app's code cache, the move would still invalidate everything.

Something that you might be interested in integrating though is the bootsnap precompile command I introduced a few month back: Shopify/bootsnap#326

it could allow you to generate a clean bootsnap cache during the build, so that later the app would boot faster once you distribute it.

@schneems
Copy link
Contributor Author

I've always been a bit fuzzy on buildpacks, but IIRC rake assets:precompile is ran with the source in $BUILD_DIR (/build?) and then later moved to $APP_DIR (/app?)?

Short story is that Heroku builds apps in one dire <sha>_build and then runs them in another dir /app.

The full story is that to be able to deploy an app, we have to have a client on a machine that can receive build information (such as source files) and kick off a build. As an implementation detail, this client is built using Heroku which means the output directory of the client is running at the path /app. This has the knock-on effect of that when an app is built on Heroku, it cannot be built in /app (even though this is where it will run) because the build client is already at /app. So instead the build client creates a tmp dir named something like <sha>_build and then builds the app in there.

Once the build is done, then at runtime, the app will live in /app.

If so then I believe your guess is correct. Bootsnap cache keys are a hash of the source file realpath. So if you move your app, all cache keys change.

I do see that this is where the cache keys are generated in bootsnap https://github.com/Shopify/bootsnap/blob/eeed78c5e07f842b61de86870e50e65939f09828/ext/bootsnap/bootsnap.c#L338-L362 and that they're based on absolute paths.

Unfortunately I don't think there's a way to really handle that. In Shopify/bootsnap#336 I suggest we could try to store parts of the cache inside the gems own directory, so this could help, but for the app's code cache, the move would still invalidate everything.

Sprockets had the same problem. I worked to fix it there https://www.schneems.com/blogs/2016-02-18-speeding-up-sprockets. Essentially we split the absolute path into a base path and a relative path. The sass library stores a "base dir" and then from that dir, a relative path inside of the cache file.

Something that you might be interested in integrating though is the bootsnap precompile command I introduced a few month back: Shopify/bootsnap#326

That's super interesting. I wouldn't mind adding an explicit precompile step. I took a look at bootsnap precompile and it looks like it's also using the same absolute paths for cache keys, so that's going to be a non-starter due to the same issue.

Unfortunately I don't think there's a way to really handle that. In Shopify/bootsnap#336 I suggest we could try to store parts of the cache inside the gems own directory, so this could help, but for the app's code cache, the move would still invalidate everything.

I think this proposal would help a lot. The only area I think it might fail is if a dev is doing something like bundle open to modify a source file and they don't invalidate/delete the bootsnap cache.

From my experience with Heroku customer support tickets The vast majority of boot time is spent in requiring dependencies. After that app code is small-ish. Parsing and generating routes. Maybe being able to handle relative paths would allow app code caching.

Do you think there would ever be a world in which bootsnap would support using relative paths for cache keys?

@casperisfine
Copy link

so that's going to be a non-starter due to the same issue.

Right sorry, I was suggesting to call it once moved to /app, but I suppose it's not possible.

The only area I think it might fail is if a dev is doing something like bundle open to modify a source file and they don't invalidate/delete the bootsnap cache.

We can still revalidate the cache with mtime, so that should be too much of an issue.

The vast majority of boot time is spent in requiring dependencies.

Yeah, we have a bunch of outliers here, but what you say likely hold true for the vast majority of apps out there.

Do you think there would ever be a world in which bootsnap would support using relative paths for cache keys?

It's tricky for a few reasons. And I was about to say no, but I had an idea last night:

I think we could do it for gems, e.g. if we were to transform the key such as key.sub(Bundler.bundle_path, '$BUNDLE_PATH'), the gems cache would now be independent from BUNDLE_PATH, meaning you could move it around without invalidating the cache.

For app code however I don't think it's possible (at least for now). We could do the same trick with Dir.pwd, but then if you chdir you could corrupt the cache. That's why Ruby store realpath in $LOADED_FEATURES, and Bootsnap can't really go around that.

To be able to use relative paths for all caches, we'd need to be able to split the path like you did for sprockets. It's worth trying but I'm afraid it would be too slow. A Sprockets cache hit can saves seconds if not minutes, but Ruby's ISeq caching while helpful doesn't save enough time to allow for slow cache key generation.

I'll see if I can find some time to experiment with the $BUNDLE_PATH part today or tomorrow.

@schneems
Copy link
Contributor Author

I'll see if I can find some time to experiment with the $BUNDLE_PATH part today or tomorrow.

Awesome! Thanks

Right sorry, I was suggesting to call it once moved to /app, but I suppose it's not possible.

It's on the list of things we would like to have. Right now we're on v2 of the Buildpack spec. The next version v3 is an open spec from the Cloud Native Computing Foundation (CNCF) and known as Cloud Native Buildpacks (CNB). Under that spec, the build and "launch" as it's called use the same dir structure. The catch is that while we've only just begun re-writing buildpacks, there's no forecast for when our infra will be able to support that spec on Heroku. It's a big unknown.

To be able to use relative paths for all caches, we'd need to be able to split the path like you did for sprockets. It's worth trying but I'm afraid it would be too slow. A Sprockets cache hit can saves seconds if not minutes, but Ruby's ISeq caching while helpful doesn't save enough time to allow for slow cache key generation.

Totally understand. I was also wondering if we could push the AOT generation upstream to something like RubyGems to avoid the cold-cache case, but I'm guessing that the bytecode isn't stable between versions and that's not straightforward.

@casperisfine
Copy link

I was also wondering if we could push the AOT generation upstream to something like RubyGems to avoid the cold-cache case

I tried in MRI without success: https://bugs.ruby-lang.org/issues/16847

Maybe rubygems/bundler would be more open to it. 👋 @deivid-rodriguez was ISeq caching as part of rubygems/bundler ever considered?

casperisfine pushed a commit to Shopify/bootsnap that referenced this issue Jan 28, 2021
Ref: heroku/heroku-buildpack-ruby#979

Heroku buildpacks build the application in one location
and then move it elsewhere. This cause all the paths to change,
hence all bootsnap cache keys to be invalidated.

By replacing $BUNDLE_PATH by a constant string in the cache keys,
we allow the bundler directory to be moved without flushing the cache.

Ideally we'd use a similar substitution for the "app root", but
I need to put more thoughts into it, as I'm not too sure how
best to infer it.
casperisfine pushed a commit to Shopify/bootsnap that referenced this issue Jan 28, 2021
Ref: heroku/heroku-buildpack-ruby#979

Heroku buildpacks build the application in one location
and then move it elsewhere. This cause all the paths to change,
hence all bootsnap cache keys to be invalidated.

By replacing $BUNDLE_PATH by a constant string in the cache keys,
we allow the bundler directory to be moved without flushing the cache.

Ideally we'd use a similar substitution for the "app root", but
I need to put more thoughts into it, as I'm not too sure how
best to infer it.
@deivid-rodriguez
Copy link
Contributor

Hei @casperisfine!

No, I think it has never been considered as far as I recall. I think it should not be considered. In my opinion, it fits ruby-core much better so the way to get this inside the language would be to try to persuade Matz harder (not sure how though). I don't think it fits nicely inside bundler & rubygems area, and I would not be happy with the maintenance overhead. The way I see it, introducing this inside bundler & rubygems would be "a hack" to take advantage of rubygems being used by default to override Matz's decision, which doesn't seem nice.

@casperisfine
Copy link

In my opinion, it fits ruby-core much better

On some points like invalidation I agree, however on some others since rubygems/bundler is already quite opiniated and already change Ruby's require behavior quite a lot, IMHO it fits. It even allows to do things such as precompile the gem during install etc.

would be to try to persuade Matz harder (not sure how though)

I've tried this for a while, failed, so now I'm a bit tired TBH. But yeah people are welcome to try again.

@edmorley
Copy link
Member

edmorley commented Mar 8, 2021

I've just added support for a beta opt-in labs build-in-app-dir, that makes the Heroku build system build in /app (the same location that the app will be run at runtime) rather than the previous location of /tmp/build_<hash>.

This can be enabled via:

heroku labs:enable build-in-app-dir -a <APP_NAME>

I'd be very interested to hear how much of a difference this makes to app boot times at runtime if anyone tries it out :-)

@sergiopantoja
Copy link

@edmorley My boot time was cut in half! No issues so far. Thank you!

Before:

$ heroku run bash
~ $ time rails runner puts "done"
real	0m13.937s
user	0m8.328s
sys	0m0.988s

~ $ time rails runner puts "done"
real	0m6.512s
user	0m4.772s
sys	0m0.460s

~ $ time rails runner puts "done"
real	0m6.482s
user	0m4.860s
sys	0m0.432s

~ $ du -hs tmp/cache
94M	tmp/cache

After enabling the labs feature and redeploying:

$ heroku run bash
~ $ time rails runner puts "done"
real	0m7.150s
user	0m5.028s
sys	0m0.592s

~ $ time rails runner puts "done"
real	0m6.110s
user	0m4.672s
sys	0m0.388s

~ $ time rails runner puts "done"
real	0m6.171s
user	0m4.612s
sys	0m0.448s

~ $ du -hs tmp/cache
49M	tmp/cache

@geoffharcourt
Copy link

geoffharcourt commented Mar 9, 2021

We have what appears to be a much more sprawling application than @sergiopantoja does and we saw a similar proportion of change:

before:

real    0m59.144s (later attempts were 34-38s)
user    0m32.280s
sys     0m3.364s

after:

real    0m36.636s (also 22s, and 19s) 
user    0m28.588s
sys     0m3.064s

@edmorley
Copy link
Member

@sergiopantoja @geoffharcourt Thank you for providing those timings - glad to hear it helped! It will be a bit longer before we can make this new behaviour the default, but that flag is fine to use in the meantime :-)

@dbackeus
Copy link

In our Rails app we reduced boot time from ~16 seconds to ~8 seconds after enabling build-in-app-dir.

Are there any caveats to be aware of?

Any progress to report regarding making this the default?

@brian-kephart
Copy link

I just tried build-in-app-dir. Sadly, with no other changes, my build is hanging on rake assets:precompile. I filed an issue with support so it can be looked at.

@spartchou
Copy link

After enable build-in-app-dir for Production, the boot up time went from 9.968s to 6.367s, which is (9.968 - 6.367) / 9.968 = 36.125..% faster

Before

# First time
~ $ bundle exec rake environment
real	0m9.968s
user	0m6.032s
sys	0m0.768s

# Second time
~ $ bundle exec rake environment
real	0m6.142s
user	0m2.640s
sys	0m0.368s

# Third time
~ $ bundle exec rake environment
real	0m6.045s
user	0m2.536s
sys	0m0.396s

After

~ $ time bundle exec rake environment
real	0m6.367s
user	0m2.748s
sys	0m0.432s

~ $ time bundle exec rake environment
real	0m5.997s
user	0m2.500s
sys	0m0.364s

~ $ time bundle exec rake environment
real	0m6.679s
user	0m2.704s
sys	0m0.328s

@yasulab
Copy link

yasulab commented Mar 25, 2022

I have also tried in my Rails app! Hope this helpful for others, and thanks for the great feature! 😆✨ (coderdojo-japan/coderdojo.jp#1423)

Before: App Boot Time

╭─○ yasulab ‹2.7.3› ~/coderdojo.jp
╰─○ heroku run bash

# 1st try
$ time NEW_RELIC_AGENT_ENABLED=false rails runner puts "Done"
real	0m10.353s
user	0m6.244s
sys	0m0.904s

# 2nd try
$ time NEW_RELIC_AGENT_ENABLED=false rails runner puts "Done"
real	0m5.248s
user	0m2.272s
sys	0m0.360s

# 3rd try
$ time NEW_RELIC_AGENT_ENABLED=false rails runner puts "Done"
real	0m4.031s
user	0m2.204s
sys	0m0.296s

After: App Boot Time

╭─○ yasulab ‹2.7.3› ~/coderdojo.jp
╰─○ heroku run bash

# 1st try
$ time NEW_RELIC_AGENT_ENABLED=false rails runner puts "Done"
real	0m2.416s
user	0m1.224s
sys	0m0.244s

# 2nd try
$ time NEW_RELIC_AGENT_ENABLED=false rails runner puts "Done"
real	0m2.618s
user	0m1.364s
sys	0m0.180s

# 3rd try
$ time NEW_RELIC_AGENT_ENABLED=false rails runner puts "Done"
real	0m2.397s
user	0m1.292s
sys	0m0.224s

@dbackeus
Copy link

dbackeus commented Mar 25, 2022

@spartchou do note that when loading the Rails environment via rake, Rails will automatically disable eager loading (as mentioned in the default generated production.rb). So with rake environment you're only loading a subset of the full application.

rails runner will use eager_load = true but as demonstrated by yasulab above it can be a good idea to disable NewRelic via ENV var for those using it since the NewRelic agent initialization is likely to add a couple of seconds of unrelated boot time.

@kreintjes
Copy link

I just tried the build-in-app-dir feature as well (executed heroku labs:enable build-in-app-dir and then redeployed the app). However unfortunately, for us it doesn't seem to have a large effect on the app boot time (which is also quite inconsistent):

Before

time rails runner puts "Done":

real  0m15.446s
user  0m10.764s
sys 0m1.512s

real  0m10.048s
user  0m6.780s
sys 0m0.516

real  0m8.035s
user  0m5.296s
sys 0m0.424s

After

time rails runner puts "Done":

real  0m12.407s
user  0m7.624s
sys 0m0.484s

real  0m11.645s
user  0m7.280s
sys 0m0.456

real  0m14.172s
user  0m7.824s
sys 0m0.456s

Might we be doing something wrong? Or is our app simply not large enough to really benefit from this yet?

We don't use NEW_RELIC, but we do use some other monitoring tools such as Sentry and Scout. Could that be influencing the timings?

@dbackeus
Copy link

@kreintjes

Bootsnap will only affect the performance of requiring files. If you have wildly varying boot times that would suggest that something else in your initialization process is taking time. Yes, could be related to Sentry and Scout, but you'd have to dig in to your gems / initializers to figure out the details. The bumbler gem sets out to help profile Rails boot time but I've never used it so can't vouch for how helpful it would be.

@brian-kephart
Copy link

I just tried build-in-app-dir. Sadly, with no other changes, my build is hanging on rake assets:precompile. I filed an issue with support so it can be looked at.

If anyone else has this problem, I solved it by adding bin/yarn to my .slugignore. Thanks to @schneems for pointing me toward the binstubs.

@schneems
Copy link
Contributor Author

Theres' a recent article about this as well that I want to link https://dev.to/dbackeus/cut-your-rails-boot-times-on-heroku-in-half-with-a-single-command-514d

@vietqhoang
Copy link

vietqhoang commented Jun 23, 2022

Just want to chime in that we are also seeing great results on our production app after enabling build-in-app-dir.

Before:

$ heroku run bash

$ time rails runner puts "done"
real	0m6.828s
user	0m5.844s
sys	0m0.904s

$ time rails runner puts "done"
real	0m3.072s
user	0m2.688s
sys	0m0.320s

$ time rails runner puts "done"
real	0m2.919s
user	0m2.500s
sys	0m0.372s

$ du -hs tmp/cache
32M	tmp/cache

After:

$ heroku run bash

$ time rails runner puts "done"
real	0m3.066s
user	0m2.664s
sys	0m0.336s

$ time rails runner puts "done"
real	0m3.132s
user	0m2.640s
sys	0m0.432s

$ time rails runner puts "done"
real	0m2.887s
user	0m2.512s
sys	0m0.304s

One problem we ran into and was able to address after identifying the issue is our app's use of a third-party buildpack which cleans the slug after build (https://elements.heroku.com/buildpacks/devforce/heroku-buildpack-cleanup).

We had tmp/cache listed in .slugcleanup, which removed the folder before slug compression. Thus, the app was still experiencing "cold" boots. Removing the tmp/cache from the clean up config addressed the cold boot issue due to the tmp/cache folder now persisting in the compiled slug.


Edit: One other thing to mention. The changes we made did increase the slug size by 10M (8% for our particular app). An increase is an expected result. Just wanted to share this detail.

@schneems schneems closed this as completed Oct 6, 2023
@robotfelix
Copy link

@edmorley Sorry to resurrect this old thread, but is there any way to have build-in-app-dir automatically enabled for review apps too? Having review apps build with a different build process to production takes away some of the production-parity that is part of their appeal!

@edmorley
Copy link
Member

edmorley commented Dec 7, 2023

Unfortunately labs can't currently be enabled via app.json, so there's no easy way to enable this for review apps. For the next generation of buildpacks (Cloud Native Buildpacks) the specification has the build time and run time paths set to the same paths, so they won't suffer from the relocation issue. I'd personally love to roll out build-in-app-dir and make it the default, but given CNBs are coming in the future (with their own sets of breaking changes), it would mean breaking some existing workflows twice compared to waiting for CNBs.

@granth
Copy link

granth commented Dec 7, 2023

You can use a postdeploy script in your app.json, and then call the Heroku Platform API from the script to enable a labs feature. That's what we do for runtime-dyno-metadata. It's awkward but functional.

  "scripts": {
    "postdeploy": "bin/heroku-review-app-setup",
    "pr-predestroy": "bin/heroku-review-app-teardown"
  }

@edmorley
Copy link
Member

edmorley commented Dec 7, 2023

Yes, though unfortunately since postdeploy runs after the initial build, enabling build-in-app-dir at that point will be too late (unless the script also triggers a rebuild, but that will require a bunch of manual wiring up, and also mean even slower provision times for review apps). At that point for many people I would imagine the slight loss of review-app to prod parity would be preferable to the workaround, particularly if they have a staging environment that can catch any rare build-in-app-dir related issues not seen in the review app. (Ultimately a review app is never going to be identical to production; datastore differences, load differences, env var differences etc mean that it's only ever going to be mostly-similar at best.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests