Document testing to reduce lambda cold start time #693

DavidLawes · 2022-07-28T14:15:29Z

What does this change?

Document the testing we've done to try and reduce the lambda cold start time (no code changes)

mchv · 2022-07-29T11:00:26Z

Few other things you could try from simple to more difficult:

Use arm64 as runtime (only one config change) see documentation
As you mentioned, change the tiered compilation to 1 (a single JVM parameter) see documentation but you need first to reduce dependencies (see below)
reduce dependencies
- do not depends on common
- do not depends on joda
- review if you can drop some of the json library as seems different are used.
Update some dependencies
You should update to scala to 2.13.8 which is unlikely to have an immediate impact but still contains some fixes
You can update to latest cats 2.x (2.8.0) and review if you can update cats effect to 3.x which I think is a transitive dependency and will provide faster implementation but can be a big effort.

Please be careful in the the idea of rewriting in another language.The jvm is really good at managing concurrent IO with excellent performance, while expressing logic at a higher level. I don't think rewriting in another language and achieving good performance (aside of fast cold start) is trivial, but of course remain possible.

DavidLawes · 2022-07-29T15:18:21Z

Thanks for these suggestions @mchv :) I'll take another look at the tiered compilation setting and how to reduce the dependencies. I'd also tested using arm64 as the runtime but didn't notice an impact, I missed this out of my test results (sorry!)

Great insight about jvm vs other languages. At the moment I don't think we have enough data/evidence to suggest migrating the sender lambdas to another language would solve our problem. Our next set of steps include understanding more about how the sender lambdas are operating (inc more detailed breakdown of how time is being spent/lost) & trying to increase the concurrency of our lambdas (e.g. by defining a provisioned concurrency).

If you have any other suggestions or thoughts, would love to hear them!

mchv · 2022-08-01T09:34:14Z

@DavidLawes You are welcome, 2 other small suggestions:

Merge Update postgresql to 42.3.6 #670 because it contains a performance fix since 42.3.5
Try sbt-proguard to shrink the jar size, without obfuscating the code (see -dontobfuscate option). I think @johnduffell as used it in the past.

johnduffell · 2022-08-01T10:07:39Z

I did try sbt-proguard and something else similar once but it only works if you can accurately define the "roots" of your code. Where things rely on reflection to find classes (usually loggers and I think AWS client (v1?) did) then you have to also define those also.
In the end I decided it is a nice idea but in reality it would be flaky and worse still it could fail at runtime due to a missing class, so would require careful testing each deploy.
I think but I'm not sure that "unnecessary code that's executed" has a bigger impact than "code that's in the Jar but not used."

Mario G did a load of work on making lambdas fast, he did a simple one with minimal libraries (upickle/ujson, built in Java http client, built in logger) and it was super fast. I can dig out some info if it's useful. Also he (and Adam F separately and possibly also Regis) did experimentation with native compilation (graalvm) and had some success, but again with some issues around reflection in AWS library. Again I can try to dig more info out if it's useful.

If it is the cold start time that is the issue, I would consider whether it would be economical to either run the lambda in a dry-run mode on a schedule to keep it warm, or pay for a burstable ec2 instance to run all the time which would only be tens of dollars a month (plus cost to maintain it) and might be more efficient in terms of dev time?

mchv · 2022-08-01T11:45:43Z

Thanks @johnduffell, this is super helpful!

DavidLawes · 2022-08-01T12:58:09Z

Thanks @johnduffell - we've been chatting about your ideas :) We think the idea of the dry run to keep the lambdas warm is a great idea, we're definitely going to test it out!

johnduffell · 2022-08-01T13:20:37Z

glad it's useful - let me know if you want any further thoughts or review anything based on my experience!

mchv · 2022-08-02T10:29:14Z

Another quick things to try regarding performance generally (this will not help directly cold start problem) is AWS CodeGuru profiler. You need to set-up for the lambda and then code integration is trivial.

DavidLawes · 2022-08-03T13:31:11Z

Another quick things to try regarding performance generally (this will not help directly cold start problem) is AWS CodeGuru profiler. You need to set-up for the lambda and then code integration is trivial.

Ah, this is super interesting, thank you for sharing! I will add this to our list of initiatives about how to improve lambda performance.

We did some investigation into our lambda timings of our lambdas and we also raised a support case with aws. We believe we may have a problem with concurrent executions of our lambdas (this is backed up by cloudwatch metrics showing how long we have sqs messages waiting on the queue to be processed)

Our next set of tests include:

validating this theory by testing whether, by defining provisioned concurrency for our lambdas, the performance improves
reducing the number of lambdas we're attempting to run in parallel by e.g. increasing batch size

As an aside, I believe our lambda code could be improved. We've seen some lambdas starting to time out after 1m30s when trying to send notifications to firebase. At the moment we make 1 api request per device, but firebase now support making batch requests which could improve our performance/efficiency.

Thought i'd note our current thoughts/plans in case it sparks any other ideas :)

Document testing to reduce lambda cold start time

4f967eb

DavidLawes requested a review from frankie297 July 28, 2022 14:15

frankie297 approved these changes Jul 28, 2022

View reviewed changes

DavidLawes merged commit 9278481 into main Jul 28, 2022

DavidLawes deleted the dlawes/document-lambda-cold-start-time-test branch July 28, 2022 14:30

waisingyiu mentioned this pull request Sep 12, 2022

LIVE-4538 Restrict JVM tiered compilation level to 1 for harvester #752

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document testing to reduce lambda cold start time #693

Document testing to reduce lambda cold start time #693

DavidLawes commented Jul 28, 2022

mchv commented Jul 29, 2022

DavidLawes commented Jul 29, 2022

mchv commented Aug 1, 2022

johnduffell commented Aug 1, 2022

mchv commented Aug 1, 2022

DavidLawes commented Aug 1, 2022

johnduffell commented Aug 1, 2022

mchv commented Aug 2, 2022

DavidLawes commented Aug 3, 2022

Document testing to reduce lambda cold start time #693

Document testing to reduce lambda cold start time #693

Conversation

DavidLawes commented Jul 28, 2022

What does this change?

mchv commented Jul 29, 2022

DavidLawes commented Jul 29, 2022

mchv commented Aug 1, 2022

johnduffell commented Aug 1, 2022

mchv commented Aug 1, 2022

DavidLawes commented Aug 1, 2022

johnduffell commented Aug 1, 2022

mchv commented Aug 2, 2022

DavidLawes commented Aug 3, 2022