Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document testing to reduce lambda cold start time #693

Merged
merged 1 commit into from Jul 28, 2022

Conversation

DavidLawes
Copy link
Contributor

What does this change?

Document the testing we've done to try and reduce the lambda cold start time (no code changes)

@DavidLawes DavidLawes merged commit 9278481 into main Jul 28, 2022
@DavidLawes DavidLawes deleted the dlawes/document-lambda-cold-start-time-test branch July 28, 2022 14:30
@mchv
Copy link
Member

mchv commented Jul 29, 2022

Few other things you could try from simple to more difficult:

  • Use arm64 as runtime (only one config change) see documentation
  • As you mentioned, change the tiered compilation to 1 (a single JVM parameter) see documentation but you need first to reduce dependencies (see below)
  • reduce dependencies
    • do not depends on common
    • do not depends on joda
    • review if you can drop some of the json library as seems different are used.
  • Update some dependencies
  • You should update to scala to 2.13.8 which is unlikely to have an immediate impact but still contains some fixes
  • You can update to latest cats 2.x (2.8.0) and review if you can update cats effect to 3.x which I think is a transitive dependency and will provide faster implementation but can be a big effort.

Please be careful in the the idea of rewriting in another language.The jvm is really good at managing concurrent IO with excellent performance, while expressing logic at a higher level. I don't think rewriting in another language and achieving good performance (aside of fast cold start) is trivial, but of course remain possible.

@DavidLawes
Copy link
Contributor Author

Thanks for these suggestions @mchv :) I'll take another look at the tiered compilation setting and how to reduce the dependencies. I'd also tested using arm64 as the runtime but didn't notice an impact, I missed this out of my test results (sorry!)

Great insight about jvm vs other languages. At the moment I don't think we have enough data/evidence to suggest migrating the sender lambdas to another language would solve our problem. Our next set of steps include understanding more about how the sender lambdas are operating (inc more detailed breakdown of how time is being spent/lost) & trying to increase the concurrency of our lambdas (e.g. by defining a provisioned concurrency).

If you have any other suggestions or thoughts, would love to hear them!

@mchv
Copy link
Member

mchv commented Aug 1, 2022

@DavidLawes You are welcome, 2 other small suggestions:

@johnduffell
Copy link
Member

I did try sbt-proguard and something else similar once but it only works if you can accurately define the "roots" of your code. Where things rely on reflection to find classes (usually loggers and I think AWS client (v1?) did) then you have to also define those also.
In the end I decided it is a nice idea but in reality it would be flaky and worse still it could fail at runtime due to a missing class, so would require careful testing each deploy.
I think but I'm not sure that "unnecessary code that's executed" has a bigger impact than "code that's in the Jar but not used."

Mario G did a load of work on making lambdas fast, he did a simple one with minimal libraries (upickle/ujson, built in Java http client, built in logger) and it was super fast. I can dig out some info if it's useful. Also he (and Adam F separately and possibly also Regis) did experimentation with native compilation (graalvm) and had some success, but again with some issues around reflection in AWS library. Again I can try to dig more info out if it's useful.

If it is the cold start time that is the issue, I would consider whether it would be economical to either run the lambda in a dry-run mode on a schedule to keep it warm, or pay for a burstable ec2 instance to run all the time which would only be tens of dollars a month (plus cost to maintain it) and might be more efficient in terms of dev time?

@mchv
Copy link
Member

mchv commented Aug 1, 2022

Thanks @johnduffell, this is super helpful!

@DavidLawes
Copy link
Contributor Author

Thanks @johnduffell - we've been chatting about your ideas :) We think the idea of the dry run to keep the lambdas warm is a great idea, we're definitely going to test it out!

@johnduffell
Copy link
Member

glad it's useful - let me know if you want any further thoughts or review anything based on my experience!

@mchv
Copy link
Member

mchv commented Aug 2, 2022

Another quick things to try regarding performance generally (this will not help directly cold start problem) is AWS CodeGuru profiler. You need to set-up for the lambda and then code integration is trivial.

@DavidLawes
Copy link
Contributor Author

Another quick things to try regarding performance generally (this will not help directly cold start problem) is AWS CodeGuru profiler. You need to set-up for the lambda and then code integration is trivial.

Ah, this is super interesting, thank you for sharing! I will add this to our list of initiatives about how to improve lambda performance.

We did some investigation into our lambda timings of our lambdas and we also raised a support case with aws. We believe we may have a problem with concurrent executions of our lambdas (this is backed up by cloudwatch metrics showing how long we have sqs messages waiting on the queue to be processed)

Our next set of tests include:

  • validating this theory by testing whether, by defining provisioned concurrency for our lambdas, the performance improves
  • reducing the number of lambdas we're attempting to run in parallel by e.g. increasing batch size

As an aside, I believe our lambda code could be improved. We've seen some lambdas starting to time out after 1m30s when trying to send notifications to firebase. At the moment we make 1 api request per device, but firebase now support making batch requests which could improve our performance/efficiency.

Thought i'd note our current thoughts/plans in case it sparks any other ideas :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants