New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: memory issue when push large bentos #4207
base: main
Are you sure you want to change the base?
Conversation
7b94cf7
to
e8490ef
Compare
What are the bugs within the current implementation? |
context in https://bentoml-team.slack.com/archives/C02QLC8RB5W/p1695088745009929 as a summary, currently it will take too much memory when do a |
2a5e36f
to
e9745a5
Compare
024c804
to
ccb008b
Compare
Looks like unit tests are failing, maybe because of the requests change...? Should we just use a |
@sauyon you need to change the tests patch |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @Haivilo worth to also test this once the deployment CLI/SDK is more mature and added some test case.
What does this PR address?
supporting limit max memory usage when pushing models
Test case 1:
pushing
bento google--flan-t5-large-service
, model size 2.92 GiBTest case 2:
pushing
bento google--flan-t5-large-service
, model size 12.55 GiBFixes #(issue)
Before submitting:
guide on how to create a pull request.
pre-commit run -a
script has passed (instructions)?those accordingly? Here are documentation guidelines and tips on writting docs.