-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize/operator: the server side actively detects whether grpc connection is available #3890
Conversation
…ection is available
…ection is available
…ection is available
Codecov Report
@@ Coverage Diff @@
## master #3890 +/- ##
==========================================
- Coverage 62.40% 62.37% -0.04%
==========================================
Files 101 101
Lines 9510 9515 +5
==========================================
Hits 5935 5935
- Misses 3109 3114 +5
Partials 466 466
Continue to review full report at Codecov.
|
/cc @artursouza @yaron2 |
@1046102779 |
If this goes in, it will be part of 1.6 and released in December. |
@yaron2 @artursouza Interested company is willing to use dapr, can you help incorporate this PR in the next version? |
Component: b, | ||
}) | ||
if err != nil { | ||
log.Warnf("error updating sidecar with component %s (%s): %s", c.GetName(), c.Spec.Type, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think debug level is enough to ensure useful logs are not overwhelmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i change it at right.
At present, our dapr ecological library uses log levels. Are there any specifications? I can have a unified look
Can we fix this in 1.5.1? Our adoption process is stuck and 1.6 in Jan 2022 is just too far away. |
@1046102779 hi, can you merge this fix to release-1.5 according to #3939 |
Can you explain exactly what you are seeing and when, and why this is a blocker for you? |
In our company we have to make tests to check the availability and stability before the adoption of open source infrastructures like dapr. As replied in #3752 , I tested dapr-1.4.2 by deleting a fixed number of app pods and restarting them every 30 min in 48 hours to mimic the production situation where pods are created and restarted continuously. Then I found the memories used by sidecar-injectors steady but those of operators' grow all the time till they hit the limit allocated. The memory used by operator then dropped drastically after reaching the limit and grows again and the situation above just repeated over and over. Since the total count of running app pods was fixed all the time and pods were just recreated during the test, I think there might be a resource leakage so I searched issues and found #3752 by @1046102779. Thanks to @1046102779 , #3752 is merged and released 1.5.0 but I found it does not solve the problem finally so @1046102779 made this PR. The reason of this bug is that when app pods restarts, old connections became unavailable and new connections were created. However, operator does not close uavailable channels actively and reclaim the unused memory until the memory limit is reached. #3752 moderates this problem by closing unused channels when components created or updated. This PR solve the problem finally by actively closing unused channels on the fly. Since we confirm there is a resource leakage but not clear of the consequences of drastic memory reclaiming of operators after reaching memory limit(It seems that operators still work when memory reclaimed after reaching limits, but we think it's not normal). I can't adopt 1.4.2 and 1.5.0 with known issues only to wait for a final fix and then tests the newly released version to see if the leakage finally resolved. To be honest, we are java engineers and not very clear of the memory mechanism in Go. In our knowledge, memory leakage could lead to serious consequences so we are seeking reply from community humbly. If this is a common practice to leave the reclaiming to the GC with a drastic memory churn, we will reexamine the test result. |
Thanks @darren-wang for explaining. Based on this being a regression, I recommend and support we cherry pick this into the upcoming hotfix release. |
…ection is available (dapr#3890) * optimize/operator: the server side actively detects whether grpc connection is available * optimize/operator: the server side actively detects whether grpc connection is available * optimize/operator: the server side actively detects whether grpc connection is available Co-authored-by: Long Dai <long.dai@intel.com> Co-authored-by: Artur Souza <artursouza.ms@outlook.com> Co-authored-by: Dapr Bot <56698301+dapr-bot@users.noreply.github.com>
…ection is available (#3890) (#3963) * optimize/operator: the server side actively detects whether grpc connection is available * optimize/operator: the server side actively detects whether grpc connection is available * optimize/operator: the server side actively detects whether grpc connection is available Co-authored-by: Long Dai <long.dai@intel.com> Co-authored-by: Artur Souza <artursouza.ms@outlook.com> Co-authored-by: Dapr Bot <56698301+dapr-bot@users.noreply.github.com> Co-authored-by: yellow chicks <seachen@tencent.com> Co-authored-by: Long Dai <long.dai@intel.com> Co-authored-by: Artur Souza <artursouza.ms@outlook.com> Co-authored-by: Dapr Bot <56698301+dapr-bot@users.noreply.github.com>
…ection is available (dapr#3890) * optimize/operator: the server side actively detects whether grpc connection is available * optimize/operator: the server side actively detects whether grpc connection is available * optimize/operator: the server side actively detects whether grpc connection is available Co-authored-by: Long Dai <long.dai@intel.com> Co-authored-by: Artur Souza <artursouza.ms@outlook.com> Co-authored-by: Dapr Bot <56698301+dapr-bot@users.noreply.github.com> Signed-off-by: x-shadow-man <1494445739@qq.com>
When a large number of pods are destroyed frequently, the server side needs to actively detect whether the connection is available to prevent the leakage of operator channel resources.
#3752