New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support variable sized pod-CIDRs for different-sized node (was: Allow --node-cidr-mask-size to change: make NodeIPAM controller support variable sized CIDRs) #90922
Comments
/sig network |
It would be nice if node ipam can handle cases where either cluster-cidr or node-cidr-mask-size changes but generally I don't think changing them is supported. |
/triage unresolved Comment 🤖 I am a bot run by vllry. 👩🔬 |
This has some consequences that I don't think Kubernetes can handle right now because it will imply that you have to delete all the nodes that overlap with the new mask. |
The problem is we use bitmap record the cidr is used or not now, but if we increase --node-cidr-mask-size, maybe more than one node use the cidr, so we need to know how many nodes use the cidr, if is zero we can use the cidr. So I try use a map record how many nodes use the cidr, you can see #90926 and help me review the code, thank you. |
ahhh ok, then you don't have to modify the bitmap, you just need to modify the Occupy() method to check if any of the bits are set between the for i := begin; i <= end; i++ {
if s.used.Bit(i) {
return fmt.Errorf("error some of the CIDRs are already allocated")
}
} kubernetes/pkg/controller/nodeipam/ipam/cidrset/cidr_set.go Lines 224 to 239 in 7094849
The comment on the method explicitly says I still have doubts that node-mask resizing can be supported, I think that the |
The problem is node // Release releases the given CIDR range.
func (s *CidrSet) Release(cidr *net.IPNet) error {
begin, end, err := s.getBeginingAndEndIndices(cidr)
if err != nil {
return err
}
s.Lock()
defer s.Unlock()
for i := begin; i <= end; i++ {
s.used.SetBit(&s.used, i, 0)
}
return nil
} So I try use map to record how many nodes use the cidr, when delete a node // Release releases the given CIDR range.
func (s *CidrSet) Release(cidr *net.IPNet) error {
begin, end, err := s.getBeginingAndEndIndices(cidr)
if err != nil {
return err
}
s.Lock()
defer s.Unlock()
for i := begin; i <= end; i++ {
s.used.SetBit(&s.used, i, 0)
if s.used[i] > 0 {
s.used[i]--
}
}
return nil
} |
If Occupy() return error, NodeIPAMController will not start, we don't want this. |
The short answer is that this field was not designed to be changed. The longer answer is that it COULD be (re)designed to be changeable, but as you point out that would require to change the bitmap-based allocator to something more sophisticated. AND it would have to be done in a way that we can upgrade existing clusters with existing allocations. I won't say that we would not take such a change, but it would need to be KEP'ed and very carefully tested with regards to in-place upgrades. |
I am going to re-title this and declare it a feature request with relatively low prio. If someone wants to step up to implement, we can talk. |
I think expanding a CIDR (ie. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@aojea OK. Let me follow up. So I need to move on to the KEP first and then the PR. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
/unassign |
let's assign to the KEP owner then :) |
/assign @rahulkjoshi |
@aojea: GitHub didn't allow me to assign the following users: rahulkjoshi. Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I dont think the KEP covers what is being demonstrated here, which really seems like a bug (one which I hit today trying to change kube-controller-manager's default node podcidr prefix from a /24 to a /23). As far as I can tell, this issue is not looking for a way to re-ip existing nodes. It is showing that if you change the kube-controller-manager argument from /24 to /23, then when a new node registers it can be errantly given a cidr that overlaps with existing nodes which already have the previous cidr size. This is very dangerous, for obvious reasons. This happens without any attempt to re-ip anything, when a node with a /24 is deleted, it marks the whole subnet (now a /23) as available and allocates it regardless of the neighboring /24. |
ah, right, sorry, I overlooked all the details. |
I don't think we want to fix the existing allocator, but instead get the
newer proposal working and it SHOULD (if I understand) cover this use
case. E.g. you would add a new config with the same base and a different
mask, and somehow that should be preferred (need to figure that out?)
…On Sun, Aug 8, 2021 at 1:33 PM Antonio Ojea ***@***.***> wrote:
I dont think the KEP covers what is being demonstrated here, which really
seems like a bug (one which I hit today trying to change
kube-controller-manager's default node podcidr prefix from a /24 to a /23).
ah, right, sorry, I overlooked all the details.
The KEP allows to use different sizes but it keeps the limitation about
not able to change the size of the assigned CIDR.
It is a known limitation, maybe should be better documented, but this
field was close to be deprecated too #57130
<#57130> 🤷
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#90922 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVFP5CKBUCAVHL6F75TT33S2TANCNFSM4M4TGRFA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Yes Tim is right. You would delete the old config -- the finalizer preserves the config until the last Node using that range goes away, but it becomes un-allocatable. Then you add your new config and restart all the nodes to use the new cidr size. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale The PR mentioned above should handle the use-case where the flag value gets changed. Code is targeted to be merged in 1.24 |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I write some test code
The text was updated successfully, but these errors were encountered: