support variable sized pod-CIDRs for different-sized node (was: Allow --node-cidr-mask-size to change: make NodeIPAM controller support variable sized CIDRs) #90922

liuxu623 · 2020-05-09T02:34:32Z

cluster cidr is 192.168.0.0/16
set --node-cidr-mask-size 24
add two nodes podCIDR will be 192.168.0.0/24 and 192.168.1.0/24
set --node-cidr-mask-size 23 and restart controller-manager
delete node 192.168.1.0/24
add a node pod CIDR will be 192.168.0.0/23
192.168.0.0/24 and 192.168.0.0/23 overlap

I write some test code

func TestAllocate(t *testing.T) {
	_, clusterCIDR, _ := net.ParseCIDR("192.168.0.0/16")
	cs, _ := NewCIDRSet(clusterCIDR, 23)
	_, podCIDR, _ := net.ParseCIDR("192.168.0.0/24")
	cs.Occupy(podCIDR)
	_, podCIDR, _ = net.ParseCIDR("192.168.0.1/24")
	cs.Occupy(podCIDR)
	cs.Release(podCIDR)
	p,_:=cs.AllocateNext()
	fmt.Println(p.String())
}

The text was updated successfully, but these errors were encountered:

liuxu623 · 2020-05-09T02:35:47Z

/sig network

andrewsykim · 2020-05-09T02:45:14Z

It would be nice if node ipam can handle cases where either cluster-cidr or node-cidr-mask-size changes but generally I don't think changing them is supported.

athenabot · 2020-05-09T03:12:13Z

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by vllry. 👩‍🔬

aojea · 2020-05-12T12:04:35Z

cluster cidr is 192.168.0.0/16

set --node-cidr-mask-size 24

add two nodes podCIDR will be 192.168.0.0/24 and 192.168.1.0/24

set --node-cidr-mask-size 23 and restart controller-manager

delete node 192.168.1.0/24

add a node pod CIDR will be 192.168.0.0/23

192.168.0.0/24 and 192.168.0.0/23 overlap

This has some consequences that I don't think Kubernetes can handle right now because it will imply that you have to delete all the nodes that overlap with the new mask.
In your example, if you don't delete one of the nodes or you reasign all the masks in all nodes, you can't increase the mask, otherwise, the new masks in one node will overlap with the masks of the other nodes.

liuxu623 · 2020-05-12T12:30:07Z

cluster cidr is 192.168.0.0/16

set --node-cidr-mask-size 24

add two nodes podCIDR will be 192.168.0.0/24 and 192.168.1.0/24

set --node-cidr-mask-size 23 and restart controller-manager

delete node 192.168.1.0/24

add a node pod CIDR will be 192.168.0.0/23

192.168.0.0/24 and 192.168.0.0/23 overlap

This has some consequences that I don't think Kubernetes can handle right now because it will imply that you have to delete all the nodes that overlap with the new mask.
In your example, if you don't delete one of the nodes or you reasign all the masks in all nodes, you can't increase the mask, otherwise, the new masks in one node will overlap with the masks of the other nodes.

The problem is we use bitmap record the cidr is used or not now, but if we increase --node-cidr-mask-size, maybe more than one node use the cidr, so we need to know how many nodes use the cidr, if is zero we can use the cidr.

So I try use a map record how many nodes use the cidr, you can see #90926 and help me review the code, thank you.

aojea · 2020-05-12T13:50:02Z

The problem is we use bitmap record the cidr is used or not now, but if we increase --node-cidr-mask-size, maybe more than one node use the cidr, so we need to know how many nodes use the cidr, if is zero we can use the cidr.

ahhh ok, then you don't have to modify the bitmap, you just need to modify the Occupy() method to check if any of the bits are set between the begin and end returned by begin, end, err := s.getBeginingAndEndIndices(cidr)

	for i := begin; i <= end; i++ {
		if s.used.Bit(i) {
                     return fmt.Errorf("error some of the CIDRs are already allocated")
                 }
	}

kubernetes/pkg/controller/nodeipam/ipam/cidrset/cidr_set.go

Lines 224 to 239 in 7094849

    
           // Occupy marks the given CIDR range as used. Occupy does not check if the CIDR 
        
           // range was previously used. 
        
           func (s *CidrSet) Occupy(cidr *net.IPNet) (err error) { 
        
           	begin, end, err := s.getBeginingAndEndIndices(cidr) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	s.Lock() 
        
           	defer s.Unlock() 
        
           	for i := begin; i <= end; i++ { 
        
           		s.used.SetBit(&s.used, i, 1) 
        
           	} 
        
           	return nil 
        
           }

The comment on the method explicitly says Occupy does not check if the CIDR // range was previously used. but I don't know the context. maybe @bowei knows better.

I still have doubts that node-mask resizing can be supported, I think that the AllocateNext() method will not work after that 🤔 , or may cause conflicting assignments

liuxu623 · 2020-05-13T01:42:59Z

I still have doubts that node-mask resizing can be supported, I think that the AllocateNext() method will not work after that 🤔 , or may cause conflicting assignments

The problem is node 192.168.0.0/24 and 192.168.1.0/24 all use 192.168.0.0/23，only 192.168.0.0/24 and 192.168.1.0/24 all deleted then we can release 192.68.0.0/23, but we use bitmap now, it can only record the cidr use or not, it can't record how many nodes use the cidr, delete a node 192.168.0.0/24 or 192.168.1.0/24 all will release 192.68.0.0/23.

// Release releases the given CIDR range.
func (s *CidrSet) Release(cidr *net.IPNet) error {
	begin, end, err := s.getBeginingAndEndIndices(cidr)
	if err != nil {
		return err
	}
	s.Lock()
	defer s.Unlock()
	for i := begin; i <= end; i++ {
		s.used.SetBit(&s.used, i, 0)
	}
	return nil
}

So I try use map to record how many nodes use the cidr, when delete a node s.used[i]--，AllocateNext() will not use the cidr if s.used[i] > 0, AllocateNext() will work normal.

// Release releases the given CIDR range.
func (s *CidrSet) Release(cidr *net.IPNet) error {
	begin, end, err := s.getBeginingAndEndIndices(cidr)
	if err != nil {
		return err
	}
	s.Lock()
	defer s.Unlock()
	for i := begin; i <= end; i++ {
		s.used.SetBit(&s.used, i, 0)
		if s.used[i] > 0 {
			s.used[i]--
		}
	}
	return nil
}

liuxu623 · 2020-05-13T01:54:38Z

ahhh ok, then you don't have to modify the bitmap, you just need to modify the Occupy() method to check if any of the bits are set between the begin and end returned by begin, end, err := s.getBeginingAndEndIndices(cidr)

If Occupy() return error, NodeIPAMController will not start, we don't want this.

aojea · 2020-05-13T10:26:24Z

Thanks for the explanation @liuxu623 , it makes more sense to me now.
However, this seems an important change on the behavior of the IPAM and usually a KEP is required.
Let's ask Tim what will be the best way to proceed

/assign @thockin

thockin · 2020-05-14T22:25:35Z

The short answer is that this field was not designed to be changed.

The longer answer is that it COULD be (re)designed to be changeable, but as you point out that would require to change the bitmap-based allocator to something more sophisticated. AND it would have to be done in a way that we can upgrade existing clusters with existing allocations.

I won't say that we would not take such a change, but it would need to be KEP'ed and very carefully tested with regards to in-place upgrades.

thockin · 2020-05-14T22:26:29Z

I am going to re-title this and declare it a feature request with relatively low prio. If someone wants to step up to implement, we can talk.

liuxu623 · 2020-05-26T07:46:18Z

kubernetes/enhancements#1811

fl-max · 2020-06-24T18:11:14Z

I think expanding a CIDR (ie. /24 to /23) block is much more tedious than shrinking a CIDR (ie. /24 to /25). Assuming this is true, maybe it would be better to split this into two features; "Expand CIDR mask size" and "Shrink CIDR mask size". I believe the latter would be much simpler to implement.

fejta-bot · 2020-09-22T18:55:31Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

pacoxu · 2021-03-02T08:25:12Z

@aojea OK. Let me follow up.
/assign

So I need to move on to the KEP first and then the PR.
I think I need to use the new KEP template and continue the PR kubernetes/enhancements#1811 here.

fejta-bot · 2021-05-31T08:41:03Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

aojea · 2021-05-31T22:45:10Z

/remove-lifecycle stale
There is a KEP in progress kubernetes/enhancements#2593

pacoxu · 2021-06-07T10:06:25Z

/unassign
since another KEP is in process.

aojea · 2021-06-07T10:28:58Z

let's assign to the KEP owner then :)
/assign @rahulkjoshi

aojea · 2021-06-07T10:29:33Z

/assign @rahulkjoshi

k8s-ci-robot · 2021-06-07T10:29:34Z

@aojea: GitHub didn't allow me to assign the following users: rahulkjoshi.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @rahulkjoshi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bnu0 · 2021-08-06T23:45:46Z

I dont think the KEP covers what is being demonstrated here, which really seems like a bug (one which I hit today trying to change kube-controller-manager's default node podcidr prefix from a /24 to a /23).

As far as I can tell, this issue is not looking for a way to re-ip existing nodes. It is showing that if you change the kube-controller-manager argument from /24 to /23, then when a new node registers it can be errantly given a cidr that overlaps with existing nodes which already have the previous cidr size. This is very dangerous, for obvious reasons.

This happens without any attempt to re-ip anything, when a node with a /24 is deleted, it marks the whole subnet (now a /23) as available and allocates it regardless of the neighboring /24.

aojea · 2021-08-08T20:33:32Z

I dont think the KEP covers what is being demonstrated here, which really seems like a bug (one which I hit today trying to change kube-controller-manager's default node podcidr prefix from a /24 to a /23).

ah, right, sorry, I overlooked all the details.
The KEP allows to use different sizes but it keeps the limitation about not able to change the size of the assigned CIDR.
It is a known limitation, maybe should be better documented, but this field was close to be deprecated too #57130 🤷

thockin · 2021-08-21T19:22:00Z

I don't think we want to fix the existing allocator, but instead get the newer proposal working and it SHOULD (if I understand) cover this use case. E.g. you would add a new config with the same base and a different mask, and somehow that should be preferred (need to figure that out?)

…

On Sun, Aug 8, 2021 at 1:33 PM Antonio Ojea ***@***.***> wrote: I dont think the KEP covers what is being demonstrated here, which really seems like a bug (one which I hit today trying to change kube-controller-manager's default node podcidr prefix from a /24 to a /23). ah, right, sorry, I overlooked all the details. The KEP allows to use different sizes but it keeps the limitation about not able to change the size of the assigned CIDR. It is a known limitation, maybe should be better documented, but this field was close to be deprecated too #57130 <#57130> 🤷 — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#90922 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVFP5CKBUCAVHL6F75TT33S2TANCNFSM4M4TGRFA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

rahulkjoshi · 2021-08-22T02:47:00Z

Yes Tim is right. You would delete the old config -- the finalizer preserves the config until the last Node using that range goes away, but it becomes un-allocatable. Then you add your new config and restart all the nodes to use the new cidr size.

k8s-triage-robot · 2021-11-20T03:08:40Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rahulkjoshi · 2021-11-21T22:51:22Z

/remove-lifecycle stale

The PR mentioned above should handle the use-case where the flag value gets changed. Code is targeted to be merged in 1.24

k8s-triage-robot · 2022-02-19T23:40:08Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-03-22T00:16:12Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-04-21T00:26:54Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-04-21T00:27:12Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liuxu623 added the kind/bug Categorizes issue or PR as related to a bug. label May 9, 2020

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 9, 2020

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 9, 2020

k8s-ci-robot added the triage/unresolved Indicates an issue that can not or will not be resolved. label May 9, 2020

liuxu623 mentioned this issue May 9, 2020

Fix NodeIPAM allocate overlap PodCIDR #90926

Closed

k8s-ci-robot assigned thockin May 13, 2020

thockin changed the title ~~NodeIPAM allocate overlap PodCIDR when increase --node-cidr-mask-size~~ Allow --node-cidr-mask-size to change: make NodeIPAM controller support variable sized CIDRs May 14, 2020

liuxu623 mentioned this issue May 26, 2020

[WIP] Make NodeIPAM controller support varialble sized CIDRs #91440

Closed

thockin removed their assignment May 29, 2020

aramase mentioned this issue Jun 23, 2020

Updating --node-cidr-mask-size value ignored #92431

Closed

Arvinderpal mentioned this issue Jun 24, 2020

Dual-Stack support for Kubeadm kubernetes/kubeadm#1612

Closed

14 tasks

k8s-ci-robot assigned pacoxu Mar 2, 2021

This was referenced Apr 26, 2021

make NodeIPAM controller support variable sized CIDRs kubernetes/enhancements#2647

Closed

make NodeIPAM controller support varialble sized CIDRs kubernetes/enhancements#2648

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2021

thockin changed the title ~~Allow --node-cidr-mask-size to change: make NodeIPAM controller support variable sized CIDRs~~ support variable sized pod-CIDRs for different-sized node (was: Allow --node-cidr-mask-size to change: make NodeIPAM controller support variable sized CIDRs) Jun 4, 2021

k8s-ci-robot unassigned pacoxu Jun 7, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 20, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 21, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 22, 2022

k8s-ci-robot closed this as completed Apr 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support variable sized pod-CIDRs for different-sized node (was: Allow --node-cidr-mask-size to change: make NodeIPAM controller support variable sized CIDRs) #90922

support variable sized pod-CIDRs for different-sized node (was: Allow --node-cidr-mask-size to change: make NodeIPAM controller support variable sized CIDRs) #90922

liuxu623 commented May 9, 2020 •

edited

liuxu623 commented May 9, 2020

andrewsykim commented May 9, 2020

athenabot commented May 9, 2020

aojea commented May 12, 2020

liuxu623 commented May 12, 2020 •

edited

aojea commented May 12, 2020 •

edited

liuxu623 commented May 13, 2020 •

edited

liuxu623 commented May 13, 2020

aojea commented May 13, 2020

thockin commented May 14, 2020

thockin commented May 14, 2020

liuxu623 commented May 26, 2020 •

edited

fl-max commented Jun 24, 2020

fejta-bot commented Sep 22, 2020

pacoxu commented Mar 2, 2021 •

edited

fejta-bot commented May 31, 2021

aojea commented May 31, 2021

pacoxu commented Jun 7, 2021

aojea commented Jun 7, 2021

aojea commented Jun 7, 2021

k8s-ci-robot commented Jun 7, 2021

bnu0 commented Aug 6, 2021 •

edited

aojea commented Aug 8, 2021

thockin commented Aug 21, 2021 via email

rahulkjoshi commented Aug 22, 2021

k8s-triage-robot commented Nov 20, 2021

rahulkjoshi commented Nov 21, 2021

k8s-triage-robot commented Feb 19, 2022

k8s-triage-robot commented Mar 22, 2022

k8s-triage-robot commented Apr 21, 2022

k8s-ci-robot commented Apr 21, 2022

support variable sized pod-CIDRs for different-sized node (was: Allow --node-cidr-mask-size to change: make NodeIPAM controller support variable sized CIDRs) #90922

support variable sized pod-CIDRs for different-sized node (was: Allow --node-cidr-mask-size to change: make NodeIPAM controller support variable sized CIDRs) #90922

Comments

liuxu623 commented May 9, 2020 • edited

liuxu623 commented May 9, 2020

andrewsykim commented May 9, 2020

athenabot commented May 9, 2020

aojea commented May 12, 2020

liuxu623 commented May 12, 2020 • edited

aojea commented May 12, 2020 • edited

liuxu623 commented May 13, 2020 • edited

liuxu623 commented May 13, 2020

aojea commented May 13, 2020

thockin commented May 14, 2020

thockin commented May 14, 2020

liuxu623 commented May 26, 2020 • edited

fl-max commented Jun 24, 2020

fejta-bot commented Sep 22, 2020

pacoxu commented Mar 2, 2021 • edited

fejta-bot commented May 31, 2021

aojea commented May 31, 2021

pacoxu commented Jun 7, 2021

aojea commented Jun 7, 2021

aojea commented Jun 7, 2021

k8s-ci-robot commented Jun 7, 2021

bnu0 commented Aug 6, 2021 • edited

aojea commented Aug 8, 2021

thockin commented Aug 21, 2021 via email

rahulkjoshi commented Aug 22, 2021

k8s-triage-robot commented Nov 20, 2021

rahulkjoshi commented Nov 21, 2021

k8s-triage-robot commented Feb 19, 2022

k8s-triage-robot commented Mar 22, 2022

k8s-triage-robot commented Apr 21, 2022

k8s-ci-robot commented Apr 21, 2022

liuxu623 commented May 9, 2020 •

edited

liuxu623 commented May 12, 2020 •

edited

aojea commented May 12, 2020 •

edited

liuxu623 commented May 13, 2020 •

edited

liuxu623 commented May 26, 2020 •

edited

pacoxu commented Mar 2, 2021 •

edited

bnu0 commented Aug 6, 2021 •

edited