Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Allocate method to reuse lease if present #792

Merged
merged 1 commit into from
Jan 10, 2023

Conversation

EmilyShepherd
Copy link
Contributor

@EmilyShepherd EmilyShepherd commented Dec 5, 2022

Previously, the Allocate method of the daemon always created a new Lease object. However, as both the CNI ADD and CHECK commands call Allocate, and CHECK can be called multiple times, this resulted in multiple Lease objects being created per pod.

Each of these leases was long lived with its own maintain() loop - however the daemon only kept track of the most recent one, meaning any old lease objects remained running forever (and held open their NetNS files). After a long enough period, this resulted in the system crashing out with "too many files" or a similar error limits-related error.

This commit updates the behaviour of Allocate() to first check if a Lease already exists for the given clientID. If none is found, one is created as before. If a Lease is found, a new Check() mechanism is called, which simply wakes up the maintain() loop to cause it to check the status of the lease.

This may fix #329.

Signed-off-by: Emily Shepherd emily@redcoat.dev

@EmilyShepherd

This comment was marked as resolved.

EmilyShepherd added a commit to EmilyShepherd/kiOS that referenced this pull request Dec 10, 2022
This leak was causing failures and crashes on the system. This has also
been submitted as a [patch upstream][upstream] however as that PR is
still pending, we will add it here as it is required to ensure the
system remains functional.

[upstream]: containernetworking/plugins#792
@EmilyShepherd
Copy link
Contributor Author

Bump @dcbw 🙏

@mccv1r0
Copy link
Member

mccv1r0 commented Jan 4, 2023

Looks good. I tested the PR via cnitool using bridge plugin and isc-dhcpd-server

2023/01/04 14:42:21 cnitool-1d56652e125f8a333e30/bridge-dhcp0-chain/ens0: lease acquired, expiration is 2023-01-04 14:45:41.855778598 -0500 EST m=+1489.727401399
2023/01/04 14:42:26 cnitool-1d56652e125f8a333e30/bridge-dhcp0-chain/ens0: Checking lease
2023/01/04 14:42:29 cnitool-1d56652e125f8a333e30/bridge-dhcp0-chain/ens0: Checking lease
2023/01/04 14:44:01 cnitool-1d56652e125f8a333e30/bridge-dhcp0-chain/ens0: renewing lease
2023/01/04 14:44:01 cnitool-1d56652e125f8a333e30/bridge-dhcp0-chain/ens0: lease renewed, expiration is 2023-01-04 14:47:21.883450554 -0500 EST m=+1589.755073355
2023/01/04 14:45:41 cnitool-1d56652e125f8a333e30/bridge-dhcp0-chain/ens0: renewing lease
2023/01/04 14:45:41 cnitool-1d56652e125f8a333e30/bridge-dhcp0-chain/ens0: lease renewed, expiration is 2023-01-04 14:49:01.990540685 -0500 EST m=+1689.862163459
2023/01/04 14:45:42 cnitool-1d56652e125f8a333e30/bridge-dhcp0-chain/ens0: Checking lease
2023/01/04 14:45:54 cnitool-1d56652e125f8a333e30/bridge-dhcp0-chain/ens0: Checking lease

@@ -67,6 +67,7 @@ type DHCPLease struct {
broadcast bool
stopping uint32
stop chan struct{}
check chan bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's convention to use the type chan struct{} to indicate that a channel carries no information.

@squeed
Copy link
Member

squeed commented Jan 9, 2023

Looks great! One minor nit, then we can get this merged.

squeed referenced this pull request Jan 9, 2023
Currently, hostname is set in the original DHCPREQUEST but not the
renewal. With some DHCP server implementations (such as FreeBSD dhcpd),
this leads to the hostname being cleared in the lease table.

This behavior is inconsistent with other DHCP clients such as dhclient
which set the hostname on the renewal request as well. To fix, use the
same options for acquire and renew.

This is compatible with RFC 2131 (see table 5).

Signed-off-by: Akhil Velagapudi <4@4khil.com>
Previously, the Allocate method of the daemon always created a new Lease
object. However, as both the CNI ADD and CHECK commands call Allocate,
and CHECK can be called multiple times, this resulted in multiple Lease
objects being created per pod.

Each of these leases was long lived with its own maintain() loop -
however the daemon only kept track of the most recent one, meaning any
old lease objects remained running forever (and held open their NetNS
files). After a long enough period, this resulted in the system crashing
out with "too many files" or a similar error limits-related error.

This commit updates the behaviour of Allocate() to first check if a
Lease already exists for the given clientID. If none is found, one is
created as before. If a Lease is found, a new Check() mechanism is
called, which simply wakes up the maintain() loop to cause it to check
the status of the lease.

This may fix containernetworking#329.

Signed-off-by: Emily Shepherd <emily@redcoat.dev>
@squeed
Copy link
Member

squeed commented Jan 10, 2023

Thanks!

@squeed squeed merged commit bf9c258 into containernetworking:main Jan 10, 2023
@EmilyShepherd EmilyShepherd deleted the check-dhcp branch January 10, 2023 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DHCP plugin - "error calling DHCP.Allocate: no more tries" error
3 participants