Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup support for physical_processor_count #1035

Closed
wants to merge 2 commits into from

Conversation

usiegl00
Copy link
Contributor

As rails depends on physical_processor_count to spin up the right number of puma workers, rails will fall over in a containerized environment where cgroups are used to limit cpu time. This pr is an attempt to accurately reflect the available cpu cores when the ruby process is running inside a cgroup.

Copy link
Collaborator

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.
Did you test it works as expected?

Comment on lines 39 to 40
if Dir.exist?("/sys/fs/cgroup/cpu,cpuacct") && (cfs_quota_us = IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us").to_i) > 0
(cfs_quota_us / IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us").to_i.to_f).ceil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some documentation about these cgroups files?
It'd be nice to add a link as a code comment here.

Comment on lines 39 to 40
if Dir.exist?("/sys/fs/cgroup/cpu,cpuacct") && (cfs_quota_us = IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us").to_i) > 0
(cfs_quota_us / IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us").to_i.to_f).ceil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.to_i.to_f can probably be just .to_f

elsif ln.start_with?("core")
cid = phy + ":" + ln[/\d+/]
cores[cid] = true if not cores[cid]
if Dir.exist?("/sys/fs/cgroup/cpu,cpuacct") && (cfs_quota_us = IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us").to_i) > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use File.read instead of IO.read? It seems clearer (same on the next line).

@usiegl00
Copy link
Contributor Author

It works as expected inside a container limited to 3 cores running in a host machine that has over 10:

irb(main):003:0> Concurrent.physical_processor_count
=> 3

cid = phy + ":" + ln[/\d+/]
cores[cid] = true if not cores[cid]
# https://kernel.googlesource.com/pub/scm/linux/kernel/git/glommer/memcg/+/cpu_stat/Documentation/cgroups/cpu.txt
if Dir.exist?("/sys/fs/cgroup/cpu,cpuacct") && (cfs_quota_us = File.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us").to_i) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we shouldn't change the behavior of physical_processor_count. We should instead have a cpu_quota method that returns a float, and then eventually a usable_processor_count which returns cpu_quota * processor_count.

This way it's up to the caller to decide if they want to floor/round/ceil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think what you implemented here is cgroups V1? Not sure how much it's used these days, in cgroups V2 the info is in /sys/fs/cgroup/cpu.max.

casperisfine pushed a commit to casperisfine/concurrent-ruby that referenced this pull request Jan 29, 2024
Closes: ruby-concurrency#1035

A running gag since the introduction of containerization is software
that starts one process per logical or physical core while running
inside a container with a restricted CPU quota and totally blowing
up memory usage in containerized environments.

The proper question to ask is how many CPU cores are usable, not how
many the machine has. To do that we have to read the cgroup info
from `/sys`. There is two way of doing it depending on the version
of cgroups used.

Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>
@casperisfine
Copy link

I took the liberty to push this a bit further in #1038 with cgroups v2 support etc.

casperisfine pushed a commit to casperisfine/concurrent-ruby that referenced this pull request Jan 29, 2024
Closes: ruby-concurrency#1035

A running gag since the introduction of containerization is software
that starts one process per logical or physical core while running
inside a container with a restricted CPU quota and totally blowing
up memory usage in containerized environments.

The proper question to ask is how many CPU cores are usable, not how
many the machine has. To do that we have to read the cgroup info
from `/sys`. There is two way of doing it depending on the version
of cgroups used.

Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>
casperisfine pushed a commit to casperisfine/concurrent-ruby that referenced this pull request Jan 29, 2024
Closes: ruby-concurrency#1035

A running gag since the introduction of containerization is software
that starts one process per logical or physical core while running
inside a container with a restricted CPU quota and totally blowing
up memory usage in containerized environments.

The proper question to ask is how many CPU cores are usable, not how
many the machine has. To do that we have to read the cgroup info
from `/sys`. There is two way of doing it depending on the version
of cgroups used.

Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>
casperisfine pushed a commit to casperisfine/concurrent-ruby that referenced this pull request Jan 29, 2024
Closes: ruby-concurrency#1035

A running gag since the introduction of containerization is software
that starts one process per logical or physical core while running
inside a container with a restricted CPU quota and totally blowing
up memory usage in containerized environments.

The proper question to ask is how many CPU cores are usable, not how
many the machine has. To do that we have to read the cgroup info
from `/sys`. There is two way of doing it depending on the version
of cgroups used.

Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>
@eregon
Copy link
Collaborator

eregon commented Jan 29, 2024

Thank you for the PR, let's continue this on #1038

@eregon eregon closed this Jan 29, 2024
casperisfine pushed a commit to casperisfine/concurrent-ruby that referenced this pull request Feb 1, 2024
Closes: ruby-concurrency#1035

A running gag since the introduction of containerization is software
that starts one process per logical or physical core while running
inside a container with a restricted CPU quota and totally blowing
up memory usage in containerized environments.

The proper question to ask is how many CPU cores are usable, not how
many the machine has. To do that we have to read the cgroup info
from `/sys`. There is two way of doing it depending on the version
of cgroups used.

Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>
casperisfine pushed a commit to casperisfine/concurrent-ruby that referenced this pull request Feb 1, 2024
Closes: ruby-concurrency#1035

A running gag since the introduction of containerization is software
that starts one process per logical or physical core while running
inside a container with a restricted CPU quota and totally blowing
up memory usage in containerized environments.

The proper question to ask is how many CPU cores are usable, not how
many the machine has. To do that we have to read the cgroup info
from `/sys`. There is two way of doing it depending on the version
of cgroups used.

Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>
eregon pushed a commit that referenced this pull request Feb 1, 2024
Closes: #1035

A running gag since the introduction of containerization is software
that starts one process per logical or physical core while running
inside a container with a restricted CPU quota and totally blowing
up memory usage in containerized environments.

The proper question to ask is how many CPU cores are usable, not how
many the machine has. To do that we have to read the cgroup info
from `/sys`. There is two way of doing it depending on the version
of cgroups used.

Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants