cgroup support for physical_processor_count #1035

usiegl00 · 2024-01-21T02:02:32Z

As rails depends on physical_processor_count to spin up the right number of puma workers, rails will fall over in a containerized environment where cgroups are used to limit cpu time. This pr is an attempt to accurately reflect the available cpu cores when the ruby process is running inside a cgroup.

eregon

This looks good.
Did you test it works as expected?

eregon · 2024-01-22T09:39:50Z

lib/concurrent-ruby/concurrent/utility/processor_counter.rb

+                if Dir.exist?("/sys/fs/cgroup/cpu,cpuacct") && (cfs_quota_us = IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us").to_i) > 0
+                  (cfs_quota_us / IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us").to_i.to_f).ceil


Is there some documentation about these cgroups files?
It'd be nice to add a link as a code comment here.

eregon · 2024-01-22T09:40:27Z

lib/concurrent-ruby/concurrent/utility/processor_counter.rb

+                if Dir.exist?("/sys/fs/cgroup/cpu,cpuacct") && (cfs_quota_us = IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us").to_i) > 0
+                  (cfs_quota_us / IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us").to_i.to_f).ceil


.to_i.to_f can probably be just .to_f

eregon · 2024-01-22T09:41:05Z

lib/concurrent-ruby/concurrent/utility/processor_counter.rb

-                  elsif ln.start_with?("core")
-                    cid        = phy + ":" + ln[/\d+/]
-                    cores[cid] = true if not cores[cid]
+                if Dir.exist?("/sys/fs/cgroup/cpu,cpuacct") && (cfs_quota_us = IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us").to_i) > 0


Could you use File.read instead of IO.read? It seems clearer (same on the next line).

usiegl00 · 2024-01-23T00:06:54Z

It works as expected inside a container limited to 3 cores running in a host machine that has over 10:

irb(main):003:0> Concurrent.physical_processor_count
=> 3

byroot · 2024-01-29T14:56:10Z

lib/concurrent-ruby/concurrent/utility/processor_counter.rb

-                    cid        = phy + ":" + ln[/\d+/]
-                    cores[cid] = true if not cores[cid]
+                # https://kernel.googlesource.com/pub/scm/linux/kernel/git/glommer/memcg/+/cpu_stat/Documentation/cgroups/cpu.txt
+                if Dir.exist?("/sys/fs/cgroup/cpu,cpuacct") && (cfs_quota_us = File.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us").to_i) > 0


IMO we shouldn't change the behavior of physical_processor_count. We should instead have a cpu_quota method that returns a float, and then eventually a usable_processor_count which returns cpu_quota * processor_count.

This way it's up to the caller to decide if they want to floor/round/ceil

Also I think what you implemented here is cgroups V1? Not sure how much it's used these days, in cgroups V2 the info is in /sys/fs/cgroup/cpu.max.

Closes: ruby-concurrency#1035 A running gag since the introduction of containerization is software that starts one process per logical or physical core while running inside a container with a restricted CPU quota and totally blowing up memory usage in containerized environments. The proper question to ask is how many CPU cores are usable, not how many the machine has. To do that we have to read the cgroup info from `/sys`. There is two way of doing it depending on the version of cgroups used. Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>

casperisfine · 2024-01-29T15:55:58Z

I took the liberty to push this a bit further in #1038 with cgroups v2 support etc.

Closes: ruby-concurrency#1035 A running gag since the introduction of containerization is software that starts one process per logical or physical core while running inside a container with a restricted CPU quota and totally blowing up memory usage in containerized environments. The proper question to ask is how many CPU cores are usable, not how many the machine has. To do that we have to read the cgroup info from `/sys`. There is two way of doing it depending on the version of cgroups used. Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>

eregon · 2024-01-29T19:05:35Z

Thank you for the PR, let's continue this on #1038

Closes: ruby-concurrency#1035 A running gag since the introduction of containerization is software that starts one process per logical or physical core while running inside a container with a restricted CPU quota and totally blowing up memory usage in containerized environments. The proper question to ask is how many CPU cores are usable, not how many the machine has. To do that we have to read the cgroup info from `/sys`. There is two way of doing it depending on the version of cgroups used. Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>

Closes: #1035 A running gag since the introduction of containerization is software that starts one process per logical or physical core while running inside a container with a restricted CPU quota and totally blowing up memory usage in containerized environments. The proper question to ask is how many CPU cores are usable, not how many the machine has. To do that we have to read the cgroup info from `/sys`. There is two way of doing it depending on the version of cgroups used. Co-Authored-By: usiegl00 <50933431+usiegl00@users.noreply.github.com>

cgroup support for physical_processor_count

4597eca

eregon reviewed Jan 22, 2024

View reviewed changes

add reference link for cgroup support for physical_processor_count

5296b55

byroot reviewed Jan 29, 2024

View reviewed changes

casperisfine mentioned this pull request Jan 29, 2024

Add Concurrent.usable_processor_count that is cgroups aware #1038

Merged

eregon closed this Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgroup support for physical_processor_count #1035

cgroup support for physical_processor_count #1035

usiegl00 commented Jan 21, 2024

eregon left a comment

eregon Jan 22, 2024

eregon Jan 22, 2024

eregon Jan 22, 2024

usiegl00 commented Jan 23, 2024

byroot Jan 29, 2024

byroot Jan 29, 2024

casperisfine commented Jan 29, 2024

eregon commented Jan 29, 2024

		if Dir.exist?("/sys/fs/cgroup/cpu,cpuacct") && (cfs_quota_us = IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us").to_i) > 0
		(cfs_quota_us / IO.read("/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us").to_i.to_f).ceil

cgroup support for physical_processor_count #1035

cgroup support for physical_processor_count #1035

Conversation

usiegl00 commented Jan 21, 2024

eregon left a comment

Choose a reason for hiding this comment

eregon Jan 22, 2024

Choose a reason for hiding this comment

eregon Jan 22, 2024

Choose a reason for hiding this comment

eregon Jan 22, 2024

Choose a reason for hiding this comment

usiegl00 commented Jan 23, 2024

byroot Jan 29, 2024

Choose a reason for hiding this comment

byroot Jan 29, 2024

Choose a reason for hiding this comment

casperisfine commented Jan 29, 2024

eregon commented Jan 29, 2024