Number of CPUs does not take into account #4341

asfaltboy · 2023-10-08T19:52:23Z

My company deploys Prisma on kubernetes (managed k8s, aka GKE, in Google cloud), and have observed an issue with the default conenction_limit being set by Prisma to be higher than expected, and also that this seems dependent on our underlying kubernetes node type.

We've noticed that the current method of using num_cpus::get_physical() when computing the default connection limit does not consider cgroups. I was wondering whether it's not better to use the num_cpus::get() method to detect the actual CPU cores available in the execution environment, it seems to take care of parsing available cgroup v1 and v2 .

As in the docs:

This will also check cgroups, frequently used in containers to constrain CPU usage.

We did notice that Prisma mention only "physical cores" in the docs, and I wonder if this is intentional.

For what it's worth, I imagine that using (already opened) connections from a DB client is probably more of an I/O heavy operation, and may not introduce significant CPU load on the physical host, at least in most common cases (perhaps some benchmark profiling could prove me wrong). That said, having the option to automatically default to a smaller pool size when a container constrained, can be more in-line with the expected behavior.

Refs:

prisma-engines/quaint/src/pooled.rs

Line 191 in 5106138

let connection_limit = num_cpus::get_physical() * 2 + 1;
https://docs.rs/num_cpus/latest/num_cpus/fn.get.html
Add cgroups support for Linux targets by seanmonstar · Pull Request #96 · seanmonstar/num_cpus · GitHub
https://kubernetes.io/docs/concepts/architecture/cgroups/
https://man7.org/linux/man-pages/man7/cgroups.7.html
https://docs.kernel.org/admin-guide/cgroup-v1/cgroups.html
https://docs.kernel.org/admin-guide/cgroup-v2.html

The text was updated successfully, but these errors were encountered:

janpio · 2023-10-08T21:25:10Z

have observed an issue with the default conenction_limit being set by Prisma to be higher than expected

Is the "higher than expected" limit causing any problems? Or is it just "higher than could have anticipated when understanding what kind of cores these are"?

asfaltboy · 2023-10-09T08:03:49Z

Good point, I should have expanded on the impact, let me try:

In our k8s setup is that we run a bunch of small (/micro-) services in "pods" with limited resources on a smaller number of nodes (VMs). So for example an 8 core (8 logical, 4 physical btw) node can at any moment run 32 pods, each is allowed to use a small portion of CPU and memory as needed for the workload.

We start off by defaulting services to a relatively limited CPU / Memory resources (say 1 core at most), and we'd expect them to default to e.g 3 connections (however, they default to 9 = 4*2+1). We also start off with modest DB resources, such that the DB configuration sets max_connections to a small number. This misalignment causes us to need to adjust the pool/db configuration for many of the default workloads, as the pool size is too large for the default connection instance.

We can of course adjust the pool config in our default service, but I guess that's the point: the automatic value is meant to represent a sane default, which in our case will not be based on the workload but rather on the hosting node in the containerized environment.

I hope that makes sense, and I've not yammered your ears off :)

janpio · 2023-10-09T08:29:06Z

Ok, so you are basically saying that using logical instead of physical CPUs would be a more convenient default that will probably be more correct in more cases?

asfaltboy · 2023-10-12T18:29:35Z

Thanks for summarizing it so well :) yes, that's exactly right

This is my understanding, at least. I'm by no means an expert, so I'll be happy to learn of other reasons this should not use logical cores.

janpio · 2023-10-13T19:08:20Z

That was most probably an oversight.

But to be honest, the number of cores - both logical or physical - does actually say very little about how many connections an application should be able to open. It is based on a very old misunderstanding of an even older rule about how many connections should be open on bare metal servers. But until we figure that out, it is probably better to at least use the logical cores.

Are you up to doing a PR?

aqrln · 2023-10-18T17:16:32Z

FWIW you don't need an external crate for this in modern Rust, std::thread::available_parallelism is part of standard library now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of CPUs does not take into account #4341

Number of CPUs does not take into account #4341

asfaltboy commented Oct 8, 2023 •

edited

janpio commented Oct 8, 2023

asfaltboy commented Oct 9, 2023

janpio commented Oct 9, 2023

asfaltboy commented Oct 12, 2023

janpio commented Oct 13, 2023

aqrln commented Oct 18, 2023

Number of CPUs does not take into account #4341

Number of CPUs does not take into account #4341

Comments

asfaltboy commented Oct 8, 2023 • edited

janpio commented Oct 8, 2023

asfaltboy commented Oct 9, 2023

janpio commented Oct 9, 2023

asfaltboy commented Oct 12, 2023

janpio commented Oct 13, 2023

aqrln commented Oct 18, 2023

asfaltboy commented Oct 8, 2023 •

edited