Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raid0 partition and single NVME disks that comprise that partition metrics don't match #2276

Open
vladzcloudius opened this issue Apr 30, 2024 · 4 comments
Assignees
Labels
bug Something isn't working right

Comments

@vladzcloudius
Copy link
Contributor

Installation details
Panel Name: Disk Writes/Reads
Dashboard Name: OS Metrics
Scylla-Monitoring Version: 4.7.1
Scylla-Version: 2024.1.3-0.20240401.64115ae91a55
Kernel version on all nodes: 5.15.0-1058-gcp

Description
Throughputs (bytes or OPS) of the RAID0 volume (md0 in screenshots below) is supposed to be equal to a sum of corresponding values on physical disks comprising it.
However it's far from it. In some cases, like in screenshots below, the corresponding value is even less.
In the example below md0 is a RAID0 volume assembled from 4 NVMe disks: nvme0n1,2,3,4

Here is the screenshot showing md0 and only nvme0n1 from all nodes (but the same picture is on all other disks:
image

Here you can see the values from all disks on a single node clearly showing the problem:

image

I ran iostat on one of the node trying to see if this is maybe some kernel issue but no, iostat shows values that totally add up:

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md0            323.00  99448.00     0.00   0.00    2.12   307.89    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.68   9.60
nvme0n1         86.00  24836.00     0.00   0.00    2.50   288.79    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.21   6.80
nvme0n2         75.00  23640.00     0.00   0.00    2.33   315.20    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.17   4.80
nvme0n3         80.00  24832.00     0.00   0.00    2.40   310.40    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.19   6.00
nvme0n4         82.00  26140.00     0.00   0.00    2.35   318.78    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.19   6.80
sda              0.00      0.00     0.00   0.00    0.00     0.00  128.00    736.00     3.00   2.29    0.57     5.75    0.00      0.00     0.00   0.00    0.00     0.00    0.07   1.60


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.03    0.00    1.60    0.00    0.00   96.37

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md0            430.00 149624.00     0.00   0.00    2.17   347.96    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.93   8.80
nvme0n1         92.00  33124.00     0.00   0.00    2.70   360.04    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.25   7.60
nvme0n2         96.00  33804.00     0.00   0.00    2.41   352.12    1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.23   8.40
nvme0n3         92.00  33256.00     0.00   0.00    2.41   361.48    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.22   6.80
nvme0n4        100.00  33056.00     0.00   0.00    2.19   330.56    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.22   6.80


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.37    0.00    1.31    0.00    0.00   96.32

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md0            290.00  98304.00     0.00   0.00    3.52   338.98    4.00     56.00     0.00   0.00    0.00    14.00    0.00      0.00     0.00   0.00    0.00     0.00    1.02   6.80
nvme0n1         88.00  29924.00     0.00   0.00    3.08   340.05    1.00     32.00     0.00   0.00    0.00    32.00    0.00      0.00     0.00   0.00    0.00     0.00    0.27   5.60
nvme0n2         85.00  27560.00     0.00   0.00    2.79   324.24    2.00     16.00     0.00   0.00    0.50     8.00    0.00      0.00     0.00   0.00    0.00     0.00    0.24   6.00
nvme0n3         75.00  28412.00     0.00   0.00    2.77   378.83    1.00      8.00     0.00   0.00    0.00     8.00    0.00      0.00     0.00   0.00    0.00     0.00    0.21   5.60
nvme0n4         92.00  28792.00     0.00   0.00    2.63   312.96    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.24   5.20

We saw similar behavior on multiple clusters.

@vladzcloudius vladzcloudius added the bug Something isn't working right label Apr 30, 2024
@vladzcloudius
Copy link
Contributor Author

cc @tarzanek @vreniers @mkeeneyj

@amnonh
Copy link
Collaborator

amnonh commented May 1, 2024

@vladzcloudius if I get it right, this is a node_exporter issue, right?

@vladzcloudius
Copy link
Contributor Author

@vladzcloudius if I get it right, this is a node_exporter issue, right?

Could be.

@amnonh
Copy link
Collaborator

amnonh commented May 8, 2024

@vladzcloudius could it be: prometheus/node_exporter#2310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right
Projects
None yet
Development

No branches or pull requests

2 participants