cpuinfo: fixes and enhancements #297

kolyshkin · 2020-05-18T22:35:47Z

This is a set of commits aiming to simplfy, fix and improve cpuinfo parsers.

Please review commit by commit. Below is a copy-paste of individual commit descriptions.

cpuinfo_test: fix test data

Real /proc/cpuinfo files do not contain empty lines at the start
or at the end of file.

parseCPUInfoARM: fix for kernel 3.7+

Since Linux kernel 3.8-rc1 (commit b4b8f770eb10a "ARM: kernel:
update cpuinfo to print all online CPUs features", Sep 10 2012)
the kernel does not print "Processor" as the first line of
/proc/cpuinfo. Instead, it adds a line named "model name"
with similar contents.

Fix the code for this case, and add more tests.

Fixes #294

parseCPUInfoARM: only parse Features once

The `Features` line is the same for all CPUs, and yet it is assigned to
and parsed many times.

Optimize the code so it is only parsed once and assigned in place.

parseCPUInfo: optimize line scan

Avoid double check of each line for ":" -- since we're splitting the line anyway,
we can reuse the result.

This also prevents a potential panic of referencing non-existen field[1]
in case input contains a ":" with no space after.

cpuinfo: read data in place

Instead of reading the data to the buffer, then making a reader and a
scanner out of that buffer, let's analyze the data as we read it line by
line.

parseCPUInfo*: add error checking

In case the input can't be read, scanner.Scan() stops and the code is
supposed to check for error using scanner.Err(). Do that.

In other words, instead of returning half-read cpuinfo, return an error.

cpuinfo: rm GetFirstNonEmptyLine, simplify scan

There is no sense to skip empty lines at the beginning of the file since
there are none (there were in test data but it is fixed by an earlier
commit).

Logic is simpler now.

Real /proc/cpuinfo files do not contain empty lines at the start or at the end of file. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

bluecmd · 2020-05-19T05:28:53Z

cpuinfo.go

 		case "processor":
-			cpuinfo = append(cpuinfo, commonCPUInfo) // start of the next processor
+			cpuinfo = append(cpuinfo, CPUInfo{VendorID: modelName}) // start of the next processor


See this comment on a similar current PR: https://github.com/prometheus/procfs/pull/296/files#discussion_r427036662

Thanks, fixed!

The link appears to be not as helpful as I wanted, but the conversation was about that "processor" is only present for CONFIG_SMP (pre v3.8) kernels, so you likely need to figure out how to support kernels without that option.

Yeah, I think I have figured it out, please see the updated commit 928d8ea, to which I have added a test case with data from the above comment.

@bluecmd can you please review it then? I am aware you're not a maintainer but you wrote some of this code.

I didn't write any of this code, I just filed bugs against it :-)

Since Linux kernel 3.8-rc1 (commit b4b8f770eb10a "ARM: kernel: update cpuinfo to print all online CPUs features", Sep 10 2012) the kernel does not print "Processor" as the first line of /proc/cpuinfo. Instead, it adds a line named "model name" with similar contents. Fix the code for this case, and add more tests. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

The `Features` line is the same for all CPUs, and yet it is assigned to and parsed many times. Optimize the code so it is only parsed once and assigned in place. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Avoid double check of each line for ":" -- since we're splitting the line anyway, we can reuse the result. This also prevents a potential panic of referencing non-existen field[1] in case input contains a ":" with no space after. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Instead of reading the data to the buffer, then making a reader and a scanner out of that buffer, let's analyze the data as we read it line by line. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

In case the input can't be read, scanner.Scan() stops and the code is supposed to check for error using scanner.Err(). Do that. In other words, instead of returning half-read cpuinfo, return an error. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

There is no sense to skip empty lines at the beginning of the file since there are none (there were in test data but it is fixed by an earlier commit). Logic is simpler now. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2020-05-19T23:19:26Z

@discordianfish @pgier PTAL

discordianfish · 2020-05-20T10:31:28Z

cpuinfo.go

 }

-func parseCPUInfoX86(info []byte) ([]CPUInfo, error) {
-	scanner := bufio.NewScanner(bytes.NewReader(info))
+func parseCPUInfoX86(info io.Reader) ([]CPUInfo, error) {


I think we decided to prefer reading everything in a buffer first. While this is unlikely to change mid reading, I think it's what we decided to to in general. But not sure, @SuperQ wdyt?

We have been doing that previously, although I think this pattern is fine also. The OS is probably buffering the file content anyway. And I think io.Reader is better to use as a parameter for most parseXXX functions because it's a little more generic than a string or byte slice.

If we do want to buffer the entire file first, then it would probably be better to create a reader from the byte slice and then pass the reader to the parse function instead of passing the byte slice directly. The byte slice parameter here might just be left over from me thinking I would need random access instead of line by line access to parse certain file formats.

Ok sounds reasonable, then let's keep is a reader.

pgier

Overall LGTM, thanks for the contribution! I think this will have a couple minor conflicts with #296, so this might need one more update after that one is merged.

pgier · 2020-05-20T19:15:14Z

cpuinfo.go

@@ -183,22 +172,16 @@ func parseCPUInfoX86(info []byte) ([]CPUInfo, error) {
 			cpuinfo[i].PowerManagement = field[1]
 		}
 	}
-	return cpuinfo, nil
+	return cpuinfo, scanner.Err()


If scanner.Err() is not nil here, I'm wondering if we should return a nil instead of cpuinfo. At least that's the pattern we've usually followed for other parts of this library.

I think this is fine.

if scanner.Err() is not nil here, I'm wondering if we should return a nil instead of cpuinfo. At least that's the pattern we've usually followed for other parts of this library.

It seems like an unnecessary complication. A user should always check for the error first. If the error is non-nil, there are no guarantees wrt what the result would be.

Ok makes sense, I'm fine with this as-is.

discordianfish · 2020-05-21T09:39:07Z

Yeah needs rebasing but also LGTM in general.

kolyshkin · 2020-05-21T16:22:54Z

I think this will have a couple minor conflicts with #296, so this might need one more update after that one is merged.

Alas, this PR also fixes the same issue as #296 so it did not have to be merged first.

Anyway, let me rework this on top of it.

SuperQ · 2020-08-24T15:35:30Z

Any interest in continuing this PR?

cpuinfo_test: fix test data

0174b24

Real /proc/cpuinfo files do not contain empty lines at the start or at the end of file. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin force-pushed the cpuinfo branch from cda6728 to 7263f91 Compare May 18, 2020 22:40

bluecmd suggested changes May 19, 2020

View reviewed changes

kolyshkin added 6 commits May 19, 2020 11:18

parseCPUInfoARM: only parse Features once

611c664

The `Features` line is the same for all CPUs, and yet it is assigned to and parsed many times. Optimize the code so it is only parsed once and assigned in place. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

cpuinfo: read data in place

a175916

Instead of reading the data to the buffer, then making a reader and a scanner out of that buffer, let's analyze the data as we read it line by line. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

parseCPUInfo*: add error checking

a6effd3

In case the input can't be read, scanner.Scan() stops and the code is supposed to check for error using scanner.Err(). Do that. In other words, instead of returning half-read cpuinfo, return an error. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

cpuinfo: rm GetFirstNonEmptyLine, simplify scan

e1d29d8

There is no sense to skip empty lines at the beginning of the file since there are none (there were in test data but it is fixed by an earlier commit). Logic is simpler now. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin force-pushed the cpuinfo branch from 7263f91 to e1d29d8 Compare May 19, 2020 18:21

discordianfish reviewed May 20, 2020

View reviewed changes

pgier reviewed May 20, 2020

View reviewed changes

kolyshkin marked this pull request as draft June 9, 2020 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpuinfo: fixes and enhancements #297

cpuinfo: fixes and enhancements #297

kolyshkin commented May 18, 2020 •

edited

bluecmd May 19, 2020

kolyshkin May 19, 2020

bluecmd May 19, 2020

kolyshkin May 19, 2020

bluecmd May 19, 2020

kolyshkin May 19, 2020 •

edited

bluecmd May 20, 2020

kolyshkin commented May 19, 2020

discordianfish May 20, 2020

pgier May 20, 2020

discordianfish May 21, 2020

pgier left a comment

pgier May 20, 2020

discordianfish May 21, 2020

kolyshkin May 21, 2020

pgier May 22, 2020

discordianfish commented May 21, 2020

kolyshkin commented May 21, 2020

SuperQ commented Aug 24, 2020

cpuinfo: fixes and enhancements #297

Are you sure you want to change the base?

cpuinfo: fixes and enhancements #297

Conversation

kolyshkin commented May 18, 2020 • edited

cpuinfo_test: fix test data

parseCPUInfoARM: fix for kernel 3.7+

parseCPUInfoARM: only parse Features once

parseCPUInfo: optimize line scan

cpuinfo: read data in place

parseCPUInfo*: add error checking

cpuinfo: rm GetFirstNonEmptyLine, simplify scan

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolyshkin May 19, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolyshkin commented May 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

discordianfish commented May 21, 2020

kolyshkin commented May 21, 2020

SuperQ commented Aug 24, 2020

kolyshkin commented May 18, 2020 •

edited

kolyshkin May 19, 2020 •

edited