Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use regexp #3460

Merged
merged 2 commits into from Feb 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
27 changes: 15 additions & 12 deletions libcontainer/cgroups/systemd/common.go
Expand Up @@ -6,7 +6,6 @@ import (
"fmt"
"math"
"os"
"regexp"
"strconv"
"strings"
"sync"
Expand Down Expand Up @@ -231,18 +230,22 @@ func systemdVersion(cm *dbusConnManager) int {
return version
}

func systemdVersionAtoi(verStr string) (int, error) {
// verStr should be of the form:
// "v245.4-1.fc32", "245", "v245-1.fc32", "245-1.fc32" (without quotes).
// The result for all of the above should be 245.
// Thus, we unconditionally remove the "v" prefix
// and then match on the first integer we can grab.
re := regexp.MustCompile(`v?([0-9]+)`)
matches := re.FindStringSubmatch(verStr)
if len(matches) < 2 {
return 0, fmt.Errorf("can't parse version %s: incorrect number of matches %v", verStr, matches)
// systemdVersionAtoi extracts a numeric systemd version from the argument.
// The argument should be of the form: "v245.4-1.fc32", "245", "v245-1.fc32",
// "245-1.fc32" (with or without quotes). The result for all of the above
// should be 245.
func systemdVersionAtoi(str string) (int, error) {
// Unconditionally remove the leading prefix ("v).
str = strings.TrimLeft(str, `"v`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH! Actually, this is not correct; this should be strings.TrimPrefix. strings.TrimLeft takes a "cut set" of characters, so currently "vvvvvv""v"v"v""v245.4-1.fc32" would be accepted;

https://go.dev/play/p/-Z4D9U4-oYM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct since it allows us to Trim either v or "v (i.e. with or without the quote).

Sure, it will also cut v" or vvvv"""vvvv but we do not expect such input here (the input expectations are documented).

Within the realm of expected input,

strings.TrimLeft(str, `"v`)

work the same as

strings.TrimPrefix(str, `"`)
strings.TrimPrefix(str, `v`)

but is somewhat more compact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah PTAL ^^

// Match on the first integer we can grab.
for i := 0; i < len(str); i++ {
if str[i] < '0' || str[i] > '9' {
// First non-digit: cut the tail.
str = str[:i]
break
}
}
ver, err := strconv.Atoi(matches[1])
ver, err := strconv.Atoi(str)
if err != nil {
return -1, fmt.Errorf("can't parse version: %w", err)
}
Expand Down
7 changes: 5 additions & 2 deletions libcontainer/cgroups/systemd/systemd_test.go
Expand Up @@ -35,8 +35,11 @@ func TestSystemdVersion(t *testing.T) {
{`"v245.4-1.fc32"`, 245, false},
{`"241-1"`, 241, false},
{`"v241-1"`, 241, false},
{"NaN", 0, true},
{"", 0, true},
{`333.45"`, 333, false},
{`v321-0`, 321, false},
{"NaN", -1, true},
{"", -1, true},
{"v", -1, true},
}
for _, sdTest := range systemdVersionTests {
ver, err := systemdVersionAtoi(sdTest.verStr)
Expand Down
44 changes: 40 additions & 4 deletions libcontainer/factory_linux.go
Expand Up @@ -5,7 +5,6 @@ import (
"errors"
"fmt"
"os"
"regexp"
"runtime/debug"
"strconv"

Expand All @@ -27,8 +26,6 @@ const (
execFifoFilename = "exec.fifo"
)

var idRegex = regexp.MustCompile(`^[\w+-\.]+$`)

// Create creates a new container with the given id inside a given state
// directory (root), and returns a Container object.
//
Expand Down Expand Up @@ -260,8 +257,47 @@ func loadState(root string) (*State, error) {
return state, nil
}

// validateID checks if the supplied container ID is valid, returning
// the ErrInvalidID in case it is not.
//
// The format of valid ID was never formally defined, instead the code
// was modified to allow or disallow specific characters.
//
// Currently, a valid ID is a non-empty string consisting only of
// the following characters:
// - uppercase (A-Z) and lowercase (a-z) Latin letters;
// - digits (0-9);
// - underscore (_);
// - plus sign (+);
// - minus sign (-);
// - period (.).
//
// In addition, IDs that can't be used to represent a file name
// (such as . or ..) are rejected.

func validateID(id string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps would be good to (despite it not being exported) add a GoDoc to describe accepted formats, and outline some of it.

Looking at this PR made me go down a bit of a trip in history; let me post below, but I should probably post elsewhere


I noticed the underscore being added, which was not explicitly included in the
regexp, so I wondered if that was accidentally included in the past (as it's
easy to forget it's part of \w). I also was curious if this ID was meant to
match other ID formats (e.g., used in Moby), some of which are required to start
with an alpha-numeric character, and ._-+ only allowed as separator.

So, I went on a short trip down history;

So, from the above;

Some what unfortunately, the runtime-spec does NOT define a format (or restrictions) in any way, which means that currently runc is more restrictive than the OCI runtime spec (overall this is fine). I found multiple attempts to have this included;

To summarize my thoughts;

  • It's probably good to document this function, and to outline the accepted characters (possibly outline (some of) the motivations for the chosen set of characters.
  • I (personally) would be in favor of formalizing accepted characters, and would've been in favor of adopting the POSIX Portable Filename Character Set, but the + addition was a bit unfortunate for that.
  • Perhaps it's worth giving it another attempt to update the OCI spec (at least with a recommendation if that's acceptable)
proposed PATCH for formalizing the format from 2015
From 2ec34c62b019431c0edebead50b56ebafe70a67c Mon Sep 17 00:00:00 2001
From: "W. Trevor King" <wking@tremily.us>
Date: Mon, 10 Aug 2015 12:07:20 -0700
Subject: [PATCH] runtime: Document container ID charset and uniqueness domain

Allow the runtime to use it's own scheme, but let the caller use UUIDs
if they want.  Jonathan asked for clarification as part of #87, but
didn't suggest a particular approach [1].  When we discussed it in the
2015-08-26 meeting [2], the consensus was to just allow everything.
With container IDs like 'a/b/c' leading to state entries like
'/var/oci/containers/a/b/c/state.json'.  But that could get ugly with
container IDs that contain '../' etc.  And perhaps there are some
filesystems out there that cannot represent non-ASCII characters
(actually, I'm not even sure what charset our JSON is in ;).  I'd
rather pick this minimal charset which can handle UUIDs, and make life
easy for runtime implementers and safe for bundle consumers at a
slight cost of flexibility for bundle-authors.

There was some confusion on the list about what "ASCII letters" meant
[3], so I've explicitly listed the allowed character ranges.  Here's a
Python 3 script that shows the associated Unicode logic:

  import unicodedata

  # http://www.unicode.org/reports/tr44/tr44-4.html#GC_Values_Table
  category = {
    'Ll': 'lowercase letter',
    'Lu': 'uppercase letter',
    'Nd': 'decimal number',
    'Pd': 'dash punctuation',
  }

  for i in range(1<<7):
      char = chr(i)
      abbr = unicodedata.category(char)
      if abbr[0] in ['L', 'N'] or abbr == 'Pd':
          cat = category[abbr]
          print('{:02x} {} {}'.format(i, char, cat))

[1]: https://github.com/opencontainers/specs/pull/87#discussion_r35828046
[2]: https://github.com/opencontainers/specs/wiki/Meeting-Minutes-2015-08-26
[3]: https://groups.google.com/a/opencontainers.org/d/msg/dev/P9gZBYhiqDE/-ptpOcQ5FwAJ
     Message-Id: <7ec9cff6-c1a6-4beb-82de-16eb412bf2f8@opencontainers.org>

Reported-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
---
 runtime.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/runtime.md b/runtime.md
index be2045b58..c9da6af9a 100644
--- a/runtime.md
+++ b/runtime.md
@@ -11,6 +11,9 @@ By providing a default location that container state is stored external applicat

 * **version** (string) Version of the OCI specification used when creating the container.
 * **id** (string) ID is the container's ID.
+  Only ASCII letters, numbers, and hyphens are valid (a–z, A–Z, 0–9, and ‘-’).
+  This value must be unique for a given host, but need not be universally unique.
+  Runtimes must allow the caller to set this ID, so that callers may choose, for example, to use [UUIDs][uuid] for universal uniqueness.
 * **pid** (int) Pid is the ID of the main process within the container.
 * **root** (string) Root is the path to the container's bundle directory.

@@ -94,3 +97,5 @@ If a hook returns a non-zero exit code, then an error is logged and the remainin

 `path` is required for a hook.
 `args` and `env` are optional.
+
+[uuid]: https://tools.ietf.org/html/rfc4122

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most interestingly, as pointed out by @mrunalp below, this commit also implicitly adds comma (,) to the list of allowed characters, since +-\. is now a range from + (ASCII 43) to . (ASCII 45) and that includes ASCII 44 (,).

Now we need to decide whether to keep supporting , or not. I vote to drop , and deal with the consequences.

if !idRegex.MatchString(id) || string(os.PathSeparator)+id != utils.CleanPath(string(os.PathSeparator)+id) {
if len(id) < 1 {
return ErrInvalidID
}

// Allowed characters: 0-9 A-Z a-z _ + - .
for i := 0; i < len(id); i++ {
c := id[i]
switch {
case c >= 'a' && c <= 'z':
case c >= 'A' && c <= 'Z':
case c >= '0' && c <= '9':
case c == '_':
case c == '+':
case c == '-':
case c == '.':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current regex also includes the comma :/
https://go.dev/play/p/n9ZiwZM8KMd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we missed \ before - making it a range from + to . instead here - https://github.com/opencontainers/runc/pull/675/files

It should have been ^[\w+\-\.]+$

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow.

I guess that the best way is to document that we're no longer allowing , in container name.

The alternative is to add , to the above switch.

default:
return ErrInvalidID
}

}

if string(os.PathSeparator)+id != utils.CleanPath(string(os.PathSeparator)+id) {
return ErrInvalidID
}

Expand Down