Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use regexp #3460

Merged
merged 2 commits into from Feb 6, 2023
Merged

Do not use regexp #3460

merged 2 commits into from Feb 6, 2023

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 15, 2022

Don't get me wrong, I love regular expressions. But the regexp package is somewhat big and slow, and we can do just fine without.

Remove two last uses of regexp in runc code (modulo tests).

This saves about 200K (2.3%) from the resulting binary, without losing any of the functionality.

Currently a draft pending

@kolyshkin
Copy link
Contributor Author

Fedora CI failed with

[04] run rootless tests ... (idmap+cgroup)
tests/rootless.sh: line 44: /etc/subuid.tmp: Read-only file system
make: *** [Makefile:103: localrootlessintegration] Error 1
make: Leaving directory '/vagrant'

which is pretty weird. Perhaps it's the problem with the underlying fs and so the kernel remounted it read-only.

Restarted.

@AkihiroSuda
Copy link
Member

slow

Yes, but not the bottleneck of runc AFAICS.
I don’t see necessity to remove regexp.

@kolyshkin kolyshkin closed this Apr 21, 2022
@kolyshkin
Copy link
Contributor Author

kolyshkin commented May 3, 2022

A few of runc dependencies use regexp as well.

Once (and if) all of these are merged, I am going to bring this out of draft, as it will save us 200K (130K stripped) from the runc binary size (which is about 5%, comparing the stripped binaries).

@kolyshkin kolyshkin reopened this May 3, 2022
@kolyshkin kolyshkin marked this pull request as draft May 3, 2022 01:32
@kolyshkin
Copy link
Contributor Author

Currently saves 531K from the stripped binary (which is 5.6% of binary size). Still very far from crun footprint, yet it seems like a worthwhile saving.

@kolyshkin kolyshkin force-pushed the no-regexp branch 2 times, most recently from 66c008c to 1d053af Compare May 24, 2022 19:30
@kolyshkin
Copy link
Contributor Author

kolyshkin commented May 26, 2022

cilium/ebpf bump moved to #3491, will rebase once merged.

@kolyshkin
Copy link
Contributor Author

kolyshkin commented May 26, 2022

urfave/cli bump moved to #3484, will rebase once merged.

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Jul 14, 2022

Rebased, made a fresh comparison:

[kir@kir-rhat runc]$ go version
go version go1.18.1 linux/amd64

[kir@kir-rhat runc]$ git describe HEAD
v1.1.0-239-g93ad6a85

[kir@kir-rhat runc]$ ls -l runc.main runc.no-regexp
-rwxrwxr-x. 1 kir kir 12999304 Jul 14 13:31 runc.main
-rwxrwxr-x. 1 kir kir 12708080 Jul 14 13:31 runc.no-regexp

[kir@kir-rhat runc]$ strip runc.main runc.no-regexp

[kir@kir-rhat runc]$ ls -l runc.main runc.no-regexp
-rwxrwxr-x. 1 kir kir 9310880 Jul 14 13:33 runc.main
-rwxrwxr-x. 1 kir kir 9100352 Jul 14 13:33 runc.no-regexp

[kir@kir-rhat runc]$ size runc.main runc.no-regexp
   text	   data	    bss	    dec	    hex	filename
5645422	3657744	 229832	9532998	 917646	runc.main
5503086	3587520	 229416	9320022	 8e3656	runc.no-regexp

So, the saving is about 200K for a stripped binary, which is 2.3% of its size.

go.mod Outdated
@@ -8,7 +8,6 @@ require (
github.com/containerd/console v1.0.3
github.com/coreos/go-systemd/v22 v22.3.2
github.com/cyphar/filepath-securejoin v0.2.3
github.com/docker/go-units v0.4.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed v0.5.0 so you could update to that instead if you want to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, replace the docker/units removal with the bump commit; PTAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, I have re-applied the commit that replaces docker/units with the in-house implementation of RAMInBytes, and the difference between these two is negligible (meaning, I guess, that the linker is smart enough to remove the functions that are not used). Here are the raw numbers:

[kir@kir-rhat runc]$ size runc-no-regex runc-no-units
   text	   data	    bss	    dec	    hex	filename
5522150	3598080	 229384	9349614	 8ea9ee	runc-no-regex
5520182	3597232	 229352	9346766	 8e9ece	runc-no-units
[kir@kir-rhat runc]$ ls -l  runc-no-regex runc-no-units
-rwxr-xr-x. 1 kir kir 9129056 Aug 31 18:26 runc-no-regex
-rwxr-xr-x. 1 kir kir 9128992 Aug 31 18:32 runc-no-units

@kolyshkin kolyshkin force-pushed the no-regexp branch 2 times, most recently from 89a7058 to 402bfa8 Compare September 2, 2022 23:41
@kolyshkin kolyshkin added the kind/refactor refactoring label Sep 2, 2022
@kolyshkin kolyshkin added this to the 1.2.0 milestone Sep 2, 2022
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM, but left some comments to improve some of the comments (the "GoDoc" suggestions aren't new, so would also be fine as a follow-up)

@@ -261,7 +258,28 @@ func loadState(root string) (*State, error) {
}

func validateID(id string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps would be good to (despite it not being exported) add a GoDoc to describe accepted formats, and outline some of it.

Looking at this PR made me go down a bit of a trip in history; let me post below, but I should probably post elsewhere


I noticed the underscore being added, which was not explicitly included in the
regexp, so I wondered if that was accidentally included in the past (as it's
easy to forget it's part of \w). I also was curious if this ID was meant to
match other ID formats (e.g., used in Moby), some of which are required to start
with an alpha-numeric character, and ._-+ only allowed as separator.

So, I went on a short trip down history;

So, from the above;

Some what unfortunately, the runtime-spec does NOT define a format (or restrictions) in any way, which means that currently runc is more restrictive than the OCI runtime spec (overall this is fine). I found multiple attempts to have this included;

To summarize my thoughts;

  • It's probably good to document this function, and to outline the accepted characters (possibly outline (some of) the motivations for the chosen set of characters.
  • I (personally) would be in favor of formalizing accepted characters, and would've been in favor of adopting the POSIX Portable Filename Character Set, but the + addition was a bit unfortunate for that.
  • Perhaps it's worth giving it another attempt to update the OCI spec (at least with a recommendation if that's acceptable)
proposed PATCH for formalizing the format from 2015
From 2ec34c62b019431c0edebead50b56ebafe70a67c Mon Sep 17 00:00:00 2001
From: "W. Trevor King" <wking@tremily.us>
Date: Mon, 10 Aug 2015 12:07:20 -0700
Subject: [PATCH] runtime: Document container ID charset and uniqueness domain

Allow the runtime to use it's own scheme, but let the caller use UUIDs
if they want.  Jonathan asked for clarification as part of #87, but
didn't suggest a particular approach [1].  When we discussed it in the
2015-08-26 meeting [2], the consensus was to just allow everything.
With container IDs like 'a/b/c' leading to state entries like
'/var/oci/containers/a/b/c/state.json'.  But that could get ugly with
container IDs that contain '../' etc.  And perhaps there are some
filesystems out there that cannot represent non-ASCII characters
(actually, I'm not even sure what charset our JSON is in ;).  I'd
rather pick this minimal charset which can handle UUIDs, and make life
easy for runtime implementers and safe for bundle consumers at a
slight cost of flexibility for bundle-authors.

There was some confusion on the list about what "ASCII letters" meant
[3], so I've explicitly listed the allowed character ranges.  Here's a
Python 3 script that shows the associated Unicode logic:

  import unicodedata

  # http://www.unicode.org/reports/tr44/tr44-4.html#GC_Values_Table
  category = {
    'Ll': 'lowercase letter',
    'Lu': 'uppercase letter',
    'Nd': 'decimal number',
    'Pd': 'dash punctuation',
  }

  for i in range(1<<7):
      char = chr(i)
      abbr = unicodedata.category(char)
      if abbr[0] in ['L', 'N'] or abbr == 'Pd':
          cat = category[abbr]
          print('{:02x} {} {}'.format(i, char, cat))

[1]: https://github.com/opencontainers/specs/pull/87#discussion_r35828046
[2]: https://github.com/opencontainers/specs/wiki/Meeting-Minutes-2015-08-26
[3]: https://groups.google.com/a/opencontainers.org/d/msg/dev/P9gZBYhiqDE/-ptpOcQ5FwAJ
     Message-Id: <7ec9cff6-c1a6-4beb-82de-16eb412bf2f8@opencontainers.org>

Reported-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
---
 runtime.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/runtime.md b/runtime.md
index be2045b58..c9da6af9a 100644
--- a/runtime.md
+++ b/runtime.md
@@ -11,6 +11,9 @@ By providing a default location that container state is stored external applicat

 * **version** (string) Version of the OCI specification used when creating the container.
 * **id** (string) ID is the container's ID.
+  Only ASCII letters, numbers, and hyphens are valid (a–z, A–Z, 0–9, and ‘-’).
+  This value must be unique for a given host, but need not be universally unique.
+  Runtimes must allow the caller to set this ID, so that callers may choose, for example, to use [UUIDs][uuid] for universal uniqueness.
 * **pid** (int) Pid is the ID of the main process within the container.
 * **root** (string) Root is the path to the container's bundle directory.

@@ -94,3 +97,5 @@ If a hook returns a non-zero exit code, then an error is logged and the remainin

 `path` is required for a hook.
 `args` and `env` are optional.
+
+[uuid]: https://tools.ietf.org/html/rfc4122

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most interestingly, as pointed out by @mrunalp below, this commit also implicitly adds comma (,) to the list of allowed characters, since +-\. is now a range from + (ASCII 43) to . (ASCII 45) and that includes ASCII 44 (,).

Now we need to decide whether to keep supporting , or not. I vote to drop , and deal with the consequences.

libcontainer/cgroups/systemd/common.go Outdated Show resolved Hide resolved
libcontainer/cgroups/systemd/common.go Outdated Show resolved Hide resolved
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment about TrimLeft vs TrimPrefix

matches := re.FindStringSubmatch(verStr)
if len(matches) < 2 {
return 0, fmt.Errorf("can't parse version %s: incorrect number of matches %v", verStr, matches)
str = strings.TrimLeft(str, `"v`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH! Actually, this is not correct; this should be strings.TrimPrefix. strings.TrimLeft takes a "cut set" of characters, so currently "vvvvvv""v"v"v""v245.4-1.fc32" would be accepted;

https://go.dev/play/p/-Z4D9U4-oYM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct since it allows us to Trim either v or "v (i.e. with or without the quote).

Sure, it will also cut v" or vvvv"""vvvv but we do not expect such input here (the input expectations are documented).

Within the realm of expected input,

strings.TrimLeft(str, `"v`)

work the same as

strings.TrimPrefix(str, `"`)
strings.TrimPrefix(str, `v`)

but is somewhat more compact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah PTAL ^^

@kolyshkin
Copy link
Contributor Author

@thaJeztah addressed your comments, PTAL

case c == '_':
case c == '+':
case c == '-':
case c == '.':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current regex also includes the comma :/
https://go.dev/play/p/n9ZiwZM8KMd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we missed \ before - making it a range from + to . instead here - https://github.com/opencontainers/runc/pull/675/files

It should have been ^[\w+\-\.]+$

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow.

I guess that the best way is to document that we're no longer allowing , in container name.

The alternative is to add , to the above switch.

Replace a regex with a simple for loop. Document the function.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Rewrite systemdVersionAtoi to not use regexp, and fix two issues:

1. It was returning 0 (rather than -1) for some errors.

2. The comment was saying that the input string is without quotes,
   while in fact it is.

Note the new function, similar to the old one, works on input either
with or without quotes. Amend the test to add test cases without quotes.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

@thaJeztah @mrunalp I have documented the function, PTAL

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

w.r.t.;

I guess that the best way is to document that we're no longer allowing , in container name.

The alternative is to add , to the above switch.

I think it should be ok to not allow ,, as (from the discussion) it looks like that was a bug, but perhaps good to make sure it's called out in the changelog / release-notes

@thaJeztah thaJeztah merged commit df47453 into opencontainers:main Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/refactor refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants