[WIP]filesystem benchmark #101

sequix · 2020-05-19T14:46:38Z

I am trying to write a filesystem test suite for this project. Basically, it uses fio to generate fake I/O operations, then stargz-snapshotter range requests a local registry through eth0 with a limited bandwidth. Meanwhile, metrics in /proc will scraped and processed afterward using prometheus and gnuplot, to generate image about the stargz-snapshotter process like:

fio will record bandwidth, iops and latency also. These will be painted with gnuplot too:

The two pictures above is made from a fio test within a stargz image, which started 4 threads to read a same file until up to 512MiB.

ktock

Great! Thanks for this.

Can we measure it towards dockerhub? However, the main concern is we will end up to make many HTTP requests to the registry...
And maybe we can include comparison with other filesystems.

cc: @AkihiroSuda

script/fs-bench/image/run.sh

cmd/containerd-stargz-grpc/main.go

script/fs-bench/fio/config/randread-4.conf

script/fs-bench/fs-bench/src/hello.py

script/fs-bench/image/Dockerfile

sequix · 2020-05-22T01:50:03Z

Great! Thanks for this.

Can we measure it towards dockerhub? However, the main concern is we will end up to make many HTTP requests to the registry...
And maybe we can include comparison with other filesystems.

cc: @AkihiroSuda

Yes, dockerhub is a much more general case, I'll make it to dockerhub.

sequix · 2020-05-25T08:27:23Z

Based on this test, I found something interesting. My test environment:

Kernel: 3.10.0-1062.18.1.el7.x86_64
Cores: 2
Mem: 8GiB
Hard disk bandwidth: 20MiB/s
Network bandwidth: 10MiB/s
Container system: debian 10 (buster)
Host system: centos 7
OCI image: docker.io/sequix/fio:legacy_256m_4t
stargz image: docker.io/sequix/fio:stargz_256m_4t
estargz image: docker.io/sequix/fio:estargz_256m_4t

I use fio to generate fake random read requests (pread(), to be precise). fio will launch 4 threads and each will pread a 4K block repeatedly until it consumed up 256MiB (1024MiB for all 4 threads).

For contrast, let's start with OCI image:

It took 50s to finish the test, 1024MiB / 50s = 20.48 MiB/s, sounds reasonable.

Now, stargz image:

850s to finish, 1024MiB / 850s = 1233 KiB/s.

Well, since stargz has to request DockerHub and decompress gzip, so maybe estargz will improve, with its memory cache prepared before actual preads, But

It took even longer, 1024 MiB / 900s = 1165 KiB/s.

And stargz-snapshotter used up a core to deal pread request in both stargz and estargz scenario (only paste estargz's process metrics here, because stargz's is very similar).

You can see from above, memory cache is ready at 120s around, but it still took pretty much time to finish the test. Maybe my test images are wrongly made. Or is the cache to blame?

ktock

Interesting. We need further investigation for finding the bottleneck but definitely we must improve the performance. I'll take a deeper look at it this week.

script/fs-bench/work/Dockerfile

script/fs-bench/work/reset.sh

script/fs-bench/test.sh

ktock · 2020-05-26T05:23:05Z

script/fs-bench/work/run.sh

+IMAGE_LEGACY="${IMAGE_LEGACY:-docker.io/sequix/fio:legacy_256m_4t}"
+IMAGE_STARGZ="${IMAGE_STARGZ:-docker.io/sequix/fio:stargz_256m_4t}"
+IMAGE_ESTARGZ="${IMAGE_ESTARGZ:-docker.io/sequix/fio:estargz_256m_4t}"


We should use docker.io/stargz organization here. I'll push these images to docker.io/stargz later.

script/fs-bench/fio/Dockerfile

script/fs-bench/tools/go.mod

script/fs-bench/work/Dockerfile

sequix · 2020-05-28T05:36:05Z

rebaseed and signed off.

ktock · 2020-06-01T09:23:33Z

@sequix After a deeper investigation last week, it turned out that the bad read performance (#101 (comment)) on the filesystem didn't come from your benchmark method but did come from some bugs in the filesystem.
I fixed them on #105. Can you measure it again after that PR get merged?

Thanks a lot for your testing!

And can you add Apache 2.0 license headers for the following files? They are needed to pass CI tests. Please refer to other existing files.

- script/fs-bench/fio/Dockerfile
- script/fs-bench/work/tools/plot/fio.sh
- script/fs-bench/work/tools/process/main.go
- script/fs-bench/work/tools/scrape/main.go

I uploaded benchmarking images on https://hub.docker.com/r/stargz/fio

script/fs-bench/work/tools/scrape/main.go

script/fs-bench/work/tools/process/main.go

sequix · 2020-06-02T01:19:59Z

How can I check the golint error log? GitHub action did not provide much info to help me pass the CI.

ktock · 2020-06-02T04:41:31Z

How can I check the golint error log? GitHub action did not provide much info to help me pass the CI.

Golint output is supposed to be logged to Github Actions. But in terms of header checks, we are currently logging just a list of files that haven't valid headers so we might need more verbose or friendly logging for this (but currently the list is enough as long as we know it indicates "these files have no valid headers").
We are using github.com/kunalkushwaha/ltag so https://github.com/kunalkushwaha/ltag/tree/master/template should help know the valid header templates.

sequix · 2020-06-02T05:15:47Z

Seems the CRIValidation failed in #105 too...

script/fs-bench/work/tools/scrape/main.go

ktock · 2020-06-03T11:08:59Z

Recent test flaky seems to be because of recent updates of one of the images (nginx) used in CRI validation test. I'm working on fixing this (please see also kubernetes-sigs/cri-tools#618 ) and sorry for blocking this PR.

ktock · 2020-06-04T08:25:35Z

Fixed CI flaky(#106) and am done the read performance improvement(#105). Can you rebase?

sequix · 2020-06-04T11:25:05Z

rebased

To make a contrast, this benchmark will test all three types of image, OCI, stargz and estargz in following steps. 1.Use fio to generate fake pread() requests parallely (4 threads). 2.Scrape metrics in /proc/<pid of stargz-snapshotter>. 3.Calculate scraped metrics with PromQL. 4.Draw fio bandwidth-latency and process-metrics images with gunplot. Signed-off-by: Chuang Zhang <scquix@gmail.com>

ktock

Thanks for the fix on the scraping codes. Smoke tested and found some points to fix.
I'll take a deeper look at the codes related to processing data this week. I'm also not sure that do we need rate() function for our use-case. process/main.go will be much simpler if we don't use PromQL.

ktock · 2020-06-25T12:37:21Z

script/fs-bench/work/Dockerfile

+RUN git clone https://github.com/opencontainers/runc \
+          $GOPATH/src/github.com/opencontainers/runc && \
+    cd $GOPATH/src/github.com/opencontainers/runc && \
+    git checkout d736ef14f0288d6993a1845745d6756cfc9ddd5a && \
+    GO111MODULE=off make -j2 BUILDTAGS='seccomp apparmor' && \
+    GO111MODULE=off make install && \
+    git clone https://github.com/containerd/containerd \
+          $GOPATH/src/github.com/containerd/containerd && \
+    cd $GOPATH/src/github.com/containerd/containerd && \
+    git checkout 990076b731ec9446437972b41176a6b0f3b7bcbf && \
+    GO111MODULE=off make -j2 && GO111MODULE=off make install


Recently we introduced a common base image among test codes for easier version management of containerd & runc in this project. Can we use the base image for this? Can use snapshotter-base build target which includes runc + containerd + containerd-stargz-grpc(built from the repo) so we won't need to build the snapshotter binary inside the testing container during runtime.
Please also see the script in the integration test.

ktock · 2020-06-25T12:38:12Z

script/fs-bench/work/tools/plot/fio.sh

+    printf "\n"
+  else
+    printf "unset key\n"
+    printf 'plot "%s" w l lw 2\n' "$LOGS_BW"


$LOGS_BW => $LOGS?

ktock · 2020-06-25T12:39:11Z

script/fs-bench/work/run.sh

+# Set environemnt variable if you want use a customize fio image,
+# whose entrypoint must be the start of a fio test, and output all its logs
+# (stdio (in file `stdio`), bw_log, iops_log, and lat_log) to /output.
+# 256m_4t stands for 4 threads and each read 256MiB (1024MiB in total).
+IMAGE_LEGACY="${IMAGE_LEGACY:-docker.io/stargz/fio:legacy_256m_4t}"
+IMAGE_STARGZ="${IMAGE_STARGZ:-docker.io/stargz/fio:stargz_256m_4t}"
+IMAGE_ESTARGZ="${IMAGE_ESTARGZ:-docker.io/stargz/fio:estargz_256m_4t}"


Could you fix the naming convention of fio images to be fio:256m-4t-{org,sgz,esgz}(stands for 4 threads and each read 256MiB)?
Please see also: https://hub.docker.com/r/stargz/fio/tags

ktock · 2020-06-25T12:43:11Z

script/fs-bench/work/tools/scrape/main.go

+		return
+	}
+
+	ts := " " + strconv.FormatInt(now.Unix()*1000+int64(now.Nanosecond())/1e6, 10) + " "


Why don't we do now.UnixNano() / int64(time.Millisecond)?

ktock · 2020-06-25T12:44:53Z

script/fs-bench/work/reset.sh

+kill_all "containerd"
+kill_all "containerd-stargz-grpc"
+kill_all "scrape"
+if [ "$NOCLEANUP" == "-nocleanup" ]; then


Unbound variable when no option is passed. It should be ${NOCLEANUP:-}.

ktock · 2020-06-25T12:46:19Z

script/fs-bench/work/reset.sh

+else
+    cleanup
+fi
+if [ "${NOSNAPSHOTTER}" == "-nosnapshotter" ] ; then


Same as the above. Should be ${NOSNAPSHOTTER:-}.

ktock reviewed May 21, 2020

View reviewed changes

ktock reviewed May 26, 2020

View reviewed changes

AkihiroSuda reviewed May 26, 2020

View reviewed changes

script/fs-bench/work/Dockerfile Show resolved Hide resolved

sequix mentioned this pull request May 28, 2020

posix filesystem call test suite based on pjdfstest #104

Draft

sequix force-pushed the fs-bench branch from a1a180f to 6274576 Compare May 28, 2020 05:35

ktock mentioned this pull request Jun 1, 2020

Improve file read performance #105

Merged

sequix force-pushed the fs-bench branch from 6274576 to cafa599 Compare June 1, 2020 12:18

ktock reviewed Jun 1, 2020

View reviewed changes

script/fs-bench/work/tools/scrape/main.go Outdated Show resolved Hide resolved

script/fs-bench/work/tools/process/main.go Outdated Show resolved Hide resolved

sequix force-pushed the fs-bench branch from cafa599 to 2698416 Compare June 2, 2020 01:12

sequix force-pushed the fs-bench branch 4 times, most recently from cba73c9 to b775cd7 Compare June 2, 2020 03:32

ktock reviewed Jun 2, 2020

View reviewed changes

script/fs-bench/work/tools/scrape/main.go Outdated Show resolved Hide resolved

sequix force-pushed the fs-bench branch from b775cd7 to 81adbb3 Compare June 3, 2020 00:25

sequix force-pushed the fs-bench branch from 8fa67a6 to 18688fb Compare June 4, 2020 11:12

sequix force-pushed the fs-bench branch from 18688fb to 777a434 Compare June 23, 2020 01:21

ktock reviewed Jun 25, 2020

View reviewed changes

AkihiroSuda approved these changes Aug 26, 2021

View reviewed changes

AkihiroSuda marked this pull request as draft August 26, 2021 01:40

LittleEvilBoi approved these changes Jan 2, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]filesystem benchmark #101

[WIP]filesystem benchmark #101

sequix commented May 19, 2020

ktock left a comment

sequix commented May 22, 2020

sequix commented May 25, 2020

ktock left a comment

ktock May 26, 2020

sequix commented May 28, 2020

ktock commented Jun 1, 2020

sequix commented Jun 2, 2020

ktock commented Jun 2, 2020

sequix commented Jun 2, 2020 •

edited

ktock commented Jun 3, 2020

ktock commented Jun 4, 2020

sequix commented Jun 4, 2020

ktock left a comment

ktock Jun 25, 2020

ktock Jun 25, 2020

ktock Jun 25, 2020

ktock Jun 25, 2020

ktock Jun 25, 2020 •

edited

ktock Jun 25, 2020

[WIP]filesystem benchmark #101

Are you sure you want to change the base?

[WIP]filesystem benchmark #101

Conversation

sequix commented May 19, 2020

ktock left a comment

Choose a reason for hiding this comment

sequix commented May 22, 2020

sequix commented May 25, 2020

ktock left a comment

Choose a reason for hiding this comment

ktock May 26, 2020

Choose a reason for hiding this comment

sequix commented May 28, 2020

ktock commented Jun 1, 2020

sequix commented Jun 2, 2020

ktock commented Jun 2, 2020

sequix commented Jun 2, 2020 • edited

ktock commented Jun 3, 2020

ktock commented Jun 4, 2020

sequix commented Jun 4, 2020

ktock left a comment

Choose a reason for hiding this comment

ktock Jun 25, 2020

Choose a reason for hiding this comment

ktock Jun 25, 2020

Choose a reason for hiding this comment

ktock Jun 25, 2020

Choose a reason for hiding this comment

ktock Jun 25, 2020

Choose a reason for hiding this comment

ktock Jun 25, 2020 • edited

Choose a reason for hiding this comment

ktock Jun 25, 2020

Choose a reason for hiding this comment

sequix commented Jun 2, 2020 •

edited

ktock Jun 25, 2020 •

edited