Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression for image pull with concurrent container creation #274

Open
1 task
shuochen0311 opened this issue Oct 9, 2023 · 5 comments
Open
1 task
Labels
bug Something isn't working

Comments

@shuochen0311
Copy link

What happened in your environment?

We have multiple containers running on a same node with overlaybd as its container snapshotter, which are doing lazy pulling for all rootfs contents. When running it on prod, we found the P95 latency has huge gaps with P50 (20s vs 10s). After checking some logs we saw an interesting coincident that

For those image pulling with unexpected latencies:

Oct 02 06:04:28 [Event] Start to pull image for container executor: image harbor-xxxxx
Oct 02 06:04:37  [Event] Finish pulling image for container executor: image harbor-xxxx

There is a container creation events hapenning inside containerd

Oct 02 06:04:29 ip-10-1-162-245 containerd[387]: time="2023-10-02T06:04:29.671653617Z" level=info msg="CreateContainer within sandbox \"e0d9308c3259dc01251575ad5c27d2efdbdaf00b7c267f06a7ab15ed6d827e23\""
Oct 02 06:04:29 ip-10-1-162-245 containerd[387]: time="2023-10-02T06:04:29.672341423Z" level=info msg="StartContainer for \"3a6b0dce5e9168993ccd0c3213929af87e4765304774773188e1830631e2ff39\""
Oct 02 06:04:29 ip-10-1-162-245 containerd[387]: time="2023-10-02T06:04:29.672417656Z" level=info msg="container start request for xxxx"
Oct 02 06:04:29 ip-10-1-162-245 containerd[387]: time="2023-10-02T06:04:29.837229175Z" level=info msg="StartContainer for \"3a6b0dce5e9168993ccd0c3213929af87e4765304774773188e1830631e2ff39\" returns successfully"

We are suspecting the container creating events (which contains some container rootfs construction process) is interfering with container image pulling and impact image lazy pull latency.

We are looking for some insights from upstream about what is the potential reason for such performance regression.

What did you expect to happen?

No response

How can we reproduce it?

Use overlaybd as snapshotter, overlap some container creation with container image download.

What is the version of your Overlaybd?

0.6.17

What is your OS environment?

ubuntu 20.04

Are you willing to submit PRs to fix it?

  • Yes, I am willing to fix it.
@shuochen0311 shuochen0311 added the bug Something isn't working label Oct 9, 2023
@liulanzheng
Copy link
Member

@shuochen0311
What was the workload in container created at 06:04:29, did it load a large amount of data which affected image pulling?
were there any other logs between 06:04:29 and 06:04:37?

@shuochen0311
Copy link
Author

@liulanzheng thanks for responding. Let me see what else can I find from the log in that period of time.

A question on my side is if the container creation/start requires a lot of data pulling, Will it affect the performance for the rpull(metadata pulling) which is at the critical path before container starts?

@lihuiba
Copy link

lihuiba commented Oct 10, 2023

if the container creation/start requires a lot of data

It depends on the application itself. If it is a busybox, it requires little data.

@shuochen0311
Copy link
Author

@lihuiba how do I know if my container is downloading a lot of data? Meanwhile, I think the question is is it expected that data downloading will affect the rpull performance?

@lihuiba
Copy link

lihuiba commented Oct 11, 2023

@shuochen0311 iostat can show you how much data has been read from a block device. It can also show you the realtime I/O speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants