Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a cache lock at SHA granularity #2612

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

senayuki
Copy link

@senayuki senayuki commented Jul 3, 2023

Fixes #2589

Description

Add a file-lock before the warmer caching. Prevent concurrent downloading of the same image.
Resolve EOF caused by repeated write of cache files discovered in #2589.
Add a mechanism to clear expired locks. This may need to be discussed.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes unit tests
  • Adds integration tests if needed.

See the contribution guide for more details.

Reviewer Notes

  • The code flow looks good.
  • Unit tests and or integration tests added.

Release Notes

- warmer adds a file-lock to prevent concurrent downloading.

@aaron-prindle
Copy link
Collaborator

Thanks for the PR here @senayuki! From the CI/CD tests it seems this is currently failing our hack/boilerplate.sh test:

RUN /home/runner/work/kaniko/kaniko/scripts/../hack/boilerplate.sh
Boilerplate missing in:
././pkg/cache/lock.go
././pkg/cache/lock_test.go
FAILED /home/runner/work/kaniko/kaniko/scripts/../hack/boilerplate.sh

To fix this you will need to add the below boilerplate to the newly created files:

  • pkg/cache/lock.go
  • pkg/cache/lock_test.go

boilerplate to add at the beginning of those two files:

/*
Copyright 2023 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

@senayuki
Copy link
Author

Sorry about that. I have added license headers.

return time.Now().After(expTime), nil
}

func ClearDeadlock(lockPath string) (cleared bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use of named return value cleared here when value is not used IIUC. Should likely just be:

func ClearDeadlock(lockPath string) bool {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

func ClearDeadlock(lockPath string) (cleared bool) {
expired, _ := isDeadlock(lockPath) // Ingore error and try again.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Ingore -> Ignore

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -152,13 +153,27 @@ func (w *Warmer) Warm(image string, opts *config.WarmerOptions) (v1.Hash, error)
return v1.Hash{}, errors.Wrapf(err, "Failed to retrieve digest: %s", image)
}

TRY_LOCAL:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not recommended in to use GOTO statements generally in Go as it can make code harder to follow/understand. In this situation, it might be more idiomatic in Go to use a for loop to retry the lock acquisition. Can you change this to use a retry loop instead? Additionally it likely makes sense to cap the amount of time the process can wait to acquire the file lock so that it cannot spin forever

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the GOTO and using for loop now.

I think it's okay not to set the maximum retry limits for the for loop here.
I designed the renewal and timeout of the lock, so there should be no deadlock here. If the holder of lock was died and cannot be renewed, other processes will have the right to unlock. And locking will continue through the entire download process, which may be very long and difficult to provide a suitable default value for timeout.

@iTaybb
Copy link

iTaybb commented Oct 12, 2023

Hey, anything new about this? we also encounter this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing lock in cache feature
3 participants