New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add bounded concurrency for tag lookup and untag #4329
Conversation
9e15cc0
to
f21b9a0
Compare
f21b9a0
to
19e5292
Compare
19e5292
to
7adf358
Compare
@corhere do you mind batting an eye on this? It looks a ok to me, but I need to give it another look. |
if ok { | ||
limit, ok := l.(int) | ||
if !ok { | ||
panic("tag lookup concurrency limit config key must have a integer value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the semantics of an explicit concurrency limit of 0
? -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the initialization of the tagStore, a judgment is made, and if it is greater than 0, it will be set; otherwise, the concurrency limit is set to the default value.
limit := DefaultConcurrencyLimit
if repo.tagLookupConcurrencyLimit > 0 {
limit = repo.tagLookupConcurrencyLimit
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document that in the docs FWIW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document that in the docs FWIW
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining all non-positive integers to mean GOMAXPROCS
does not leave any values to signify the other possible special case: unbounded concurrency / no limit. To be clear, I don't think an option for unbounded concurrency should be added as part of this PR. I simply want to reserve the negative numbers for future use by having it be an error to set concurrencylimit
to a negative value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @corhere, I've added a check that directly throws an error if it's set to a negative value. However, I'm not quite clear on how you suggest handling unbounded / no limit concurrency. Should we maintain the current logic? Or are you saying that even if the user sets it to unbounded, we should treat it with a certain upper limit, such as GOMAXPROCS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that we do not handle unbounded concurrency. At least not as part of this PR. What I am suggesting is the following:
concurrencylimit value |
Result |
---|---|
(unset) | Tag lookups are limited to GOMAXPROCS concurrency (the default) |
0 |
Tag lookups are limited to GOMAXPROCS concurrency (the default) |
Positive integer, e.g. 3 |
Tag lookups are limited to the value, e.g. 3 |
Negative integer, e.g. -1 |
Configuration error |
Making negative numbers an error today guarantees that no valid registry configuration can have a concurrencylimit
option set to a negative number. That gives us the freedom to redefine negative values to mean something valid in the future as a non-breaking change. One potential future definition of negative concurrencylimit
values is to configure unbounded concurrency. But we don't need that today.
Note that setting concurrencylimit: 0
should be allowed. It is very useful to have an "explicit default" value for distribution configuration as it allows the user to be explicit about their intent to use GOMAXPROCS
as the concurrency limit in their configuration, or to override e.g. concurrencylimit: 4
in the YAML config by setting the environment variable REGISTRY_TAG_CONCURRENCYLIMIT=0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your nice explanation, I have updated it, PTAL.
7adf358
to
b7f94f6
Compare
b7f94f6
to
a0424f0
Compare
a0424f0
to
600402b
Compare
4b3671c
to
7b6eae8
Compare
if ok { | ||
limit, ok := l.(int) | ||
if !ok { | ||
panic("tag lookup concurrency limit config key must have a integer value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining all non-positive integers to mean GOMAXPROCS
does not leave any values to signify the other possible special case: unbounded concurrency / no limit. To be clear, I don't think an option for unbounded concurrency should be added as part of this PR. I simply want to reserve the negative numbers for future use by having it be an error to set concurrencylimit
to a negative value.
fd7c7a6
to
dfd3f03
Compare
Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
dfd3f03
to
a2afe23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for a nice contribution @microyahoo
@microyahoo Congrats! You made it! |
Congrats @microyahoo! very excited to see this getting merged and thanks to @Antiarchitect for the initial work. |
add concurrency limits for tag lookup and untag
Harbor is using the distribution for it's (harbor-registry) registry component.
The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest.
To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads).
Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948.
P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over.
Thanks @Antiarchitect's efforts with PR #3890.
Signed-off-by: Liang Zheng zhengliang0901@gmail.com