-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hg: Add a cache for mercurial repositories. #372
base: develop
Are you sure you want to change the base?
Conversation
ad46667
to
1f7ac92
Compare
@cdevienne not sure how much effort will require but if possible can you please add test for this? |
@kumaritanushree I will write some tests. Do you need more precisions on the CacheID logic I implemented? |
2a8ea07
to
ff75179
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some questions and left some comments.
df01894
to
b738d70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind fixing the rebase so that we do not have all the other changes in this PR?
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
The idea is to save the whole untouched clone (with no checkout) in the cache. If already present, the pull is done directly in the cache, and is faster (except on very small repos) because only new changeset are transfered. If the ref is a changeset id (not a tag, branch, topic or bookmark), and the changeset is already known in the cached clone, no pull is done which avoid any network exchange. Then we copy the cached entry and do the checkout. Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
The repo URL must be in the cache id. The ref is purposely not included in it because we want to reuse the cached repository when the ref moves. And finally, we use a sha256 hash to mask any authentication data because we don't want them to be readable in the cache folder name. Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Use only the repository URL as a cacheID Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Apologies for the delay with the review. As I haven't been using mercurial for more than a decade, it's hard to think through the hg-related logic. However, I still have a couple of notes. @cdevienne I omit some not important things, but this one I would like to address before this gets merged: can we not export the new functions if they are not used in other packages? Otherwise, they'll become a part of the public API and it'll be harder changing them in the future. @joaopapereira Regarding the immutability and cache size questions. I think that With references mutable by design, we can't decide with a high confidence whether it is safe to cache or not. Users, on the other hand, have more knowledge about artifacts they're dealing with and can explicitly allow to cache some of them with mutable references. It is totally fine as far as it is an opt-in. The cache size restriction in the current implementation (if I read it correctly) is not ideal, as it is silently refuses to save new data, which is probably important, into the cache instead of ejecting less important data (e.g. oldest entries or least used). If I were to use I would merge this PR (after unexporting the new functions) and plan improvements to the caching layer, namely to implement a clean up logic that removes less important entries. I'd also allow users to explicitly request unlimited cache size, instead of the implicit |
The 'hg' type is strictly an internal tool of the 'hg' fetcher. The only meaningful public API of this package is the 'Sync' type. Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
I un-exported the "Hg" type completely, as it has no reasonable use outside this package. |
@joaopapereira should I be able to approve workflow runs? |
@Zebradil Are you not able to approve? Let me check the permissions in that case. For now I am approving. |
@praveenrewar No, I didn't have the button. |
@cdevienne thank you for the adjustments. Now we can see that the |
@Zebradil Please check now. I am still figuring out some of the permissions stuff, so please excuse me if it still doesn't work 😅 |
I can't test if it works now, as there are no pipelines to approve :) |
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
my bad I forgot to run golangci-lint. I cleaned up the code. |
@cdevienne sorry, I didn't notice earlier that with the new functions some existing structs ( |
@praveenrewar I don't see the button to approve workflow runs. |
It is a breaking change indeed, but I would be surprised it breaks anything. I don't see a point in keeping these public if it doesn't break any code. If it does I would happily revert the change. |
Precising my point: I think the internal of the fetchers are not a part of the contract vendir has with its users. So it should not have been public in the first place. |
I doubt that this code is used outside of vendir. However, it is located under
Yes, but how do we know if it doesn't break someone else's code? @joaopapereira could you give a suggestion here? Removing symbols from public API is a breaking change according to the semantic versioning spec and we should implement it carefully with adequate communication. But vendir's major version is still |
As a generic rule of thumb in Carvel we tend to:
I like the idea of exposing only the sync, but we should standardize what this sync interface should have before we do that. This being said, I would recommend we keep exported Structs exported, and in the future, if we want to invest time in getting an interface going that would support all the fetching types, we can create the interface and make all things private that we want to make private. In terms of API consumption for the majority of the tools, we expect people to use the Cobra commands interface (not the best experience) until we are in a position where we have a supported API for each tool like we currently have in imgpkg. |
Only folks with write access can see that button, and write access comes with the Approver Role. |
hg: Add a cache for mercurial repositories.
The idea is to save the whole untouched clone (with no checkout) in the cache.
If already present, the fetch (a pull actually for hg) is done directly in the cache, and is
faster (except on very small repos) because only new changesets are
transfered.
If the ref is a changeset id (not a tag, branch, topic or bookmark), and
the changeset is already known in the cached clone, no pull is done
which avoid any network exchange.
Then we copy the cached entry and do the checkout.
The same feature for git will be proposed very soon, so any feedback on this one is welcome if it helps doing things better for git!