Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hg: Add a cache for mercurial repositories. #372

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

cdevienne
Copy link

@cdevienne cdevienne commented Apr 2, 2024

hg: Add a cache for mercurial repositories.

The idea is to save the whole untouched clone (with no checkout) in the cache.

If already present, the fetch (a pull actually for hg) is done directly in the cache, and is
faster (except on very small repos) because only new changesets are
transfered.

If the ref is a changeset id (not a tag, branch, topic or bookmark), and
the changeset is already known in the cached clone, no pull is done
which avoid any network exchange.

Then we copy the cached entry and do the checkout.

The same feature for git will be proposed very soon, so any feedback on this one is welcome if it helps doing things better for git!

@cdevienne cdevienne force-pushed the hg-cache branch 2 times, most recently from ad46667 to 1f7ac92 Compare April 2, 2024 19:18
@cdevienne cdevienne changed the title Hg cache hg: Add a cache for mercurial repositories. Apr 2, 2024
pkg/vendir/fetch/hg/hg.go Outdated Show resolved Hide resolved
pkg/vendir/fetch/hg/hg.go Outdated Show resolved Hide resolved
@kumaritanushree
Copy link
Contributor

@cdevienne not sure how much effort will require but if possible can you please add test for this?

@cdevienne
Copy link
Author

@kumaritanushree I will write some tests. Do you need more precisions on the CacheID logic I implemented?

@cdevienne cdevienne force-pushed the hg-cache branch 2 times, most recently from 2a8ea07 to ff75179 Compare May 7, 2024 11:41
@cdevienne cdevienne marked this pull request as ready for review May 7, 2024 11:44
Copy link
Member

@joaopapereira joaopapereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions and left some comments.

test/e2e/hg_test.go Outdated Show resolved Hide resolved
test/e2e/hg_test.go Outdated Show resolved Hide resolved
pkg/vendir/fetch/hg/sync.go Outdated Show resolved Hide resolved
pkg/vendir/fetch/hg/hg.go Outdated Show resolved Hide resolved
pkg/vendir/fetch/hg/hg.go Outdated Show resolved Hide resolved
@cdevienne cdevienne force-pushed the hg-cache branch 2 times, most recently from df01894 to b738d70 Compare May 10, 2024 09:47
Copy link
Member

@joaopapereira joaopapereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind fixing the rebase so that we do not have all the other changes in this PR?

Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
The idea is to save the whole untouched clone (with no checkout) in the cache.

If already present, the pull is done directly in the cache, and is
faster (except on very small repos) because only new changeset are
transfered.

If the ref is a changeset id (not a tag, branch, topic or bookmark), and
the changeset is already known in the cached clone, no pull is done
which avoid any network exchange.

Then we copy the cached entry and do the checkout.

Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
The repo URL must be in the cache id.
The ref is purposely not included in it because we want to reuse the cached repository
when the ref moves.
And finally, we use a sha256 hash to mask any authentication data because we don't
want them to be readable in the cache folder name.

Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Use only the repository URL as a cacheID

Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
@Zebradil
Copy link
Member

Zebradil commented May 20, 2024

Apologies for the delay with the review.

As I haven't been using mercurial for more than a decade, it's hard to think through the hg-related logic. However, I still have a couple of notes.

@cdevienne I omit some not important things, but this one I would like to address before this gets merged: can we not export the new functions if they are not used in other packages? Otherwise, they'll become a part of the public API and it'll be harder changing them in the future.

@joaopapereira Regarding the immutability and cache size questions. I think that vendir can give a little more control over this to a user.

With references mutable by design, we can't decide with a high confidence whether it is safe to cache or not. Users, on the other hand, have more knowledge about artifacts they're dealing with and can explicitly allow to cache some of them with mutable references. It is totally fine as far as it is an opt-in.

The cache size restriction in the current implementation (if I read it correctly) is not ideal, as it is silently refuses to save new data, which is probably important, into the cache instead of ejecting less important data (e.g. oldest entries or least used). If I were to use vendir cache, I'd just set its size to some large number hoping that the cache size will remain smaller than that.


I would merge this PR (after unexporting the new functions) and plan improvements to the caching layer, namely to implement a clean up logic that removes less important entries. I'd also allow users to explicitly request unlimited cache size, instead of the implicit 1Mb limit, but we can discuss these details in a separate issue, I guess.

The 'hg' type is strictly an internal tool of the 'hg' fetcher.
The only meaningful public API of this package is the 'Sync' type.

Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
@cdevienne
Copy link
Author

cdevienne commented May 21, 2024

I un-exported the "Hg" type completely, as it has no reasonable use outside this package.

@Zebradil
Copy link
Member

@joaopapereira should I be able to approve workflow runs?

@praveenrewar
Copy link
Member

@Zebradil Are you not able to approve? Let me check the permissions in that case. For now I am approving.

@Zebradil
Copy link
Member

@praveenrewar No, I didn't have the button.

@Zebradil
Copy link
Member

@cdevienne thank you for the adjustments. Now we can see that the localClone function is unused. Could you remove it? Or was it supposed to be used at some point?

@praveenrewar
Copy link
Member

No, I didn't have the button.

@Zebradil Please check now. I am still figuring out some of the permissions stuff, so please excuse me if it still doesn't work 😅

@Zebradil
Copy link
Member

I can't test if it works now, as there are no pipelines to approve :)
Let's wait until we have another PR.

Signed-off-by: Christophe de Vienne <christophe.devienne@orus.io>
@cdevienne
Copy link
Author

@cdevienne thank you for the adjustments. Now we can see that the localClone function is unused. Could you remove it? Or was it supposed to be used at some point?

my bad I forgot to run golangci-lint. I cleaned up the code.

@Zebradil
Copy link
Member

@cdevienne sorry, I didn't notice earlier that with the new functions some existing structs (Hg, HgInfo, ...) and functions (NewHg, ...) became unexported too. This could be a breaking change for someone who might use them in their code. Could you revert this change to the said structs and functions?

@Zebradil
Copy link
Member

@praveenrewar I don't see the button to approve workflow runs.

image

@cdevienne
Copy link
Author

@cdevienne sorry, I didn't notice earlier that with the new functions some existing structs (Hg, HgInfo, ...) and functions (NewHg, ...) became unexported too. This could be a breaking change for someone who might use them in their code. Could you revert this change to the said structs and functions?

It is a breaking change indeed, but I would be surprised it breaks anything. I don't see a point in keeping these public if it doesn't break any code. If it does I would happily revert the change.

@cdevienne
Copy link
Author

Precising my point: I think the internal of the fetchers are not a part of the contract vendir has with its users. So it should not have been public in the first place.

@Zebradil
Copy link
Member

I think the internal of the fetchers are not a part of the contract vendir has with its users.

I doubt that this code is used outside of vendir. However, it is located under pkg which, if I understand correctly, makes it "public". If it were under internal I wouldn't bother much.

I don't see a point in keeping these public if it doesn't break any code.

Yes, but how do we know if it doesn't break someone else's code?

@joaopapereira could you give a suggestion here?

Removing symbols from public API is a breaking change according to the semantic versioning spec and we should implement it carefully with adequate communication. But vendir's major version is still 0 which gives us some freedom to break the public API. I personally would commit this (breaking) change and see who reacts on this and what are their use cases. Also, ytt, for example, discourages using it as a module: https://github.com/carvel-dev/ytt/blob/develop/examples/integrating-with-ytt/apis.md#as-a-go-module. I think for vendir these considerations make sense too.

@joaopapereira
Copy link
Member

As a generic rule of thumb in Carvel we tend to:

  • Minimize the breaking changes, our tools are still all in the major 0 version, which gives us some leeway to do breaking changes. Nevertheless, we should aim to minimize it
  • We try to keep consistency whenever possible

I like the idea of exposing only the sync, but we should standardize what this sync interface should have before we do that. This being said, I would recommend we keep exported Structs exported, and in the future, if we want to invest time in getting an interface going that would support all the fetching types, we can create the interface and make all things private that we want to make private.

In terms of API consumption for the majority of the tools, we expect people to use the Cobra commands interface (not the best experience) until we are in a position where we have a supported API for each tool like we currently have in imgpkg.

@joaopapereira
Copy link
Member

@praveenrewar I don't see the button to approve workflow runs.

image

Only folks with write access can see that button, and write access comes with the Approver Role.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

5 participants