Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge: Parameterize merge functions + git merge-file on marker_size (with a stubbed lookup) #101

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rwe
Copy link
Contributor

@rwe rwe commented Sep 30, 2021

NOTE: For convenience, this branch is pre-resolved with https://github.com/rwe/git-revise/tree/rerere-skip-unsized-marker which is PR'd as #95 and would have a minor conflict once merged. The standalone change is available on https://github.com/rwe/git-revise/tree/conflict-marker.

This parameterizes the relevant pieces of merge.py on conflict_marker. The parameterization currently is dummied out to resolve to the default marker size, which is 7. However this refactoring makes that dependency explicit and therefore possible to extend to use git check-attr. See discussion here: #95 (comment), which describes a naïve and likely-not-performant example implementation of "real" resolution.

@@ -255,6 +267,9 @@ def merge_files(
"--quiet",
# Send results to stdout instead of overwriting "current file".
"--stdout",
# Ensure markers are the expected length to ensure consistent rerere results.
"--marker-size",
str(marker_size),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I thought there was no way to override attributes.
I'd probably put this in a separate commit, apart from the refactoring.

Copy link
Contributor Author

@rwe rwe Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the "parameterize marker size" commit. I think that seems like the right place to me?

Note that this does not override attributes: they weren't and aren't used at all here. This is merging arbitrary temp files and git merge-file doesn't refer to the repo at all other than to check user preference for diff3 style, which isn't relevant here. (The rerere normalization strips those out, and otherwise we'd want to preserve their preference if the user wants to manually resolve).

The note about using git check-attr to determine conflict-marker-size is just so that the rerere conflict IDs here would be cross-compatible with the ones created and used by vanilla git rebase/git merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry I thought this commit already had a behavior change

@@ -284,7 +302,11 @@ def replay_recorded_resolution(
):
return (b"", None, None)

(normalized_preimage, conflict_id) = normalize_conflicted_file(preimage)
(normalized_preimage, conflict_id) = normalize_conflicted_file(
marker_size=marker_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we are supposed to use kwarg-style calls for non-kwargs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow what you mean by non-kwargs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marker_size is a positional argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still puzzling over what you mean here. Saying "it's a positional argument" on a line where it's passed by name doesn't make sense. normalize_conflicted_file does not have opaque *args: there are no positional-only parameters. There's nothing wrong with giving the names explicitly.

Copy link
Contributor

@krobelus krobelus Oct 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I mixed that up.
I was just wondering if there is a convention of when to pass the name along an argument.
I think I usually pass the name only for arguments that have default values (and kwargs).
However, we use the name when calling Repository.bool_config for example, this is consistent with your version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer named arguments in general, unless the order is unambiguous (like something like call_with(fn, arg1, arg2)) or irrelevant (e.g. max(a, b)). They're particularly helpful during refactoring, since they're both resilient to re-ordering of parameters, their use makes the introduction of default parameters a little easier to check for, and having the names there can make a merge resolution much more obvious. Though, these are also ultimately justifications of habit :)

@@ -28,6 +28,12 @@
DEFAULT_CONFLICT_MARKER_SIZE = 7


def get_conflict_marker_size(__repo: Repository, __file: Path) -> int:
# TODO: Determine on a per-file basis by its `conflict-marker-size` attribute.
# See ll_merge_marker_size in git/ll-merge.c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your comment about performance concerns: here's a possible way to avoid them:

  1. In merge_files(), run "git merge-file", without passing a marker size
  2. Only if the above "git merge-file" failed, ask "git check-attr" for the conflict marker size, and re-run "git merge-file"

In the common case (no conflict) we're as fast as ever.

I often use git-revise for rewrites that cause the same conflicts - that's
where rerere kicks in. However, there are typically very few of those
conflicts, so I don't think that matters. Also that's not the typical use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing an optimistic merge is a good idea to consider. I'd worry a little about being tempted (or tempting others) to look for or save cached resolutions for both marker sizes, which could lead to inconsistency. Probably avoidable, though! Or not a big deal.

I do wonder what the actual performance impact of batch-querying git check-attrs for each un-encountered entry per tree, e.g. using a persistent process like git cat-file is using. I suspect that the performance difference between that and re-merging is pretty small. (I was mainly leery of my hamfisted query-each-entry-individually 😄).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably avoidable, though!

Yeah, merge_files() would do two runs of "git merge-file" and if the first failed, callers would only ever see the results of the second.

Using a persistent process for git check-attrs sounds great, and there are probably other reasons to use that: we can detect attributes like file encoding and whether a file is binary.

@@ -28,6 +28,12 @@
DEFAULT_CONFLICT_MARKER_SIZE = 7


def get_conflict_marker_size(__repo: Repository, __file: Path) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should maybe be a method of Repository (or at least the generic check-attr wrapper should be)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or at least the generic check-attr wrapper should be
Yep, I agree that a general check-attr wrapper thing would benefit from being lifted up, if implemented. It would make sense to allow that to be a persisted process and have those path → attributes cached.

As far as handling the "conflict-marker-size" attribute in particular, it might benefit from staying standalone. There's some extra validation vs other attributes (turning it into an int, and defaulting to 7 if non-positive), and it's very specific to this area of code dealing with merge-conflict text.

@rwe
Copy link
Contributor Author

rwe commented Jan 10, 2022

@mystor Just wanted to bump this. This branch has no conflicts with base and has been ready for a while. I've pushed up a no-op rebase in order to keep things fresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants