Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diff may omit difference of map in struct in slice #329

Open
mrwonko opened this issue Apr 3, 2023 · 7 comments
Open

Diff may omit difference of map in struct in slice #329

mrwonko opened this issue Apr 3, 2023 · 7 comments
Labels
reporter Improvements in the difference reporter

Comments

@mrwonko
Copy link

mrwonko commented Apr 3, 2023

I've run into a case where the relevant difference is omitted from the cmp.Diff output. I have differences within a large (≥18 entries) map that is nested inside a slice of structs, with some other fields preceding it:

func TestMapDiff(t *testing.T) {
	const (
		// Diff only outputs the first 8 map entries, so if they're identical, the diff is useless
		numSame = 8
		// only 10 differing entries or more trigger the issue
		numDiff = 10
	)
	type Wrapper struct {
		// we need a couple of unrelated fields above the map to trigger the issue,
		// slices or other maps seem to work well (strings don't)
		UnrelatedField1 []string
		UnrelatedField2 []string
		UnrelatedField3 []string
		UnrelatedField4 []string
		UnrelatedField5 []string
		UnrelatedField6 []string
		// this is where the difference is
		DifferingMap    map[string]any
	}
	want := map[string]any{}
	got := map[string]any{}
	for i := 0; i < numSame+numDiff; i++ {
		key := fmt.Sprintf("key-%02d", i)
		if i < numSame {
			want[key] = "same"
			got[key] = "same"
		} else {
			// note that the missing entries must come after the present ones, alphabetically
			want[key] = "missing"
		}
	}
	if diff := cmp.Diff(
		// we must use a slice to trigger the issue, comparing plain Wrappers doesn't trigger it
		[]*Wrapper{
			{
				// To reproduce the issue, these must not be nil
				UnrelatedField1: []string{},
				UnrelatedField2: []string{},
				UnrelatedField3: []string{},
				UnrelatedField4: []string{},
				UnrelatedField5: []string{},
				UnrelatedField6: []string{},
				DifferingMap:    want,
			},
		},
		[]*Wrapper{
			{
				UnrelatedField1: []string{},
				UnrelatedField2: []string{},
				UnrelatedField3: []string{},
				UnrelatedField4: []string{},
				UnrelatedField5: []string{},
				UnrelatedField6: []string{},
				DifferingMap:    got,
			},
		}); diff != "" {
		t.Errorf("mismatch (-want +got):\n%s", diff)
	}
}

This leads to the following output:

          []*main.Wrapper{
        - 	&{
        - 		UnrelatedField1: []string{},
        - 		UnrelatedField2: []string{},
        - 		UnrelatedField3: []string{},
        - 		UnrelatedField4: []string{},
        - 		UnrelatedField5: []string{},
        - 		UnrelatedField6: []string{},
        - 		DifferingMap: map[string]any{
        - 			"key-00": string("same"),
        - 			"key-01": string("same"),
        - 			"key-02": string("same"),
        - 			"key-03": string("same"),
        - 			"key-04": string("same"),
        - 			"key-05": string("same"),
        - 			"key-06": string("same"),
        - 			"key-07": string("same"),
        - 			...
        - 		},
        - 	},
        + 	&{
        + 		UnrelatedField1: []string{},
        + 		UnrelatedField2: []string{},
        + 		UnrelatedField3: []string{},
        + 		UnrelatedField4: []string{},
        + 		UnrelatedField5: []string{},
        + 		UnrelatedField6: []string{},
        + 		DifferingMap: map[string]any{
        + 			"key-00": string("same"),
        + 			"key-01": string("same"),
        + 			"key-02": string("same"),
        + 			"key-03": string("same"),
        + 			"key-04": string("same"),
        + 			"key-05": string("same"),
        + 			"key-06": string("same"),
        + 			"key-07": string("same"),
        + 		},
        + 	},
          }

Note how the whole Wrapper is marked as differing, instead of just the map inside, and the interesting differing part of the map is omitted, likely as a direct result.

Reducing the number of differences inside the nested map to 9 yields this much more useful output:

          []*main.Wrapper{
          	&{
          		... // 4 identical fields
          		UnrelatedField5: {},
          		UnrelatedField6: {},
          		DifferingMap: map[string]any{
          			... // 6 identical entries
          			"key-06": string("same"),
          			"key-07": string("same"),
        - 			"key-08": string("missing"),
        - 			"key-09": string("missing"),
        - 			"key-10": string("missing"),
        - 			"key-11": string("missing"),
        - 			"key-12": string("missing"),
        - 			"key-13": string("missing"),
        - 			"key-14": string("missing"),
        - 			"key-15": string("missing"),
        - 			"key-16": string("missing"),
          		},
          	},
          }
@jgalliers
Copy link

I didn't realise until just now but this issue is actually another example of the issue that I opened today - #343

The underlying issue here is not actually map differences in a slice - it's the magnitude of the change. You correctly identify here that the quantity of changed fields directly affects the result, and that's because the diff has tipped over the "half-way" mark (more than half of ANY fields on the object being diffed has changed) and this converts the diff from a "field-wise" diff to a "new object" diff.

I'm not sure why this is the case and hopefully we get an answer on #343

@dsnet
Copy link
Collaborator

dsnet commented Oct 12, 2023

Much of the reporter is using a complex set of heuristic to determine how to best format the output. Reports like this and #343 are helpful to adjust the heuristic to provide a better human experience. It will unfortunately never be perfect as much of this subjective.

@dsnet
Copy link
Collaborator

dsnet commented Oct 12, 2023

I traced down heuristic for both this issue and #343 to this code snippet:
https://github.com/google/go-cmp/blob/master/cmp/internal/diff/diff.go#L112-L120

Changing it to something like:

func (r Result) Similar() bool {
	// Use NumSame+1 to offset NumSame so that binary comparisons are similar.
	return r.NumSame+1 >= r.NumDiff/2
}

fixes the issue, but we'll have to think carefully about possible unexpected transitive effects.

@dsnet
Copy link
Collaborator

dsnet commented Oct 12, 2023

Here's an example of a diff that got worse:

got:
    [][]uint8{
-       {0xde, 0xad, 0xbe, 0xef},
        {0xff, 0x6f, 0x6f},
+       "foo",
        "barbaz",
        bytes.Join({
-           "bl",
            "a",
-           "hdieblah",
+           "dded",
        }, ""),
+       "here",
+       {0x68, 0x72, 0x6d, 0x70, 0x68, 0xff},
    }

want:
    [][]uint8{
-       {0xde, 0xad, 0xbe, 0xef},
        {0xff, 0x6f, 0x6f},
+       "foo",
        "barbaz",
+       "added",
+       "here",
-       "blahdieblah",
+       {0x68, 0x72, 0x6d, 0x70, 0x68, 0xff},
    }

@jgalliers
Copy link

jgalliers commented Oct 13, 2023

I traced down heuristic for both this issue and #343 to this code snippet: https://github.com/google/go-cmp/blob/master/cmp/internal/diff/diff.go#L112-L120

Changing it to something like:

func (r Result) Similar() bool {
	// Use NumSame+1 to offset NumSame so that binary comparisons are similar.
	return r.NumSame+1 >= r.NumDiff/2
}

fixes the issue, but we'll have to think carefully about possible unexpected transitive effects.

Could we pass some kind of option that allows us to manage the evaluation? We can already transform, order, etc via these (very helpful!) functions, could we do something with that?

I'm not an expert in the deep internals here but would it potentially be possible to do something with the editscript generation and adding some kind of new EditType? Perhaps through a new cmp.Option() such as SetSameObjectHeuristic(int) in which the Similar() function you linked to either the passed value?

Or a more flexible option like SetSameObjectHeuristic(interface{}, int) which allows a type/value combination to specify a value to assess the "sameness".

I actually don't have a problem at all with the heuristic (it seems like a smart solution to a difficult problem!) - but being able to indicate you want a full property-wise diff would be very valuable in my opinion.

@mrwonko
Copy link
Author

mrwonko commented Oct 13, 2023

I wouldn't mind showing the entire object as changed, as long as the entire object is printed, and not just the first couple of fields (which may be identical). Make it a cmpopt, if you'd rather make it opt-in.

@dsnet
Copy link
Collaborator

dsnet commented Oct 13, 2023

If you look at the implementation, it's a lot of hardcoded constants and heuristics. It is infeasible to expose all of those details through options. It's better to look at case examples of where it operates poorly and tune the heuristics to do better in those cases without undoing how it works on other cases. It takes some finessing, but the reporter gradually do better for more and more situations. Until we've exhausted all the reasonable heuristic tuning, I don't think we should expose any options to alter reporter outputer.

@dsnet dsnet added the reporter Improvements in the difference reporter label Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reporter Improvements in the difference reporter
Projects
None yet
Development

No branches or pull requests

3 participants