Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Reordering of comparison in @EqualsAndHashCode for better performance #1543

Closed
stsypanov opened this issue Dec 19, 2017 · 17 comments

Comments

@stsypanov
Copy link

Consider simple class:

@EqualsAndHashCode
class Smth {
  Long id;
  List<Long> props;
  String name;
  int count;
}

It generates this code:

public boolean equals(Object o) {
  if (o == this) return true;
  if (!(o instanceof Smth)) return false;
  final Smth other = (Smth) o;
  if (!other.canEqual((Object) this)) return false;
  final Object this$id = this.id;
  final Object other$id = other.id;
  if (this$id == null ? other$id != null : !this$id.equals(other$id)) return false;
  final Object this$props = this.props;
  final Object other$props = other.props;
  if (this$props == null ? other$props != null : !this$props.equals(other$props)) return false;
  final Object this$name = this.name;
  final Object other$name = other.name;
  if (this$name == null ? other$name != null : !this$name.equals(other$name)) return false;
  if (this.count != other.count) return false;
  return true;
}

protected boolean canEqual(Object other) {
  return other instanceof Smth;
}

Then the same code generated by IDEA (for Java 7+):

public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;
    Smth smth = (Smth) o;
    return count == smth.count &&
            Objects.equals(id, smth.id) &&
            Objects.equals(props, smth.props) &&
            Objects.equals(name, smth.name);
}

IDEA-generated code reorders comparison (primitives and wrappers first) yielding better performance for the case when count or id fields are mutually different and we can immediately return false skipping more complicated comparison of strings and collections.

It'd be nice if Lombok could do the same.

@Maaartinus
Copy link
Contributor

I'm not sure about the usefulness of such an optimization. Obviously, testing the string before the list is better if only the former differs. It's probably also better in the average real use case, but what if the string is megabytes long and the list differ already in their size?

I always order my fields accordingly; not as an optimization, but placing the simple things first makes sense to me. There's a proposal for generating compareTo, where the fields can be ranked via a special annotation.

@stsypanov
Copy link
Author

My proposal is only about primitives and wrappers like Integer (their comparison is in fact comparison of underlying primitives) as you are never sure about other fields of type Object.

@Maaartinus
Copy link
Contributor

@stsypanov You're right, somehow I "saw" name being placed before props. I may be misreading it again, but the primitive count was placed before the wrapper id and the wrapper wasn't moved before anything as it was already at the top.

Anyway, I guess, the primitive comparison is so cheap that moving primitives up sounds like a good idea. I'd place wrappers next because of the two memory indirections (which may be much more costly than the comparison itself). Everything else then in the last group.

Disclaimer: I'm just a random commenter here, so whatever I say, doesn't matter. ;)

@stsypanov
Copy link
Author

@Maaartinus your comment is useful as it helped me to clarify what is expected from requested change.

@victorwss
Copy link
Contributor

Strings are ubiquitous and are reasonably fast to compare if their hashCode()'s do not collide nor happens to be zero. So, I propose that we should use four groups: primitives first, wrappers second, strings third and everything else last.

@rspilker
Copy link
Collaborator

I think primitives and primitive wrappers first is a good idea.

Regarding Strings, I'm a bit on the fence here. There could be many other objects that are even faster, and with upcoming value types, if their fields are primitives, they are probably faster. And if we give Strings special treatment, there is no way for a user to optimize their code. Currently, they can do that by reordering their fields.

@Maaartinus
Copy link
Contributor

@victorwss String are only fast to compare if they hash codes differ. When the hash codes are equal (either by collision or because of the strings being equal), then you're out of luck and need to compare their arrays.

Another problem: In order to use the hash code, you have to call String#hashCode, which possibly has to compute the hash code and that for both strings. This may take much longer than the comparison. Thereafter, they're cached (unless zero), but this may or may not be useful. You could use reflection instead, but that's possibly slow, surely hacky and complicated by compact strings since JDK 9.

Java itself uses neither String.hash nor String.hashCode() in its equals and we're in a worse position as the field is private. So I don't think, we cab do anything fast here.


@rspilker I guess, we could use a benchmark. I'd bet, primitives are the fastest with a large margin as they require no indirection and method call.

Primitive wrappers could IMHO profit most from testing

 Integer a = this.myInt, b = that.myInt;
 if ((a != b) && (a == null || b == null || a.intValue() != b.intValue())) return false;

and avoiding the call.

All objects might profit from testing

 if (a != b && (a == null || b == null || !a.equals(b))) return false;

instead of

 if (a != null ? !a.equals(b) : b != null) return false;

The added tests are redundant, but avoid the indirection and the call in some common cases. The added cost in case the call is necessary is probably negligible (it'd get eliminated completely with inlining, but I guess, this doesn't happen).

Currently, they can do that by reordering their fields.

Sure, imagine a value class starting with an UUID.

@samukce
Copy link
Contributor

samukce commented Jun 4, 2020

Hello Folks, an up in this thread... I do have the same needs that are to be able to order the equals intentionally. Even we know that primitive is cheaper and would be first, for my business scenario I do have two String fields, basically: { String name; String value; } . If the equals are built checking fist the name and I have the value one that is the one that really changes frequently its content, the performance is hugely impacted. Sure that I'm talking about thousands of objects been compared, that is the scenario that I have and those more ms have an impact on the business flow that I need to achieve.

I captured this test scenario in a simple example just to compare with Lombok and creating one equals manually with the "right" ordination for my scenario.

https://gist.github.com/samukce/38d1f58a43e670a3c70d361daa80ba1b

Benchmark                           Mode  Cnt    Score   Error  Units
HashCodeSimulation.compareIntelliJ  avgt   15  251.713 ± 5.123  ms/op
HashCodeSimulation.compareLombok    avgt   15  270.360 ± 6.523  ms/op

if the folks are ok I can submit a PR with some idea, How is it sounds? @Maaartinus

@randakar
Copy link

randakar commented Jun 4, 2020 via email

@samukce
Copy link
Contributor

samukce commented Jun 4, 2020

Yeap, I thought this idea but this is really implicit, right? if any other person gives some maintenance in this class is almost impossible to figure out it @randakar .

I thought that we would have the same approach that we do for @ToString where we can inform the rank number for the field.

@randakar
Copy link

randakar commented Jun 4, 2020 via email

@Maaartinus
Copy link
Contributor

@samukce I'm afraid, your benchmark is broken: You've excluded name from CategoryLombok, but it's not constant. Whatever the implications on performance might be, your aren't measuring the same things.

Additionally, the class is not final, so canEqual gets generated and used.

There are other differences like getClass() != o.getClass() vs. instanceof which itself may be relevant. In case of final classes, Lombok could switch to the former, when the performance difference matters, but this would require an own benchmark.


if the folks are ok I can submit a PR with some idea, How is it sounds? @Maaartinus

I'm not a project owner, so I can't decide anything. Anyway, before you start, have a look at #512 discussing the syntax for an annotation joining @EAHC and Comparable.

I guess, you'd like something like @EAHC.Rank. I'd start by benchmarking the difference caused by reordering same-type fields.


I could actually imagine that splitt tests for a field can be better, e.g., like for strings s1 and s2

boolean same_s1 = s1 == that.s1;
if (!same_s1 && (s1 == null || that.s1 == null || s1.length() != that.s1.length())) return false;

boolean same_s2 = s2 == that.s2;
if (!same_s2 && (s2 == null || that.s2 == null || s2.length() != that.s2.length())) return false;

if (!same_s1 && !s1.equals(that.s1)) return false;

if (!same_s2 && !s2.equals(that.s2)) return false;

However, this is most probably nothing for Lombok (too complicated, unsure gain).


Yeap, I thought this idea but this is really implicit, right? if any other person gives some maintenance in this class is almost impossible to figure out it

Sure, and probably nobody would like the order "(value, name)", especially as it's used also in @ToString. If there are just too fields, I'd surely go for a manual solution.


If I were to hack on Lombok (unfortunately no time), I'd start with

  • @EAHC.Rank on fields and
  • @EQHC.Strategy on class with values NONE (status quo) and PRIMITIVES_FIRST and a corresponding configuration key

@stsypanov
Copy link
Author

stsypanov commented Jun 4, 2020

FYI, IDEA does similar reordering as a result of https://youtrack.jetbrains.com/issue/IDEA-170178. The issue is still open, but in fact when you take example snippet and generate equals() with IDEA the primitive will be compared first.

P. S. however this is not relevant for wrappers, they are not reordered

@samukce
Copy link
Contributor

samukce commented Jun 4, 2020

@Maaartinus that was a second test that I did but, to compare with that field vs without that field.

So, yeah, I agree with the approaches that you proposed. The rank looks the most simple and the flexible option, given then for each case the order can impact differently depending on the business code flow. For instance, I can have one object with 5x int fields and one String, but for instance, I can have this String field 90% changing in the time and the ints fields almost fixed value, so for this case, even primitive firsts what will make difference, in this case, will be the String field first. I will open a PR for the rank idea, and we can see the acceptance. I would like to add this in my project and remove the manual equals method that I needed to add in many different classes.

@samukce
Copy link
Contributor

samukce commented Jun 5, 2020

@stsypanov it can help you. ;) PR: #2485

@rspilker
Copy link
Collaborator

I've closed this issue, because we now have two improvements:

  1. @Include(rank=n) can be used to manually re-order the fields in the equals generation.
  2. Primitives get a default rank of 1000, references 0.

So everyone get the added performance boost of the primitives, and customization is possible.

Regarding primitive wrappers, it's not completely off the table yet. Although we can never be sure that Long refers to java.lang.Long (it could be a class a different file in the same package), it is also not likely that that would be a problem. And primitive wrappers are likely candidates, especially now value classes are getting more attention, and your JVM probably already contains optimized code to handle equality for those.

@rspilker
Copy link
Collaborator

Update: primitives default to 1000, "primitive wrappers" to 800, references to 0.

The caveat is that the primitive wrapper detection could be off, but then:

  • The chances that someone does this is very low
  • The impact of a false detection is also low
  • The rank field can still be used to fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants