Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cacheHashCode #2513

Merged
merged 18 commits into from
Sep 24, 2020
Merged

Feature/cacheHashCode #2513

merged 18 commits into from
Sep 24, 2020

Conversation

andrebrait
Copy link
Contributor

@andrebrait andrebrait commented Jul 9, 2020

Opening this for review.

As mentioned above, I'll squash the commits and change the commit message to close the corresponding issues on GitHub before the final merge.

Pending:

  1. Documentation (I would like input on what you think should be there, etc.)
  2. Adding a configuration key to enable warning the user if this option is used (probably useful)
  3. Finding out why javac compiler warnings are not working

This solves #784

This is the same implementation used by java.lang.String, but the field was made transient to avoid accidental serialization.

@JarvisCraft
Copy link

It may be worth being able to calculate hashCode of truely-immutable objects (such as @Values) once they are created so that no mutating operations happen on them thus allowing better JIT optimizations for read-operations (allowing it to omit RW-barriers insertion), this may be done as another parameter of the annotation.
However, this would require insertion of logic to constructors which may be more complicated.

@randakar
Copy link

randakar commented Jul 12, 2020 via email

@randakar
Copy link

randakar commented Jul 12, 2020 via email

@andrebrait
Copy link
Contributor Author

Like any change to your code, you have to be mindful to whether this makes sense or whether or not your tests will break with it.

Repeating myself a bit, this is not different than the user manually caching hashCode themselves. It will behave the same ans break under than same circumstances, for example.

This is not on by default. It takes the user adding the annotation and setting the parameter explicitly to true, so I don't expect this to do anything bad that wouldn't happen is the user wrote their own hashCode and cached it themselves. This just makes it more convenient so they can still use Lombok and cache hashCode at the same time.

What concerns me more is what this does to unit tests.
For example, some of my projects make use of EqualsVerifier. And then you
run into this: https://jqno.nl/equalsverifier/manual/caching-hashcodes/

On Sun, Jul 12, 2020 at 4:48 PM Floris Kraak randakar@gmail.com wrote:

Not really, it's just doing a call to hashCode() at object construction
time.

On Sun, Jul 12, 2020 at 3:35 PM PROgrm_JARvis notifications@github.com
wrote:

It may be worth being able to calculate hashCode of truely-immutable
objects (such as @values) once they are created so that no mutating
operations happen on them thus allowing better JIT optimizations for
read-operations (allowing it to omit RW-barriers insertion), this may be
done as another parameter of the annotation.
However, this would require insertion of logic to constructors which may
be more complicated.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2513 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AABIERINLSL4SCNSHY6YOJ3R3G32BANCNFSM4OWBL5GQ
.

--
"Don't only practice your art, but force your way into it's secrets, for
it and knowledge can raise men to the divine."
-- Ludwig von Beethoven

--
"Don't only practice your art, but force your way into it's secrets, for it
and knowledge can raise men to the divine."
-- Ludwig von Beethoven

@andrebrait
Copy link
Contributor Author

The reasons why I don't like calculating on construction are:

  1. Wasting cycles even if the object is not used for hashing
  2. It makes every lazy property not lazy anymore (and Lombok supports those and making them mutually exclusive is likely not easy (though implementing warnings is ok))

@JarvisCraft
Copy link

JarvisCraft commented Jul 12, 2020

The reasons why I don't like calculating on construction are:

  1. Wasting cycles even if the object is not used for hashing
  2. It makes every lazy property not lazy anymore (and Lombok supports those and making them mutually exclusive is likely not easy (though implementing warnings is ok))
  1. That's why it's suggested for it to be optional.
  2. You simply can use enum instead of multiple booleans like:
enum CacheStrategy {
    NEVER, CONSTRUCTION, ALWAYS
}

@randakar
Copy link

randakar commented Jul 12, 2020 via email

@andrebrait
Copy link
Contributor Author

@JarvisCraft ok, I think it looks more and more that the enum is going to be preferable to just a boolean.

I just need to figure out how to handle that on the constructor. I figured out the caching (but not precalculation) for javac, but the Eclipse bits are still a mystery to me in some points.

I'll look into how to handle that on the constructor.

The reasons why I don't like calculating on construction are:

  1. Wasting cycles even if the object is not used for hashing
  2. It makes every lazy property not lazy anymore (and Lombok supports those and making them mutually exclusive is likely not easy (though implementing warnings is ok))
  1. That's why it's suggested for it to be optional.
  2. You simply can use enum instead of multiple booleans like:
enum CacheStrategy {
    NEVER, CONSTRUCTION, ALWAYS
}

@JarvisCraft
Copy link

@JarvisCraft ok, I think it looks more and more that the enum is going to be preferable to just a boolean.

I just need to figure out how to handle that on the constructor. I figured out the caching (but not precalculation) for javac, but the Eclipse bits are still a mystery to me in some points.

I'll look into how to handle that on the constructor.

The reasons why I don't like calculating on construction are:

  1. Wasting cycles even if the object is not used for hashing
  2. It makes every lazy property not lazy anymore (and Lombok supports those and making them mutually exclusive is likely not easy (though implementing warnings is ok))
  1. That's why it's suggested for it to be optional.
  2. You simply can use enum instead of multiple booleans like:
enum CacheStrategy {
    NEVER, CONSTRUCTION, ALWAYS
}

Nice to hear it! ❤️
Don't consider the constructor-one a super-needed feature. It is fair to say that in most cases it won't be less needed than cache-one. I just thought that it would be worth suggesting it while you are working on the PR so that if it is possible you would be able to implement it all at once.
So it's really great to know that you are able to find time for further research on it :3

@andrebrait
Copy link
Contributor Author

@JarvisCraft Actually, I might not be able to do it right now, but I'll use an enum so it's easy to add the option for precalculation in a future PR.

Initially I'll just finish implementing the caching solution and write documentation etc.

I don't have that much spare time, unfortunately :/

@JarvisCraft
Copy link

@JarvisCraft Actually, I might not be able to do it right now, but I'll use an enum so it's easy to add the option for precalculation in a future PR.

Initially I'll just finish implementing the caching solution and write documentation etc.

I don't have that much spare time, unfortunately :/

No problems at all, just nice to hear that a more flexible option will be used :)
Good luck anyway ❤️

@randakar
Copy link

randakar commented Jul 13, 2020 via email

@andrebrait
Copy link
Contributor Author

Implementation of both javac and eclipse code paths is finished.

I'll make this open for review. I'll squash the commits and change the commit message to close the corresponding issues on GitHub before the final merge.

Pending:

  1. Documentation
  2. Adding a configuration key to enable warning the user if this option is used (?)

@randakar
Copy link

randakar commented Jul 13, 2020 via email

@andrebrait
Copy link
Contributor Author

@randakar

In this case, it's not a very big change and it's full of "rookie mistakes" commits because it's the first time I contribute to this and it's been ages since I last used Eclipse at all. Check some of the latest ones out and you'll see what I'm talking about.

In a case like this, squashing allows you to have a clean and meaningful history. There's no point in keeping history as polluted as it is right now.

This whole thing is basically one big change, too. It's not like you revert part of it and have things work anyway.

@andrebrait
Copy link
Contributor Author

I have no idea why, but the warnings when compiling with javac don't appear when I try to compile my maven projects.

@andrebrait andrebrait marked this pull request as ready for review July 13, 2020 17:43
@andrebrait
Copy link
Contributor Author

I updated the original post to reflect the changes made today. Opening this for review now.

@andrebrait
Copy link
Contributor Author

@Rawi01 let me know if there's anything else that needs to be changed before this is considered mergeable ;)

@Rawi01
Copy link
Collaborator

Rawi01 commented Jul 16, 2020

@andrebrait I like it but I am not a project maintainer. Usually they are online on thursday evenings and take care of open issues and PR, you might get some feedback tonight.

@andrebrait
Copy link
Contributor Author

@Rawi01 awesome. Thanks for the input!

@Rawi01
Copy link
Collaborator

Rawi01 commented Jul 21, 2020

@andrebrait Just noticed that you should also remove the CacheStrategy import statement for javac.
https://github.com/rzwitserloot/lombok/blob/ce92389da76b17b2c8d9f8757d67b4ea3edc8cb5/src/core/lombok/javac/handlers/HandleGetter.java#L137

@andrebrait
Copy link
Contributor Author

@Rawi01 Done in the latest commit. Thanks!

@rspilker
Copy link
Collaborator

First of all, it looks pretty good.

I do have some remarks, questions, suggestions:

  • I like that we use 0 for the 'not calculated' case, because the field is transient, so we get the reset of the field for free.
  • We don't need to set the field to 0 if we generate it; it already is the default value for int fields.
  • I see a lot of setGeneratedBy calls. Possibly, using new SetGeneratedByVisitor(node) can be used on a higher level element once.
  • Possibly, if the hashcode would be 0, it makes sense to replace it by Integer.MIN_VALUE (0x80000000) or 0x40000000. These values are chosen because they are non-zero, so the hashcode will be cached, but it also is a high power of two, effectively meaning that their modulo 2^n is zero. This means that in most hash data-structures they will end up in the same bucket as they would have had we returned zero.
  • I do understand the check to see if the type is final. But I know some people don't make classes final to be able to mock them. They use package-private constructors, effectively making sure nobody can subclass them outside the package. Also, using package-private constructors can allow for a controlled universe of subclasses. In those scenario's, being able to use the cached hashcode in a non-final class makes sense.

I'm not saying we need to change all remarks above, but I would like to discuss them.

@andrebrait
Copy link
Contributor Author

Awesome :-) let me address a few of these for now.

  • We don't need to set the field to 0 if we generate it; it already is the default value for int fields.

I added it to make it explicit, mostly.

  • I see a lot of setGeneratedBy calls. Possibly, using new SetGeneratedByVisitor(node) can be used on a higher level element once.

I have barely any idea what these actually do, so whatever you think I should do ;-)

  • Possibly, if the hashcode would be 0, it makes sense to replace it by Integer.MIN_VALUE (0x80000000) or 0x40000000. These values are chosen because they are non-zero, so the hashcode will be cached, but it also is a high power of two, effectively meaning that their modulo 2^n is zero. This means that in most hash data-structures they will end up in the same bucket as they would have had we returned zero.

Nice idea. I assume the chance is basically the same for hashCode resulting in zero as it is for it resulting in Integer.MIN_VALUE, except that 0 is the hashCode for null and so the chance of it just being zero is slightly higher, so leading to a potentially slightly more inefficient cache.

  • I do understand the check to see if the type is final. But I know some people don't make classes final to be able to mock them. They use package-private constructors, effectively making sure nobody can subclass them outside the package. Also, using package-private constructors can allow for a controlled universe of subclasses. In those scenario's, being able to use the cached hashcode in a non-final class makes sense.

It's either that or a (maybe configurable?) big warning, I think. They may do that, of course, but I think a warning might be in order here.

@randakar
Copy link

randakar commented Jul 24, 2020 via email

Allow caching hash code for non-final classes (but will warn)
Use Integer.MIN_VALUE to differentiate uncached and 0 cached
@andrebrait
Copy link
Contributor Author

@rspilker @Rawi01 I performed some of the requested changes in the latest commit. I haven't addressed the number of setGeneratedBy mainly because I honestly don't understand it, which ones I should replace and where the replacement would go.

I believe the way I generated the code should be clean enough. Tests have been updated and they're all passing, as expected.

When we have some idea of if/when this can be part of Lombok, I'll write the documentation for the web page.

@andrebrait
Copy link
Contributor Author

@rzwitserloot Tagging you since I see you've been active lately and having 20+ PRs can be overwhelming. Looking for feedback here/seeing this merged if it's good (documentation pending, but I want to know if the code itself is good before committing to writing documentation, version in which this would possibly be included, etc.)

@Rawi01
Copy link
Collaborator

Rawi01 commented Aug 21, 2020

@andrebrait we need to flag every generated code as generated by lombok to exclude it from refactorings, formating and highlighting, otherwise eclipse complains a lot. We also have to modify its source code position to match the position of the generating annotation. You can see that if you select a generated method in the outline view.

The setGeneratedBy method does both of these things. This obviously is annoying and error prone, thats why there also is the recursiveSetGeneratedBy method that does that for a generated node and all its children. That said it should be sufficient if you call the recursive method only and ignore everything else.

At some point it might be useful to refactor the whole source code to remove all the unnecessary calls but thats a lot of work and it might not work in some cases as there are some exceptions and missing methods in the SetGeneratedByVisitor (thats the part used in the recusive method). It would also be useful if there would be some kind of test case that can verifies the source code positions even if it eclipse usually complains if there is something wrong.

@rspilker
Copy link
Collaborator

rspilker commented Aug 27, 2020

When thinking about the warning for non-final classes, I think that that's a mistake.

In general, subclasses are not a problem If they contain their own state, they will call super.hashCode(). That's totally fine, even if that one is cached.

So we should not warn for non-final classes.

If anything, we should warn if the class itself contains modifiable state. But unfortunately, that's very hard to detect. Some cases are trivial. If you have a public setters that does not throw an exception, the class is mutable.

But since we don't have access to resolution, even fields that are final could contain references to mutable objects.

The only time we can reliably generate a warning is if there are public setters for the fields in the hashCode.

But at this time, I would be totally happy if we would not generate any warnings at all.

@andrebrait
Copy link
Contributor Author

@rspilker Done removing the warning for final classes.

@rspilker rspilker merged commit ee6c0ca into projectlombok:master Sep 24, 2020
@rspilker
Copy link
Collaborator

Unfortunately I missed a bug in the code, where a 0 hashcode would return 0 the first time, and Integer.MIN_VALUE all other invocations.

@andrebrait
Copy link
Contributor Author

Ok, that sucks.

Would you like me to fix it or will you do it yourself? @rspilker

@andrebrait
Copy link
Contributor Author

I also missed the chance to change the commit message to properly close #784 upon merge. I was waiting for it to be ready to merge to squash and clean it up.

Could you close the issue once the bug has been solved?

@andrebrait
Copy link
Contributor Author

I found it as well. If you don't fix it today, expect a PR by tomorrow with the fix.

@rspilker
Copy link
Collaborator

I already fixed it, and also fix a position bug in Eclipse

@andrebrait
Copy link
Contributor Author

I believe this is missing from the changelog for 1.18.16, @rzwitserloot

@rzwitserloot
Copy link
Collaborator

@andrebrait looks like it, yeah. whoopsie. Addressed: #2671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants