You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a fully immutable class, the hashCode may be called a lot (consider re-hashing, for example.) Since the class is fully immutable (let's say it contains Strings and Long objects, for example), the hashCode could be pre-computed at Construction, then the hashCode can simply return that value.
This would be an "optimization" that would need to be judiciously applied to a subset of @ Value objects. Perhaps an option to the @ AllArgsConstructor would make sense.
Kinda necro-ing this issue, but I have been working on a project in which 100% of data classes are exclusively immutable, and this would come in handy. Maybe this was forgotten and never talked about.
I think the implementation could be something like what the String class does: add a private int _hashCodeCache = 0; variable on the class so instances will have that when initialized, and make hashCode check if that is 0 or not. If not, return the cached value. If 0, calculate hash and store it in the object.
In #52 (comment)@EqualsAndHashCode(cacheHashCode = true) was mentioned as possible way to implement this one. Automatically detecting that none of the fields could be set is quite hard, especially if we think about super classes. It somehow seems reasonable to cache by default if @Value is used but IIRC this annotation does not block non final fields so we might end up with some strange behaviour if someone sets one of the field after the hashCode was cached.
@Rawi01 I kinda started exploring how to do this in the codebase yesterday just for fun (I got the javac part done, not much for eclipse yet).
I just added a private transient int hashCodeCache = 0; field to the class if none existed (and warned about it if it did, also turning cacheHashCode into false) and modified the generated hashCode() method to check for it if cacheHashCode is true.
Perhaps we could just add in the documentation that this is intended for fully immutable objects (i.e.: all-final fields, all fields immutable (something we can't check anyway)) whose hashCode() will always return the same value and thus the user should be careful with it.
I don't know how this would be viewed by current lombok standards. I mean, it's clearly useful and it has a use-case, and we could warn the user if the class is either non-final or it has non-final fields, but there's only so much we can do besides adding documentation and defaulting to false.
We can check the class for non-final fields and for the lack of the final modifier in the class, but I guess the ultimate responsibility would end up falling on the user.
Automatically detecting that none of the fields could be set is quite hard, especially if we think about super classes. It somehow seems reasonable to cache by default if @Value is used but IIRC this annotation does not block non final fields so we might end up with some strange behaviour if someone sets one of the field after the hashCode was cached.
There are too many ways how this could go wrong (e.g., mutable members), but that's the user's responsibility. Lombok must not do the caching unless explicitly requested, so I'm afraid, @Value alone can't imply caching because of compatibility reasons.
We can check the class for non-final fields and for the lack of the final modifier in the class, but I guess the ultimate responsibility would end up falling on the user.
I guess, I wouldn't bother as we can check everything. But there could be a mode adding assert $cachedHashCode == $computeHashCode(). For people developing and `running tests with assertions while running in production without them (which is IMHO the only sane choice), it could help to discover possible problems.
Some details I recall from old threads:
There were two variants discussed: Caching and precomputation, where the latter has some minor advantages:
It works even when the hashCode happens to be zero.
It needs no zero test so it's a tiny bit faster
It doesn't break immutability in any sense.
As caching and precomputation are mutually exclusive, an enum HashCodeMode {NONE, CACHED, PRECOMPUTED} would be nice, however, it's pain to write.
For precomputed hashes, we would then have to check if we have lazily initialized fields that are included in the hashCode computation, I think, because precomputation would mean they won't be lazy anymore.
Although I think the users might have their use cases, computationally it's not too expensive to do an int comparison nor a boolean comparison (specially since the branch predictor is going to catch up with the result quite fast given it will always evaluate to true, except for the first time), and it's more flexible than precomputing anyway (precomputing forces a heavy computation even if the object is not going to be used for hashing in that circumstance).
Also, we can add a boolean to check if hashCode() was computed instead of doing a zero equality check, though that adds yet another variable and the chances of hashCode() computing 0 are quite slim, aren't they? (but we should account for it if possible)
@andrebrait Sure, precomputed is a trade-off for cases where the initialization cost doesn't matter but the access does. I guess, there's no sane way how to make it work together with lazy getters and forbidding this combo would be IMHO good enough.
IIUYC, you're arguing about "precomputed" being not that useful when we have "cached", and I agree.
Also, we can add a boolean to check if hashCode() was computed instead of doing a zero equality check, though that adds yet another variable and the chances of hashCode() computing 0 are quite slim, aren't they? (but we should account for it if possible)
The boolean is rather ugly and can cost up to 8 bytes. I've read about a DOS attack exploiting string.hashCode() == 0, but that's probably just theoretical. I guess, the zero check solution is good enough.
So basically, forget what I wrote, except for the assert thing, which I still consider useful.
@Maaartinus I thought boolean and int were the same size in Java (4 bytes), except when used in arrays. I agree it's not "clean" but it is the solution for a hashCode equal to zero resulting in it not being cached.
Also, I didn't quite get the comment you made about assert.
@andrebrait IIRC on Android, it's like you said; but in HotSpot, it's 1 and 4 bytes, respectively, finally padded up to 4 or 8 bytes (depending on architecture and whatever). So adding a single byte may cost 8 bytes or be free. I'm unsure what solutions is better; I wanted just note the cost.
Concerning assert, that's simple. Everybody wants the caching, but it may go wrong when the fields change and it'd nice to have a possibility to to detect the problem somehow. So given a corresponding entry in lombok.config, you'd generate something like
public int hashCode() {
if ($hashCode == 0) {
$hashCode = computeHashCode();
} else {
assert $hashCode == computeHashCode(); // generated only when configured so
}
return $hashCode;
}
@Maaartinus yeah, @Value is just an indicator that the user might want this but we can not be sure about it. If someone uses @Value for immutable objects he have to add @EqualsAndHashCode(cacheHashCode = true) to all classes but that seems to be something that can be solved with meta annotations 😄
I think that $hashCodeCache == 0 should be sufficient as it is quite unlikely to happen and the worst thing is that there is a small performance impact for a few objects. If not, another alternative is a Integer field + a null check but that might be slower due to the unboxing.
Since this really is a boolean check, why not use boolean?
boolean should not be bigger than int, and it's what is closest to the
intended meaning of the code. Making it an int sounds backwards.
There is one other concern though - race conditions. When you cache a
computed hashcode for an object is it possible to end up with a corrupted
state somehow if two threads do it in parallel?
For truly immutable objects I doubt this is a problem in practice - at
worst, the value gets computed multiple times and we waste a few cycles,
right? Are we sure though that there isn't potential for some unforeseen
behaviour when multiple threads are involved?
Aside from that, this does beg the question. Most cache implementations
have a bunch of tunables that are there to improve performance and reduce
memory usage if an object isn't actively being used. They are tunables
because the load conditions and usage patterns of cached objects matter a
lot for things like cache timeouts. Without them, we risk ending up with an
implementation that burns too many cycles when disabled and burns too much
memory when enabled.
However, doing anything of the sort for this sounds like we'd end up with
both massive complexity and a potential configuration overload. Not
something I'd want on a 'simple' annotation like EqualsAndHashcode. Maybe
lombok properties fit the bill, but still.. the complexity of that would
quickly increase the scope beyond lombok's domain and into that of a more
specialised library that exposes the complexity in the code rather than in
some vague set of configuration properties.
Hrm.
On Thu, Jul 9, 2020 at 4:47 PM Rawi01 ***@***.***> wrote:
@Maaartinus <https://github.com/Maaartinus> yeah, @value is just an
indicator that the user might want this but we can not be sure about it. If
someone uses @value for immutable objects he have to add @EqualsAndHashCode(cacheHashCode
= true) to all classes but that seems to be something that can be solved
with meta annotations 😄
I think that $hashCodeCache == 0 should be sufficient as it is quite
unlikely to happen and the worst thing is that there is a small performance
impact for a few objects. If not, another alternative is a Integer field
+ a null check but that might be slower due to the unboxing.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#784 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABIERPYSDVXLC65L5MJTDTR2XJ6HANCNFSM4OUXTBKA>
.
--
"Don't only practice your art, but force your way into it's secrets, for it
and knowledge can raise men to the divine."
-- Ludwig von Beethoven
Activity
lombokissues commentedon Jul 14, 2015
👤 anthony@whitford.com 🕗 Nov 04, 2014 at 03:29 UTC
For a fully immutable class, the hashCode may be called a lot (consider re-hashing, for example.) Since the class is fully immutable (let's say it contains Strings and Long objects, for example), the hashCode could be pre-computed at Construction, then the hashCode can simply return that value.
This would be an "optimization" that would need to be judiciously applied to a subset of @ Value objects. Perhaps an option to the @ AllArgsConstructor would make sense.
Perhaps a toString result could also be cached.
lombokissues commentedon Jul 14, 2015
👤 anthony@whitford.com 🕗 Nov 10, 2014 at 19:34 UTC
Somebody reminded me that Effective Java, by Joshua Bloch, mentions this optimization. Item 71?
http://stackoverflow.com/questions/18473071/how-caching-hashcode-works-in-java-as-suggested-by-joshua-bloch-in-effective-jav
lombokissues commentedon Jul 14, 2015
End of migration
andrebrait commentedon Jul 8, 2020
Kinda necro-ing this issue, but I have been working on a project in which 100% of data classes are exclusively immutable, and this would come in handy. Maybe this was forgotten and never talked about.
I think the implementation could be something like what the
String
class does: add aprivate int _hashCodeCache = 0;
variable on the class so instances will have that when initialized, and makehashCode
check if that is 0 or not. If not, return the cached value. If 0, calculate hash and store it in the object.Rawi01 commentedon Jul 9, 2020
In #52 (comment)
@EqualsAndHashCode(cacheHashCode = true)
was mentioned as possible way to implement this one. Automatically detecting that none of the fields could be set is quite hard, especially if we think about super classes. It somehow seems reasonable to cache by default if@Value
is used but IIRC this annotation does not block non final fields so we might end up with some strange behaviour if someone sets one of the field after thehashCode
was cached.andrebrait commentedon Jul 9, 2020
@Rawi01 I kinda started exploring how to do this in the codebase yesterday just for fun (I got the
javac
part done, not much for eclipse yet).I just added a
private transient int hashCodeCache = 0;
field to the class if none existed (and warned about it if it did, also turningcacheHashCode
intofalse
) and modified the generatedhashCode()
method to check for it ifcacheHashCode
istrue
.Perhaps we could just add in the documentation that this is intended for fully immutable objects (i.e.: all-final fields, all fields immutable (something we can't check anyway)) whose
hashCode()
will always return the same value and thus the user should be careful with it.I don't know how this would be viewed by current lombok standards. I mean, it's clearly useful and it has a use-case, and we could warn the user if the class is either non-final or it has non-final fields, but there's only so much we can do besides adding documentation and defaulting to
false
.We can check the class for non-final fields and for the lack of the
final
modifier in the class, but I guess the ultimate responsibility would end up falling on the user.andrebrait commentedon Jul 9, 2020
Check out here what I did for
javac
(obviously WIP).Maaartinus commentedon Jul 9, 2020
@Rawi01
There are too many ways how this could go wrong (e.g., mutable members), but that's the user's responsibility. Lombok must not do the caching unless explicitly requested, so I'm afraid,
@Value
alone can't imply caching because of compatibility reasons.@andrebrait
I guess, I wouldn't bother as we can check everything. But there could be a mode adding
assert $cachedHashCode == $computeHashCode()
. For people developing and `running tests with assertions while running in production without them (which is IMHO the only sane choice), it could help to discover possible problems.Some details I recall from old threads:
There were two variants discussed: Caching and precomputation, where the latter has some minor advantages:
As caching and precomputation are mutually exclusive, an
enum HashCodeMode {NONE, CACHED, PRECOMPUTED}
would be nice, however, it's pain to write.andrebrait commentedon Jul 9, 2020
Precomputed will make lazy getters/lazy initialization inviable, though
andrebrait commentedon Jul 9, 2020
@Maaartinus
For precomputed hashes, we would then have to check if we have lazily initialized fields that are included in the
hashCode
computation, I think, because precomputation would mean they won't be lazy anymore.Although I think the users might have their use cases, computationally it's not too expensive to do an int comparison nor a boolean comparison (specially since the branch predictor is going to catch up with the result quite fast given it will always evaluate to
true
, except for the first time), and it's more flexible than precomputing anyway (precomputing forces a heavy computation even if the object is not going to be used for hashing in that circumstance).Also, we can add a boolean to check if
hashCode()
was computed instead of doing a zero equality check, though that adds yet another variable and the chances ofhashCode()
computing 0 are quite slim, aren't they? (but we should account for it if possible)Maaartinus commentedon Jul 9, 2020
@andrebrait Sure, precomputed is a trade-off for cases where the initialization cost doesn't matter but the access does. I guess, there's no sane way how to make it work together with lazy getters and forbidding this combo would be IMHO good enough.
IIUYC, you're arguing about "precomputed" being not that useful when we have "cached", and I agree.
The boolean is rather ugly and can cost up to 8 bytes. I've read about a DOS attack exploiting
string.hashCode() == 0
, but that's probably just theoretical. I guess, the zero check solution is good enough.So basically, forget what I wrote, except for the
assert
thing, which I still consider useful.andrebrait commentedon Jul 9, 2020
@Maaartinus I thought boolean and int were the same size in Java (4 bytes), except when used in arrays. I agree it's not "clean" but it is the solution for a hashCode equal to zero resulting in it not being cached.
Also, I didn't quite get the comment you made about
assert
.Maaartinus commentedon Jul 9, 2020
@andrebrait IIRC on Android, it's like you said; but in HotSpot, it's 1 and 4 bytes, respectively, finally padded up to 4 or 8 bytes (depending on architecture and whatever). So adding a single byte may cost 8 bytes or be free. I'm unsure what solutions is better; I wanted just note the cost.
Concerning assert, that's simple. Everybody wants the caching, but it may go wrong when the fields change and it'd nice to have a possibility to to detect the problem somehow. So given a corresponding entry in
lombok.config
, you'd generate something likeand a wrongly cached hashCode will blow my tests.
Rawi01 commentedon Jul 9, 2020
@Maaartinus yeah,
@Value
is just an indicator that the user might want this but we can not be sure about it. If someone uses@Value
for immutable objects he have to add@EqualsAndHashCode(cacheHashCode = true)
to all classes but that seems to be something that can be solved with meta annotations 😄I think that
$hashCodeCache == 0
should be sufficient as it is quite unlikely to happen and the worst thing is that there is a small performance impact for a few objects. If not, another alternative is aInteger
field + anull
check but that might be slower due to the unboxing.randakar commentedon Jul 9, 2020
15 remaining items