New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make location handling more memory efficient #31
Conversation
The build failure seems to be unrelated to my changes:
That's a bug in javadoc that was fixed only in Java 12. |
All Maven core integration tests pass against this change. |
Intersting, but do we know for sure that the hash calculation of a String is constant no matter the Java version, OS or vendor? |
Yes, it's part of the Java language spec since Java 1.2, see the documentation of String.hashCode. This is how switch-over-strings is implemented by javac as well. Even if that was not the case, we would simply fall back to the old behavior of using the map. So this would even be safe on Java 1.1 :) |
Looks like such big thousands-of-modules Maven builds are happening these days: thanks for helping on performance improvement. |
I did a hashmap first, but it would still perform much worse than this, because you would pay the memory cost of creating a hashmap for every single pom model object. That's more than a gigabyte of hashmaps in my customer's case. |
@oehme , just wondering:
|
@vlsi Using switch over the hashCode wouldn't make the code more readable, because you still need the if-statement for the equals check. However, we could just use switch over the key itself, if Java 7 was allowed (maybe @hboutemy can comment if that's an option) Using another Map implementation won't help here, as we don't have any problems with lookup performance or large maps. We simply have many little maps. Creating the Maps themselves is already too much. |
I fully agree. Just wonder if there are other maps left. |
Is code readability the top priority here? |
Why else would I use switch over if? It compiles down to the same bytecode, so it makes no difference in performance. That's why I thought you were suggesting it for readability. |
Technically speaking, switch compiles to a different bytecode. Here you go, sir: public InputLocation getLocation2( Object key, String cmpKey, InputLocation loc) {
return key.equals(cmpKey) ? loc : getOtherLocation(key);
}
public InputLocation getLocation2( Object key )
{
int hash = key.hashCode();
switch(hash) {
case 0: return getLocation2(key, "", location);
case 293428218: return getLocation2(key, "groupId", groupIdLocation);
case 240640653: return getLocation2(key, "artifactId", artifactIdLocation);
case 351608024: return getLocation2(key, "version", versionLocation);
case 3575610: return getLocation2(key, "type", typeLocation);
case -281470431: return getLocation2(key, "classifier", classifierLocation);
case 109264468: return getLocation2(key, "scope", scopeLocation);
case 642751220: return getLocation2(key, "systemPath", systemPathLocation);
case 745536613: return getLocation2(key, "exclusions", exclusionsLocation);
case -79017120: return getLocation2(key, "optional", optionalLocation);
}
return getOtherLocation( key );
} public InputLocation getLocation3( Object key )
{
int hash = key.hashCode();
switch(hash) {
case 0:
return key.equals("") ? location : getOtherLocation(key);
case 293428218:
return key.equals("groupId") ? groupIdLocation : getOtherLocation(key);
case 240640653:
return key.equals("artifactId") ? artifactIdLocation : getOtherLocation(key);
case 351608024:
return key.equals("version") ? versionLocation : getOtherLocation(key);
case 3575610:
return key.equals("type") ? typeLocation : getOtherLocation(key);
case -281470431:
return key.equals("classifier") ? classifierLocation : getOtherLocation(key);
case 109264468:
return key.equals("scope") ? scopeLocation : getOtherLocation(key);
case 642751220:
return key.equals("systemPath") ? systemPathLocation : getOtherLocation(key);
case 745536613:
return key.equals("exclusions") ? exclusionsLocation : getOtherLocation(key);
case -79017120:
return key.equals("optional") ? optionalLocation : getOtherLocation(key);
}
return getOtherLocation( key );
} public InputLocation getLocation4( Object key )
{
int hash = key.hashCode();
switch(hash) {
case 0:
if (key.equals("")) { return location; } else { break; }
case 293428218:
if (key.equals("groupId")) { return groupIdLocation; } else { break; }
case 240640653:
if (key.equals("artifactId")) { return artifactIdLocation; } else { break; }
case 351608024:
if (key.equals("version")) { return versionLocation; } else { break; }
case 3575610:
if (key.equals("type")) { return typeLocation; } else { break; }
case -281470431:
if (key.equals("classifier")) { return classifierLocation; } else { break; }
case 109264468:
if (key.equals("scope")) { return scopeLocation; } else { break; }
case 642751220:
if (key.equals("systemPath")) { return systemPathLocation; } else { break; }
case 745536613:
if (key.equals("exclusions")) { return exclusionsLocation; } else { break; }
case -79017120:
if (key.equals("optional")) { return optionalLocation; } else { break; }
}
return getOtherLocation( key );
} OpenJDK 11 uses something like the following for all the
Which means getLocation2 and getLocation4 have comparable bytecode size (if compared with your |
Maven Core now use Java7 so you can definitely use Java7 (well we are in 2019 now :) ) |
In that case I'll just use the Java 7 switch-over-String |
The vast majority of getLocation/setLocation calls in Maven are done with the key being equal to the corresponding field's name. Instead of using a LinkedHashMap (the worst data structure for memory and speed), we now remember those in dedicated fields for fast access. This greatly reduces the size of the Maven model for large builds, as location tracking is responsible for the majority of memory usage.
I've changed this to use |
Is there anything I can do to move this one along? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
maybe @hboutemy or @rfscholte?
Can we assume that we have enough tests that cover this piece of code already? If so, it looks good to me too. |
I ran it through the Maven core integration tests and those passed, if that helps :) |
merged it and deployed SNAPSHOT |
PR is here: apache/maven#252 |
The vast majority of getLocation/setLocation calls in Maven
are done with the key being equal to the corresponding field's name.
Instead of using a LinkedHashMap (the worst data structure for memory
and speed), we now remember those in dedicated fields for fast access.
This greatly reduces the size of the Maven model for large builds,
as location tracking is responsible for the majority of memory usage.
For example, in the 2000 submodule build I'm currently profiling, this change
reduces the memory usage of Maven from ~12GB to ~8GB. Still too much, but
more manageable :)
For example, the code for Maven's
Dependency
class now looks like this: