Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make location handling more memory efficient #31

Merged
merged 1 commit into from Jun 2, 2019
Merged

Make location handling more memory efficient #31

merged 1 commit into from Jun 2, 2019

Conversation

oehme
Copy link
Contributor

@oehme oehme commented Apr 10, 2019

The vast majority of getLocation/setLocation calls in Maven
are done with the key being equal to the corresponding field's name.
Instead of using a LinkedHashMap (the worst data structure for memory
and speed), we now remember those in dedicated fields for fast access.

This greatly reduces the size of the Maven model for large builds,
as location tracking is responsible for the majority of memory usage.

For example, in the 2000 submodule build I'm currently profiling, this change
reduces the memory usage of Maven from ~12GB to ~8GB. Still too much, but
more manageable :)

For example, the code for Maven's Dependency class now looks like this:

public InputLocation getLocation( Object key )
    {
        int hash = key.hashCode();
        if ( hash == 0 && key.equals ( "" ) )
        {
            return this.location;
        }
        else if ( hash == 293428218 && key.equals ( "groupId" ) )
        {
            return groupIdLocation;
        }
        else if ( hash == 240640653 && key.equals ( "artifactId" ) )
        {
            return artifactIdLocation;
        }
        else if ( hash == 351608024 && key.equals ( "version" ) )
        {
            return versionLocation;
        }
        else if ( hash == 3575610 && key.equals ( "type" ) )
        {
            return typeLocation;
        }
        else if ( hash == -281470431 && key.equals ( "classifier" ) )
        {
            return classifierLocation;
        }
        else if ( hash == 109264468 && key.equals ( "scope" ) )
        {
            return scopeLocation;
        }
        else if ( hash == 642751220 && key.equals ( "systemPath" ) )
        {
            return systemPathLocation;
        }
        else if ( hash == 745536613 && key.equals ( "exclusions" ) )
        {
            return exclusionsLocation;
        }
        else if ( hash == -79017120 && key.equals ( "optional" ) )
        {
            return optionalLocation;
        }
        else
        {
            return getOtherLocation( key );
        }
    }

@oehme
Copy link
Contributor Author

oehme commented Apr 10, 2019

The build failure seems to be unrelated to my changes:

Exit code: 1 - javadoc: error - The code being documented uses modules but the packages defined in https://docs.oracle.com/javase/7/docs/api/ are in the unnamed module.

That's a bug in javadoc that was fixed only in Java 12.

@oehme
Copy link
Contributor Author

oehme commented Apr 10, 2019

All Maven core integration tests pass against this change.

@rfscholte
Copy link
Member

Intersting, but do we know for sure that the hash calculation of a String is constant no matter the Java version, OS or vendor?

@oehme
Copy link
Contributor Author

oehme commented Apr 10, 2019

Yes, it's part of the Java language spec since Java 1.2, see the documentation of String.hashCode. This is how switch-over-strings is implemented by javac as well.

Even if that was not the case, we would simply fall back to the old behavior of using the map. So this would even be safe on Java 1.1 :)

@hboutemy
Copy link
Member

Looks like such big thousands-of-modules Maven builds are happening these days: thanks for helping on performance improvement.
I didn't see that the location Map was implemented as LinkedHashMap: does anybody know why such an implementation? What makes a simple HashMap not adapted?
I'd prefer to avoid the additional code generated with this smart PR if I can...

@oehme
Copy link
Contributor Author

oehme commented Apr 21, 2019

I did a hashmap first, but it would still perform much worse than this, because you would pay the memory cost of creating a hashmap for every single pom model object. That's more than a gigabyte of hashmaps in my customer's case.

@vlsi
Copy link

vlsi commented Apr 24, 2019

@oehme , just wondering:

  1. Have you considered switch(key.hashCode()) { case 0: ... case 293428218: ...} approach?
  2. Koloboke and Koloboke Compile seem to be a nice drop-in replacement for Map with much better performance and/or memory consumption. Of course this PR doesn't seem to need Koloboke, however it might be still relevant for the code around.

@oehme
Copy link
Contributor Author

oehme commented Apr 24, 2019

@vlsi Using switch over the hashCode wouldn't make the code more readable, because you still need the if-statement for the equals check. However, we could just use switch over the key itself, if Java 7 was allowed (maybe @hboutemy can comment if that's an option)

Using another Map implementation won't help here, as we don't have any problems with lookup performance or large maps. We simply have many little maps. Creating the Maps themselves is already too much.

@vlsi
Copy link

vlsi commented Apr 24, 2019

Using another Map implementation won't help here

I fully agree. Just wonder if there are other maps left.

@vlsi
Copy link

vlsi commented Apr 24, 2019

Using switch over the hashCode wouldn't make the code more readable

Is code readability the top priority here?
I thought memory and performance overheads are more important than readability here.

@oehme
Copy link
Contributor Author

oehme commented Apr 24, 2019

Why else would I use switch over if? It compiles down to the same bytecode, so it makes no difference in performance. That's why I thought you were suggesting it for readability.

@vlsi
Copy link

vlsi commented Apr 24, 2019

Technically speaking, switch compiles to a different bytecode.
See https://stackoverflow.com/a/45023935/1261287

Here you go, sir:

public InputLocation getLocation2( Object key, String cmpKey, InputLocation loc) {
    return key.equals(cmpKey) ? loc : getOtherLocation(key);
}

public InputLocation getLocation2( Object key )
{
    int hash = key.hashCode();
    switch(hash) {
        case 0: return getLocation2(key, "", location);
        case 293428218: return getLocation2(key, "groupId", groupIdLocation);
        case 240640653: return getLocation2(key, "artifactId", artifactIdLocation);
        case 351608024: return getLocation2(key, "version", versionLocation);
        case 3575610: return getLocation2(key, "type", typeLocation);
        case -281470431: return getLocation2(key, "classifier", classifierLocation);
        case 109264468: return getLocation2(key, "scope", scopeLocation);
        case 642751220: return getLocation2(key, "systemPath", systemPathLocation);
        case 745536613: return getLocation2(key, "exclusions", exclusionsLocation);
        case -79017120: return getLocation2(key, "optional", optionalLocation);
    }
    return getOtherLocation( key );
}
public InputLocation getLocation3( Object key )
{
    int hash = key.hashCode();
    switch(hash) {
        case 0:
            return key.equals("") ? location : getOtherLocation(key);
        case 293428218:
            return key.equals("groupId") ? groupIdLocation : getOtherLocation(key);
        case 240640653:
            return key.equals("artifactId") ? artifactIdLocation : getOtherLocation(key);
        case 351608024:
            return key.equals("version") ? versionLocation : getOtherLocation(key);
        case 3575610:
            return key.equals("type") ? typeLocation : getOtherLocation(key);
        case -281470431:
            return key.equals("classifier") ? classifierLocation : getOtherLocation(key);
        case 109264468:
            return key.equals("scope") ? scopeLocation : getOtherLocation(key);
        case 642751220:
            return key.equals("systemPath") ? systemPathLocation : getOtherLocation(key);
        case 745536613:
            return key.equals("exclusions") ? exclusionsLocation : getOtherLocation(key);
        case -79017120:
            return key.equals("optional") ? optionalLocation : getOtherLocation(key);
    }
    return getOtherLocation( key );
}
public InputLocation getLocation4( Object key )
{
    int hash = key.hashCode();
    switch(hash) {
        case 0:
            if (key.equals("")) { return location; } else { break; }
        case 293428218:
            if (key.equals("groupId")) { return groupIdLocation; } else { break; }
        case 240640653:
            if (key.equals("artifactId")) { return artifactIdLocation; } else { break; }
        case 351608024:
            if (key.equals("version")) { return versionLocation; } else { break; }
        case 3575610:
            if (key.equals("type")) { return typeLocation; } else { break; }
        case -281470431:
            if (key.equals("classifier")) { return classifierLocation; } else { break; }
        case 109264468:
            if (key.equals("scope")) { return scopeLocation; } else { break; }
        case 642751220:
            if (key.equals("systemPath")) { return systemPathLocation; } else { break; }
        case 745536613:
            if (key.equals("exclusions")) { return exclusionsLocation; } else { break; }
        case -79017120:
            if (key.equals("optional")) { return optionalLocation; } else { break; }
    }
    return getOtherLocation( key );
}

OpenJDK 11 uses something like the following for all the switch variants:

  public com.Test$InputLocation getLocation2(java.lang.Object);
    descriptor: (Ljava/lang/Object;)Lcom/Test$InputLocation;
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=4, locals=3, args_size=2
         0: aload_1
         1: invokevirtual #2                  // Method java/lang/Object.hashCode:()I
         4: istore_2
         5: iload_2
         6: lookupswitch  { // 10
              -281470431: 156
               -79017120: 204
                       0: 96
                 3575610: 144
               109264468: 168
               240640653: 120
               293428218: 108
               351608024: 132
               642751220: 180
               745536613: 192
                 default: 216
            }

getLocation: 208: areturn
getLocation2: 221: areturn
getLocation3: 321: areturn
getLocation4: 241: areturn

Which means getLocation2 and getLocation4 have comparable bytecode size (if compared with your if-else-if getLocation).
And switch might be better optimized by JIT compiler (which is basically a question to test)

@olamy
Copy link
Member

olamy commented Apr 24, 2019

Maven Core now use Java7 so you can definitely use Java7 (well we are in 2019 now :) )

@oehme
Copy link
Contributor Author

oehme commented Apr 25, 2019

In that case I'll just use the Java 7 switch-over-String

The vast majority of getLocation/setLocation calls in Maven
are done with the key being equal to the corresponding field's name.
Instead of using a LinkedHashMap (the worst data structure for memory
and speed), we now remember those in dedicated fields for fast access.

This greatly reduces the size of the Maven model for large builds,
as location tracking is responsible for the majority of memory usage.
@oehme
Copy link
Contributor Author

oehme commented Apr 29, 2019

I've changed this to use switch (key) instead.

@oehme
Copy link
Contributor Author

oehme commented Jun 1, 2019

Is there anything I can do to move this one along?

Copy link
Member

@olamy olamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
maybe @hboutemy or @rfscholte?

@rfscholte
Copy link
Member

Can we assume that we have enough tests that cover this piece of code already? If so, it looks good to me too.

@oehme
Copy link
Contributor Author

oehme commented Jun 2, 2019

I ran it through the Maven core integration tests and those passed, if that helps :)

@olamy
Copy link
Member

olamy commented Jun 2, 2019

merged it and deployed SNAPSHOT
@oehme can you make a pr for maven-core so we can test this on ASF CI?
Thanks!

@olamy olamy merged commit 26e875c into codehaus-plexus:master Jun 2, 2019
@oehme
Copy link
Contributor Author

oehme commented Jun 3, 2019

PR is here: apache/maven#252

@hboutemy hboutemy added this to the 1.11. milestone Jul 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants