Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNG-6697] New fast model interpolator not using reflection #261

Merged
merged 4 commits into from Jul 24, 2019

Conversation

gnodet
Copy link
Contributor

@gnodet gnodet commented Jun 29, 2019

No description provided.

@gnodet
Copy link
Contributor Author

gnodet commented Jun 30, 2019

Fwiw, one test is broken with this commit, so I need to find a fix before the PR can be merged.

@gnodet
Copy link
Contributor Author

gnodet commented Jul 2, 2019

The test has been fixed by the latest commit.

@Tibor17
Copy link
Contributor

Tibor17 commented Jul 2, 2019

FIELDS_CACHE does not seem to have any usages, so it can be deleted.
It's better to use ConcurrentHashMap instead of HashMap and then possibly use computeIfAbsent instead of whole if-else. Not sure is the access to that Map is concurrent or not but it's safety in the future and the implementation of the Map uses the locks only when necessary. Additionally it gains the performance because the bucket is divided in to parts depending on the number of CPU.

@Tibor17
Copy link
Contributor

Tibor17 commented Jul 2, 2019

We can put a TODO comment for computeIfAbsent because this call is J8 related or wait for Mvn 3.7.0 where I guess it would be compiled with J8.

@gnodet
Copy link
Contributor Author

gnodet commented Jul 2, 2019

Fwiw, the changes to StringSearchModelInterpolator are not required, those are left-overs from a previous optimization attempt, before I actually write the new StringVisitorModelInterpolator. Maybe I should just revert it back to the way it was.

I believe in non-concurrent access, HashMap performs better than ConcurrentHashMap. For the new StringVisitorModelInterpolator, one additional performance improvement could be the usage of a fork/join pool. I'll give it a try to see if the visitor can gain anything running in a fork/join pool.

@gnodet
Copy link
Contributor Author

gnodet commented Jul 2, 2019

I haven't been able to gain anything using a fork/join pool...

@Tibor17
Copy link
Contributor

Tibor17 commented Jul 2, 2019

@gnodet
The HashMap cannot perform better since it is not threadsafe.

@Tibor17
Copy link
Contributor

Tibor17 commented Jul 2, 2019

@gnodet
Are you sure that the method interpolate( String value ) is not used in concurrent access?
I am mostly missing a Javadoc documentation in Maven about these aspects. Then it is hard to analyse whole Maven and even if so then I still would not be sure about threadsafety in Maven classes.

@gnodet
Copy link
Contributor Author

gnodet commented Jul 2, 2019

@gnodet
The HashMap cannot perform better since it is not threadsafe.

There is no concurrent access at all for the InnerInterpolator object because it's created for the duration of the full model interpolation which happens in a single thread and is then discarded. The InnerInterpolator does not outlive the interpolateObject method.

And fwiw, in a non concurrent access, the HashMap is supposed to perform better than ConcurrentHashMap because it does not need to care about locking in any way.

new InterpolateObjectAction( obj, valueSources, postProcessors, this, problems );
InnerInterpolator innerInterpolator = createInterpolator( valueSources, postProcessors, problems );

PrivilegedAction<Object> action;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use declaration and assignment at one line please.

@Tibor17
Copy link
Contributor

Tibor17 commented Jul 2, 2019

If you want to improve the performance this this.fields = fields.toArray( new CacheField[0] ); should used real size and not 0; otherwise it goes to reflection.

btw, the ConcurrentHashMap uses pseudolocks, not the Posix locking, so it is not as bad as you may think.

@Tibor17
Copy link
Contributor

Tibor17 commented Jul 2, 2019

private static void evaluateArray( Object target, InterpolateObjectAction ctx ) has unpleasant Java Reflection. Removing the reflection may speedup Maven too. We can cast target to Object[] and then iterate over the array in ordinal way.
Not sure how much reflection can be optimized this way. I did not make the analysis of whole class.

@gnodet
Copy link
Contributor Author

gnodet commented Jul 3, 2019

@Tibor17 The whole point of this PR is to actually provide a completely new implementation of the ModelInterpolator, so that the StringSearchModelInterpolator would not be used anymore and replace it with the new StringVisitorModelInterpolator which does not use reflection at all.

Fwiw, your assumption about the performance of the toArray seems wrong, see h2database/h2database#311 or https://medium.com/@hanru.yeh/collection-toarray-new-t-0-will-be-faster-than-collection-toarray-new-t-size-5b129fabfa88 for example.
The performance of the ConcurrentHashMap are definitely very good, especially under concurrent access compared to synchronizing on the HashMap, but for a single thread access, HashMap is still a winner.

Also, I haven't tried to blindy optimize things, but I did use a profile to find hot spots that were using a lot of cpu and find ways to optimize it. For the array case (which isn't a problem anymore since we're not using reflection anymore with the new interpolator). I can provide the profiling data, but I don't think there is any array in the model, so the evaluateArray is definitely not a major problem, would it take quite a long time compared to a better implementation.

I think, if there's a need to optimize things more, I think the next point to consider is to optimize the Xpp3Dom because the initialization of those nodes amounts for 10% of the time used in the DefaultMaven.buildGraph method in my usecase. Given the information given by the profiler, I can't see many other good optimization spots without making major changes such as: avoid cloning the model by using immutable structures for plugins, dependencies and xpp3dom, find a way to iterating through the whole hierarchy multiple times when building the reactor projects in DefaultModelBuilder.build, etc...

@Tibor17
Copy link
Contributor

Tibor17 commented Jul 3, 2019

I made an analysis the cache is used in the method stack of protected method, so it is stack confinement threadsafety. So it's ok.

yes, the domain objects in Maven should be best Immutable which avoids unnecessary cloning and avoids problems with memory visibility between Threads, and then Maven should have services with domain objects (not collections). The state of immutable objects would be valid at the time when they were read from the services.
yes, iterating through the whole hierarchy is pending issue. There should be singleton (persistence layer) in memory keeping the domain objects with latest memory snapshot. Then no recursive iteration would happen once the domain object was created.
These two facts point to a bad status of the architecture in Maven.

@gnodet
Copy link
Contributor Author

gnodet commented Jul 3, 2019

I've raised codehaus-plexus/plexus-utils#67 to improve the 2 costly operations on the Xpp3Dom class.

@michael-o
Copy link
Member

Someone needs to push a new Plexus Utils release...

@eolivelli
Copy link
Contributor

If you want to improve the performance this this.fields = fields.toArray( new CacheField[0] ); should used real size and not 0; otherwise it goes to reflection.

btw, the ConcurrentHashMap uses pseudolocks, not the Posix locking, so it is not as bad as you may think.

@Tibor17
this is surprisingly not true, indeed Collection.toArray( new Xxxx[0] ) is faster than Collection.toArray( new Xxxx[collection.size()] ).

Because with new Xxxx[0] the JVM will skip the allocation and the zeroing of the array.
I know it is tricky but it is true.
if the collection contains 1000 elements the new Xxxx(1000) will need to allocate a 1000-elements array and set nulls with new Xxxx[0] the Jvm won't do it.
Internally the collection will allocate the same array, but due to "escape analysis" it is most likely that the new array won't be zeroed.

@rfscholte
Copy link
Contributor

Someone needs to push a new Plexus Utils release...

Plexus Utils 3.2.1 should be available soon

@gnodet
Copy link
Contributor Author

gnodet commented Jul 23, 2019

I've raised https://issues.apache.org/jira/browse/MNG-6718 and #271 to upgrade plexus-utils to 3.2.1, but those are really independant improvements so this one can be merged without waiting for #271.

{
final Map<String, String> cache = new HashMap<>();
final StringSearchInterpolator interpolator = new StringSearchInterpolator();
interpolator.setCacheAnswers( true );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an extra speed up could be achieved if this PR could be merged: codehaus-plexus/plexus-interpolation#17

@olamy olamy merged commit 690841e into apache:master Jul 24, 2019
@Tibor17
Copy link
Contributor

Tibor17 commented Jul 30, 2019

@gnodet
We have observed the following error. Do you have an idea what way the reflective call to the field java.io.File.fs has got to this cache? Why there is fs. I guess it is only private member of java.io.File.

WARNING: An illegal reflective access operation has occurred, WARNING: Illegal reflective access by org.apache.maven.model.interpolation.StringSearchModelInterpolator$InterpolateObjectAction$CacheField (file:/home/jenkins/jenkins-slave/workspace/maven-box_maven_master/test/core-it-suite/target/apache-maven/lib/maven-model-builder-3.6.2-SNAPSHOT.jar) to field java.io.File.fs, WARNING: Please consider reporting this to the maintainers of org.apache.maven.model.interpolation.StringSearchModelInterpolator$InterpolateObjectAction$CacheField, WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations, WARNING: All illegal access operations will be denied in a future release

@Tibor17
Copy link
Contributor

Tibor17 commented Jul 30, 2019

@Tibor17
this is surprisingly not true, indeed Collection.toArray( new Xxxx[0] ) is faster than Collection.toArray( new Xxxx[collection.size()] ).

In Oracle JDK, ArrayList.toArray([]) with new Xxxx[arrayList.size()] should be the same speed as with new Xxxx[0].
Accoring the JavaDoc new Xxxx[arrayList.size()] the returned array would be yours including the Type of array.
Oracle allocates new array without reflection because given size is smaller than the size of the collection and the type is Object[]. This is related to the Oracle and IMO another vendor may get worse performance if he is using Reflection Array.newInstance().
The reflection is also used in Oracle if and only if the internal array of the collection is not Object[].

I guess that new MyClass[0] may get even slower because Generic type is usually not identical to Object.class in the most Java code and JVM has to perform checks of the type on every element, see (T[]) new Object[newLength] which happens in case of new Xxxx[0].

public static <T,U> T[] copyOf(U[] original, int newLength, Class<? extends T[]> newType) {
	T[] copy = ((Object)newType == (Object)Object[].class)
		? (T[]) new Object[newLength]
		: (T[]) Array.newInstance(newType.getComponentType(), newLength);
	System.arraycopy(original, 0, copy, 0,
					 Math.min(original.length, newLength));
	return copy;
}
	
public <T> T[] toArray(T[] a) {
	if (a.length < size)
		// Make a new array of a's runtime type, but my contents:
		return (T[]) Arrays.copyOf(elementData, size, a.getClass());
	System.arraycopy(elementData, 0, a, 0, size);
	if (a.length > size)
		a[size] = null;
	return a;
}

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 4, 2019

@gnodet
@michael-o
@eolivelli
@rfscholte
@olamy
We still have the same problem with Reflection.
According to the stacktrace the error comes from this change.
Pls see my previous request #261 (comment).
Is somebody working on it?
Thx!

Error Message

expected:<[]> but was:<[WARNING: An illegal reflective access operation has occurred, WARNING: Illegal reflective access by org.apache.maven.model.interpolation.StringSearchModelInterpolator$InterpolateObjectAction$CacheField (file:/home/jenkins/jenkins-slave/workspace/ven-box_maven_removed-java-tools/test/core-it-suite/target/apache-maven/lib/maven-model-builder-3.6.2-SNAPSHOT.jar) to field java.io.File.fs, WARNING: Please consider reporting this to the maintainers of org.apache.maven.model.interpolation.StringSearchModelInterpolator$InterpolateObjectAction$CacheField, WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations, WARNING: All illegal access operations will be denied in a future release]>

Stacktrace

junit.framework.AssertionFailedError: expected:<[]> but was:<[WARNING: An illegal reflective access operation has occurred, WARNING: Illegal reflective access by org.apache.maven.model.interpolation.StringSearchModelInterpolator$InterpolateObjectAction$CacheField (file:/home/jenkins/jenkins-slave/workspace/ven-box_maven_removed-java-tools/test/core-it-suite/target/apache-maven/lib/maven-model-builder-3.6.2-SNAPSHOT.jar) to field java.io.File.fs, WARNING: Please consider reporting this to the maintainers of org.apache.maven.model.interpolation.StringSearchModelInterpolator$InterpolateObjectAction$CacheField, WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations, WARNING: All illegal access operations will be denied in a future release]>
at org.apache.maven.it.MavenITmng4387QuietLoggingTest.testit(MavenITmng4387QuietLoggingTest.java:71)

Standard Output

mng4387QuietLogging(it).....................................FAILURE (1.6 s)

@michael-o
Copy link
Member

This likely means that

            CacheField( Field field )
            {
                this.field = field;
                field.setAccessible( true );
            }

has to go?

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 4, 2019

@michael-o
More than adding setAccessible, I would ask the question "Why we are caching and introspecting java.io.File?"
Where's this come from? Who is caller and why doing these things. Why this issue was not before?

@michael-o
Copy link
Member

We need to check the log, I thiink there was also some use of reflection before this change.

BTW, this is a common problem, not just in Maven. Compiling with Maven 3.5.4 a webapp:

Building WAR files...
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.sun.xml.bind.v2.runtime.reflect.opt.Injector (file:/net/home/osipovmi/.m2/repository/org/glassfish/jaxb/jaxb-runtime/2.3.0/jaxb-runtime-2.3.0.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int)
WARNING: Please consider reporting this to the maintainers of com.sun.xml.bind.v2.runtime.reflect.opt.Injector
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

I think we should spawn another issue for this, continue with 3.6.2, but address this in the next release.

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 5, 2019

@michael-o
Pls post a new links and PRs for related issues.
I wanted to say, using given questions, that this caching algorithm should have sanity check for cacheable and eligible objects to cache, simply the pseodo code in my mind:

if ( isValid( obj ) )
{
   doCache( obj );
}

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 5, 2019

Perhaps also the sanity check to introspect nested objects as well.

@michael-o
Copy link
Member

@Tibor17 This isn't my code, @gnodet can you pick this up?

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 5, 2019

@michael-o
I was backtracking the problem, and I have some findings.
I know it is not your code ;-), but you are my ASF partner and @gnodet is dev partner too, so I want to discuss it with you and see your opions.

The field fs of java.io.File was cached in ObjectField -> CacheField 1.
It is called in 2 without ignoring java, javax and jackarta packages.

I think the whole problem is that the fields are eagerly introspected on Java objects in 1 which should be avoided by keeping the CacheField empty. Thus we should change the else to else if ( isQualifiedForInterpolation ) in 3.

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 5, 2019

There are more issues like introspecting static fields.
Improvements: two instance variables can be turned to local variables, etc.

@michael-o
Copy link
Member

But the field isQualifiedForInterpolation applies to the entire class, not just the field of this type. Are you certain about the else if?

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 5, 2019

Of course and that's the point. Because reading the field of Java File does not make sense. It's not our code. What makes sense to me regarding interpolation of java.io.File is to get cannonic path of that file in string form.

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 5, 2019

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 5, 2019

I made the change locally and the unit test StringSearchModelInterpolatorTest passed succesfully.
It's only unit test, so the IT would be more required.

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 5, 2019

@michael-o
We have successfult buid [1] with this fix [2].
Can you have a look?
Should I post a new Jira ticket?
[1]: https://builds.apache.org/job/maven-box/job/maven/job/removed-java-tools-3/2/
[2]: caee9f0

@Tibor17
Copy link
Contributor

Tibor17 commented Aug 7, 2019

@michael-o
Ok, so I will issue new Jira and PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants