Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve race condition problem #106

Merged
merged 2 commits into from Oct 22, 2020
Merged

Conversation

rolandhe
Copy link
Contributor

I found a race condition problem which cause to throw NoSuchPropertyException when there are to many threads since version 3.1.2. Although it was resolved by trying to call method of "getReadMethod", it will loss some performance, I think it is the best way to resolve race condition.

Comment on lines 2777 to 2778
// if (cacheGetMethod.containsKey(targetClass, propertyName))
// return null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these line instead of commenting them out. Also remove the comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +2881 to +2882
// if (cacheSetMethod.containsKey(targetClass, propertyName))
// return null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these line instead of commenting them out. Also remove the comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines -3955 to 3960
if (method == NULL_REPLACEMENT)
return null;
return method;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the logic, you will return NULL_REPLACEMENT instead of null

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this? As far I understand this will return NULL_REPLACEMENT which was set in put

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry,I can't catch what you said.
Yes, It will return NULL_REPLACEMENT if you put.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far I understand, the previous code will return a raw null if method from cache equals to NULL_REPLACEMENT (which is initialised to ClassPropertyMethodCache#get). You've changed the code to return method directly without checking if this is NULL_REPLACEMENT, so this means you will return NULL_REPLACEMENT instead of null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now checked in the callers (getGetMethod and getSetMethod).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @harawata for clarification :)

Comment on lines 3977 to 3984
// boolean containsKey(Class clazz, String propertyName)
// {
// ConcurrentHashMap<String,Method> methodsByPropertyName = this.cache.get(clazz);
// if (methodsByPropertyName == null)
// return false;
//
// return methodsByPropertyName.containsKey(propertyName);
// }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the code instead of commenting it out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@lukaszlenart
Copy link
Collaborator

@harawata @JCgH4164838Gh792C124B5 @yasserzamani maybe you would like to also check this change.

@harawata
Copy link
Contributor

Thank you, @rolandhe !
And thank you for the heads-up, @lukaszlenart !

I just took a quick look and didn't find any logical issue. 👍

@@ -3963,19 +3966,22 @@ void put(Class clazz, String propertyName, Method method)
if (methodsByPropertyName == null)
{
methodsByPropertyName = new ConcurrentHashMap<String, Method>();
this.cache.put(clazz, methodsByPropertyName);
ConcurrentHashMap<String,Method> old = this.cache.putIfAbsent(clazz, methodsByPropertyName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think above if already has checked that it's absent. Maybe you would replace the whole if statement with?

methodsByPropertyName = this.cache.putIfAbsent(clazz, new ConcurrentHashMap<String, Method>());

Copy link
Contributor Author

@rolandhe rolandhe Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`

/**
* {@inheritdoc}
*
* @return the previous value associated with the specified key,
* or {@code null} if there was no mapping for the key
* @throws NullPointerException if the specified key or value is null
*/
public V putIfAbsent(K key, V value)

`

"if" check and then put, this is classic race condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I meant isn't following piece redundant:

            ConcurrentHashMap<String,Method> methodsByPropertyName = this.cache.get(clazz);
            if (methodsByPropertyName == null)
            {

because you've already used putIfAbsent inside the if.

Copy link
Contributor Author

@rolandhe rolandhe Oct 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using read-compare-write style, which would reduce “new” operation, and very few method call new concurrent map object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 thanks!

@JCgH4164838Gh792C124B5
Copy link
Contributor

Hello @rolandhe . Thank you for reporting the race condition scenario (and thanks to @lukaszlenart for advising about the PR conversation). 👍

I read the proposed code changes in the PR and subsequent comments, but remain unsure if the proposed changes will work as intended. Can you tell us approximately how many concurrent threads were involved (10's, 100's, 1000's, etc.) before the symptom started being evident ?

Also, could you provide a small example (inline in a comment here, or maybe a link to small reproducer project) that reproduces the condition (causing NoSuchPropertyException to be thrown with some consistency) ?

Since changes to the getGetMethod() and getSetMethod() could have some subtle side-effects to the behaviour and performance of OGNL (and the applications that use it), some extra caution may be needed when evaluating the proposed changes. A reproducer, that can be used to cause the condition, would help evaluate proposed changes.

There may be a trade-off involved in changes to the "two-level-cache logic" that is currently in place for OGNL. Structural changes to avoid the NoSuchPropertyException condition could change the performance characteristics of OGNL. For that reason, the probability/frequency of the issue in OGNL 3.1.x should probably be considered against performance impacts.

If the race condition can be eliminated (or reduced) without any substantial reduction in performance, or other undesirable side-effects, that would be ideal. If not, then it may be a "judgment call" as to what is "better" (and maybe even something requiring a flag/choice).

Thanks again, @rolandhe. Any additional details you can provide will help out. 😄

@rolandhe
Copy link
Contributor Author

rolandhe commented Oct 9, 2020

Hello @rolandhe . Thank you for reporting the race condition scenario (and thanks to @lukaszlenart for advising about the PR conversation). 👍

I read the proposed code changes in the PR and subsequent comments, but remain unsure if the proposed changes will work as intended. Can you tell us approximately how many concurrent threads were involved (10's, 100's, 1000's, etc.) before the symptom started being evident ?

Also, could you provide a small example (inline in a comment here, or maybe a link to small reproducer project) that reproduces the condition (causing NoSuchPropertyException to be thrown with some consistency) ?

Since changes to the getGetMethod() and getSetMethod() could have some subtle side-effects to the behaviour and performance of OGNL (and the applications that use it), some extra caution may be needed when evaluating the proposed changes. A reproducer, that can be used to cause the condition, would help evaluate proposed changes.

There may be a trade-off involved in changes to the "two-level-cache logic" that is currently in place for OGNL. Structural changes to avoid the NoSuchPropertyException condition could change the performance characteristics of OGNL. For that reason, the probability/frequency of the issue in OGNL 3.1.x should probably be considered against performance impacts.

If the race condition can be eliminated (or reduced) without any substantial reduction in performance, or other undesirable side-effects, that would be ideal. If not, then it may be a "judgment call" as to what is "better" (and maybe even something requiring a flag/choice).

Thanks again, @rolandhe. Any additional details you can provide will help out. 😄

I committed example that reproduce condition exception, it is ognl.race.RaceTestCase#testOgnlRace,,before you run this code you should remove lines 2147 and 2148 of OgnlRuntime.java. Maybe you would run many times and then produce the "NoSuchPropertyException".

@JCgH4164838Gh792C124B5
Copy link
Contributor

Hello @rolandhe (and everyone).

Thanks for providing the reproducer test code. 😄

Unfortunately, using the reproducer code I was unable to reproduce any failures or exceptions with the existing OGNL 3.2.16-SNAPSHOT code (many executions, with variations from a few threads to 1000s). Adding an additional test that called the original test multiple times with varied numbers of threads and iterations, along with cache clears, did not generate any exceptions either.

I also tried the test code with the OgnlRuntime from this PR (both "as-is" and with lines 2147-2148 commented out), using the same sorts of tests against the original code. There were no failures or exceptions in that case either.

It seems that in practical terms, the probability of the race condition in the original code appears to be very slight, otherwise we would expect to be able to reproduce it with some frequency (at least with 100's or 1000's of threads and many repeated runs).

I am not sure if the reproducer test, given it cannot (consistently) demonstrate the presence or absence of the race condition failure, should be included in the final revision of this PR ? Others might have different thoughts on that.

The good news is that after running multiple builds, with both the old code and the new code from this PR, there were no indications of any observable performance differences between the two. 😄

With no apparent performance impact seen, the proposed OgnlRuntime logic change seems OK to me. Good work! 👍 👏

@harawata
Copy link
Contributor

Just FYI, here is the repro I used when investigating the NoSuchPropertyException before (1) (2) .

import ognl.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.stream.*;
import java.lang.reflect.*;

public class GetGetMethodRaceCondition {
  public static void main(String[] args) throws Exception {
    int run = Integer.valueOf(args[0]);
    List<Future<Object>> futures = new ArrayList<>();
    ExecutorService executor = Executors.newCachedThreadPool();
    try {
      IntStream.range(0, run).forEach(i -> {
        futures.add(executor.submit(() -> {
          return OgnlRuntime.getGetMethod(null, String.class, "bytes");
        }));
      });
      for (int i = 0; i < run; i++) {
        if (futures.get(i).get() == null) {
          System.out.println("!!! ERROR !!!! at round " + i);
        }
      }
    } finally {
      executor.shutdown();
    }
  }
}

To reproduce the issue reliably, I had to use a shell script like the below.

#!/bin/sh

# compile
javac GetGetMethodRaceCondition.java -cp ognl-3.2.15.jar

echo ''
# run rounds
for i in {1..100}
do
  java -cp ognl-3.2.15.jar:javassist-3.27.0-GA.jar:. GetGetMethodRaceCondition 100
done

p.s.
The direct cause of NoSuchPropertyException was in getReadMethod and it was fixed in #104 , so 3.2.15 won't throw NoSuchProprtyException even with this race condition exist, I believe.

@rolandhe
Copy link
Contributor Author

rolandhe commented Oct 13, 2020

Hello @rolandhe (and everyone).

Thanks for providing the reproducer test code. 😄

Unfortunately, using the reproducer code I was unable to reproduce any failures or exceptions with the existing OGNL 3.2.16-SNAPSHOT code (many executions, with variations from a few threads to 1000s). Adding an additional test that called the original test multiple times with varied numbers of threads and iterations, along with cache clears, did not generate any exceptions either.

I also tried the test code with the OgnlRuntime from this PR (both "as-is" and with lines 2147-2148 commented out), using the same sorts of tests against the original code. There were no failures or exceptions in that case either.

It seems that in practical terms, the probability of the race condition in the original code appears to be very slight, otherwise we would expect to be able to reproduce it with some frequency (at least with 100's or 1000's of threads and many repeated runs).

I am not sure if the reproducer test, given it cannot (consistently) demonstrate the presence or absence of the race condition failure, should be included in the final revision of this PR ? Others might have different thoughts on that.

The good news is that after running multiple builds, with both the old code and the new code from this PR, there were no indications of any observable performance differences between the two. 😄

With no apparent performance impact seen, the proposed OgnlRuntime logic change seems OK to me. Good work! 👍 👏

Please copy test code to your MASTER branch code, NOT this PR code, then remove or comment next two lines:

if (m == null) m = getReadMethod((target == null) ? null : target.getClass(), propertyName, null);

then run the test code, maybe run many times, I tested five times, and reproduce.

image

@JCgH4164838Gh792C124B5
Copy link
Contributor

Hi @rolandhe .

Sorry if it was not clear from the earlier comment, but I previously had copied and ran the test code against the master branch of OGNL 3.2. I tested both with the original branch state for getMethodValue(), as well as with the two lines commented out, but did not see any failures in multiple runs.

Today I repeated the same tests again many times manually, and was finally able to reproduce the failure (using the original OGNL 3.2 OgnlRuntime, with the two lines commented out of getMethodValue()). Without commenting out those two lines, I never saw the error being produced.

I also performed the test with the OgnlRuntime code from this PR in its original form, and with the same lines commented out in getMethodValue() for comparison, and did not see any failures in multiple runs.

Using a process like the sample @harawata provided above (thanks @harawata 😄 ), it was easier to produce the error more consistently as well.

Anyway, your PR's changes appears to eliminate the race without any noticeable side-effects. Thanks for your work on this. 👏

Copy link
Collaborator

@lukaszlenart lukaszlenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@lukaszlenart lukaszlenart merged commit 99d53ae into orphan-oss:master Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants