Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-termination of WPI on NJR Benchmarks for Nullness Checker Annotations #6468

Open
nimakarimipour opened this issue Feb 26, 2024 · 2 comments

Comments

@nimakarimipour
Copy link
Contributor

nimakarimipour commented Feb 26, 2024

Tested on CF versions 3.34.0 and 3.42.0


I executed WPI on NJR benchmarks to infer annotations for Nullness Checker. And observed that in the majority of benchmarks, WPI consistently performs over a thousand to two thousands iterations without terminating.

I extracted one benchmark out of 288 existing benchmarks and made a repository with all my scripts to run the WPI available here. For your convenience, I stored the result of the first hundred iteration.diff as well in iterations directory for a quick view of the contents in the iteration diffs.

The repository is ready to use, after downloading checker framework jar (details in the repo readme), please run wpi/run_wpi.py which will execute wpi on the benchmarks in dataset.

Please let me know if I made a mistake in the configurations or you need more information to resolve this.

Best,

Nima

@kelloggm
Copy link
Contributor

I still don't know what's causing this, but I'm writing down some notes before I give up on this for the moment in case someone else has some insight here (and so that I can pick this up again later).

The iteration diffs are periodic, but the period (surprisingly) is 5 diffs! That is, diff -q iteration90.diff iteration95.diff has no output (i.e., those two files don't differ). However, there are differences between iteration90.diff and each of iteration91.diff, iteration92.diff, iteration93.diff, and iteration94.diff. I haven't yet figured out what causes this phenomenon, but it is extremely surprising to me: the only periodic behavior I've seen before from WPI before involved periods of exactly two - that is, where there is some kind of flip-flopping behavior. This isn't unique to the examples I gave above - the pattern begins with iteration9.diff and iteration14.diff. Before iteration 9, there is no periodic behavior.

The diffs also vary wildly in size. In the 5 period cycle (e.g., starting with iteration 9), the following hold:

  • 9 is small
  • 10 is huge
  • 11 is small
  • 12 is huge
  • 13 is very small, even smaller than 9 or 11

The oscillation seems to involve inferred monotonic non-null fields and contract annotations, though I'm not sure whether both are needed. My best guess at this point is that either the contract inference code is responsible or there is a bug in the logic for inferring monotonic non-null. My next plan is to investigate these two possibilities with the following experiments, which I don't have time to carry out right now (since the linked repo doesn't make it trivial to use a local copy of the Checker Framework):

  • to check if the contract inference code is the problem, I'd like to re-run the experiment using a different typechecker that will also induce significant numbers of contracts. The obvious choice is the Index Checker, but the RLC could also work. We should try both.
  • to check if monotonic non-null support is the issue, I'd disable the support for inferring monotonic non-null by removing the code in NullnessAnnotatedTypeFactory#wpiAdjustForUpdateField and NullnessAnnotatedTypeFactory#wpiAdjustForUpdateNonField. If the loop goes away after we do that, @MonotonicNonNull inference is definitely the problem; this seems like the most likely problem to me.

@nimakarimipour if you have time to carry out one or both of these two experiments tomorrow, that would be great. Hopefully I'll have some time to dig into this more later in the week, but if you want quick results you might have to do some of it yourself :)

@erfan-arvan
Copy link

Hello everyone,

I wanted to provide an update on our ongoing efforts to diagnose and resolve the issue with WPI not terminating on certain NJR benchmarks as reported by Nima (@nimakarimipour). I have set up a dedicated repository to document our debugging results, which can be found at wpi-njr-debug-results.

Following suggestions from Martin (@kelloggm), I conducted two key experiments to investigate the potential causes of the issue. The first experiment employed the Index Checker to determine if the contract inference code might be at fault. For the second experiment, support for inferring monotonic non-null annotations was disabled to assess its impact on the non-termination issue. In both scenarios, WPI successfully terminated after 9 iterations for the sample benchmark provided by Nima (here), leading us to believe that the inference of @MonotonicNonNull annotations may be closely associated with the observed problem.

Moving forward, I intend to examine the WPI rules and their implementations, particularly those related to @MonotonicNonNull inference, to gain a deeper understanding of the underlying issues. I will continue to share updates as more information becomes available and as we progress toward resolving this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants