Non-termination of WPI on NJR Benchmarks for Nullness Checker Annotations #6468

nimakarimipour · 2024-02-26T18:12:05Z

Tested on CF versions `3.34.0` and `3.42.0`

I executed WPI on NJR benchmarks to infer annotations for Nullness Checker. And observed that in the majority of benchmarks, WPI consistently performs over a thousand to two thousands iterations without terminating.

I extracted one benchmark out of 288 existing benchmarks and made a repository with all my scripts to run the WPI available here. For your convenience, I stored the result of the first hundred iteration.diff as well in iterations directory for a quick view of the contents in the iteration diffs.

The repository is ready to use, after downloading checker framework jar (details in the repo readme), please run wpi/run_wpi.py which will execute wpi on the benchmarks in dataset.

Please let me know if I made a mistake in the configurations or you need more information to resolve this.

Best,

Nima

The text was updated successfully, but these errors were encountered:

kelloggm · 2024-02-27T22:01:39Z

I still don't know what's causing this, but I'm writing down some notes before I give up on this for the moment in case someone else has some insight here (and so that I can pick this up again later).

The iteration diffs are periodic, but the period (surprisingly) is 5 diffs! That is, diff -q iteration90.diff iteration95.diff has no output (i.e., those two files don't differ). However, there are differences between iteration90.diff and each of iteration91.diff, iteration92.diff, iteration93.diff, and iteration94.diff. I haven't yet figured out what causes this phenomenon, but it is extremely surprising to me: the only periodic behavior I've seen before from WPI before involved periods of exactly two - that is, where there is some kind of flip-flopping behavior. This isn't unique to the examples I gave above - the pattern begins with iteration9.diff and iteration14.diff. Before iteration 9, there is no periodic behavior.

The diffs also vary wildly in size. In the 5 period cycle (e.g., starting with iteration 9), the following hold:

9 is small
10 is huge
11 is small
12 is huge
13 is very small, even smaller than 9 or 11

The oscillation seems to involve inferred monotonic non-null fields and contract annotations, though I'm not sure whether both are needed. My best guess at this point is that either the contract inference code is responsible or there is a bug in the logic for inferring monotonic non-null. My next plan is to investigate these two possibilities with the following experiments, which I don't have time to carry out right now (since the linked repo doesn't make it trivial to use a local copy of the Checker Framework):

to check if the contract inference code is the problem, I'd like to re-run the experiment using a different typechecker that will also induce significant numbers of contracts. The obvious choice is the Index Checker, but the RLC could also work. We should try both.
to check if monotonic non-null support is the issue, I'd disable the support for inferring monotonic non-null by removing the code in NullnessAnnotatedTypeFactory#wpiAdjustForUpdateField and NullnessAnnotatedTypeFactory#wpiAdjustForUpdateNonField. If the loop goes away after we do that, @MonotonicNonNull inference is definitely the problem; this seems like the most likely problem to me.

@nimakarimipour if you have time to carry out one or both of these two experiments tomorrow, that would be great. Hopefully I'll have some time to dig into this more later in the week, but if you want quick results you might have to do some of it yourself :)

erfan-arvan · 2024-03-10T14:08:22Z

Hello everyone,

I wanted to provide an update on our ongoing efforts to diagnose and resolve the issue with WPI not terminating on certain NJR benchmarks as reported by Nima (@nimakarimipour). I have set up a dedicated repository to document our debugging results, which can be found at wpi-njr-debug-results.

Following suggestions from Martin (@kelloggm), I conducted two key experiments to investigate the potential causes of the issue. The first experiment employed the Index Checker to determine if the contract inference code might be at fault. For the second experiment, support for inferring monotonic non-null annotations was disabled to assess its impact on the non-termination issue. In both scenarios, WPI successfully terminated after 9 iterations for the sample benchmark provided by Nima (here), leading us to believe that the inference of @MonotonicNonNull annotations may be closely associated with the observed problem.

Moving forward, I intend to examine the WPI rules and their implementations, particularly those related to @MonotonicNonNull inference, to gain a deeper understanding of the underlying issues. I will continue to share updates as more information becomes available and as we progress toward resolving this problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-termination of WPI on NJR Benchmarks for Nullness Checker Annotations #6468

Non-termination of WPI on NJR Benchmarks for Nullness Checker Annotations #6468

nimakarimipour commented Feb 26, 2024 •

edited

kelloggm commented Feb 27, 2024

erfan-arvan commented Mar 10, 2024

Non-termination of WPI on NJR Benchmarks for Nullness Checker Annotations #6468

Non-termination of WPI on NJR Benchmarks for Nullness Checker Annotations #6468

Comments

nimakarimipour commented Feb 26, 2024 • edited

Tested on CF versions 3.34.0 and 3.42.0

kelloggm commented Feb 27, 2024

erfan-arvan commented Mar 10, 2024

nimakarimipour commented Feb 26, 2024 •

edited

Tested on CF versions `3.34.0` and `3.42.0`