Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolves #1691: Lucene negative queries #1694

Merged
merged 3 commits into from Jun 3, 2022

Conversation

MMcM
Copy link
Contributor

@MMcM MMcM commented May 25, 2022

At root a new clause class that produces a negated boolean.
But since Lucene represents that as another Occur, the clause actually implements set subtraction.
The planner tries to maximize the cases where the subtraction is from a positive query rather than just *:*.

Includes #1675 because of conflicts.

@foundationdb-ci
Copy link
Contributor

AWS CodeBuild CI Report for Linux CentOS 7

  • CodeBuild project: fdb-record-layer-pr-proto2
  • Commit ID: 42daece
  • Result: SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

AWS CodeBuild CI Report for Linux CentOS 7

  • CodeBuild project: fdb-record-layer-pr-proto3
  • Commit ID: 42daece
  • Result: SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@MMcM MMcM requested a review from alecgrieser May 25, 2022 04:36
}
if (filterMask != null && filterMask.getUnsatisfiedFilters().isEmpty()) {
filterMask.setSatisfied(true);
}
if (!negatedChildren.isEmpty()) {
return new LuceneNotQuery(childClauses, negatedChildren);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone did something like lucene_clause_a OR NOT lucene_clause_b, could this transformation lead to this being translated into something like (lucene_clause_a) -(lucene_clause_b) (effectively turning the OR to an AND)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so. LuceneNotQuery always has occur = MUST, so it is not possible for negatedChildren to be added to in the SHOULD case. Note how flattening only happens when occur matches.

clauses.addAll(notQuery.getNegatedChildren());
}
return new LuceneBooleanQuery(clauses, BooleanClause.Occur.SHOULD);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a missing case here for if occur is SHOULD instead of MUST or MUST_NOT? I'm a little worried that if we're not careful, this could result in accidentally including (*:*) in more queries than we'd like, but maybe that's a danger with negative queries more broadly

Copy link
Contributor Author

@MMcM MMcM May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A boolean query only has SHOULD or MUST. Those are the two branches of the if. NOT is not represented as one with MUST_NOT, but rather as MUST with a set of negated children.

I think this can be made clearer here by using switch. I did that.

More broadly, I tried a few approaches to reconcile the Boolean algebra trees that queries have with the Boolean queries with occurrence requirements that Lucene has. None of them were entirely satisfactory. One whose behavior was slightly more obvious was where the flattening and the addition of any *:* happened at bind time rather than plan time. In the end, though, I decided it was true to the intent of the API to have all such manipulation happen early and to have bind only be about deferring the concrete representation until the concrete values were known and no other rewriting.

@foundationdb-ci
Copy link
Contributor

AWS CodeBuild CI Report for Linux CentOS 7

  • CodeBuild project: fdb-record-layer-pr-proto2
  • Commit ID: d23fe98
  • Result: SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

AWS CodeBuild CI Report for Linux CentOS 7

  • CodeBuild project: fdb-record-layer-pr-proto2
  • Commit ID: 8133b78
  • Result: SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

AWS CodeBuild CI Report for Linux CentOS 7

  • CodeBuild project: fdb-record-layer-pr-proto3
  • Commit ID: 8133b78
  • Result: SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

alecgrieser
alecgrieser previously approved these changes Jun 3, 2022
@alecgrieser
Copy link
Contributor

I think the stuff in here regarding negative queries looks good. This includes the PR #1675, which might need to be resolved before we merge this one. I'm also a little concerned about merge skew with #1623, so getting another PRB run would probably be good.

MMcM added 3 commits June 2, 2022 17:22
At root a new clause class that produces a negated boolean.
But since Lucene represents that as another Occur, the clause actually implements set subtraction.
The planner tries to maximize the cases where the subtraction is from a positive query rather than just *:*.
@sonarcloud
Copy link

sonarcloud bot commented Jun 3, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@foundationdb-ci
Copy link
Contributor

AWS CodeBuild CI Report for Linux CentOS 7

  • CodeBuild project: fdb-record-layer-pr-proto2
  • Commit ID: 2d8ddd8
  • Result: SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

AWS CodeBuild CI Report for Linux CentOS 7

  • CodeBuild project: fdb-record-layer-pr-proto3
  • Commit ID: 2d8ddd8
  • Result: SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)

@alecgrieser alecgrieser merged commit d86feac into FoundationDB:main Jun 3, 2022
@MMcM MMcM deleted the lucene-not-query branch June 3, 2022 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants