Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17274: allow JSON atomic updates to use multiple modifiers or a modifier like 'set' as a field name if child docs not enabled #2451

Merged
merged 9 commits into from May 11, 2024

Conversation

eukaryote
Copy link
Contributor

@eukaryote eukaryote commented May 9, 2024

Fixes problems with the JsonLoader's parsing of atomic update values that use multiple modifiers in a single update or reference a field name that matches a modifier (like set or add).

https://issues.apache.org/jira/browse/SOLR-17274

Description

Fixes two problems with JSON atomic updates where an atomic update value like {"set": {"set": "foo"}} (set a field named set to value foo) or {"foo": {"add": "bar", "remove": "baz"}} (add bar value to and remove baz values from field foo) would throw an exception due to isChildDoc thinking a nested doc was being updated. The exact exception message was:

Unable to index docs with children: the schema must include definitions for both a uniqueKey field and the '_root_' field, using the exact same fieldType

Solution

I modified the isChildDoc method of JsonLoader to check if the schema is usable for child docs, and to return false if not. I also had to revert a part of the logic in parseObjectFieldValue so that it would return all the modifier values when there are multiple. The isChildDoc behavior is unchanged if the schema does support child docs, so the same error with continue to be thrown in that case.

Tests

There weren't any atomic update tests that used the JSON loader previously, so I added a new schema file with the minimal fields needed to test this functionality and some tests for existing functionality via Json and for what is changed by this PR.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

The only tests that failed were the S3 tests, which I believe is common and not necessarily a problem if my changes shouldn't have affected the s3 functionality.

 - this modifies `isChildDoc` to always return false if child docs are
   not possible according to the schema, and restores part of the
   logic of `parseObjectFieldValue` so that it can handle
   multiple modifiers
 - re-enables the ability to use multiple modifiers in a single
   update, like `{"add": "foo", "remove": "bar"}`
 - re-enables the ability to have a field with a name like
   `set` or `remove` and be able to use updates like
   `{"set": {"set": "foo"}`
 - both of these are possible now but only if the schema
   doesn't support child/nested docs; if the schema does
   support them, then functionality is unchanged and either
   will continue throwing an exception
@eukaryote eukaryote changed the title SOLR-17274: allow JSON atomic updates to use multiple modifiers or a modifier like 'set' as a field name in atomic updates if child docs not enabled SOLR-17274: allow JSON atomic updates to use multiple modifiers or a modifier like 'set' as a field name if child docs not enabled May 9, 2024
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test

doc.setField("id", "1");
doc.setField(
"set",
new HashMap<String, String>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map.of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map.of doesn't seem to allow null values, so I can't do Map.of("set", null) here. What do you suggest instead of HashMap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah well; its okay. I hate the syntax hack with the HashMap subclass but I see its appeal in being able to use it inline like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it looks like one of the checks is failing because of the instance initializer block, so I'll change it to use HashMap but avoid that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@dsmiley
Copy link
Contributor

dsmiley commented May 10, 2024

Thinking of wording for CHANGES.txt:

Improved disambiguation of atomic updates from child docs for JSON based updates (not an issue for other formats).
Multiple atomic updates to a field in one update is supported (again). Schema fields can be named "set", "add", etc. (again). These improvements only apply when the schema doesn't have the root field (no child docs).

@dsmiley
Copy link
Contributor

dsmiley commented May 10, 2024

Alternatively I rather like the title here:

allow JSON atomic updates to use multiple modifiers or a modifier like 'set' as a field name if child docs not enabled

@dsmiley
Copy link
Contributor

dsmiley commented May 10, 2024

I'll merge this tonight unless I hear to the contrary.

@dsmiley dsmiley merged commit 3b30f25 into apache:main May 11, 2024
3 checks passed
dsmiley pushed a commit that referenced this pull request May 14, 2024
…modifier like 'set' as a field name if child docs not enabled (#2451)

* fix two problems with json atomic update [SOLR-17274]

 - this modifies `isChildDoc` to always return false if child docs are
   not possible according to the schema, and restores part of the
   logic of `parseObjectFieldValue` so that it can handle
   multiple modifiers
 - re-enables the ability to use multiple modifiers in a single
   update, like `{"add": "foo", "remove": "bar"}`
 - re-enables the ability to have a field with a name like
   `set` or `remove` and be able to use updates like
   `{"set": {"set": "foo"}`
 - both of these are possible now but only if the schema
   doesn't support child/nested docs; if the schema does
   support them, then functionality is unchanged and either
   will continue throwing an exception

---------

Co-authored-by: Calvin Smith <casmith@lucasfilm.com>
Co-authored-by: David Smiley <dsmiley@apache.org>
(cherry picked from commit 3b30f25)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants