Reduce storage required for indexing - stop writing sp_name, res_type, and sp_updated to hfj_spidx_* tables #5941

volodymyr-korzh · 2024-05-15T17:05:51Z

Migration:

migrated all HFJ_SPIDX tables to allow SP_NAME and RES_TYPE columns to be nullable.
Migration runs with failureAllowed(), as both SP_NAME and RES_TYPE could be included in custom indexes, and SQL Server won't let us change nullability on columns with indexes pointing to them.

Optimization changes:

added IndexStorageOptimizationListener - it is applied only to HFJ_SPIDX tables - nulling SP_NAME, RES_TYPE SP_UPDATED if this feature is enabled.
IndexStorageOptimizationListener uses JpaSearchParamCache to recover SP_NAME, RES_TYPE from hash_identity after loading from DB.
HASH_IDENTITY field was added to BaseResourceIndexedSearchParam entity and removed from all inheritor entities.
Equals and hashCode methods for all ResourceIndexedSearchParam now using HASH_IDENTITY instead of sp_name and res_type. This is required to make ResourceIndexedSearchParam objects with and without optimization to be equal - to not cause unnecessary ResourceIndexedSearchParams updates. (as we are comparing db version of entities with in-memory built Search params)
Updated DaoSearchParamSynchronizer logic to check whether it is needed to update existing search parameters after isIndexStorageOptimized change
Exception is thrown during the startup if isIncludePartitionInSearchHashes and isIndexStorageOptimized are enabled on server. (isIncludePartitionInSearchHashes is not supported if isIndexStorageOptimized is set to true)
InMemoryResourceMatcher now uses hashIdentity to filter SearchParams instead of sp_name. (only if optimization is enabled)
BaseSearchParamPredicateBuilder now uses hashIdentity instead of SP_NAME, RES_TYPE to build a query. This way new optimization could work in pair with Enabled IndexMissingFields setting. (only if optimization is enabled)
Added new tests and documentation.

…_type and sp_updated columns of HFJ_SPIDX tables nullable

github-actions · 2024-05-15T17:06:58Z

This Pull Request has failed the formatting check

Please run mvn spotless:apply or mvn clean install -DskipTests to fix the formatting issues.

You can automate this auto-formatting process to execute on the git pre-push hook, by installing pre-commit and then calling pre-commit install --hook-type pre-push. This will cause formatting to run automatically whenever you push.

codecov · 2024-05-15T18:31:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.51%. Comparing base (497b9f2) to head (0e40bf1).
Report is 87 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #5941      +/-   ##
============================================
+ Coverage     83.39%   83.51%   +0.11%     
- Complexity    26927    27219     +292     
============================================
  Files          1681     1698      +17     
  Lines        103965   105204    +1239     
  Branches      13189    13308     +119     
============================================
+ Hits          86702    87858    +1156     
- Misses        11613    11656      +43     
- Partials       5650     5690      +40

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

...ver-model/src/main/java/ca/uhn/fhir/jpa/model/listener/IndexStorageOptimizationListener.java

… RES_TYPE after update/load

…-required-for-indexing-tables # Conflicts: # hapi-fhir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/migrate/tasks/HapiFhirJpaMigrationTasks.java

…eters if IndexMissingFields and optimizeIndexStorage are both enabled

…arameters

…rect configuration handling

… SP recovery, documentation updates

…-required-for-indexing-tables

… default

…-required-for-indexing-tables

tadgh

Approved pending various comments.

...main/resources/ca/uhn/hapi/fhir/changelog/7_4_0/5937-reduce-storage-for-sp-index-tables.yaml

hapi-fhir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/config/SearchConfig.java

tadgh · 2024-06-05T19:28:40Z

...ir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/migrate/tasks/HapiFhirJpaMigrationTasks.java

@@ -135,6 +135,104 @@ protected void init740() {
 					.toColumn("RES_ID")
 					.references("HFJ_RESOURCE", "RES_ID");
 		}
+
+		// Allow null values in SP_NAME, RES_TYPE columns for all HFJ_SPIDX_* tables. These are marked as failure
+		// allowed, since SQL Server won't let us change nullability on columns with indexes pointing to them.


so.... what happens on SQL server? they just cant use this feature?

I have added this comment related to SQL server to handle rare case when SP_NAME, RES_TYPE columns are included in custom(client) index.   

ca.uhn.fhir.jpa.model.entity.StorageSettings#setIndexMissingFields  * <p> * The following index may need to be added into the indexed tables such as <code>HFJ_SPIDX_TOKEN</code> * to improve the search performance while <code>:missing</code> is enabled. * <code>RES_TYPE, SP_NAME, SP_MISSING</code> * </p> */ public void setIndexMissingFields(IndexEnabledEnum theIndexMissingFields) { Validate.notNull(theIndexMissingFields, "theIndexMissingFields must not be null"); myIndexMissingFieldsEnabled = theIndexMissingFields; }

Here is a javadoc where we added a recommendation to create RES_TYPE, SP_NAME, SP_MISSING index if IndexMissingFields is enabled. If this index was created for SQL server DB, migration won’t change RES_TYPE, SP_NAME nullability. And yes, it this case new optimization feature won’t work.  

Since IndexMissingFields is disabled by default, and won’t be enabled in most cases (as it impact write performance and increases db size), there should be no issues with SQL serve DB.

Could we not just not run the migrations on sql server? instead of running them and having them fail? look into onlyappliestoplatforms()

But RES_TYPE, SP_NAME, SP_MISSING index will not be created in 99% cases, but we will drop support of SQL server DB for this feature. Do we need this ?

tadgh · 2024-06-05T19:29:32Z

.../src/main/java/ca/uhn/fhir/jpa/search/builder/predicate/BaseSearchParamPredicateBuilder.java

+		}
+		conditions.add(BinaryCondition.equalTo(getMissingColumn(), generatePlaceholder(theMissing)));
+
+		ComboCondition condition = ComboCondition.and(conditions.toArray());


I would like michael/ken/james to review this section

...ver-model/src/main/java/ca/uhn/fhir/jpa/model/listener/IndexStorageOptimizationListener.java

tadgh · 2024-06-05T19:33:38Z

...ver-model/src/main/java/ca/uhn/fhir/jpa/model/listener/IndexStorageOptimizationListener.java

+					indexedSearchParamOptional.get().getParameterName());
+		}
+	}
+}


Can you explain what this code is doing exactly? this last method.

This logic is added to recover res_type and sp_name values from search parameter hash_identity.

If new optimization feature is enable, when ResourceIndexedSearchParam entity is loaded from DB res_type and sp_name are null, so we reconstruct these values.

At first, this logic was added to allow usage of ResourceIndexedSearchParam.getParamName() InMemoryResourceMatcher.java. But later InMemoryResourceMatcher was tweaked to use hash_identity instead of ParamName if new feature is enabled.

As of now, recovered res_type and sp_name values are used mostly by tests. It is possible to completely switch all tests to use hash_identity instead. However, I think we can leave this logic, as res_type and sp_name values could be helpful during debugging/troubleshooting and it is relatively cheap operation performance-wise.

Note: res_type and sp_name values are not recovered if StorageSettings#isIndexOnContainedResources,
StorageSettings#isIndexOnContainedResourcesRecursively are enabled

hapi-fhir-jpaserver-model/src/main/java/ca/uhn/fhir/jpa/model/util/SearchParamHash.java

…-required-for-indexing-tables

…-required-for-indexing-tables # Conflicts: # hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamCoordsTest.java # hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamDateTest.java # hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamQuantityNormalizedTest.java # hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamQuantityTest.java # hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamStringTest.java # hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamTokenTest.java # hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamUriTest.java

Reduce storage required for indexing - migration to make sp_name, res…

2b7d735

…_type and sp_updated columns of HFJ_SPIDX tables nullable

volodymyr-korzh self-assigned this May 15, 2024

volodymyr-korzh linked an issue May 15, 2024 that may be closed by this pull request

Reduce storage required for indexing - stop writing sp_name, res_type, and sp_updated to hfj_spidx_* tables #5937

Open

volodymyr-korzh added 3 commits May 15, 2024 14:29

Reduce storage required for indexing - new setting and unit tests

3e8073a

Reduce storage required for indexing - implementation part 1

ff523ab

Reduce storage required for indexing - implementation part 1 fix

658d5d4

jamesagnew reviewed May 17, 2024

View reviewed changes

...ver-model/src/main/java/ca/uhn/fhir/jpa/model/listener/IndexStorageOptimizationListener.java Outdated Show resolved Hide resolved

volodymyr-korzh added 11 commits May 17, 2024 11:23

Reduce storage required for indexing - fixes

c18de2c

Reduce storage required for indexing - added restoring of SP_NAME and…

c411b12

… RES_TYPE after update/load

Merge remote-tracking branch 'origin/master' into 5937-reduce-storage…

f305e1d

…-required-for-indexing-tables # Conflicts: # hapi-fhir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/migrate/tasks/HapiFhirJpaMigrationTasks.java

Reduce storage required for indexing - Fixed search for missing param…

71483dd

…eters if IndexMissingFields and optimizeIndexStorage are both enabled

Reduce storage required for indexing - Fixed search refchain search p…

4ae9c13

…arameters

Reduce storage required for indexing - Migration updated, added incor…

886be87

…rect configuration handling

Reduce storage required for indexing - Tests update, updated logic of…

32c5555

… SP recovery, documentation updates

Merge remote-tracking branch 'origin/master' into 5937-reduce-storage…

18ed7fd

…-required-for-indexing-tables

Reduce storage required for indexing - IndexStorageOptimized false by…

8b836e1

… default

Reduce storage required for indexing - fixes

551ad25

Reduce storage required for indexing - docs update and changelog

fa81205

volodymyr-korzh marked this pull request as ready for review May 30, 2024 17:41

volodymyr-korzh requested a review from a team as a code owner May 30, 2024 17:41

volodymyr-korzh added 2 commits May 30, 2024 18:42

Reduce storage required for indexing - docs update

0b393fa

Merge remote-tracking branch 'origin/master' into 5937-reduce-storage…

5932478

…-required-for-indexing-tables

tadgh approved these changes Jun 5, 2024

View reviewed changes

volodymyr-korzh added 5 commits June 5, 2024 14:20

Reduce storage required for indexing - minor fixes

ee5e9c7

Merge remote-tracking branch 'origin/master' into 5937-reduce-storage…

f9402a3

…-required-for-indexing-tables

Reduce storage required for indexing - assertj migration

5d53fbf

Reduce storage required for indexing - javadoc update

0e40bf1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce storage required for indexing - stop writing sp_name, res_type, and sp_updated to hfj_spidx_* tables #5941

Reduce storage required for indexing - stop writing sp_name, res_type, and sp_updated to hfj_spidx_* tables #5941

volodymyr-korzh commented May 15, 2024 •

edited

github-actions bot commented May 15, 2024 •

edited

codecov bot commented May 15, 2024 •

edited

tadgh left a comment

tadgh Jun 5, 2024

volodymyr-korzh Jun 6, 2024

tadgh Jun 6, 2024

volodymyr-korzh Jun 6, 2024

tadgh Jun 5, 2024

tadgh Jun 5, 2024

volodymyr-korzh Jun 6, 2024

Reduce storage required for indexing - stop writing sp_name, res_type, and sp_updated to hfj_spidx_* tables #5941

Are you sure you want to change the base?

Reduce storage required for indexing - stop writing sp_name, res_type, and sp_updated to hfj_spidx_* tables #5941

Conversation

volodymyr-korzh commented May 15, 2024 • edited

github-actions bot commented May 15, 2024 • edited

codecov bot commented May 15, 2024 • edited

Codecov Report

tadgh left a comment

Choose a reason for hiding this comment

tadgh Jun 5, 2024

Choose a reason for hiding this comment

volodymyr-korzh Jun 6, 2024

Choose a reason for hiding this comment

tadgh Jun 6, 2024

Choose a reason for hiding this comment

volodymyr-korzh Jun 6, 2024

Choose a reason for hiding this comment

tadgh Jun 5, 2024

Choose a reason for hiding this comment

tadgh Jun 5, 2024

Choose a reason for hiding this comment

volodymyr-korzh Jun 6, 2024

Choose a reason for hiding this comment

volodymyr-korzh commented May 15, 2024 •

edited

github-actions bot commented May 15, 2024 •

edited

codecov bot commented May 15, 2024 •

edited