Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce storage required for indexing - stop writing sp_name, res_type, and sp_updated to hfj_spidx_* tables #5941

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

volodymyr-korzh
Copy link
Collaborator

@volodymyr-korzh volodymyr-korzh commented May 15, 2024

Migration:

  • migrated all HFJ_SPIDX tables to allow SP_NAME and RES_TYPE columns to be nullable.
  • Migration runs with failureAllowed(), as both SP_NAME and RES_TYPE could be included in custom indexes, and SQL Server won't let us change nullability on columns with indexes pointing to them.

Optimization changes:

  • added IndexStorageOptimizationListener - it is applied only to HFJ_SPIDX tables - nulling SP_NAME, RES_TYPE SP_UPDATED if this feature is enabled.
  • IndexStorageOptimizationListener uses JpaSearchParamCache to recover SP_NAME, RES_TYPE from hash_identity after loading from DB.
  • HASH_IDENTITY field was added to BaseResourceIndexedSearchParam entity and removed from all inheritor entities.
  • Equals and hashCode methods for all ResourceIndexedSearchParam now using HASH_IDENTITY instead of sp_name and res_type. This is required to make ResourceIndexedSearchParam objects with and without optimization to be equal - to not cause unnecessary ResourceIndexedSearchParams updates. (as we are comparing db version of entities with in-memory built Search params)
  • Updated DaoSearchParamSynchronizer logic to check whether it is needed to update existing search parameters after isIndexStorageOptimized change
  • Exception is thrown during the startup if isIncludePartitionInSearchHashes and isIndexStorageOptimized are enabled on server. (isIncludePartitionInSearchHashes is not supported if isIndexStorageOptimized is set to true)
  • InMemoryResourceMatcher now uses hashIdentity to filter SearchParams instead of sp_name. (only if optimization is enabled)
  • BaseSearchParamPredicateBuilder now uses hashIdentity instead of SP_NAME, RES_TYPE to build a query. This way new optimization could work in pair with Enabled IndexMissingFields setting. (only if optimization is enabled)
  • Added new tests and documentation.

…_type and sp_updated columns of HFJ_SPIDX tables nullable
Copy link

github-actions bot commented May 15, 2024

This Pull Request has failed the formatting check

Please run mvn spotless:apply or mvn clean install -DskipTests to fix the formatting issues.

You can automate this auto-formatting process to execute on the git pre-push hook, by installing pre-commit and then calling pre-commit install --hook-type pre-push. This will cause formatting to run automatically whenever you push.

Copy link

codecov bot commented May 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.51%. Comparing base (497b9f2) to head (0e40bf1).
Report is 87 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5941      +/-   ##
============================================
+ Coverage     83.39%   83.51%   +0.11%     
- Complexity    26927    27219     +292     
============================================
  Files          1681     1698      +17     
  Lines        103965   105204    +1239     
  Branches      13189    13308     +119     
============================================
+ Hits          86702    87858    +1156     
- Misses        11613    11656      +43     
- Partials       5650     5690      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@volodymyr-korzh volodymyr-korzh marked this pull request as ready for review May 30, 2024 17:41
@volodymyr-korzh volodymyr-korzh requested a review from a team as a code owner May 30, 2024 17:41
Copy link
Collaborator

@tadgh tadgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending various comments.

@@ -135,6 +135,104 @@ protected void init740() {
.toColumn("RES_ID")
.references("HFJ_RESOURCE", "RES_ID");
}

// Allow null values in SP_NAME, RES_TYPE columns for all HFJ_SPIDX_* tables. These are marked as failure
// allowed, since SQL Server won't let us change nullability on columns with indexes pointing to them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so.... what happens on SQL server? they just cant use this feature?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this comment related to SQL server to handle rare case when SP_NAME, RES_TYPE columns are included in custom(client) index. 



ca.uhn.fhir.jpa.model.entity.StorageSettings#setIndexMissingFields
 
* <p>
 * The following index may need to be added into the indexed tables such as <code>HFJ_SPIDX_TOKEN</code>
 * to improve the search performance while <code>:missing</code> is enabled.
 * <code>RES_TYPE, SP_NAME, SP_MISSING</code>
 * </p>
 */
public void setIndexMissingFields(IndexEnabledEnum theIndexMissingFields) {
	Validate.notNull(theIndexMissingFields, "theIndexMissingFields must not be null");
	myIndexMissingFieldsEnabled = theIndexMissingFields;
}

Here is a javadoc where we added a recommendation to create RES_TYPE, SP_NAME, SP_MISSING index if IndexMissingFields is enabled. If this index was created for SQL server DB, migration won’t change RES_TYPE, SP_NAME nullability. And yes, it this case new optimization feature won’t work.



Since IndexMissingFields is disabled by default, and won’t be enabled in most cases (as it impact write performance and increases db size), there should be no issues with SQL serve DB.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not just not run the migrations on sql server? instead of running them and having them fail? look into onlyappliestoplatforms()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But RES_TYPE, SP_NAME, SP_MISSING index will not be created in 99% cases, but we will drop support of SQL server DB for this feature. Do we need this ?

}
conditions.add(BinaryCondition.equalTo(getMissingColumn(), generatePlaceholder(theMissing)));

ComboCondition condition = ComboCondition.and(conditions.toArray());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like michael/ken/james to review this section

indexedSearchParamOptional.get().getParameterName());
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this code is doing exactly? this last method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is added to recover res_type and sp_name values from search parameter hash_identity.

If new optimization feature is enable, when ResourceIndexedSearchParam entity is loaded from DB res_type and sp_name are null, so we reconstruct these values.

At first, this logic was added to allow usage of ResourceIndexedSearchParam.getParamName() InMemoryResourceMatcher.java. But later InMemoryResourceMatcher was tweaked to use hash_identity instead of ParamName if new feature is enabled.

As of now, recovered res_type and sp_name values are used mostly by tests. It is possible to completely switch all tests to use hash_identity instead. However, I think we can leave this logic, as res_type and sp_name values could be helpful during debugging/troubleshooting and it is relatively cheap operation performance-wise.

Note: res_type and sp_name values are not recovered if StorageSettings#isIndexOnContainedResources,
StorageSettings#isIndexOnContainedResourcesRecursively are enabled

…-required-for-indexing-tables

# Conflicts:
#	hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamCoordsTest.java
#	hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamDateTest.java
#	hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamQuantityNormalizedTest.java
#	hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamQuantityTest.java
#	hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamStringTest.java
#	hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamTokenTest.java
#	hapi-fhir-jpaserver-model/src/test/java/ca/uhn/fhir/jpa/model/entity/ResourceIndexedSearchParamUriTest.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce storage required for indexing - stop writing sp_name, res_type, and sp_updated to hfj_spidx_* tables
3 participants