[BugFix] deduplicate partition range for multi-table join mv #45773

mofeiatwork · 2024-05-17T01:26:33Z

Why I'm doing:

If base tables of Multi-table Join MV has different partition name, the partitions should be deduplicated, instead of creating duplicate overlapped partitions

What I'm doing:

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

Signed-off-by: Murphy <mofei@starrocks.com>

starrocks-cr · 2024-05-17T01:28:54Z

fe/fe-core/src/main/java/com/starrocks/sql/common/SyncPartitionUtils.java

+            }
+        }
+        return differ != null ? differ.diff(unique, mvRangeMap) :
+                PartitionDiffer.simpleDiff(unique, mvRangeMap);
    }

    public static ListPartitionDiff getListPartitionDiff(Map<String, List<List<String>>> baseListMap,


The most risky bug in this code is:
Incorrect usage of RangeSet leading to logic errors or unexpected behavior.

You can modify the code like this:

import com.google.common.collect.RangeSet; import com.google.common.collect.TreeRangeSet; // Other imports remain unchanged public static RangePartitionDiff getRangePartitionDiffOfSlotRef(Map<String, Range<PartitionKey>> baseRangeMap, Map<String, Range<PartitionKey>> mvRangeMap, PartitionDiffer differ) { // Original method contents up to the modification point RangeSet<PartitionKey> ranges = TreeRangeSet.create(); Map<String, Range<PartitionKey>> unique = Maps.newHashMap(); for (Map.Entry<String, Range<PartitionKey>> entry : baseRangeMap.entrySet()) { if (!ranges.encloses(entry.getValue())) { ranges.add(entry.getValue()); unique.put(entry.getKey(), entry.getValue()); } } return differ != null ? differ.diff(unique, mvRangeMap) : PartitionDiffer.simpleDiff(unique, mvRangeMap); }

Explanation:
The most critical issue with the original code snippet is how it uses the RangeSet and checks for enclosures before adding a range. The flaw lies in the misunderstanding of the encloses method from the Guava library's RangeSet. When checking if the RangeSet already encloses a particular range before adding it, you might miss scenarios where two non-overlapping but adjacent ranges could be unified into a single continuous range that logically should be treated as one. However, without a detailed understanding of the surrounding application context and intended logic, this might not be the only risk, but it stands out due to potential logical inconsistencies it introduces in handling ranges, especially with contiguous or overlapping ranges.

Furthermore, there was no direct bug that would lead to immediate crashes or exceptions. Rather, the issue here is about ensuring the logic accurately reflects the intention of partition range manipulation - which is crucial for the proper functioning of any partitioning algorithm in database systems, particularly for range-based partitioning schemes.

Signed-off-by: Murphy <mofei@starrocks.com>

sonarcloud · 2024-05-17T20:33:13Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

github-actions · 2024-05-17T21:50:41Z

[FE Incremental Coverage Report]

✅ pass : 12 / 12 (100.00%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	com/starrocks/sql/common/SyncPartitionUtils.java	10	10	100.00%	[]
🔵	com/starrocks/connector/ConnectorPartitionTraits.java	2	2	100.00%	[]

github-actions · 2024-05-17T21:50:52Z

[BE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2024-05-20T16:47:11Z

@Mergifyio backport branch-3.3

mergify · 2024-05-20T16:47:35Z

backport branch-3.3

✅ Backports have been created

#45954 [BugFix] deduplicate partition range for multi-table join mv (backport #45773) has been created for branch branch-3.3 but encountered conflicts

Signed-off-by: Murphy <mofei@starrocks.com> (cherry picked from commit 2bb8e40) # Conflicts: # fe/fe-core/src/test/java/com/starrocks/scheduler/PartitionBasedMvRefreshTest.java

…#45773) (#45954) Co-authored-by: Murphy <96611012+mofeiatwork@users.noreply.github.com>

deduplicate partition range for multi-table join mv

251ca43

Signed-off-by: Murphy <mofei@starrocks.com>

github-actions bot added the 3.3 label May 17, 2024

mergify bot assigned mofeiatwork May 17, 2024

starrocks-cr bot reviewed May 17, 2024

View reviewed changes

check partition type

34f5ca0

Signed-off-by: Murphy <mofei@starrocks.com>

mofeiatwork requested a review from a team as a code owner May 17, 2024 02:24

ABingHuang approved these changes May 20, 2024

View reviewed changes

LiShuMing approved these changes May 20, 2024

View reviewed changes

DorianZheng approved these changes May 20, 2024

View reviewed changes

mofeiatwork merged commit 2bb8e40 into StarRocks:main May 20, 2024
52 checks passed

github-actions bot removed the 3.3 label May 20, 2024

mergify bot mentioned this pull request May 20, 2024

[BugFix] deduplicate partition range for multi-table join mv (backport #45773) #45954

Merged

42 tasks

wanpengfei-git pushed a commit that referenced this pull request May 20, 2024

[BugFix] deduplicate partition range for multi-table join mv (backport …

bbe59e2

…#45773) (#45954) Co-authored-by: Murphy <96611012+mofeiatwork@users.noreply.github.com>

github-actions bot added the 3.3-merged label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] deduplicate partition range for multi-table join mv #45773

[BugFix] deduplicate partition range for multi-table join mv #45773

mofeiatwork commented May 17, 2024 •

edited

starrocks-cr bot May 17, 2024

sonarcloud bot commented May 17, 2024

github-actions bot commented May 17, 2024

github-actions bot commented May 17, 2024

github-actions bot commented May 20, 2024

mergify bot commented May 20, 2024 •

edited

[BugFix] deduplicate partition range for multi-table join mv #45773

[BugFix] deduplicate partition range for multi-table join mv #45773

Conversation

mofeiatwork commented May 17, 2024 • edited

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

starrocks-cr bot May 17, 2024

Choose a reason for hiding this comment

sonarcloud bot commented May 17, 2024

Quality Gate passed

github-actions bot commented May 17, 2024

[FE Incremental Coverage Report]

file detail

github-actions bot commented May 17, 2024

[BE Incremental Coverage Report]

github-actions bot commented May 20, 2024

mergify bot commented May 20, 2024 • edited

✅ Backports have been created

mofeiatwork commented May 17, 2024 •

edited

mergify bot commented May 20, 2024 •

edited