-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] deduplicate partition range for multi-table join mv #45773
[BugFix] deduplicate partition range for multi-table join mv #45773
Conversation
Signed-off-by: Murphy <mofei@starrocks.com>
} | ||
} | ||
return differ != null ? differ.diff(unique, mvRangeMap) : | ||
PartitionDiffer.simpleDiff(unique, mvRangeMap); | ||
} | ||
|
||
public static ListPartitionDiff getListPartitionDiff(Map<String, List<List<String>>> baseListMap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Incorrect usage of RangeSet leading to logic errors or unexpected behavior.
You can modify the code like this:
import com.google.common.collect.RangeSet;
import com.google.common.collect.TreeRangeSet;
// Other imports remain unchanged
public static RangePartitionDiff getRangePartitionDiffOfSlotRef(Map<String, Range<PartitionKey>> baseRangeMap,
Map<String, Range<PartitionKey>> mvRangeMap,
PartitionDiffer differ) {
// Original method contents up to the modification point
RangeSet<PartitionKey> ranges = TreeRangeSet.create();
Map<String, Range<PartitionKey>> unique = Maps.newHashMap();
for (Map.Entry<String, Range<PartitionKey>> entry : baseRangeMap.entrySet()) {
if (!ranges.encloses(entry.getValue())) {
ranges.add(entry.getValue());
unique.put(entry.getKey(), entry.getValue());
}
}
return differ != null ? differ.diff(unique, mvRangeMap) :
PartitionDiffer.simpleDiff(unique, mvRangeMap);
}
Explanation:
The most critical issue with the original code snippet is how it uses the RangeSet
and checks for enclosures before adding a range. The flaw lies in the misunderstanding of the encloses
method from the Guava library's RangeSet
. When checking if the RangeSet
already encloses a particular range before adding it, you might miss scenarios where two non-overlapping but adjacent ranges could be unified into a single continuous range that logically should be treated as one. However, without a detailed understanding of the surrounding application context and intended logic, this might not be the only risk, but it stands out due to potential logical inconsistencies it introduces in handling ranges, especially with contiguous or overlapping ranges.
Furthermore, there was no direct bug that would lead to immediate crashes or exceptions. Rather, the issue here is about ensuring the logic accurately reflects the intention of partition range manipulation - which is crucial for the proper functioning of any partitioning algorithm in database systems, particularly for range-based partitioning schemes.
Signed-off-by: Murphy <mofei@starrocks.com>
Quality Gate passedIssues Measures |
[FE Incremental Coverage Report]✅ pass : 12 / 12 (100.00%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
@Mergifyio backport branch-3.3 |
✅ Backports have been created
|
Signed-off-by: Murphy <mofei@starrocks.com> (cherry picked from commit 2bb8e40) # Conflicts: # fe/fe-core/src/test/java/com/starrocks/scheduler/PartitionBasedMvRefreshTest.java
Signed-off-by: Murphy <mofei@starrocks.com> (cherry picked from commit 2bb8e40) # Conflicts: # fe/fe-core/src/test/java/com/starrocks/scheduler/PartitionBasedMvRefreshTest.java
Why I'm doing:
What I'm doing:
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: