Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] deduplicate partition range for multi-table join mv #45773

Merged

Conversation

mofeiatwork
Copy link
Contributor

@mofeiatwork mofeiatwork commented May 17, 2024

Why I'm doing:

  • If base tables of Multi-table Join MV has different partition name, the partitions should be deduplicated, instead of creating duplicate overlapped partitions

What I'm doing:

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

Signed-off-by: Murphy <mofei@starrocks.com>
}
}
return differ != null ? differ.diff(unique, mvRangeMap) :
PartitionDiffer.simpleDiff(unique, mvRangeMap);
}

public static ListPartitionDiff getListPartitionDiff(Map<String, List<List<String>>> baseListMap,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Incorrect usage of RangeSet leading to logic errors or unexpected behavior.

You can modify the code like this:

import com.google.common.collect.RangeSet;
import com.google.common.collect.TreeRangeSet;
// Other imports remain unchanged

public static RangePartitionDiff getRangePartitionDiffOfSlotRef(Map<String, Range<PartitionKey>> baseRangeMap,
                                                                 Map<String, Range<PartitionKey>> mvRangeMap,
                                                                 PartitionDiffer differ) {
    // Original method contents up to the modification point
    
    RangeSet<PartitionKey> ranges = TreeRangeSet.create();
    Map<String, Range<PartitionKey>> unique = Maps.newHashMap();
    for (Map.Entry<String, Range<PartitionKey>> entry : baseRangeMap.entrySet()) {
        if (!ranges.encloses(entry.getValue())) {
            ranges.add(entry.getValue());
            unique.put(entry.getKey(), entry.getValue());
        }
    }
    return differ != null ? differ.diff(unique, mvRangeMap) :
            PartitionDiffer.simpleDiff(unique, mvRangeMap);
}

Explanation:
The most critical issue with the original code snippet is how it uses the RangeSet and checks for enclosures before adding a range. The flaw lies in the misunderstanding of the encloses method from the Guava library's RangeSet. When checking if the RangeSet already encloses a particular range before adding it, you might miss scenarios where two non-overlapping but adjacent ranges could be unified into a single continuous range that logically should be treated as one. However, without a detailed understanding of the surrounding application context and intended logic, this might not be the only risk, but it stands out due to potential logical inconsistencies it introduces in handling ranges, especially with contiguous or overlapping ranges.

Furthermore, there was no direct bug that would lead to immediate crashes or exceptions. Rather, the issue here is about ensuring the logic accurately reflects the intention of partition range manipulation - which is crucial for the proper functioning of any partitioning algorithm in database systems, particularly for range-based partitioning schemes.

Signed-off-by: Murphy <mofei@starrocks.com>
@mofeiatwork mofeiatwork requested a review from a team as a code owner May 17, 2024 02:24
Copy link

sonarcloud bot commented May 17, 2024

Copy link

[FE Incremental Coverage Report]

pass : 12 / 12 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/sql/common/SyncPartitionUtils.java 10 10 100.00% []
🔵 com/starrocks/connector/ConnectorPartitionTraits.java 2 2 100.00% []

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@mofeiatwork mofeiatwork merged commit 2bb8e40 into StarRocks:main May 20, 2024
52 checks passed
Copy link

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label May 20, 2024
Copy link
Contributor

mergify bot commented May 20, 2024

backport branch-3.3

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request May 20, 2024
Signed-off-by: Murphy <mofei@starrocks.com>
(cherry picked from commit 2bb8e40)

# Conflicts:
#	fe/fe-core/src/test/java/com/starrocks/scheduler/PartitionBasedMvRefreshTest.java
mofeiatwork added a commit that referenced this pull request May 20, 2024
Signed-off-by: Murphy <mofei@starrocks.com>
(cherry picked from commit 2bb8e40)

# Conflicts:
#	fe/fe-core/src/test/java/com/starrocks/scheduler/PartitionBasedMvRefreshTest.java
wanpengfei-git pushed a commit that referenced this pull request May 20, 2024
…#45773) (#45954)

Co-authored-by: Murphy <96611012+mofeiatwork@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants