Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bucket verify: false overlapping block reports #2420

Closed
belm0 opened this issue Apr 13, 2020 · 9 comments
Closed

bucket verify: false overlapping block reports #2420

belm0 opened this issue Apr 13, 2020 · 9 comments
Labels

Comments

@belm0
Copy link
Contributor

belm0 commented Apr 13, 2020

Thanos, Prometheus and Golang version used: thanos v0.12.0-rc.1 (except for sidecar which is at v0.11.0)

Object Storage Provider: GCS

What happened: Ran bucket verify and it reported overlapping blocks. An 8h block recently written by compactor was overlapping four 2h blocks written by sidecar. But since the latest compactor implementation has a 48h block delete delay (#2136 ), this is expected.

snippet from bucket verify output:

level=warn ts=2020-04-13T03:48:30.537002524Z caller=overlapped_blocks.go:43 msg="found overlapped blocks" group=0@4233639831006398026 overlap="
[mint: 1586707200000, maxt: 1586714400000, range: 2h0m0s, blocks: 2]:
    <ulid: 01E5RSG6R668NAZESSQZED1TD7, mint: 1586707200000, maxt: 1586736000000, range: 8h0m0s>,
    <ulid: 01E5QW6MNFRSQEQ684CBK60CK2, mint: 1586707200000, maxt: 1586714400000, range: 2h0m0s>
[mint: 1586714400000, maxt: 1586721600000, range: 2h0m0s, blocks: 2]:
    <ulid: 01E5RSG6R668NAZESSQZED1TD7, mint: 1586707200000, maxt: 1586736000000, range: 8h0m0s>,
    <ulid: 01E5R32C6EXB1DRE58D4VB70Q4, mint: 1586714400000, maxt: 1586721600000, range: 2h0m0s
[mint: 1586721600000, maxt: 1586728800000, range: 2h0m0s, blocks: 2]:
    <ulid: 01E5RSG6R668NAZESSQZED1TD7, mint: 1586707200000, maxt: 1586736000000, range: 8h0m0s>,
    <ulid: 01E5R9Y2FFGXWAKPBQ4BGN925Q, mint: 1586721600000, maxt: 1586728800000, range: 2h0m0s>
[mint: 1586728800000, maxt: 1586736000000, range: 2h0m0s, blocks: 2]:
    <ulid: 01E5RSG6R668NAZESSQZED1TD7, mint: 1586707200000, maxt: 1586736000000, range: 8h0m0s>,
    <ulid: 01E5RGSSSXDTNA379XN3EVQYZ5, mint: 1586728800000, maxt: 1586736000000, range: 2h0m0s>
"

What you expected to happen: bucket verify should be aware of block delete delay, and not report overlaps within that range (especially if it's a perfect overlap between compactor and sidecar blocks)

How to reproduce it (as minimally and precisely as possible):
Run sidecar and compactor with default options. Check bucket verify after about 12 hours.

Full logs to relevant components:

thanos-compact logs of the relevant compaction

``` E 2020-04-13T03:31:25.241222230Z level=info ts=2020-04-13T03:31:25.241006853Z caller=compact.go:633 group="0@{replica=\"main\"}" groupKey=0@4233639831006398026 msg="compaction available and planned; downloading blocks" plan="[/var/thanos/compact/compact/0@4233639831006398026/01E5QW6MNFRSQEQ684CBK60CK2 /var/thanos/compact/compact/0@4233639831006398026/01E5R32C6EXB1DRE58D4VB70Q4 /var/thanos/compact/compact/0@4233639831006398026/01E5R9Y2FFGXWAKPBQ4BGN925Q /var/thanos/compact/compact/0@4233639831006398026/01E5RGSSSXDTNA379XN3EVQYZ5]"

E 2020-04-13T03:32:05.960222018Z level=info ts=2020-04-13T03:32:05.959913817Z caller=compact.go:692 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="downloaded and verified blocks; compacting blocks" plan="[/var/thanos/compact/compact/0@4233639831006398026/01E5QW6MNFRSQEQ684CBK60CK2 /var/thanos/compact/compact/0@4233639831006398026/01E5R32C6EXB1DRE58D4VB70Q4 /var/thanos/compact/compact/0@4233639831006398026/01E5R9Y2FFGXWAKPBQ4BGN925Q /var/thanos/compact/compact/0@4233639831006398026/01E5RGSSSXDTNA379XN3EVQYZ5]" duration=40.718598845s

E 2020-04-13T03:32:24.846681182Z level=info ts=2020-04-13T03:32:24.846395189Z caller=fetcher.go:451 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=98.250223ms cached=56 returned=56 partial=0

E 2020-04-13T03:33:24.815335272Z level=info ts=2020-04-13T03:33:24.815094264Z caller=fetcher.go:451 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=66.97198ms cached=56 returned=56 partial=0

E 2020-04-13T03:33:38.196496958Z level=info ts=2020-04-13T03:33:38.19619836Z caller=compact.go:441 msg="compact blocks" count=4 mint=1586707200000 maxt=1586736000000 ulid=01E5RSG6R668NAZESSQZED1TD7 sources="[01E5QW6MNFRSQEQ684CBK60CK2 01E5R32C6EXB1DRE58D4VB70Q4 01E5R9Y2FFGXWAKPBQ4BGN925Q 01E5RGSSSXDTNA379XN3EVQYZ5]" duration=1m32.236141704s

E 2020-04-13T03:33:38.268094915Z level=info ts=2020-04-13T03:33:38.26783171Z caller=compact.go:722 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="compacted blocks" new=01E5RSG6R668NAZESSQZED1TD7 blocks="[/var/thanos/compact/compact/0@4233639831006398026/01E5QW6MNFRSQEQ684CBK60CK2 /var/thanos/compact/compact/0@4233639831006398026/01E5R32C6EXB1DRE58D4VB70Q4 /var/thanos/compact/compact/0@4233639831006398026/01E5R9Y2FFGXWAKPBQ4BGN925Q /var/thanos/compact/compact/0@4233639831006398026/01E5RGSSSXDTNA379XN3EVQYZ5]" duration=1m32.307778181s overlapping_blocks=false

E 2020-04-13T03:34:07.430519788Z level=info ts=2020-04-13T03:34:07.430315711Z caller=compact.go:764 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="uploaded block" result_block=01E5RSG6R668NAZESSQZED1TD7 duration=25.371003415s

E 2020-04-13T03:34:07.522599830Z level=info ts=2020-04-13T03:34:07.522353634Z caller=compact.go:792 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="marking compacted block for deletion" old_block=01E5QW6MNFRSQEQ684CBK60CK2

E 2020-04-13T03:34:07.553891769Z level=info ts=2020-04-13T03:34:07.553709099Z caller=block.go:157 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="block has been marked for deletion" block=01E5QW6MNFRSQEQ684CBK60CK2

E 2020-04-13T03:34:07.654412028Z level=info ts=2020-04-13T03:34:07.654176246Z caller=compact.go:792 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="marking compacted block for deletion" old_block=01E5R32C6EXB1DRE58D4VB70Q4

E 2020-04-13T03:34:07.729202698Z level=info ts=2020-04-13T03:34:07.728845712Z caller=block.go:157 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="block has been marked for deletion" block=01E5R32C6EXB1DRE58D4VB70Q4

E 2020-04-13T03:34:07.819677228Z level=info ts=2020-04-13T03:34:07.819412578Z caller=compact.go:792 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="marking compacted block for deletion" old_block=01E5R9Y2FFGXWAKPBQ4BGN925Q

E 2020-04-13T03:34:07.854043215Z level=info ts=2020-04-13T03:34:07.853813446Z caller=block.go:157 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="block has been marked for deletion" block=01E5R9Y2FFGXWAKPBQ4BGN925Q

E 2020-04-13T03:34:08.031032346Z level=info ts=2020-04-13T03:34:08.030789754Z caller=compact.go:792 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="marking compacted block for deletion" old_block=01E5RGSSSXDTNA379XN3EVQYZ5

E 2020-04-13T03:34:08.080332509Z level=info ts=2020-04-13T03:34:08.080062267Z caller=block.go:157 group="0@{replica="main"}" groupKey=0@4233639831006398026 msg="block has been marked for deletion" block=01E5RGSSSXDTNA379XN3EVQYZ5

</p>
</details>
@stale
Copy link

stale bot commented May 13, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label May 13, 2020
@belm0
Copy link
Contributor Author

belm0 commented May 13, 2020

@bwplotka I think this is a real bug introduced in v0.12.0, unless it has since been fixed

@stale stale bot removed the stale label May 13, 2020
@stale
Copy link

stale bot commented Jun 12, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 12, 2020
@stale
Copy link

stale bot commented Jun 19, 2020

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jun 19, 2020
@GiedriusS
Copy link
Member

Need to check if this is still valid.

@GiedriusS GiedriusS reopened this Jun 19, 2020
@stale stale bot removed the stale label Jun 19, 2020
@belm0
Copy link
Contributor Author

belm0 commented Jul 14, 2020

bucket verify doesn't seem to be aware of pending deletions, and falsely reports overlapping blocks after a compaction. Would someone please confirm?

@belm0
Copy link
Contributor Author

belm0 commented Jul 15, 2020

Evidence: thanos-compact is happy and in steady state, while bucket verify reports overlapping blocks. If there were really overlapping active blocks, thanos-compact would halt. @GiedriusS

thanos-compact log:

ts=2020-07-15T06:45:05Z caller=compact.go:887 msg="start sync of metas"
ts=2020-07-15T06:45:08Z caller=compact.go:892 msg="start of GC"
ts=2020-07-15T06:45:08Z caller=compact.go:904 msg="start of compactions"
ts=2020-07-15T06:45:09Z caller=compact.go:936 msg="compaction iterations done"
ts=2020-07-15T06:45:09Z caller=compact.go:308 msg="start first pass of downsampling"
ts=2020-07-15T06:45:12Z caller=compact.go:316 msg="start second pass of downsampling"
ts=2020-07-15T06:45:16Z caller=compact.go:323 msg="downsampling iterations done"
ts=2020-07-15T06:45:20Z caller=retention.go:30 msg="start optional retention"
ts=2020-07-15T06:45:20Z caller=retention.go:45 msg="optional retention apply done"
ts=2020-07-15T06:45:20Z caller=clean.go:33 msg="started cleaning of aborted partial uploads"
ts=2020-07-15T06:45:20Z caller=clean.go:60 msg="cleaning of aborted partial uploads done"
ts=2020-07-15T06:45:20Z caller=blocks_cleaner.go:43 msg="started cleaning of blocks marked for deletion"
ts=2020-07-15T06:45:20Z caller=blocks_cleaner.go:57 msg="cleaning of blocks marked for deletion done"

bucket verify:

$ /bin/thanos tools bucket verify --objstore.config-file=/etc/config/object-store.yaml
level=warn ts=2020-07-15T07:40:27.677258445Z caller=overlapped_blocks.go:43 msg="found overlapped blocks" group=0@4233639831006398026 overlap="
[mint: 1593676800000, maxt: 1593684000000, range: 2h0m0s, blocks: 4]: <ulid: 01ED70V6WGV50P7WTVCHTHXDD1, mint: 1593043200000, maxt: 1594252800000, range: 336h0m0s>, <ulid: 01ED6BFVJFR4WWVBZEP8QKN9VK, mint: 1593561600000, maxt: 1593734400000, range: 48h0m0s>, <ulid: 01ED627072JGKYJT1YRXQF0WVT, mint: 1593676800000, maxt: 1593705600000, range: 8h0m0s>, <ulid: 01EC7JY0VJTQ80TBMX6Q0AA2VV, mint: 1593676800000, maxt: 1593684000000, range: 2h0m0s>
[mint: 1593864000000, maxt: 1593871200000, range: 2h0m0s, blocks: 4]: <ulid: 01ED70V6WGV50P7WTVCHTHXDD1, mint: 1593043200000, maxt: 1594252800000, range: 336h0m0s>, <ulid: 01ED6D6PYMNA7JKPTP3P38BN3Z, mint: 1593734400000, maxt: 1593907200000, range: 48h0m0s>, <ulid: 01ED63H1CEW5S5EA9G3A0NY9S4, mint: 1593849600000, maxt: 1593878400000, range: 8h0m0s>, <ulid: 01ECD5EXBH2FNZJH812YBF1ZJN, mint: 1593864000000, maxt: 1593871200000, range: 2h0m0s>
...

@stale
Copy link

stale bot commented Aug 14, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Aug 14, 2020
@stale
Copy link

stale bot commented Aug 21, 2020

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants