Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition around TestQuerySharding_Correctness #7808

Open
narqo opened this issue Apr 4, 2024 · 2 comments
Open

Race condition around TestQuerySharding_Correctness #7808

narqo opened this issue Apr 4, 2024 · 2 comments

Comments

@narqo
Copy link
Contributor

narqo commented Apr 4, 2024

This build on the CI has failed due to race condition detected

WARNING: DATA RACE
Read at 0x00c0016ee5c0 by goroutine 3679:
  github.com/prometheus/prometheus/promql.(*evaluator).vectorSelectorSingle()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1972 +0x334
  github.com/prometheus/prometheus/promql.(*evaluator).eval()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1718 +0x3bd3
  github.com/prometheus/prometheus/promql.(*evaluator).rangeEval()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1125 +0x279
  github.com/prometheus/prometheus/promql.(*evaluator).eval()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1429 +0x6459
  github.com/prometheus/prometheus/promql.(*evaluator).Eval()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1065 +0x149
  github.com/prometheus/prometheus/promql.(*Engine).execEvalStmt()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:774 +0x19af
  github.com/prometheus/prometheus/promql.(*Engine).exec()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:656 +0x59c
  github.com/prometheus/prometheus/promql.(*query).Exec()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:244 +0x226
  github.com/grafana/mimir/pkg/frontend/querymiddleware.(*downstreamHandler).Do()
      /__w/mimir/mimir/pkg/frontend/querymiddleware/querysharding_test.go:2156 +0xe1
  github.com/grafana/mimir/pkg/frontend/querymiddleware.(*querySharding).Do()
      /__w/mimir/mimir/pkg/frontend/querymiddleware/querysharding.go:133 +0x1068
  github.com/grafana/mimir/pkg/frontend/querymiddleware.TestQuerySharding_Correctness.func1.1.1()
      /__w/mimir/mimir/pkg/frontend/querymiddleware/querysharding_test.go:733 +0x375
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Previous write at 0x00c0016ee5c0 by goroutine 3666:
  github.com/prometheus/prometheus/model/histogram.(*FloatHistogram).CopyTo()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/model/histogram/float_histogram.go:93 +0x1f7
  github.com/prometheus/prometheus/storage.(*SampleRingIterator).AtFloatHistogram()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/storage/buffer.go:391 +0x77
  github.com/prometheus/prometheus/promql.(*evaluator).matrixIterSlice()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:2176 +0x950
  github.com/prometheus/prometheus/promql.(*evaluator).eval()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1518 +0xa076
  github.com/prometheus/prometheus/promql.(*evaluator).rangeEval()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1125 +0x279
  github.com/prometheus/prometheus/promql.(*evaluator).eval()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1364 +0x4b32
  github.com/prometheus/prometheus/promql.(*evaluator).Eval()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:1065 +0x149
  github.com/prometheus/prometheus/promql.(*Engine).execEvalStmt()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:774 +0x19af
  github.com/prometheus/prometheus/promql.(*Engine).exec()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:656 +0x59c
  github.com/prometheus/prometheus/promql.(*query).Exec()
      /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/promql/engine.go:244 +0x226
  github.com/grafana/mimir/pkg/frontend/querymiddleware.(*downstreamHandler).Do()
      /__w/mimir/mimir/pkg/frontend/querymiddleware/querysharding_test.go:2156 +0xe1
  github.com/grafana/mimir/pkg/frontend/querymiddleware.(*shardedQuerier).handleEmbeddedQueries.func1()
      /__w/mimir/mimir/pkg/frontend/querymiddleware/sharded_queryable.go:113 +0x157
  github.com/grafana/dskit/concurrency.ForEachJob.func1()
      /__w/mimir/mimir/vendor/github.com/grafana/dskit/concurrency/runner.go:119 +0xb8
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /__w/mimir/mimir/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x91

Goroutine 3679 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1742 +0x825
  github.com/grafana/mimir/pkg/frontend/querymiddleware.TestQuerySharding_Correctness.func1.1()
      /__w/mimir/mimir/pkg/frontend/querymiddleware/querysharding_test.go:722 +0x3c4
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 3666 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /__w/mimir/mimir/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x124
  github.com/grafana/dskit/concurrency.ForEachJob()
      /__w/mimir/mimir/vendor/github.com/grafana/dskit/concurrency/runner.go:112 +0x2cb
  github.com/grafana/mimir/pkg/frontend/querymiddleware.(*shardedQuerier).handleEmbeddedQueries()
      /__w/mimir/mimir/pkg/frontend/querymiddleware/sharded_queryable.go:112 +0x18c
  github.com/grafana/mimir/pkg/frontend/querymiddleware.(*shardedQuerier).Select()
      /__w/mimir/mimir/pkg/frontend/querymiddleware/sharded_queryable.go:103 +0x6ef
  github.com/grafana/mimir/pkg/storage/lazyquery.LazyQuerier.Select.func1()
      /__w/mimir/mimir/pkg/storage/lazyquery/lazyquery.go:53 +0xc2
==================
···
@narqo
Copy link
Contributor Author

narqo commented Apr 4, 2024

This part rings a bell

···
github.com/prometheus/prometheus/model/histogram.(*FloatHistogram).CopyTo()
     /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/model/histogram/float_histogram.go:93 +0x1f7
  github.com/prometheus/prometheus/storage.(*SampleRingIterator).AtFloatHistogram()
     /__w/mimir/mimir/vendor/github.com/prometheus/prometheus/storage/buffer.go:391 +0x77

@krajorama, I recall you're were looking at something similar around a race in FloatHistogram. Could you have a look if that's already a known problem

@krajorama
Copy link
Contributor

hi, yes, I have a workaround for this: #7504

Shouldn't happen in production.

I have some ideas about how to actually fix in prometheus, but it takes some research. prometheus/prometheus#13679

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants