*: Support multi-tenancy on object storage using custom prefix #1318

github-edouard-devouge · 2019-07-10T09:21:41Z

Support multi-tenancy on object storage using custom prefix

Context / Use case

On a multi-cluster kubernetes setup with multiple prometheus instances (one per namespace) using PrometheusOperator. Each prometheus instance embed its own thanos sidecar and for performance & scalability purpose data chuncks are sharded accross multiple s3 buckets.
For each prometheus instance, there is a dedicated thanos store and a thanos compactor working on its own bucket. One thanos query is setup to map/reduce promql queries across multiple stores and thanos-sidecars.

Problems

With this setup there is a proliferation of s3 buckets.
Bucket creation lifecycle is not as dynamic as kubernetes resources (kubectl apply versus terraform apply). Require use of multiple tools.

Question / feature request

Is there a way to share a bucket across multiple thanos instances (keeping the sharding property) ?

Otherwise, a cool feature could be the ability to add a custom prefix on thanos objects (objstore.config). So we could take advantage of s3 bucket multi-tenancy.

e.g : s3://<my-bucket-name>/**<custom-prefix>**/<thanos-objects>

The text was updated successfully, but these errors were encountered:

github-edouard-devouge · 2019-07-31T14:31:33Z

Design proposal to discuss

Scope :

Enrich objstore configuration to support a custom prefix in s3 object storage key name
Focus on AWS S3 implementation for a first contribution

Expected result

[Before] Current flat bucket file tree :

  bucket-root
  ├── 01DH3V0E80CBEK8SK00A93SH7Z
  │    ├── chunks
  │    ├── index
  │    └── meta.json
  ├── 01DG4BDF8R3FDSZ3EM6MVHVTFX
  │    ├── chunks
  │    ├── index
  │    └── meta.json
  ├──  01DH3V01JXFHC0C9ABCM7MZGP4
  │    ├── chunks
  │    ├── index
  │    └── meta.json
  └── debug
       └── metas
           ├── 01DH3V0E80CBEK8SK00A93SH7Z.json
           ├── 01DG4BDF8R3FDSZ3EM6MVHVTFX.json
           └──  01DH3V01JXFHC0C9ABCM7MZGP4.json

[After] Expected prefixed bucket file tree :

  bucket-root
  └── my
      └── custom
          └── prefix
              ├── 01DH3V0E80CBEK8SK00A93SH7Z
              │    ├── chunks
              │    ├── index
              │    └── meta.json
              ├── 01DG4BDF8R3FDSZ3EM6MVHVTFX
              │    ├── chunks
              │    ├── index
              │    └── meta.json
              ├──  01DH3V01JXFHC0C9ABCM7MZGP4
              │    ├── chunks
              │    ├── index
              │    └── meta.json
              └── debug
                   └── metas
                       ├── 01DH3V0E80CBEK8SK00A93SH7Z.json
                       ├── 01DG4BDF8R3FDSZ3EM6MVHVTFX.json
                       └──  01DH3V01JXFHC0C9ABCM7MZGP4.json

Implementation steps

Add a new prefix flag in objstore.config for type s3

type: S3
config:
    prefix: "my/custom/prefix/"
    bucket: ""
    endpoint: ""
    region: ""
    access_key: ""
    insecure: false
  [...]

Write some unit tests in pkg/objstore/s3/s3_test.go

Edit pkg/objstore/s3/s3.go to :

Extend the configuration data structure type Config struct with Prefix string yaml:"perfix"
Validate prefix format :

func validate(conf Config) error {
  [...]
  if conf.Prefix != "" && len(conf.Prefix)>=1024 {
		return errors.New("prefix is too long (limited by Amazon at 1024 bytes long)")
	}
	return nil
}
[...]
prefix = strings.TrimSuffix(prefix, "/") + "/"

Append prefix string to object name for each minio call requiring an object name. For instance:

// Delete removes the object with the given name.
func (b *Bucket) Delete(ctx context.Context, name string) error {
	return b.client.RemoveObject(b.name, prefix + name)
}

Update the doc : docs/storage.md
Create a PR

Forecats of impacted files :

povilasv · 2019-08-02T13:16:05Z

The proposal LGTM 🥇

Nathan-Okolita-Aimtheory · 2019-08-16T13:15:46Z

@edevouge is there any progress with this issue?

github-edouard-devouge · 2019-08-19T07:54:43Z

@Nathan-Okolita-Aimtheory: a pull request ( #1392 ) implementing this feature is work-in-progress by @jaredallard

brancz · 2019-08-20T07:37:34Z

I'm not against this per se, but we already have a multi-tenancy mechanism approved in the thanos receive component, that is label-set based. If we do something about multi-tenancy in the object store bucket, then that should be label-set based as well, as opposed to path based.

Nathan-Okolita-Aimtheory · 2019-08-20T11:28:44Z

@brancz would your label based implementation look like this in the Bucket.yml:
type: S3 config: label: "tenant_label" bucket: "bucket" endpoint: "s3" region: "region" access_key: "access_key" secret_key: "secret_key" insecure: false signature_version2: true
Would the Thanos store gateway need to index the entire s3 bucket to return just the label desired?

brancz · 2019-08-20T12:12:34Z

A tenant is a label-set, so not just a single label. So that specific part would rather be something like:

tenantLabels:
- label1
- label2

bigkraig · 2019-10-14T20:21:41Z

@brancz I'm not fully familiar with the label based tenants, but segregating in the bucket path allows you to use one bucket & policies for multiple Thanos deployments, that may not want to share Thanos services.

At least in my case, I have different teams with different products. By leveraging path restrictions I can reduce my S3 bucket sprawl. It also means that teams that may want to share a common query endpoint can do so, without exposing themselves to cardinality bombs by bad deployments. If someone does something bad, it only impacts their bucket location.

@edevouge @jaredallard are either of you continuing to work on the existing PR?

dlahoza · 2019-12-11T07:48:08Z

These changes would allow lifecycle management inside the multipurpose buckets.
As an example, we use an S3 bucket per cluster, and we maintain multiple clusters in a single account.
It's convenient to use a single bucket, replicate it, delete it with all the data at once.
And the best part is that we can apply a life-cycle policy on the Thanos data by using the prefix of a subdirectory.
And this is crucial for us.

AnandDevan · 2020-01-08T02:34:34Z

We host all monitoring related configs, backup etc. under single bucket and would like to use that bucket to host thanos data as well. This feature would be super useful.

domino7 · 2020-01-16T16:16:21Z

Elimination of multiple buckets maintenance is definitive adventage.
One more requirement here will be to ensure that clients will not affect each other performance and stability (eg with query like {"__name__"=~".+"} setup for single client/hashring should be affected at most. It shouldn't be necessary to index and query metrics from all clients - which may be the case for labels-based segregation suggested by @brancz ). Solution with /path prefix and dedicated read access(stores, queriers) for each client seems to address this issues.

I played a little bit with PR from @Dlag (PR mentioned above) and menaged to setup configuration with read and write access, however I spotted that compaction is affected when accessing data from s3 subdirectories, failing with write compaction: chunk 8 not found: segment index 0 out of range error similar to #1300 . Setup was tested with thanos 0.10 with additional s3 prefixes configuration (configuration with thanos v0.10 and dedicated s3 buckets instead of subdirectories works fine for the same dataset).
That is one more thing that should be verified for this feature.

stale · 2020-02-15T16:52:26Z

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

Pryz · 2020-02-20T15:44:44Z

Kind of the same use case here where I need to set a prefix because Thanos data will not be the only data I would stored in the bucket. Seems like two PRs have been closed for this issue but I'm having hard time to understand what the blocker is.

How can we move forward here ? :)

stale · 2020-03-21T16:09:13Z

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

Nathan-Okolita-Aimtheory · 2020-03-23T11:26:59Z

I am still hoping this issue will be picked up. We have an environment with many Prometheus instances many of which are deployed through automation. Having to provision many s3 buckets is not a very appealing prospect.

stale · 2020-04-22T18:21:49Z

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

chadlwilson · 2020-04-23T01:27:22Z

I think there is still a lot of interest here. This would make it a lot easier to operate a single Thanos that can aggregate metrics for N teams' prometheuses.

pracucci · 2020-04-23T06:59:07Z

Introducing the ability to configure an optional bucket prefix would also be a nice step forward towards Cortex blocks storage and Thanos bucket store interoperability.

bwplotka · 2020-10-14T22:24:42Z

Thanks for the explanations @markmsmith @roidelapluie @dsayan154 I think I am happy with some simple prefix to make sure you can store blocks deeper in bucket (or have it compatible with Cortex).

@underrun:

Can you clarify how to avoid the issue @mei-ling is having without prefixes, which, if i understand, would allow multiple store gateway / compactor to work on the same bucket - one per cluster as desired in that case?

You just point multiple sidecars to the same bucket, same directory. Since Prometheus has unique external labels we can distinguish between blocks. Directory does not matter for Thanos. (:

I'm also a little confused by how people are using "tenancy" here as i feel like that's almost the opposite of this issue. does "multi-tenant" here mean having one set of infrastructure across an arbitrary number of prometheus monitored by one "thanos" with possibly multiple store gateway / compactor? This is the use case I feel like prefixes help with. Or does it mean having multiple teams with different sets of infrastructure who want to share one "thanos" for only monitoring what they are responsible for?

I think both things are the same right? Multiple metric sources (e.g Prometheus + sidecar) from multiple potentially isolated teams, uploading to the same bucket, thus allowing whatever set of compactors/store gateways thanks to sharding.

@dsayan154

In case, a tenant stops subscribing to this solution, the tenant cluster's data should be deleted from the objectore store.

Deletion of data for a defined prefix is a valid issue/ request you can put on Thanos and we can guide/help you write reliable CLI / tooling to make it work. (complexity no 1) (: You can even build on top of that and automated this purely with some deletion API tenant. Building on top of prefix is really not just hacking prefixes for multitenancy. Our semantics for multitenancy is around labels, so would be nice if your setup will follow the same rules so we can work together on those complexities you mentioned (: Per complexity number 2. You are right this is some scalability limitation, however further away (you can have easily 100k items for iter API and it's quite large number of blocks (something like 1y of ~100 tenants) for Thanos with healthy compaction). This is however easy to solve by copying the blocks new dirs if needed etc. Let's create an issue to discuss potential solutions. What I am saying is that prefix will not solve this (very potential) problem: What if within tenant you have that number of blocks?

@roidelapluie

With the prefix, it is possible to use the excellent Thanos block viewer on a cortex backend. Currently, Cortex does not really have a nice UI like Thanos to visualize this.

YES! cc @Oghenebrume50 @prmsrswt @kunal-kushwaha for motivation to get block viewer natively for Prometheus (and Complex a well!)

stale · 2020-12-13T22:34:44Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

pracucci · 2020-12-14T07:43:20Z

Still valid

stale · 2021-04-18T22:57:12Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

midnightconman · 2021-04-19T00:23:56Z

Still valid?

pete-leese · 2021-05-20T08:59:23Z

What is the current status of this? :)

stale · 2021-07-19T18:26:14Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

dsayan154 · 2021-07-20T04:17:49Z

still valid

oronsh · 2021-09-05T16:09:49Z

Why won't you do it dynamic rather than config based?
This way, each store api could handle multiple prefixes for multiple tenants.
I guess this would need support in the compactor as well.

stale · 2022-01-09T10:34:00Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

markmsmith · 2022-01-09T17:28:52Z

Still needed.

yuha0 · 2022-02-04T19:52:45Z

Hi @bwplotka

You just point multiple sidecars to the same bucket, same directory. Since Prometheus has unique external labels we can distinguish between blocks. Directory does not matter for Thanos. (:

This will put a huge and unnecessary pressure on each thanos store process, as well as the singleton compactor process.

For instance, say now I create a new cluster with a new thanos deployment in it, pointing to an existing bucket where other existing old clusters are using as metrics storage backend, the thanos store will immediately require a huge amount of resources (disk, CPU, memory) to process all the existing metrics data in that bucket, even when the new cluster hasn't even started generating a single data point.

This problem is more obvious on the compactor component since it is required to be a singleton process. Losing this one single process means losing data compaction globally, which is a huge risk.

Our current mitigation is just to create one bucket for each "tenant" (Kubernetes cluster) so each one can mind their own resource requirements for thanos store and compactor. But this means a lot of buckets.

tomereli · 2022-04-10T12:57:48Z

Any update on this? Is it at all planned to be added at some point of time? @bwplotka

bwplotka · 2022-04-29T11:40:10Z

Hi, all again! Thanks all for keeping this up to date - it's really important to solve all the important cases you might have. Please let us know if this is still valid and for what reasons--we never said no to this, but honestly, we still don't see a clear need. Let me explain:

Currently, all Thanos components support multi-tenancy thanks to external labels:

Receivers uploads block with tenant external label per tenant's block
All bucket read components like Store and Compactor have selector flag mechanism that allows you to choose what blocks to handle. You can totally run any of those and say "give me blocks only for tenant X". You can even shard instances of store per tenants. All explained here (let us know if things are unclear we can expand on this document). IMO this solves cases like:

For instance, say now I create a new cluster with a new thanos deployment in it, pointing to an existing bucket where other existing old clusters are using as metrics storage backend, the thanos store will immediately require a huge amount of resources (disk, CPU, memory) to process all the existing metrics data in that bucket, even when the new cluster hasn't even started generating a single data point.

Tools exists that inject matchers into query e.g https://github.com/prometheus-community/prom-label-proxy (we might consider adding this natively in Querier)

The only need for this kind implementation might be useful when we hit limit of number of directories to scan in single bucket, Prefix might in theory help here for scan speed up. But these reasons seem to be very different to what we see from responses here and might have different solutions (e.g having 2 buckets etc.)

@roidelapluie mentioned

With the prefix, it is possible to use the excellent thanos block viewer on a cortex backend. Currently Cortex does not really have a nice UI like thanos to visualize this.

If I understand it right the thanos tool commands might be needed to be extend by selector feature too.

Let us know if we miss anything here 🤗 Thanks!

hanjm · 2022-04-29T13:40:38Z

Thanks your detailed note. I think the solutions like having 2 or more buckets need a heavy resource operatation . when create a new tenant , using bucket prefix is more easily than apply create a bucket.

tomereli · 2022-04-29T13:53:29Z

The reason bucket prefix support is appealing to us and probably others, is since we use silo multi-tenancy model for our tenants, meaning that each tenant's bucket must not be accessible to another tenant.
By this we also mean Thanos itself - we want to enforce this with resource policies in AWS based on JWT custom claims, which will decouple authorization logic from code entirely.
Having single bucket and using Thanos multitenancy just doesn't fit this use case so we're forced to use a Thanos per tenant plus a bucket per tenant.

markmsmith · 2022-05-02T22:05:37Z

The use case I posted in May of 2020 is still relevant today, and the lack of this feature blocks us from reducing the number of buckets we need to spin up for each tenant within an AWS account:
#1318 (comment)

yeya24 · 2022-06-24T02:12:11Z

Closed by #5337.

GiedriusS added component: receive feature request/improvement labels Jul 11, 2019

povilasv mentioned this issue Aug 5, 2019

Allow usage of folders within S3 #697

Closed

jaredallard mentioned this issue Aug 9, 2019

feat(pkg/objstore/s3): support for prefixes #1392

Closed

bwplotka mentioned this issue Oct 24, 2019

Feat/blocks cortexproject/cortex#1695

Merged

bwplotka mentioned this issue Dec 10, 2019

objstore/s3, docs/storage.md: Added 'path' field for S3 objstore #1862

Closed

stale bot added the stale label Feb 15, 2020

stale bot removed the stale label Feb 20, 2020

stale bot added the stale label Mar 21, 2020

stale bot removed the stale label Mar 23, 2020

stale bot added the stale label Apr 22, 2020

stale bot removed the stale label Apr 23, 2020

stale bot added the stale label Dec 13, 2020

stale bot removed the stale label Dec 14, 2020

kakkoyun changed the title ~~Support multi-tenancy on object storage using custom prefix~~ *: Support multi-tenancy on object storage using custom prefix Feb 12, 2021

stale bot added the stale label Apr 18, 2021

stale bot removed the stale label Apr 19, 2021

stale bot added the stale label Jul 19, 2021

stale bot removed the stale label Jul 20, 2021

GiedriusS mentioned this issue Nov 22, 2021

Storage: Add Oracle Cloud Infrastructure Object Storage Bucket support #4661

Closed

2 tasks

stale bot added the stale label Jan 9, 2022

stale bot removed the stale label Jan 9, 2022

jademcosta mentioned this issue May 6, 2022

sidecar/compact/store/receiver - Add the prefix option to buckets #5337

Merged

2 tasks

yeya24 closed this as completed Jun 24, 2022

*: Support multi-tenancy on object storage using custom prefix #1318

*: Support multi-tenancy on object storage using custom prefix #1318

Comments

github-edouard-devouge commented Jul 10, 2019

Support multi-tenancy on object storage using custom prefix

Context / Use case

Problems

Question / feature request

github-edouard-devouge commented Jul 31, 2019 • edited

Design proposal to discuss

Scope :

Expected result

Implementation steps

Forecats of impacted files :

povilasv commented Aug 2, 2019

Nathan-Okolita-Aimtheory commented Aug 16, 2019

github-edouard-devouge commented Aug 19, 2019

brancz commented Aug 20, 2019

Nathan-Okolita-Aimtheory commented Aug 20, 2019

brancz commented Aug 20, 2019

bigkraig commented Oct 14, 2019

dlahoza commented Dec 11, 2019

AnandDevan commented Jan 8, 2020

domino7 commented Jan 16, 2020

stale bot commented Feb 15, 2020

Pryz commented Feb 20, 2020

stale bot commented Mar 21, 2020

Nathan-Okolita-Aimtheory commented Mar 23, 2020

stale bot commented Apr 22, 2020

chadlwilson commented Apr 23, 2020

pracucci commented Apr 23, 2020

bwplotka commented Oct 14, 2020

stale bot commented Dec 13, 2020

pracucci commented Dec 14, 2020

stale bot commented Apr 18, 2021

midnightconman commented Apr 19, 2021

pete-leese commented May 20, 2021

stale bot commented Jul 19, 2021

dsayan154 commented Jul 20, 2021

oronsh commented Sep 5, 2021

stale bot commented Jan 9, 2022

markmsmith commented Jan 9, 2022

yuha0 commented Feb 4, 2022

tomereli commented Apr 10, 2022 • edited

bwplotka commented Apr 29, 2022

hanjm commented Apr 29, 2022

tomereli commented Apr 29, 2022

markmsmith commented May 2, 2022

yeya24 commented Jun 24, 2022

github-edouard-devouge commented Jul 31, 2019 •

edited

tomereli commented Apr 10, 2022 •

edited