Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Has anyone tired using Postgres with multiple databases holding multiple tenants/schemas? #206

Open
JonMcCarthy7 opened this issue Aug 25, 2022 · 8 comments

Comments

@JonMcCarthy7
Copy link

I'd like to hear from people who have been able to successfully accomplish this type of architecture using this gem. I'm currently stuck with connection errors from constantly switching database connections. I think the route I need to take is better managing the connection_handler/connection_pools that connect to each database. Has anyone ran into something similar? Thanks

@SkyM
Copy link

SkyM commented Aug 26, 2022

@JonMcCarthy7 We have been using this in production for ~5 years. At this point we have >100 tenant databases with 1-5gb of data per database. I honestly consider this architecture decision one of the worst in my career. By all metrics we have 'successfully' accomplished this architecture, however it is turning into a major issue as we look forward to scaling and re-factoring our codebase. Backing out of this style multi-tenant database is a HUGE problem. Be advised.

For your specific issue, if you find yourself switching between multiple tenants per request or per lifecycle of a thread, you probably need to reconsider how you are separating data. You should definitely be looking at expanding thread pools and minimizing the amount of switching per request that you are performing.

@JonMcCarthy7
Copy link
Author

JonMcCarthy7 commented Aug 26, 2022

Hey @SkyM. Thanks for the response. Are you saying you did the 1 Tenant per DB approach? Sorry if I wasn't clear, in my case I'm trying to set up multiple tenants on each database/shard.

@SkyM
Copy link

SkyM commented Aug 26, 2022

We used MySQL and did one tenant per database with a master database to handle shared data.

@SkyM
Copy link

SkyM commented Aug 26, 2022

For context, you may want to read through this discussion on the original library: influitive/apartment#636

@oleksii-leonov
Copy link
Contributor

oleksii-leonov commented Aug 27, 2022

@JonMcCarthy7

We have been using the apartment gem for at least 5 years, and absolutely happy with it.

  • We using PostgreSQL with a separate schema for each tenant.
  • Our product is a B2B SaaS, designed for large companies in a specific industry. So, we have thousands of clients (not millions).
  • We have a significant amount of business logic and ~250 tables in each tenant, some data stored as JSONB.
  • Tenants could be quite large (hundreds of gigabytes).
  • We need strict data isolation between tenants.
  • It's extremely rare we need to collect data from different tenants.
  • We do switch to the proper tenant once per request. We never do switch inside requests. If we need to show some aggregated data, we use a background job to collect it first and then keep it in the shared schema.

If you have the same use case, this setup with apartment gem and PostgreSQL with a separate schema for each tenant will work like a charm:

  • It's much easier to migrate 1000 relatively small schemas (with millions of records in tables) than one DB/schema with tables with billions of records.
  • It's almost impossible to accidentally show one client's data to other clients. In the case of a classic setup with one DB/schema and hundreds of models with the company_id column — it's easy to accidentally miss scope.
  • It's easy to restore one particular client's schema from DB backup in case of some emergency (if the client accidentally deleted some critical data from its account) without touching other clients' data.

But there are use cases when this setup would not be beneficial:

  • If you expect to have millions of clients. Separate schemas are pretty expensive for DB (you need much more space and experience in tuning your DB for such load).
  • If you need to mix data between different clients easily.
  • If your business logic is relatively small and straightforward and it's possible to control scope.

@oleksii-leonov
Copy link
Contributor

@JonMcCarthy7

Sorry, I realized we use multiple DBs with tenants differently than you asked.

In our case, we have multiple DB servers and use separate schemas for each tenant. But we keep all the tenants on each of DBs.

This means we have schema "test-client-1" on all DB servers. But "DB server 1" has tables dedicated to one business domain and "DB server 2" has tables dedicated to another business domain. Each worker keeps a connection to all DBs and switches between tenants with our custom switch method (set search_path for all connected DBs). So, we don't sharding tenants between different DB servers.

@tjhanley
Copy link

@aleksejleonov

This means we have schema "test-client-1" on all DB servers. But "DB server 1" has tables dedicated to one business domain and "DB server 2" has tables dedicated to another business domain. Each worker keeps a connection to all DBs and switches between tenants with our custom switch method (set search_path for all connected DBs). So, we don't sharding tenants between different DB servers.

I am experimenting with multiple DB Hosts [#208]. I am experiencing a lot of issues. You mention a custom switch method with search_path. When I run the stock switch Apartment doesn't seem to find the tenant's schema on the secondary host.

irb(main):007:0> Apartment.tenants_with_config
=>
{"tenant_1"=>{"adapter"=>"postgresql", "host"=>"localhost", "port"=>5432, "encoding"=>"unicode", "database"=>"development"},
 "tenant_2"=>
  {"host"=>"some-host-ip",
   "adapter"=>"postgresql",
   "database"=>"tenant2_db",
   "username"=>"postgres",
   "port"=>5432,
   "encoding"=>"unicode",
   "password"=>"[redacted]"}}
irb(main):006:0> Apartment::Tenant.switch('tenant_2')
/Users/tom/.rbenv/versions/3.0.4/lib/ruby/gems/3.0.0/gems/ros-apartment-2.11.0/lib/apartment/adapters/postgresql_adapter.rb:139:in `raise_schema_connect_to_new': Could not set search path to schemas, they may be invalid: "tenant_2" "tenant_1". (Apartment::TenantNotFound)
Original error: ActiveRecord::StatementInvalid: Could not find schema tenant_2
/Users/tom/.rbenv/versions/3.0.4/lib/ruby/gems/3.0.0/gems/ros-apartment-2.11.0/lib/apartment/adapters/postgresql_adapter.rb:75:in `connect_to_new': Could not find schema tenant_2 (ActiveRecord::StatementInvalid)

The odd thing is if I issue a Apartment::Tenant.drop('tenant_2') command from the console the schema drops successfully.

irb(main):008:0> Apartment::Tenant.drop('tenant_2')
(90.3ms)  DROP SCHEMA "tenant_2" CASCADE
=> #<Thread::ConditionVariable:0x000000010baf2510>

@oleksii-leonov
Copy link
Contributor

oleksii-leonov commented Nov 14, 2022

@tjhanley

class ApplicationRecord < ActiveRecord::Base
  primary_abstract_class
end

class Db1Base < ApplicationRecord
  self.abstract_class = true
  establish_connection(:db1)
end

class Db2Base < ApplicationRecord
  self.abstract_class = true
  establish_connection(:db2)
end

class ModelMain < ApplicationRecord; end
class Model1 < Db1Base; end
class Model2 < Db2Base; end
module SwitchTenant
  def self.reset
    switch!(Apartment.default_tenant)
  end

  def self.switch!(tenant = nil)
    if tenant.nil?
      reset
    else
      # Main DB
      Apartment::Tenant.switch!(tenant)

      # DB1
      Db1Base.connection.schema_search_path = tenant
      Db1Base.connection.clear_query_cache

      # DB2
      Db2Base.connection.schema_search_path = tenant
      Db2Base.connection.clear_query_cache
    end
  end

  def self.switch(tenant = nil)
    switch!(tenant)
    yield
  ensure
    reset
  end
end
> SwitchTenant.switch!('tenant1')
# Now you will get all the data from the "tenant1" schema in each DB.
> ModelMain.all
> Model1.all
> Model2.all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants