Drop firebase in favor of always alive postgres instance + some sort of middleware for querying #20

evamaxfield · 2021-10-15T00:35:49Z

Idea / Feature

Replace Firebase Cloud Firestore with a Postgres instance. (and Firebase Storage with just a Google Storage bucket)

Use Case / User Story

Firebase has given us a ton of problems in automated deployment / cookiecutter deploy, replacing it with something else that we can entirely control and manage would be easier.
Document Storage has generally worse query performance than Relational storage and there is no join functionality. This affects our query performance drastically.
Directly from data user: document storage is just not what I think about when I think of a dataset.

Solution

In general, switch to some Relation storage database (Postgres preferred just because of how tested and trusted it is). File storage can be done using a basic google bucket.

I have seen promise with supabase: https://github.com/supabase/supabase
it is entirely open source, it has a beautiful API for access data, and bills itself as a the relational database equivalent of firebase but ya know. everything in it's tech stack is open.

it is even open enough that we can just use their docker images and self host: https://supabase.io/docs/guides/hosting/overview

Alternatives

much like the problems with firebase, I am not sure how customizable supabase is. or if we should even use it.

an alternative is just to run everything ourselves. but it seems like all of the docker images and tools that supabase has decided to use make a lot of sense.

Stakeholders

@isaacna @tohuynh

Major Components

these are rough steps

experiment with different frameworks, libraries, etc.
decide on frameworks, libraries, etc. as team
show example working API in Python + JS and storage components (with setup in infra-as-code)
- make build should setup some container somewhere on GCP with database and ports are exposed and public read set for example
- make add-test-data should simply add some fake values to the newly created infra
- show in Python + JS that we can easily fetch data
come up with rollout and migration strategy
migrate

Dependencies

Other Notes

The text was updated successfully, but these errors were encountered:

tohuynh · 2021-10-15T17:57:31Z

Document Storage has generally worse query performance than Relational storage and there is no join functionality. This affects our query performance drastically.

True, but only if our database models are relational (which cdp database models are) and we don't de-normalize our database models. Our error was trying to use a non-relational database to store relational data.

Overall, I support the switch since it could improve the speed of fetching data by a lot.

evamaxfield · 2021-10-15T18:47:22Z

This is also related to CouncilDataProject/cdp-backend#167. Most relational database systems have model handling built in and we may not need these models. Further, most of our issues with event gather pipeline and uploading data in chunks. I think with relational we can actually use a proper transaction? Maybe?

Finally. If we use a relational database, we can use alembic for schema versioning (thank god)

isaacna · 2021-10-16T17:39:51Z

I agree with To in that it doesn't really make much sense using a non-relational database when our schema is relational.

I think with relational we can actually use a proper transaction? Maybe?

Postgres supports transactions so I think this would be a huge plus. The way we handle bulk db uploads with store_event_processing_results in the pipeline currently is pretty cool, but I don't think we should have to do that in the first place.

Also after just glancing briefly at the docs, Supabase seems to be really nice and also have much more active dev support and general usage than FireO.

I will say tho that the Python client they have seems to be community made, and not actually from the supabase devs themselves. Additionally, they don't have a native ORM geneators for Python (or at least yet): https://supabase.io/docs/reference/javascript/generating-types

If ORM is really desired than we could use something like SQLAlchemy along with Supabase

What customization use cases did you have in mind that CDP have that Supabase might not support?

evamaxfield · 2021-10-17T00:02:15Z

I will say tho that the Python client they have seems to be community made, and not actually from the supabase devs themselves. Additionally, they don't have a native ORM geneators for Python (or at least yet): https://supabase.io/docs/reference/javascript/generating-types

Yea I think this is something we will have to simply discuss as a team. I personally think I preferred FireO and the ORM style because we weren't using a relational system with SQL. Like, FireO's querying is easier than firebase python querying imo. That was a large reason why I personally supported it and if supabase-js / any sort of python API client can handle it, I think we should just experiment a bit.

What customization use cases did you have in mind that CDP have that Supabase might not support?

It's basically always public read and related. "How easy can we set this up with infrastructure-as-code"

tohuynh · 2021-11-03T03:22:40Z

If we want to change the database, the frontend and backend code that interacts with the database would need to change as well.

We should add an intermediary data layer for both the frontend and backend. To interact with the database, the frontend and backend will go through the data layer. The advantage is if we decide to change the database again, we would only change the data layer. So long as the contract between the data layer and the frontend and backend doesn't change, we won't need to modify the frontend or backend at all.

evamaxfield · 2021-11-03T04:08:20Z

Ahhhhh I see. Good point @tohuynh. So this would be the "middleware" component.

evamaxfield added the proposal A detailed proposal / spec for a CDP feature label Oct 15, 2021

evamaxfield mentioned this issue Oct 15, 2021

🙏 CouncilDataProject/cdp-data#1

Closed

evamaxfield added the infrastructure Build, upgrade, or remove, a piece of infrastructure or core utility label Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop firebase in favor of always alive postgres instance + some sort of middleware for querying #20

Drop firebase in favor of always alive postgres instance + some sort of middleware for querying #20

evamaxfield commented Oct 15, 2021

tohuynh commented Oct 15, 2021 •

edited

evamaxfield commented Oct 15, 2021

isaacna commented Oct 16, 2021

evamaxfield commented Oct 17, 2021

tohuynh commented Nov 3, 2021

evamaxfield commented Nov 3, 2021

Drop firebase in favor of always alive postgres instance + some sort of middleware for querying #20

Drop firebase in favor of always alive postgres instance + some sort of middleware for querying #20

Comments

evamaxfield commented Oct 15, 2021

Idea / Feature

Use Case / User Story

Solution

Alternatives

Stakeholders

Major Components

Dependencies

Other Notes

tohuynh commented Oct 15, 2021 • edited

evamaxfield commented Oct 15, 2021

isaacna commented Oct 16, 2021

evamaxfield commented Oct 17, 2021

tohuynh commented Nov 3, 2021

evamaxfield commented Nov 3, 2021

tohuynh commented Oct 15, 2021 •

edited