Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

The schema sets the column mergestat.container_images.url as unique, which stops additional syncs from being added for the same repository. #1142

Open
josipradic opened this issue Dec 1, 2023 · 2 comments

Comments

@josipradic
Copy link

Hello everyone, great work on this amazing project! 馃憦馃徏

We've developed numerous MergeStat syncs for internal purposes, collecting data from our apps and inserting outcomes into Postgres. Our code resides in a Git repository, containing Dockerfiles and main.ts scripts. Our approach involves building images and pushing them to the same registry's repository but with varying tags. This work aligns seamlessly with our existing CI integration for automated build and deployment.

However, we're encountering an issue with MergeStat's schema, which imposes a unique constraint (unique_container_images_url) on the url column within mergestat.container_images table:

(https://github.com/mergestat/mergestat/blob/81d136637e2f2062327e30838310f9c9a8df7906/migrations/900000000000064_update_explore_add_saved_explores.up.sql#L229C50-L229C50)

This constraint restricts us from adding more syncs from our repository with the same URL but different tags, as the schema demands uniqueness for URLs.

To address this, we've resolved the issue by removing the uniqueness constraint using ALTER TABLE mergestat.container_images DROP CONSTRAINT IF EXISTS unique_container_images_url;, enabling us to add all our syncs.

Could this action potentially cause other functionalities to fail?

We're curious about the rationale behind making the url column unique.

Alternatively, another approach could involve pushing all syncs into their own repositories. However, as mentioned earlier, we view this as a singular project and would prefer to avoid such segregation, especially if the uniqueness of the column might be reconsidered or removed in the near future.

@amenowanna
Copy link
Contributor

@josipradic The main reason for that constraint is to avoid adding a sync twice that interacts with the same sync tables. The pattern we have is that a sync is responsible for the maintaining a certain set of tables for a specific repo. So if you run two syncs that touch the same tables for the same repo then you introduce a race condition between the syncs. Your approach is very interesting and I love to hear it is working for you. I think removing the constraint in your case makes sense but I don't see it as something I would change to the core project. Once a migration is run it won't run again so the constraint would only come back when/if you start from scratch. Would love to learn more of the use cases. We have a community slack and would love to get to know you and your team more!

@josipradic
Copy link
Author

josipradic commented Dec 7, 2023

@amenowanna Thanks for fast reply. Yes, I think I realized why you are going in such direction but I was curious to find out if there's any other reason of doing that. For example, you do have a sync which combines pull requests and pull request commits which means that we must make sure we are not running original pull-requests and pull-request-commits at the same time or we will end up in race condition like you've said.

We are making our small internal project writing syncs that are connecting to various APIs so we gather some insights into the PostgreSQL and we were thinking to maintain all syncs in the same repository (git and registry). That's the only reason why I wanted to understand how MergeStat thinks about this issue and what will be the likelihood of future development.

Would love to join your Slack community.

Looking forward to your latest version (it should include fix for scheduling job intervals) because we find it very costly to spin up jobs every minute. 馃槄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants