Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric v2 migration #42613

Merged
merged 21 commits into from
May 20, 2024
Merged

Metric v2 migration #42613

merged 21 commits into from
May 20, 2024

Conversation

metamben
Copy link
Contributor

@metamben metamben commented May 13, 2024

fixes #42186
fixes #42187
fixes #42188
fixes #42189
fixes #42190
fixes #42191

@metamben metamben requested a review from camsaul as a code owner May 13, 2024 22:19
@metabase-bot metabase-bot bot added the .Team/QueryProcessor :hammer_and_wrench: label May 13, 2024
@metamben metamben requested a review from snoe May 13, 2024 22:31
Copy link

replay-io bot commented May 13, 2024

Status Complete ↗︎
Commit f907cef
Results
1 Failed
⚠️ 3 Flaky
2563 Passed

Copy link
Contributor

@snoe snoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but probably worth thinking about the couple questions.

src/metabase/db/custom_migrations/metrics_v2.clj Outdated Show resolved Hide resolved
- column:
name: dataset_query_metrics_v2_migration_backup
remarks: The copy of dataset_query before the metrics v2 migration
type: ${text.type}
Copy link
Contributor

@snoe snoe May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be a separate table [report-card-id, old-query]

Having this hang around on report_card could be rather annoying long-term.

Copy link
Contributor Author

@metamben metamben May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering this too. Do you know of prior art? I assumed we would drop that field a few releases later. We would have to do that with a separate table too, I guess. I wonder if such an extra column can mess up serdes (although the tests don't fail).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my concern was serdes as well, just from the cleanliness of the serialized cards (in a git backed serdes system) I think it would probably be better separate if it's not too much more cumbersome

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think creating a separate table is ultimately going to be more code and more noise, I think we should just have SerDes ignore the backup column entirely. It's only there in case we broke things anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Populating the backup column with values can and should be done in Liquibase. Look at migration 85 for example. Only do stuff in Clojure migrations if they can't be done in SQL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camsaul, the custom migration populates this field only if it rewrites dataset_query. Are you suggesting always setting this field, or adding a condition approximating what's in the custom migration or something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment, I think it's probably fine to just always set this Field since it's just copying pointers and we're already doing a full table scan anyway.

Copy link
Contributor Author

@metamben metamben May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that imply that on rollback we lose changes that we could keep? I ranked the importance of this higher than speed.

@camsaul
Copy link
Member

camsaul commented May 16, 2024

I think you have to put all of your Fixes #issue on different lines otherwise it's not going to link them to this PR and close them automatically

(:id card)
(-> rewritten
(assoc :dataset_query_metrics_v2_migration_backup (:dataset_query card)
:updated_at (Instant/now))))))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's usually better to use current_timestamp or now() (or now(3) for MySQL/MariaDB to get millisecond precision, altho that's probably not important in this case) so you don't have to worry about weirdness happening here with timezone conversion if this is a datetime or timestamp [without time zone] column but I guess it doesn't really matter that much

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in fact, I'm not 100% sure that updating this column is what everybody would wish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this occurrence altogether, but there is another one. Is this important? As long as setting the timestamp works, I think we are fine.

(t2/update! :report_card
(:id card)
(-> rewritten
(assoc :dataset_query_metrics_v2_migration_backup (:dataset_query card)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably easier just do

UPDATE report_card
SET dataset_query_metrics_v2_migration_backup = dataset_query

or

UPDATE report_card
SET dataset_query_metrics_v2_migration_backup = dataset_query
WHERE dataset_query LIKE '%["metric",%'

in a Liquibase migration to back up all the Card queries in one go, then you have a discrete step that is super easy to reverse because all you need to do is

UPDATE report_card
SET dataset_query = dataset_query_metrics_v2_migration_backup 
WHERE dataset_query_metrics_v2_migration_backup  IS NOT NULL

those should both be reasonably quick (other than the full table scan) since the strings should be stored separately anyway so it's just copying pointers rather than entire strings.

Not sure if

WHERE dataset_query LIKE '%["metric",%'

would actually make it faster, on the one hand you're avoiding a lot of updates but on the other hand now you have to fetch and scan every string. It's simpler just to copy the value for every Card

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think knowing which cards actually got rewritten (and might have to be restored) is more important than speed, but I can be convinced otherwise.

:made_public_by_id
;; we don't expect a description for this column because it should never change
;; once created by the migration
:dataset_query_metrics_v2_migration_backup nil} col)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion, just change assertion from = to =? instead of adding this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be a bit more precise what you mean? This is an explicit set of fields we want to exclude from testing.

Copy link
Member

@camsaul camsaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than relatively small style suggestions my main concern here is that the migration is doing things really inefficiently, you're doing things like

(doseq [card (select :report_card :collection_id 1)]
  (t2/delete! :report_card :id (:id card))

which is pretty slow when you could have just done

(t2/delete! :report_card :collection_id 1)

instead. This can be pretty important since it affects startup time and we're dealing with potentially hundreds of things that need migration -- one DML query versus 100 is a pretty big performance difference at startup time.

But it would have been even better to make most of these things plain SQL/Liquibase migrations in the first place anyway. It's way easier to write efficient and unbuggy SQL queries than it is to write Clojure migrations.

Creating the migrations collection, granting it perms, and populating the backup column for dataset_query can all be done in Liquibase migrations easily, as can reversing those migrations.

The only thing that really needs to be done as a Clojure migrations is the part where you migrate queries for Cards that referenced v1 Metrics.

Migrating V1 Metrics to Cards could be done in pure SQL if you wanted to. Something like this would probably do the trick

INSERT INTO report_card
  (name, type, creator_id, migrated_from_v1_metric_id, dataset_query, database_id, table_id, collection_id)
SELECT
  concat(metric.name, ' (Migrated from metric ', cast(metric.id AS text), ')'),
  'metric',
  metric.creator_id,
  metric.id,
  -- Maybe using native JSON manipulation functions here would be easier
  concat(
    '{"database":', 
    cast(t.db_id AS text),
    ',"type":"query","query":', 
    concat(
      trim(trailing '}' FROM metric.definition),
      ',"source-table":',
      metric.table_id,
      '}'
    ), 
    '}'
  ),
  t.db_id,
  metric.table_id,
  (SELECT id FROM collection WHERE slug = 'migrated_metrics_v1')
FROM metric
LEFT JOIN metabase_table t
  ON metric.table_id = t.id;

You could add a new column called migrated_from_v1_metric_id and then build a map of migrated_from_v1_metric_id => id to migrate queries in Clojure land.

For the Clojure migrations select-reducible + reduce is going to be a lot more efficient than select and fetching everything all in memory at once and then iterating over it with doseq, even more so if you only fetch the columns you actually need

@camsaul
Copy link
Member

camsaul commented May 16, 2024

Ok I was thinking about doing the Metric => Card migration in SQL a little more and the only real tricky part here is populating dataset_query, in Postgres using native JSON manipulation functions you can do this instead of all the concat and trim shenanigans

jsonb_set(
  jsonb_set(
    jsonb_set('{"type":"query"}'::jsonb, 
      '{database}', to_jsonb(t.db_id)),
    '{query}', metric.definition::jsonb),
  '{query, source-table}', to_jsonb(metric.table_id)
)::text

This seems a lot more readable to me, the only downside is you'd have to do different versions for MySQL and H2, but maybe you'd have to do that anyway. Take a look at v42.00-001 for an example of using native JSON functions in a DB migration

@camsaul
Copy link
Member

camsaul commented May 16, 2024

Ok so I guess I misspoke a bit and the version of H2 we're using doesn't have enough JSON functions to make this work with native JSON manipulation in H2. So I guess you'd have to do concat/trim there

@metamben
Copy link
Contributor Author

@camsaul, I have the feeling that either we go all in and do the whole migration in SQL or we keep the question rewriting part in Clojure and then it makes little sense to do the v2 metric card creation in SQL, because that's orders of magnitude less work. I expect the updates to the question cards to dominate the whole process.

I'm not super keen on moving the whole thing to SQL, because reproducing all the tests I did with the Clojure land rewrite would be quite a large amount of work. Unless you think it's crucial to do as much as possible in standard liquibase as possible, I would prefer to keep the data manipulating parts of the migration together.

@metamben metamben requested a review from camsaul May 17, 2024 17:02
@ranquild ranquild merged commit 2fe6e7e into metrics-v2 May 20, 2024
21 checks passed
@ranquild ranquild deleted the metric-v2-migration branch May 20, 2024 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.Team/QueryProcessor :hammer_and_wrench:
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants