Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata upload to Mussel by team #751

Closed
wants to merge 19 commits into from

Conversation

hanyuli1995
Copy link
Collaborator

@hanyuli1995 hanyuli1995 commented Apr 23, 2024

Summary

We want to upload the key-value pair: team -> list of groupBy entities under the key team to mussel.

2024-04-28 23:12:51 INFO  MetadataStore:280 - Putting metadata for
key: group_bys/cs_ds
conf: ["group_bys/cs_ds/host.trip_stage.v1","group_bys/cs_ds/user.message_intent.v1"]

Why / Goal

Test Plan

Checklist

  • Documentation update

Reviewers

@nikhilsimha

@hanyuli1995 hanyuli1995 changed the title [wip]Add metadata upload to Mussel by team Add metadata upload to Mussel by team Apr 23, 2024
def putConf(configPath: String): Future[Seq[Boolean]] = {
// derive team from path to file
def pathToTeam(confPath: String): String = {
// // capture <conf_type>/<team> as key e.g joins/team
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example of input and output of the function to the comments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikhilsimha updated

@nikhilsimha
Copy link
Contributor

Let's generalize this even more.

we want to eventually put a lot more metadata.

Team/keys -> gbs
Team/keys -> joins
Team/keys -> features
Team -> stagingqueries

Feature -> confs
Table -> confs
Feature -> GroupBy
Feature -> joins
groupBy -> joins

For auto complete search:
Constant -> list of teams
Constant -> list of keys
Constant -> list of groupBys
Constant -> list of features
Constant -> list of joins

I believe we can write a method that extracts all this metadata in one traversal, without having to parse the files repeatedly.

We need a map of endpoint name -> key value extractor

@hanyuli1995
Copy link
Collaborator Author

Hi @nikhilsimha , I think we have two types of key-value pairs:

  1. keys -> gbConf and keys->joinConf are 1 to 1 mapping and they are already there, we can easily combine them together by providing an upper path which include both joins and groupBys, same for key -> features, feature -> confs, table -> confs, feature -> GroupBy, feature -> joins. These k-v pair can be derived from one single file and they can be finished in one traversal.

  2. team -> gbs, team -> joins and team -> features are 1 to many mapping and we cannot directly derive a put request by looking at one single file. Need to traverse all the files to build the sequence first and then generate the k-v pairs.

I am adding two separate functions now putConfByTeam and putConfByName, We can get all the k-v pair from 1 in function putConfByName by adding new putRequest. For the k-v pairs from 2 we can get all of them in function putConfByTeam. Wander do you think we need to merge these two functions together? I think this is do-able but not sure whether there is a clean way in scala code to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants