Fill anchors instead of quoted anchors #28

WilliamHarvey97 · 2022-04-06T13:27:13Z

In my custom vega-lite template, I tried do to something like this to compute the sum of squared residuals:

           "transform": [
                {
                    "joinaggregate": [
                        {
                            "op": "mean",
                            "field": "lab",
                            "as": "mean_y"
                        }
                    ]
                },
                {
                    "calculate": "pow(datum.<DVC_METRIC_Y> - datum.<DVC_METRIC_X>,2)",
                    "as": "SR"
                },
                {
                    "joinaggregate": [
                        {
                            "op": "sum",
                            "field": "SR",
                            "as": "SSR"
                        }
                    ]
                }
            ]

However, it seems impossible to do this because it looks like dvc-render replace only a quoted anchor like f'"{cls.anchor(name)}"' instead of just the anchor f'{cls.anchor(name)}'

dvc-render/src/dvc_render/vega_templates.py

Line 78 in 422bfe6

return f'"{cls.anchor(name)}"'

I didn't dig a lot into this, but I was just wondering if this could just be this way so we could use anchors on datum?

Thank you!

The text was updated successfully, but these errors were encountered:

dberenbaum · 2022-04-11T12:51:58Z

@pared Thoughts on this?

pared · 2022-04-11T13:20:09Z

The use case makes perfect sense. However, there is a certain reason why we quote the anchors, and this is because we treat the templates as dictionaries (not only in code, but for example in integrations, like vscode). Having quoted anchors allows us to parse and dump from/to json. We could think about involving some logic regarding the quotes. In the case provided by @WilliamHarvey97 we could detect that anchors are used within a string and it's safe to replace them without quotes.

dberenbaum · 2022-04-11T14:48:34Z

@pared Maybe we aren't being clear about whether templates are:

JSON documents where anchors are JSON values to be replaced OR
Text documents where anchors are strings to be replaced.

Is VS Code expecting each anchor to be injected as a value to a JSON key or as a string into a text doc? cc @mattseddon

pared · 2022-04-11T15:33:11Z

@dberenbaum the main problem is that templates are both. When replacing the anchors we treat them as text files. Inside DVC and the integrations we treat them as JSON dictionaries, for convenient parsing. Also, as I recall we might be relying on the current behavior in some places (@mattseddon don't you change the colors of the plots somewhere?).

mattseddon · 2022-04-11T22:26:36Z

@dberenbaum we are currently holding the template as a string so that we can do a String.replace and coerce the string into JSON like this:

  JSON.parse(
    template.replace('"<DVC_METRIC_DATA>"', JSON.stringify(datapoints))
  ) as TopLevelSpec

@pared we update the colors by overwriting that part of the encoding with:

  encoding: {
    color: {
      legend: { disable: true },
      scale: colorScale
    }
  }

Where colorScale has this type: { domain: string[]; range: string[] }.

I am open to this changing we just need to be careful not to break anything.

Note: the way that our code is currently setup could be improved as we actually convert the initial CLI output into a JSON object and then back to a string, and then back to JSON once we know which datapoints we want to display.

pared · 2022-04-12T09:59:20Z

So I guess we have two ways to go:

Stop treating the templates as json dictionaries - that would require synchronous changes for DVC, Studio and VSCode, makes debugging a bit harder
Add functionality to fill anchors that are not entrenched within quotes, it would require a bit of logic inside template but should be possible to do.

There still exists possibility to treat templates only as dictionaries. In this way, we would remove anchors and would need to provide path within dict and data to fill it with. Mentioning it only for historical accuracy, as it would require users to learn vega to create their own templates.

daavoo · 2022-04-12T10:24:20Z

Related #23 ?

dberenbaum · 2022-04-12T18:02:33Z

There still exists possibility to treat templates only as dictionaries. In this way, we would remove anchors and would need to provide path within dict and data to fill it with. Mentioning it only for historical accuracy, as it would require users to learn vega to create their own templates.

@pared Could you explain more or show an example of what templates would look like without anchors?

Also, it might be fine to not allow use cases like pow(datum.<DVC_METRIC_Y> - datum.<DVC_METRIC_X>,2) as long as we clearly document it. We may need to update the docs anyway to clarify current behavior (are the anchors <DVC_METRIC_X> or "<DVC_METRIC_X>")?

One more thought: it seems like only <DVC_METRIC_DATA> is ever anything other than a string, so maybe it makes sense to do a simple replace without quotes for everything other than that anchor?

pared · 2022-04-14T15:33:00Z

One more thought: it seems like only <DVC_METRIC_DATA> is ever anything other than a string, so maybe it makes sense to do a simple replace without quotes for everything other than that anchor?

That would make sense.

Regarding the no-anchors approach:
Every templates is simply a dict.
So, if we know how vega works, we will know where should we put particular fillers. eg. when inserting data we would need to dovega_dict["data"] = {DVC_PROVIDED_DATA} in the filling code.

The issue gets a bit complicated when talking about different types of plots. For example, in our simplest plot, we can provide configuration for x and y axis and color as a dictionary with proper structure. This gets a bit complicated in more advanced templates, like confusion matrix, where we have to provide x and y for some data transformations that we perform. When creating it, it seemed much easier to ask users to create their own template (with their own data) and in the end just replace their data references with proper anchors.

pared · 2022-04-22T10:13:02Z

BTW @WilliamHarvey97 what kind of plot are you trying to implement? Maybe we could approach it in some other way.

WilliamHarvey97 · 2022-04-23T21:13:59Z

I was implementing an actual v.s. predicted plot. I wanted to add MAPE, RMSE, R2 and other useful metrics to show at the top of the plot. We often share our results within my team and I found it useful to include these metrics at the top so we don't need to dig further.

In the end, we chose to version png image. We wanted to use the color of each point to encode something else and in the current implementation of dvc-render, the color is reserved for version encoding. I prefer the side-to-side comparison of png.

pared · 2022-04-25T11:29:59Z

@WilliamHarvey97 Got it. Well, what you are trying to achieve should be the scope of iterative/dvc#7086. Would you mind sharing some example of your use case? I am preparing some backward incompatible changes for dvc 3.0 and knowing more use cases might help me define it better.

daniel-falk · 2022-05-09T13:51:39Z

This would be a really nice feature to support since a lot of use cases depends on the calculate transform. In my case I wanted to normalize across all the elements in each revision. I ended up doing a "hack" and using the sum operator of the joinaggregate transform grouped over a unique combination of values, this way I could rename a field specified with an anchor to a fixed string which I then could use in the calculate transform. Like this:

  "transform": [
    {
      "joinaggregate": [{
        "op": "sum",
        "field": "<DVC_METRIC_Y>",
        "as": "count"
      }],
      "groupby": ["rev"]
    },
    {
      "joinaggregate": [{
          "op": "sum",
          "field": "<DVC_METRIC_Y>",
          "as": "variable"
      }],
      "groupby": ["<DVC_METRIC_X>", "rev"]
    },
    {
      "calculate": "datum.variable/datum.count",
      "as": "variable_as_percent"
    }
  ],

Here is a working example in vega-lite:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "values": [
      {
        "aggregate": 4208.201290480213,
        "prop": "prop1",
        "rev": "workspace"
      },
      {
        "aggregate": 1841.9859846942586,
        "prop": "prop2",
        "rev": "workspace"
      },
      {
        "aggregate": 1051.8727099035762,
        "prop": "prop3",
        "rev": "workspace"
      },
      {
        "aggregate": 740.6659046177122,
        "prop": "prop4",
        "rev": "workspace"
      },
      {
        "aggregate": 483.1788367367855,
        "prop": "prop5",
        "rev": "workspace"
      },
      {
        "aggregate": 567.9014562535242,
        "prop": "prop6",
        "rev": "workspace"
      },
      {
        "aggregate": 2183.434726669379,
        "prop": "prop7",
        "rev": "workspace"
      },
      {
        "aggregate": 2208.201290480213,
        "prop": "prop1",
        "rev": "HEAD"
      },
      {
        "aggregate": 1841.9859846942586,
        "prop": "prop2",
        "rev": "HEAD"
      },
      {
        "aggregate": 1051.8727099035762,
        "prop": "prop3",
        "rev": "HEAD"
      },
      {
        "aggregate": 740.6659046177122,
        "prop": "prop4",
        "rev": "HEAD"
      },
      {
        "aggregate": 483.1788367367855,
        "prop": "prop5",
        "rev": "HEAD"
      },
      {
        "aggregate": 567.9014562535242,
        "prop": "prop6",
        "rev": "HEAD"
      },
      {
        "aggregate": 2083.434726669379,
        "prop": "prop7",
        "rev": "HEAD"
      }
    ]
  },
  "mark": "line",
  "transform": [
    {
      "joinaggregate": [{"op": "sum", "field": "aggregate", "as": "count"}],
      "groupby": ["rev"]
    },
    {
      "joinaggregate": [{"op": "sum", "field": "aggregate", "as": "variable"}],
      "groupby": ["prop", "rev"]
    },
    {"calculate": "datum.variable/datum.count", "as": "variable_as_percent"}
  ],
  "encoding": {
    "x": {"field": "prop"},
    "y": {"field": "variable_as_percent", "type": "quantitative"},
    "color": {"field": "rev"}
  }
}

pared · 2022-05-11T13:53:36Z

@daniel-falk
Good idea with the workaround. I will try to look into supporting this once iterative/dvc#7086 is done.

daavoo added the feature request label Apr 6, 2022

dberenbaum added the p2-medium label Apr 18, 2022

daavoo removed the p2-medium label Dec 20, 2022

skshetry mentioned this issue Feb 15, 2023

render: avoid deepcopying iterative/dvc#9033

Merged

mattseddon mentioned this issue Oct 5, 2023

Fill template anchors inside custom template strings #145

Merged

dberenbaum closed this as completed in #145 Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fill anchors instead of quoted anchors #28

Fill anchors instead of quoted anchors #28

WilliamHarvey97 commented Apr 6, 2022 •

edited

dberenbaum commented Apr 11, 2022

pared commented Apr 11, 2022

dberenbaum commented Apr 11, 2022

pared commented Apr 11, 2022

mattseddon commented Apr 11, 2022

pared commented Apr 12, 2022 •

edited

daavoo commented Apr 12, 2022

dberenbaum commented Apr 12, 2022

pared commented Apr 14, 2022

pared commented Apr 22, 2022

WilliamHarvey97 commented Apr 23, 2022 •

edited

pared commented Apr 25, 2022

daniel-falk commented May 9, 2022

pared commented May 11, 2022

Fill anchors instead of quoted anchors #28

Fill anchors instead of quoted anchors #28

Comments

WilliamHarvey97 commented Apr 6, 2022 • edited

dberenbaum commented Apr 11, 2022

pared commented Apr 11, 2022

dberenbaum commented Apr 11, 2022

pared commented Apr 11, 2022

mattseddon commented Apr 11, 2022

pared commented Apr 12, 2022 • edited

daavoo commented Apr 12, 2022

dberenbaum commented Apr 12, 2022

pared commented Apr 14, 2022

pared commented Apr 22, 2022

WilliamHarvey97 commented Apr 23, 2022 • edited

pared commented Apr 25, 2022

daniel-falk commented May 9, 2022

pared commented May 11, 2022

WilliamHarvey97 commented Apr 6, 2022 •

edited

pared commented Apr 12, 2022 •

edited

WilliamHarvey97 commented Apr 23, 2022 •

edited