Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commas in tag values are not escaped #83

Open
martin-sucha opened this issue Apr 5, 2019 · 7 comments
Open

Commas in tag values are not escaped #83

martin-sucha opened this issue Apr 5, 2019 · 7 comments

Comments

@martin-sucha
Copy link
Contributor

Tags are a slice of strings in the documentation, but datadog-go documentation does not mention valid values for tags. datadog-go removes newlines from tags, but does not remove/escape other invalid characters like comma (,). Comma in tag value will be interpreted as multiple tags since tags are comma-separated in the UDP datagrams.

The page at https://docs.datadoghq.com/developers/guide/what-best-practices-are-recommended-for-naming-metrics-and-tags/ lists recommeded practice for naming tags, perhaps datadog-go should provide a function to sanitize the tag values according to those rules?

@mrVanboy
Copy link

mrVanboy commented Apr 8, 2019

FYI we implemented our custom sanitizers for tags which follows the best practices from DataDog documentation and I believe that we can create PR after maintainers of this repo confirm that sanitizers should be in this package.

@arbll
Copy link
Member

arbll commented Nov 28, 2019

Hi there,

Feel free to open a PR, sanitizing the tags makes sense to me.

I am a bit worried about the potential performance hit, make sure to mesure it if you create one ;).

Arthur

@jmacd
Copy link

jmacd commented Dec 9, 2019

I referenced this from an OpenTelemetry metrics SDK issue in which we are looking to address the same problem. I started poking around hoping that DataDog had already set a convention for how to encode tag values with commas in them. @arbll the problem is that your agent will have to understand the convention that is used, so this issue can't be solved in the client alone.

How does DataDog intend to support tag values with commas in them?

@arbll
Copy link
Member

arbll commented Dec 10, 2019

Hi @jmacd,
From the data I can gather it seems that commas are not supported at all in tags, even in the backend:

May contain alphanumerics, underscores, minuses, colons, periods, and slashes. Other characters are converted to underscores.

I think in this library it would make sense to sanitize by either removing commas or replacing them by underscores.

@jmacd
Copy link

jmacd commented Dec 10, 2019

@arbll The document linked above, which I believe you are quoting from, describes naming tags, i.e., the valid set of characters permissible in a tag name. Restricting tag names is fine. It does not mention a restriction on possible tag values.

Anyway, in as much as I am currently responsible for the OpenTelemetry metrics specification, I will not approve any such sanitization of tag values and if this is something Datadog won't spec out, we'll probably end up creating a new specification or not supporting dogstatsd out of the box.

One way to spec this out would be to follow the W3C specification for trace context, which specifies how to encode distributed context tags in URL encoding. I.e., commas would be encoded using %2C.

@jmacd
Copy link

jmacd commented Dec 10, 2019

Here is the specification I refer to: https://w3c.github.io/correlation-context/#value-format

@bai
Copy link

bai commented Dec 16, 2019

Please correct if I'm wrong — from what I see, DataDog Agent doesn't have "tag name" and "tag value" distinction for metrics, those are parsed [0] directly into []string [1], using , as separator, meaning it's always just a single string "tag" (e.g., a valid list of tags: []string{"foo:bar", "baz:hey", "foo"})

Note that according to statsd/dogstatsd spec, "tag names" are not unique, so as per "valid list of tags" example above, tag foo is used twice per metric: once with value, once without. Assuming that's why DataDog uses string slice instead of hashmap for storing tags.

[0] https://github.com/DataDog/datadog-agent/blob/master/pkg/dogstatsd/parser.go#L85
[1] https://github.com/DataDog/datadog-agent/blob/master/pkg/metrics/metric_sample.go#L69

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants