Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support regex matching against values in attributesprocessor #979

Closed
dneray opened this issue May 16, 2020 · 6 comments
Closed

Support regex matching against values in attributesprocessor #979

dneray opened this issue May 16, 2020 · 6 comments
Assignees
Projects

Comments

@dneray
Copy link

dneray commented May 16, 2020

I have some use cases where it would be useful to be able to parse the attribute values into multiple fields.

For example my application has a field
http.url: http://example.com/path?queryParam1=value1,queryParam2=value2

and I would like to parse out the fields into:
http_protocol: http
http_domain: example.com
http_path: path
http_query_params: queryParam1=value1,queryParam2=value2

This is what I am thinking for the configuration:

  attributes/insert:
    actions:
      - from_attribute: "http.url"
        action: insert
        regex: ^(?P<http_protocol>.*):\/\/(?P<http_domain>.*)\/(?P<http_path>.*)(\?|\&)(?P<http_query_params>.*)
@ccaraman
Copy link
Contributor

To address this comment https://github.com/open-telemetry/opentelemetry-collector/pull/981/files#r426882973 in #981

I propose changing the configuration to look like

type ActionValue struct {
	// Only one of each can be specified.

        // Value specifies the value to populate for the key.
	// The type of the value is inferred from the configuration.
	// This can't be combined with other fields in this struct.
	RawValue interface{} `mapstructure:"raw_value"`

	// FromAttribute specifies the attribute from the span to use to populate
	// the value. If the attribute doesn't exist, no action is performed.
	FromAttribute string `mapstructure:"from_attribute"`

	// A regex may be specified in place of a value for INSERT, UPDATE or UPSERT
	// The keys will be inferred based on the names of the matcher groups provided
	// And the names will be inferred based on the values of the matcher group
	Regex string `mapstructure:"regex"`

	// HASH calculates the SHA-1 hash of an existing value and overwrites the value
	// with it's SHA-1 hash result. This only works with upsert/update. 
	Hash boolean 'mapstructure:"hash'`

}

// ActionKeyValue specifies the attribute key to act upon.
type ActionKeyValue struct {
	// Key specifies the attribute to act upon.
	// This is a required field.
	Key string `mapstructure:"key"`

	// Value specifies the value to populate for the key.
	// Depending on the 
	ActionValue ActionValue `mapstructure:"value"`

	// Action specifies the type of action to perform.
	// The set of values are {INSERT, UPDATE, UPSERT, DELETE}.
	// Both lower case and upper case are supported.
	// INSERT - Inserts the key/value to spans when the key does not exist.
	//          No action is applied to spans where the key already exists.
	//          Either Value or FromAttribute must be set.
	// UPDATE - Updates an existing key with a value. No action is applied
	//          to spans where the key does not exist.
	//          Either Value or FromAttribute must be set.
	// UPSERT - Performs insert or update action depending on the span
	//          containing the key. The key/value is insert to spans
	//          that did not originally have the key. The key/value is updated
	//          for spans where the key already existed.
	//          Either Value or FromAttribute must be set.
	// DELETE - Deletes the attribute from the span. If the key doesn't exist,
	//          no action is performed.
	// This is a required field.
	Action Action `mapstructure:"action"`
}

Resulting in sample configs:

  attributes/regex_insert:
    actions:
      - key: "http.url"
        value:
        	regex: ^(?P<http_protocol>.*):\/\/(?P<http_domain>.*)\/(?P<http_path>.*)(\?|\&)(?P<http_query_params>.*)
        action: insert

  attributes/insert:
    actions:
      - key: "string key"
        value:
          from_attribute: "anotherkey"
        action: insert

cc @bogdandrutu and @tigrannajaryan for thoughts.

@jrcamp
Copy link
Contributor

jrcamp commented Jun 12, 2020

We've discussed doing something like this in the filtering proposal where the action (match type) is a key.

Also is doing key/value necessary or can it use a regular yaml map? I don't think you'd ever be inserting into the same key would you?

For instance, an alternative to the above config:

attributes:
  insert:
    "http.url":
        regex: ^(?P<http_protocol>.*):\/\/(?P<http_domain>.*)\/(?P<http_path>.*)(\?|\&)(?P<http_query_params>.*)

    "string key": 
        from_attribute: "anotherkey"

Edit: removed value, don't think it's needed.

@jrcamp
Copy link
Contributor

jrcamp commented Jun 12, 2020

Also we're adding support for nested arrays and maps in attributes to the protocol. If we're going to redesign the config do we need to go ahead and take that into account? It's starting to hurt my brain given attributes can be arbitrarily nested..

@jrcamp
Copy link
Contributor

jrcamp commented Jun 12, 2020

I feel like once we start going down the nested array/map hole things could get complicated trying to express it in yaml. Maybe we just need a small DSL. (Actually, maybe just borrow Python syntax entirely, but it wouldn't be implemented in Python):

# Delete nested
del attr["foo"]["bar"]

# Add dictionary
attr["bar"] = {"one": "two"}

# Overwrite list
attr["list"] = [1,2,3]

# Append to list
attr["list"] += [4,5,6]

# regex
attr["http.url"]  = /^(?P<http_protocol>.*):\/\/(?P<http_domain>.*)\/(?P<http_path>.*)(\?|\&)(?P<http_query_params>.*)/

# from another key
attr["string key"] = attr["another key"]

Obviously the operations it can do would be fairly limited but I think it's going to be way easier to understand than more yaml.

tigrannajaryan pushed a commit that referenced this issue Jun 24, 2020
Note: All of the cudos go to @dneray for the logic/testing/effort in #981

This pr is based off of it with the following modifications

- The config looks like
```
- key : <key to use for applying the rule too>
  pattern: <the regex pattern with named submatchers>
  action: extract
```
I think that the original PR is going in the right way for what we can the internal logic to be but as the issue #979 and #1170 indicate we need to take a step back and redesign the attributes processor (or make it clear of its limitations).  This pr unblocks the required functionality. 

**Link to tracking Issue:** #979 

**Link to follow up issue:** #1170 

**Testing:** Added tests for extracting.

**Documentation:** Updated processor readme.
wyTrivail pushed a commit to mxiamxia/opentelemetry-collector that referenced this issue Jul 13, 2020
Note: All of the cudos go to @dneray for the logic/testing/effort in open-telemetry#981

This pr is based off of it with the following modifications

- The config looks like
```
- key : <key to use for applying the rule too>
  pattern: <the regex pattern with named submatchers>
  action: extract
```
I think that the original PR is going in the right way for what we can the internal logic to be but as the issue open-telemetry#979 and open-telemetry#1170 indicate we need to take a step back and redesign the attributes processor (or make it clear of its limitations).  This pr unblocks the required functionality. 

**Link to tracking Issue:** open-telemetry#979 

**Link to follow up issue:** open-telemetry#1170 

**Testing:** Added tests for extracting.

**Documentation:** Updated processor readme.
@bogdandrutu
Copy link
Member

@ccaraman can you please extract your new proposal and comments from @jrcamp to a new "feature request" to enhance the filter config and close this bug?

@ccaraman
Copy link
Contributor

To wrap up this issue - regex support is in the attributes processor. The follow up questions around supporting new data types for the processor and coming up with a more comprehensive configuration is tracked by #1170

Collector automation moved this from To do to Done Jul 13, 2020
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
http.ResponseWriters may implement additional interfaces
(http.CloseNotifier, http.Flusher, http.Hijacker, http.Pusher,
io.ReaderFrom) that get lost when the ResponseWriter is wrapped in
another object. This change uses the httpsnoop package to wrap the
ResponseWriter so that the resulting object implements any of the
optional interfaces that the original ResponseWriter implements as
well as using the replacement ResponseWriter methods that gather
information for tracing.
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Collector
  
Done
Development

No branches or pull requests

4 participants