Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(regex_parser transform): Add support for nested field names #1967

Closed
wants to merge 2 commits into from

Conversation

bruceg
Copy link
Member

@bruceg bruceg commented Mar 2, 2020

Closes #1812

Bruce Guenter added 2 commits March 2, 2020 14:46
The upstream regex crate only supports alpha-numeric characters plus
underscore in group names. Our fork adds support for field notation
(period and square braces) in the `extend-group-names` branch. Use this
branch until upstream supports more characters in group names.

Signed-off-by: Bruce Guenter <bruce@timber.io>
Signed-off-by: Bruce Guenter <bruce@timber.io>
@bruceg bruceg added type: enhancement A value-adding code change that enhances its existing functionality. transform: regex_parser labels Mar 2, 2020
@bruceg bruceg requested a review from a user March 2, 2020 20:51
@binarylogic binarylogic assigned ghost Mar 2, 2020
@ghost
Copy link

ghost commented Mar 3, 2020

I have two questions:

  1. Would it make sense to add a test (and, if it doesn't work, also an implementation) for array indexes? For example, array.field[0].x.

  2. Because the PR is not likely to be merged soon (because the maintainers of regex crate previously expressed some concerns about adding this without a careful spec) can we put the fork to the lib directory? I was recently thinking about doing the same to leveldb-sys crate which we maintain ourselves too.

@bruceg
Copy link
Member Author

bruceg commented Mar 3, 2020

  1. It would make sense to add such a test. However, nothing else appears to be testing this (at least not in tests/behavior), which makes me wonder if it shouldn't go in a separate issue.
  2. I have no opinion one way or another, except that I think this would work best as a submodule rather than a full import of the regex source tree.

@binarylogic
Copy link
Contributor

And just noting. I would like to standardize around a field access syntax for consistency. If we're baking this into the regex crate, as well as other parts of Vector, it makes sense to document this somewhere and/or centralize parsing this format.

@ghost
Copy link

ghost commented Mar 3, 2020

@binarylogic
This format is already standard in all of the the Vector's components because it is implemented in the core (reading and writing fields in this format automatically operate on internal nested structures, a new implementation of it instead of the old flatten/unflatten was added in #1902 and passed existing unit and behavior tests). However, I agree that we need to have a dedicated place for it in the documentation (#1975 represents it).

@bruceg

I have no opinion one way or another, except that I think this would work best as a submodule rather than a full import of the regex source tree.

My concern is that subsequent changes, even with the submodule approach, would require coordinated effort between multiple repositories, while putting the full source to Vector's repository would allow to bundle each subsequent change into a single PR.

@binarylogic
Copy link
Contributor

binarylogic commented Mar 9, 2020

From planning

Option 1

Given the caveats of maintaining a fork of the regex crate we've decided to add an option to place fields into a namespace, similar to the target option in the geoip transform. Use can then subsequently move fields around with the rename_fields transform. Concrete changes are:

  • Add a target option that functions like the one referenced above.
  • Add docs on pairing this with the rename_fields transform to achieve more customized naming.

Option 2

Implement special encoding of . characters that would allow us to accomplish this without modifying the regex crate.

@bruceg
Copy link
Member Author

bruceg commented Mar 9, 2020

Closing this due to discussion on the main issue

@bruceg bruceg closed this Mar 9, 2020
@binarylogic binarylogic deleted the regex-parser-nested-fields branch April 24, 2020 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regex_parser transform doesn't work with nested fields
2 participants