Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DataType a separate type within Component #10069

Closed

Conversation

ankitpatel96
Copy link
Contributor

@ankitpatel96 ankitpatel96 commented May 1, 2024

Addresses #9429

There were several approaches to implementing this, but I decided to make Type an interface that continues to be embedded in ID.

I split Type into ComponentType and DataType - ComponentType represents the names of components and DataType is an enum for Traces, Logs, and Metrics.

This keeps ID with the same basic API, which eases this transition. The largest change is NewType will automatically create a DataType instead of ComponentType for the strings "traces", "logs", and "metrics". In addition, ID's zero value now contains a null value for Type - since it is now an interface.

Copy link

codecov bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 80.76923% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 91.63%. Comparing base (671dcbc) to head (c6b2623).

Files Patch % Lines
component/config.go 68.75% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10069      +/-   ##
==========================================
- Coverage   91.65%   91.63%   -0.02%     
==========================================
  Files         356      356              
  Lines       16843    16855      +12     
==========================================
+ Hits        15437    15445       +8     
- Misses       1063     1067       +4     
  Partials      343      343              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

component/config.go Outdated Show resolved Hide resolved
func mustNewDataType(strType string) DataType {
return MustNewType(strType)
}
type DataType string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use a struct here to disallow users to create new DataType's

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I understand that's a risk. I've tried to mitigate it with the logic in pipelines/config.go that make sure its a member of signalNameToDataType.

Do you think its still worth wrapping it in a struct?

@@ -109,18 +109,23 @@ func callValidateIfPossible(v reflect.Value) error {
return nil
}

// Type is the component type as it is used in the config.
type Type struct {
type Type interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to keep ID basically the same, we make Type an interface and implement it with ComponentType and DataType

@ankitpatel96 ankitpatel96 marked this pull request as ready for review May 3, 2024 01:31
@ankitpatel96 ankitpatel96 requested a review from a team as a code owner May 3, 2024 01:31
@ankitpatel96 ankitpatel96 requested a review from dmitryax May 3, 2024 01:31
@ankitpatel96 ankitpatel96 changed the title Datatype Type Interface Make DataType a separate type within Component May 3, 2024
component/config.go Show resolved Hide resolved
}

// ComponentType is the component type as it is used in the config.
type ComponentType struct { //revive:disable-line:exported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the linter directive needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter doesn't like it when type FooBar is in a package Foo. It thinks Foo/FooBar is annoying to pronounce. I think in this case having component/ComponentType is a good exemption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the linter here. The difference between component.Type and component.ComponentType is very not obvious at a glance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// DataTypeFromSignal takes in a string of a datatype, and returns a DataType and a bool.
// The bool is set to true if the string corresponds to a supported DataType, else it is False.
// if there is a matching DataType, it returns the matching DataType enum. If not, it returns an empty string.
func DataTypeFromSignal(s string) (DataType, bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead implement UnmarshalText for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea!

processor/memorylimiterprocessor/memorylimiter_test.go Outdated Show resolved Hide resolved
exporter/exporterhelper/common_test.go Outdated Show resolved Hide resolved
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Creates ComponentType and DataType - ComponentType represents the names of components and DataType is an enum for Traces,Logs, and Metrics (and future signal types!).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComponentType represents the names of components

I think we need to be very precise here not to overload the words type or name. Currently, for components we use ID to refer to type[/name]. So when you say "names of components", I think this should actually say Type of component. It may be helpful to give a couple clear examples for users as well. "The type of component, e.g. otlp, filelog"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great feedback - my use of the word name didn't account for name actually being embedded within ID

@@ -109,18 +110,26 @@ func callValidateIfPossible(v reflect.Value) error {
return nil
}

// Type is the component type as it is used in the config.
type Type struct {
// Type represents the names of receivers (otlp, filelog, etc),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

}

// ComponentType is the component type as it is used in the config.
type ComponentType struct { //revive:disable-line:exported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the linter here. The difference between component.Type and component.ComponentType is very not obvious at a glance.

func mustNewDataType(strType string) DataType {
return MustNewType(strType)
}
type DataType string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to rework all of this, it would make sense to me to move data type into a different package, since things other than components can be types.

@djaglowski
Copy link
Member

This PR seems to add complexity but I'm not clear what the benefit is. The original issue suggested moving DataType (logs, metrics, traces) out of the Component package, which I agree with. IMO, if we're reworking this, the most intuitive model is to have signal.Type (logs, metrics, traces), and component.Type (otlp, filelog, etc), and component.ID which is a pair of signal.Type and component.Type.

mx-psi pushed a commit that referenced this pull request May 9, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
`newBaseExporter` needs a `DataType` - up until now we've been passing
in a `Type`. In preparation of splitting `DataType` and `Type`, pass in
a real `DataType`.

<!-- Issue number if applicable -->
#### Link to tracking issue
related to
#9429



In preparation for
#10069
@mx-psi
Copy link
Member

mx-psi commented May 9, 2024

component.ID which is a pair of signal.Type and component.Type.

A component.ID currently is a pair of component.Type and a (possibly empty) name for the component (e.g. otlp/name). We use the fact that component.DataType is also a component.Type to also use it for pipeline names (e.g. traces/sampled). Making it a pair of signal.Type and component.Type wouldn't work

mx-psi pushed a commit that referenced this pull request May 9, 2024
…#10128)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
In an upcoming PR, I change Type to be an interface. This means the zero
value of Type will be nil - which will cause this test to fail.
Initializing ID instead of relying on the zero value fixes this

<!-- Issue number if applicable -->
#### Link to tracking issue
related to
#9429


<!--Please delete paragraphs that you did not use before submitting.-->
In preparation for
#10069
@djaglowski
Copy link
Member

A component.ID currently is a pair of component.Type and a (possibly empty) name for the component (e.g. otlp/name). We use the fact that component.DataType is also a component.Type to also use it for pipeline names (e.g. traces/sampled). Making it a pair of signal.Type and component.Type wouldn't work

Ah, you're right, crossed wires. My suggested alternative was hastily written.

I am primarily concerned about the proposal to have both component.Type and component.ComponentType. This introduces a problem which is in my opinion worse than the one we're trying to fix.

@ankitpatel96
Copy link
Contributor Author

If we're going to rework all of this, it would make sense to me to move data type into a different package, since things other than components can be types.

I totally agree! I have a followup PR planned where I will move it out. Didn't make sense for me to do it all in one PR. I also plan to rename it to SignalType to better match the rest of OpenTelemetry.

I am primarily concerned about the proposal to have both component.Type and component.ComponentType. This introduces a problem which is in my opinion worse than the one we're trying to fix.

I can totally see the concern here - Type was already pretty confusing to understand. However I do think that the idea of having an interface like Type that we can fulfill with either ComponentType or DataType is a good one. Very open to other names... I'll brainstorm but honestly I don't have any good ones right now.

bogdandrutu added a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request May 10, 2024
**Description:** <Describe what has changed.>
In open-telemetry/opentelemetry-collector#10069,
I am making Type an interface. This means the zero value of Type will be
nil - which will cause this test to fail. Initializing ID instead of
relying on the zero value fixes this

---------

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
@ankitpatel96
Copy link
Contributor Author

I think something I should have done in the beginning is lay out the problem and how I arrived at this solution:

If we want to make DataType its own type - we have to adjust Type somehow. As Pablo said:

A component.ID currently is a pair of component.Type and a (possibly empty) name for the component (e.g. otlp/name). We use the fact that component.DataType is also a component.Type to also use it for pipeline names (e.g. traces/sampled).

We have to somehow change either ID or Type to make this work.

I considered a few options:

  1. Creating a DataTypeID just for Pipelines - containing a DataType and string name. Other components would just use the same ID as they do today. I decided against this because its a little ugly to have both a DataTypeID and ID. I'd have to change a lot of pipeline and connector code to make this work - and there might have been API changes as well.

  2. I could make DataType and Type share an interface such that ID was a pair of ThisCommonInterface (this a placeholder name) and a string name. I decided against this because it would be an invasive change to the ID API - specifically changing the ID.Type method to return ThisCommonInterface would affect a huge amount of code.

  3. Making Type an interface. This is relatively easy to do, and not much code outside of component/ has to changed. The API stays the same and its logical to have ComponentType and DataType share an interface.

I hope this explains how I decided on approach 3. Is this approach something that everyone agrees with? If we can get to consensus on the overall approach I can finish iterating on this PR and get it to a mergeable state.

jlg-io pushed a commit to jlg-io/opentelemetry-collector-contrib that referenced this pull request May 14, 2024
**Description:** <Describe what has changed.>
In open-telemetry/opentelemetry-collector#10069,
I am making Type an interface. This means the zero value of Type will be
nil - which will cause this test to fail. Initializing ID instead of
relying on the zero value fixes this

---------

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
@mx-psi
Copy link
Member

mx-psi commented May 15, 2024

From the Collector stability discussion, we decided to first focus in if the approach of having an interface and two types that implement said interface (the type for component types and the type for signal types) makes sense, and decide on what names to use for said types/interfaces later.

@TylerHelmuth
Copy link
Member

This feels like a pretty confusing topic, so I'm gonna try to define a bunch of terms for use in this comment.

Currently we have the following important terms:

  • Kind: represents the kind of component, such as receiver or exporter.
  • Type: The unique identifier for a component. This value MUST be unique between Kind of components. Ex: otlphttp, transform.
  • Component ID: this is the full name of the component that user defines in their configuration. It is made up of the Type of components and optional string we call name. type[/name]. This value MUST be unique betweenKind of components. Ex: otlphttp/honeycomb, transform/k8s-events.
  • DataType: Is a special Type that represents a OTel Signal, currently traces, metrics, or logs. An important consequence of DataType being a Type is that it can be used in a Component ID. We currently use this to make named pipelines, such as traces/honeycomb.

With those terms defined, I want to poke at why we use Component ID to represent both a configured component, such as otlphttp/honeycomb, and a pipeline, such as traces/honeycomb. While they are technically both identifiers in the config, it feels to me like they server very different purposes. otlphttp/honeyomb identifies a specific component within the config and is used as the identifier for its placement in the collector's pipeline(s). traces/honeycomb is a specific pipeline and expects that Component IDs will be added to the different Kind lists (receivers, processors, exporters).

Importantly I think these two identifiers need their own naming restrictions. A Type is restricted to the regex ^[a-zA-Z][0-9a-zA-Z_]{0,62}$. That is a great regex when Type is being used to identify a component, but not when identifying a pipeline. For a pipeline DataType needs to be restricted to traces, metrics, and logs exactly. We do this with some special DataType vars, but there isn't anything in our APIs that enforces that the Component IDs used in the pipelines.Config struct be one of these vars.

I feel we'd really benefit from stopping the use of Component ID to represent both a component and a pipeline. I think Component ID and Type should refer to only components and that we should introduce a new concept of Signal and Pipeline ID. Similar to Component ID, Pipeline ID would be made up of a Signal and an optional string name. signal[/name]. Signal would be force to be traces, metrics or logs.

With this solution we have no more overlap between names and no mixing of concepts in the structs. I will admit that I only took a brief look through service/internal/graph and I'm guess it causes some havoc in there, but I think the separation of concerns will be worth it.

@djaglowski
Copy link
Member

@TylerHelmuth's analysis makes sense to me. There is commonality in the structure of type[/name] but it's simple enough that sharing any kind of dependency between the two is really just unnecessary complication.

I will admit that I only took a brief look through service/internal/graph and I'm guess it causes some havoc in there, but I think the separation of concerns will be worth it.

I could be forgetting some details but I think it may be less complicated than the changes in this PR.

@TylerHelmuth
Copy link
Member

I also think that the implementation of a Pipeline ID and Signal would not (should not?) live in component. If component exists as a definition of what a Component is in the collector, I do not think we need to "leak" pipeline concepts into the package. We should probably define Signal in a config module and Pipeline ID in service somewhere.

@ankitpatel96
Copy link
Contributor Author

ankitpatel96 commented May 17, 2024

Hey @TylerHelmuth, thanks for the comprehensive feedback!
This was definitely an option I considered - see option 1 that I mentioned in my previous comment.

I can implement this other option - there will be a bunch of changes in pipeline and connector code as well as perhaps some API changes - but this is probably the right time to make those changes! You also make a good point of the pipeline API silently accepting Types that aren't DataType - that's definitely something we can improve.

I think it'll be better if I close this PR and open a different PR for this change.

@ankitpatel96
Copy link
Contributor Author

I also think that the implementation of a Pipeline ID and Signal would not (should not?) live in component. If component exists as a definition of what a Component is in the collector, I do not think we need to "leak" pipeline concepts into the package. We should probably define Signal in a config module and Pipeline ID in service somewhere.

I agree with this! My original plan was to

  1. change the backing type for DataType
  2. Move DataType to another package
  3. Rename DataType to Signal or something like that

I will say I'm not sure Signal should live in a config module - I think that there should be a generally available enum for the different Signal's supported by the collector - and as far as I can tell DataType is the only option.

andrzej-stencel pushed a commit to andrzej-stencel/opentelemetry-collector that referenced this pull request May 27, 2024
…etry#10127)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
`newBaseExporter` needs a `DataType` - up until now we've been passing
in a `Type`. In preparation of splitting `DataType` and `Type`, pass in
a real `DataType`.

<!-- Issue number if applicable -->
#### Link to tracking issue
related to
open-telemetry#9429



In preparation for
open-telemetry#10069
andrzej-stencel pushed a commit to andrzej-stencel/opentelemetry-collector that referenced this pull request May 27, 2024
…open-telemetry#10128)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
In an upcoming PR, I change Type to be an interface. This means the zero
value of Type will be nil - which will cause this test to fail.
Initializing ID instead of relying on the zero value fixes this

<!-- Issue number if applicable -->
#### Link to tracking issue
related to
open-telemetry#9429


<!--Please delete paragraphs that you did not use before submitting.-->
In preparation for
open-telemetry#10069
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants