Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change AccountService from Go to DotNet (auto) #1538

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

RassK
Copy link

@RassK RassK commented Apr 19, 2024

Changes

Fixes #589

Changes AccountingService from Go to .NET service.
Uses OpenTelemetry .NET Automatic Instrumentation.

NOTE: It's a first preview of demo app that is instrumented with OpenTelemetry .NET Automatic instrumentation. Some parts are still TBD. Please review and add comments about improvements and missing parts / docs.

Merge Requirements

For new features contributions please make sure you have completed the following
essential items:

  • CHANGELOG.md updated to document new feature additions
  • Appropriate documentation updates in the docs
  • Appropriate Helm chart updates in the helm-charts

Maintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.

src/accountingservice/Dockerfile Outdated Show resolved Hide resolved
src/accountingservice/Dockerfile Outdated Show resolved Hide resolved
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 30, 2024
@github-actions github-actions bot added the helm-update-required Requires an update to the Helm chart when released label Apr 30, 2024
@RassK
Copy link
Author

RassK commented Apr 30, 2024

Example Jaeger traces:
image
image

CC: @lachmatt @Kielek Should parentless Kafka consume requests be filtered? Seems without additional context these just trash the output.

@RassK RassK marked this pull request as ready for review April 30, 2024 12:57
@RassK RassK requested a review from a team as a code owner April 30, 2024 12:57
@RassK RassK requested a review from Kielek April 30, 2024 12:57
@RassK
Copy link
Author

RassK commented Apr 30, 2024

After removing EOF

image

@RassK RassK changed the title [Draft] Change AccountService from Go to DotNet (auto) Change AccountService from Go to DotNet (auto) Apr 30, 2024
@puckpuck puckpuck added docs-update-required Requires documentation update and removed Stale labels Apr 30, 2024
@puckpuck
Copy link
Contributor

puckpuck commented Apr 30, 2024

@RassK Can you add a Chanlog entry for this? Also, we will need to update docs and the Helm chart for this. If you need help to prep a PR in those areas, please let us know.

Screenshot 2024-05-13 at 10 53 00 AM

@RassK
Copy link
Author

RassK commented May 6, 2024

@puckpuck thanks for helping to guide the last steps missing. I have updated the changelog. Would be glad to accept some help with other missing parts as well. It's my first PR to this repo.

@Kielek
Copy link
Contributor

Kielek commented May 10, 2024

LGTM, I think this can be merged as the PR to opentlemetry.io is ready.
As I understand, helm charts should not be affected. We are changing just one type of service to another.

@puckpuck
Copy link
Contributor

We will need to bump up the memory for the service in the Helm chart, but nothing beyond that. We can merge this before having to do anything in Helm, since Helm is bound to a release and not nightlies.

I want to keep the label on this PR so we can track all items that need to be updated when doing our next release.

@puckpuck
Copy link
Contributor

When I run this branch, I notice clock skews with the accounting service, which causes issues with the trace waterfall. I'm not sure what this service does differently than our other .NET service (cart), but we don't see the same clock skew there. @RassK can you take a look at this?

@RassK
Copy link
Author

RassK commented May 14, 2024

@lachmatt could you recheck that kafka instrumentation, I think that is the source of this issue.

@RassK
Copy link
Author

RassK commented May 15, 2024

@puckpuck So consulted with @lachmatt and we reached conclusion that it is not the issue with clock skew.

In the .NET the Kafka span's instrumentation start time is the start of the polling. Previously we had issues with the amount of spans, so every poll (even an empty one) created a span. Now as we by default filter such spans, I could lower the polling max timeout to default (100ms), so accounting service's Kafka consume spans could be now <100ms, which can be still huge compared to some other services reporting processing time around 1ms. The default setting now renders the Jaeger UI in more understandable way... but we cannot make the UI look more beautiful just because of the messaging behaviour, it still has to reflect the actual logic.

I see it more like Jaeger could improve theirs's UI by using more dynamic time axle and collapse big portions like this (10s wait time).

Here is the improved look >
image

@julianocosta89
Copy link
Member

@RassK wouldn't that be better handled in a linked span, instead of a child span?

We did that for fraudDetectionService.

@RassK
Copy link
Author

RassK commented May 15, 2024

@julianocosta89 that's unfortunately not configurable in auto instrumentation.

@puckpuck
Copy link
Contributor

Something is wrong with the .NET SDK here. No other SDKs start the span when you start polling.

@lachmatt
Copy link

lachmatt commented May 15, 2024

@puckpuck
This is a behavior of Kafka client instrumentation which is part of OTel .NET Autoinstrumentation.
Semantic conventions for messaging recommend to create receive spans for pull-based scenarios.
Most of the existing instrumentations for Kafka client are creating process spans, but these were problematic, as explained in OTEP, and in PR.

@puckpuck
Copy link
Contributor

We should investigate doing this so the trace doesn't look like accounting service is starting before everything else, which looks like this should be done with a process consumer span. In its current state, this makes the consumption of the PlaceOrder trace very hard to do and will only raise questions from users about why the accounting service started before the request itself.

@lachmatt
Copy link

@RassK wouldn't that be better handled in a linked span, instead of a child span?

Creation context is used as a parent, in addition to being added as a link. This scenario is described in OTEP and was discussed when working on the instrumentation.

We should investigate doing this so the trace doesn't look like accounting service is starting before everything else, which looks like this should be done with a process consumer span. In its current state, this makes the consumption of the PlaceOrder trace very hard to do and will only raise questions from users about why the accounting service started before the request itself.

@pyohannes What would be the recommendation from Messaging SIG side in such scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-update-required Requires documentation update helm-update-required Requires an update to the Helm chart when released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dotnet auto instrumentation
5 participants