Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Sampling - XRay (Part 1) (Fetching Sampling Rules) #1536

Closed

Conversation

bhautikpip
Copy link
Contributor

This is the intial PR for X-Ray remote sampling. The scope of this PR is to introduce data model for the business logic of remote sampling, adding support for fetching sampling rules and stroring.

Upcoming PRs just a heads up

  1. Support for getting sampling targets from AWS X-Ray + Business logic for centralized reservoir + fallback sampling (traceIDRationBasedSampler)
  2. Validate rules are being matched using matching logic

bhautikpip and others added 30 commits February 4, 2021 16:21
Copy link

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's possible, I would recommend a further split, first PR would just have the xrayClient

samplers/aws/xray/client.go Outdated Show resolved Hide resolved
samplers/aws/xray/client.go Outdated Show resolved Hide resolved
samplers/aws/xray/client.go Outdated Show resolved Hide resolved
samplers/aws/xray/remote_sampler_config.go Outdated Show resolved Hide resolved
samplers/aws/xray/client.go Outdated Show resolved Hide resolved
}
}

func (rs *RemoteSampler) startRulePoller(ctx context.Context) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only have one poller for rules and targets, notably there should only be one goroutine that is fetching rules or targets based on their appropriate tick. This removes needing to think about any threadsafety issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that because we typically refresh sampling rules every 5 mins and sampling targets every 10 seconds.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the same goroutine should be polling rules every 5 min or polling targets every 10s, but you shouldn't have two goroutines that may execute at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't execute at a same time and even if it will then those would be 2 separate processes right? I don't want to overburden the getSamplingRules and getSamplingTargets logic in a single goroutine. I think this would look much cleaner. Also, I'm not sure if we can combine those in a single goroutine programmatically as we would require both the logic to run in an infinite for loop.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't execute at a same time

It will because the timing will overlap sometimes.

The logic including any potential cleanliness doesn't change that much I think

for {
  select {
    case <-rulesTicker.C():
      fetchRules()
    case <-targetsTicker.C()
      fetchTargets()
  }
}

This ensures the pollers are serialized onto a single thread to avoid simultaneously updating.

That being said, another thing to keep in mind is that targets fetch isn't a fixed tick, it is a value returned in the actual targets response

https://docs.aws.amazon.com/xray/latest/api/API_SamplingTargetDocument.html#xray-Type-SamplingTargetDocument-Interval

So presumably you can't use a ticker for the targets fetch, I'm not sure if there is any better pattern in Go but I guess it may involve a goroutine that sleeps and signals a channel

for {
  case := <-rulesTicker.C()
    fetchRules()
  case <-targetsChan
    fetchTargets()
}

fetchTargets() {
  ...
  nextFetch = min(rules.intervals)
  s.manifest = newManifest
  go func() {
    time.Sleep(nextFetch)
    targetsChan <- struct{}
}

var _ rand.Source = (*lockedSource)(nil)
var _ rand.Source64 = (*lockedSource64)(nil)

type lockedSource struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised we need an entire locked random to compute jitter. I don't see anything like it in this other library, do we really need this?

https://github.com/lthibault/jitterbug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to add all the methods for this. Not sure using this lib is a good idea since we have like custom implementations. @Aneurysm9 any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jitterbug appears to depend on a rand.Source implementation, which is how it ignores concurrency issues. It pushes that off to the source. The default source used by the package-global functions in the math/rand package is safe for concurrent use, but sources created with NewSource() are not and need locking like this to be made safe for concurrent use.

I suppose the question here is whether the default source can be used.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I was referencing jitterbug just to look at a different implementation which looked much simpler. I don't think we need to use it as a dependency but if the default source can work then we should just use that I guess

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the default source in a library is generally avoided because it needs to be seeded to be effectively used in most cases but, as a library author, you shouldn't be making the choice to seed it for an application that might not want it seeded and you also can't guarantee that another library doesn't come along later and re-seed it with a fixed value. I do really wish that new sources were also safe for concurrent access.

Can the problem be avoided by simply having more sources? Can we have each actor with its own source and thus avoid concurrency concerns?

samplers/aws/xray/sampling_rule.go Outdated Show resolved Hide resolved
samplers/aws/xray/sampling_rule.go Outdated Show resolved Hide resolved
samplers/aws/xray/client.go Outdated Show resolved Hide resolved
@bhautikpip
Copy link
Contributor Author

If it's possible, I would recommend a further split, first PR would just have the xrayClient

Thanks for your review :) It's hard to break down these PRs further. I have tried to abstract core logic as much possible since this PR would add all the data model and the next PR would have all the core logic.

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a partial review.

Please address build and lint errors in addition to the comments made.

samplers/aws/xray/client.go Outdated Show resolved Hide resolved
samplers/aws/xray/client.go Outdated Show resolved Hide resolved
samplers/aws/xray/client.go Outdated Show resolved Hide resolved
samplers/aws/xray/client.go Outdated Show resolved Hide resolved
samplers/aws/xray/client.go Outdated Show resolved Hide resolved
samplers/aws/xray/remote_sampler.go Outdated Show resolved Hide resolved
samplers/aws/xray/remote_sampler.go Outdated Show resolved Hide resolved
samplers/aws/xray/remote_sampler.go Outdated Show resolved Hide resolved
samplers/aws/xray/remote_sampler.go Outdated Show resolved Hide resolved
samplers/aws/xray/remote_sampler.go Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
)

type xrayClient struct {
// http client for sending unsigned proxied requests to the collector
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// http client for sending unsigned proxied requests to the collector
// http client for sending sampling requests to the collector

}

// newClient returns an HTTP client with proxy endpoint
func newClient(d string) *xrayClient {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func newClient(d string) *xrayClient {
func newClient(addr string) *xrayClient {


// newClient returns an HTTP client with proxy endpoint
func newClient(d string) *xrayClient {
endpoint := "http://" + d
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably check if the address already has a scheme. If that should be strictly forbidden, then there should be some validation closer to the user I guess.

Note that while not the first version, eventually this sampler needs to be able to support more configuration, for example TLS or auth headers, as users don't necessarily run collectors as plaintext sidecars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think we strictly expect string like "127.0.0.1:8080" but yeah you're right I will add some validation around this to double sure.


endpointURL, err := url.Parse(endpoint)
if err != nil {
globalLogger.Error(err, "unable to parse endpoint from string")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newClient needs to return error this is a fatal issue


// getSamplingRules calls the collector(aws proxy enabled) for sampling rules
func (p *xrayClient) getSamplingRules(ctx context.Context) (*getSamplingRulesOutput, error) {
statisticsByte, err := json.Marshal(getSamplingRulesInput{})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statistics?

tempManifest.refreshedAt = now

// assign temp manifest to original copy/one sync refresh.
rs.mu.Lock()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be for another PR, but wondering if volatile write / read is an appropriate use for atomic in Go too, but not sure

}

// sets polling interval for sampling rules
func WithSamplingRulesPollingInterval(polingInterval time.Duration) Option {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func WithSamplingRulesPollingInterval(polingInterval time.Duration) Option {
func WithSamplingRulesPollingInterval(pollingInterval time.Duration) Option {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate it's positive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

}

// sets custom proxy endpoint
func WithEndpoint(endpoint string) Option {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate endpoint expectations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would just take in endpoint and validate it in NewRemoteSampler method since this is a config function doesn't makes sense to validate it here and also not sure if it's a good idea to return from here.

// setting global logger
globalLogger = cfg.logger
// set global verbosity to log info logs
stdr.SetVerbosity(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't modify global state in this library

// rule represents a centralized sampling rule
type rule struct {
// Centralized reservoir for keeping track of reservoir usage
reservoir *reservoir
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should leave out reservoir from this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think it's getting used in createRule method to populate some of the values we're getting from getSamplingRules call.

func (rs *remoteSampler) ShouldSample(parameters sdktrace.SamplingParameters) sdktrace.SamplingResult {
// ToDo: add business logic for remote sampling

return sdktrace.TraceIDRatioBased(0.05).ShouldSample(parameters)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just need a WithInitialSampler option and a field to store it in right? Not sure of the rearranging

}

// setting global logger
globalLogger = cfg.logger
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this - we shouldn't have a globallogger, logger needs to be on the sampler object itself. This isn't a globally initialized library

}
body := bytes.NewReader(statisticsByte)

req, err := http.NewRequestWithContext(ctx, http.MethodPost, p.endpoint.String()+"/GetSamplingRules", body)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a struct that won't be initialized many times so having 2 fields is basically no overhead, while computing a string on every poll is overhead.

defer output.Body.Close()

var samplingRulesOutput *getSamplingRulesOutput
if err := json.NewDecoder(output.Body).Decode(&samplingRulesOutput); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe json.Unmarshal() handles this correctly, but it would be nicer to see

var samplingRulesOutput getSamplingRulesOutput

i.e., drop the *, not the &

var _ rand.Source64 = (*lockedSource64)(nil)

type lockedSource struct {
mu sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go doesn't provide a synchronized random number out of the box, so this is OK but I agree it'd be better to hand this to the user--let them pass in a rand.Source--they'll have such a lock figured out somewhere on their own, typically.

// ToDo: other fields will be used in business logic for remote sampling
// reservoir is a reservoir distributed among all running instances of the SDK
type reservoir struct {
// Quota assigned to client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(What's happening here, with all the commented-out fields?)

func (rs *remoteSampler) ShouldSample(parameters sdktrace.SamplingParameters) sdktrace.SamplingResult {
// ToDo: add business logic for remote sampling

return sdktrace.TraceIDRatioBased(0.05).ShouldSample(parameters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it makes sense to plug in an initial value even though this PR is already big, it's the natural default.


// ToDo: other fields will be used in business logic for remote sampling
// rule represents a centralized sampling rule
type rule struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you place a reference to some documentation on the AWS side where manifest and rule are documented, please?

@bhautikpip
Copy link
Contributor Author

Closing this in favor of this complete PR for remote sampling support

@bhautikpip bhautikpip closed this Mar 1, 2022
plantfansam referenced this pull request in plantfansam/opentelemetry-go-contrib Mar 18, 2022
…jaeger (#1536)

* Bump google.golang.org/api in /exporters/trace/jaeger

Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.39.0 to 0.40.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/master/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.39.0...v0.40.0)

Signed-off-by: dependabot[bot] <support@github.com>

* Auto-fix go.sum changes in dependent modules

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants