New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote Sampling - XRay (Part 1) (Fetching Sampling Rules) #1536
Remote Sampling - XRay (Part 1) (Fetching Sampling Rules) #1536
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible, I would recommend a further split, first PR would just have the xrayClient
samplers/aws/xray/remote_sampler.go
Outdated
} | ||
} | ||
|
||
func (rs *RemoteSampler) startRulePoller(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only have one poller for rules and targets, notably there should only be one goroutine that is fetching rules or targets based on their appropriate tick. This removes needing to think about any threadsafety issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about that because we typically refresh sampling rules every 5 mins and sampling targets every 10 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the same goroutine should be polling rules every 5 min or polling targets every 10s, but you shouldn't have two goroutines that may execute at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't execute at a same time and even if it will then those would be 2 separate processes right? I don't want to overburden the getSamplingRules
and getSamplingTargets
logic in a single goroutine. I think this would look much cleaner. Also, I'm not sure if we can combine those in a single goroutine programmatically as we would require both the logic to run in an infinite for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't execute at a same time
It will because the timing will overlap sometimes.
The logic including any potential cleanliness doesn't change that much I think
for {
select {
case <-rulesTicker.C():
fetchRules()
case <-targetsTicker.C()
fetchTargets()
}
}
This ensures the pollers are serialized onto a single thread to avoid simultaneously updating.
That being said, another thing to keep in mind is that targets fetch isn't a fixed tick, it is a value returned in the actual targets response
So presumably you can't use a ticker for the targets fetch, I'm not sure if there is any better pattern in Go but I guess it may involve a goroutine that sleeps and signals a channel
for {
case := <-rulesTicker.C()
fetchRules()
case <-targetsChan
fetchTargets()
}
fetchTargets() {
...
nextFetch = min(rules.intervals)
s.manifest = newManifest
go func() {
time.Sleep(nextFetch)
targetsChan <- struct{}
}
var _ rand.Source = (*lockedSource)(nil) | ||
var _ rand.Source64 = (*lockedSource64)(nil) | ||
|
||
type lockedSource struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised we need an entire locked random to compute jitter. I don't see anything like it in this other library, do we really need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to add all the methods for this. Not sure using this lib is a good idea since we have like custom implementations. @Aneurysm9 any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jitterbug appears to depend on a rand.Source
implementation, which is how it ignores concurrency issues. It pushes that off to the source. The default source used by the package-global functions in the math/rand
package is safe for concurrent use, but sources created with NewSource()
are not and need locking like this to be made safe for concurrent use.
I suppose the question here is whether the default source can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I was referencing jitterbug just to look at a different implementation which looked much simpler. I don't think we need to use it as a dependency but if the default source can work then we should just use that I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the default source in a library is generally avoided because it needs to be seeded to be effectively used in most cases but, as a library author, you shouldn't be making the choice to seed it for an application that might not want it seeded and you also can't guarantee that another library doesn't come along later and re-seed it with a fixed value. I do really wish that new sources were also safe for concurrent access.
Can the problem be avoided by simply having more sources? Can we have each actor with its own source and thus avoid concurrency concerns?
Thanks for your review :) It's hard to break down these PRs further. I have tried to abstract core logic as much possible since this PR would add all the data model and the next PR would have all the core logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only a partial review.
Please address build and lint errors in addition to the comments made.
samplers/aws/xray/client.go
Outdated
) | ||
|
||
type xrayClient struct { | ||
// http client for sending unsigned proxied requests to the collector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// http client for sending unsigned proxied requests to the collector | |
// http client for sending sampling requests to the collector |
samplers/aws/xray/client.go
Outdated
} | ||
|
||
// newClient returns an HTTP client with proxy endpoint | ||
func newClient(d string) *xrayClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func newClient(d string) *xrayClient { | |
func newClient(addr string) *xrayClient { |
samplers/aws/xray/client.go
Outdated
|
||
// newClient returns an HTTP client with proxy endpoint | ||
func newClient(d string) *xrayClient { | ||
endpoint := "http://" + d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably check if the address already has a scheme. If that should be strictly forbidden, then there should be some validation closer to the user I guess.
Note that while not the first version, eventually this sampler needs to be able to support more configuration, for example TLS or auth headers, as users don't necessarily run collectors as plaintext sidecars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we strictly expect string like "127.0.0.1:8080"
but yeah you're right I will add some validation around this to double sure.
samplers/aws/xray/client.go
Outdated
|
||
endpointURL, err := url.Parse(endpoint) | ||
if err != nil { | ||
globalLogger.Error(err, "unable to parse endpoint from string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newClient
needs to return error
this is a fatal issue
samplers/aws/xray/client.go
Outdated
|
||
// getSamplingRules calls the collector(aws proxy enabled) for sampling rules | ||
func (p *xrayClient) getSamplingRules(ctx context.Context) (*getSamplingRulesOutput, error) { | ||
statisticsByte, err := json.Marshal(getSamplingRulesInput{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
statistics?
tempManifest.refreshedAt = now | ||
|
||
// assign temp manifest to original copy/one sync refresh. | ||
rs.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be for another PR, but wondering if volatile write / read is an appropriate use for atomic
in Go too, but not sure
} | ||
|
||
// sets polling interval for sampling rules | ||
func WithSamplingRulesPollingInterval(polingInterval time.Duration) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func WithSamplingRulesPollingInterval(polingInterval time.Duration) Option { | |
func WithSamplingRulesPollingInterval(pollingInterval time.Duration) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate it's positive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
} | ||
|
||
// sets custom proxy endpoint | ||
func WithEndpoint(endpoint string) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate endpoint expectations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would just take in endpoint and validate it in NewRemoteSampler
method since this is a config function doesn't makes sense to validate it here and also not sure if it's a good idea to return from here.
// setting global logger | ||
globalLogger = cfg.logger | ||
// set global verbosity to log info logs | ||
stdr.SetVerbosity(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't modify global state in this library
// rule represents a centralized sampling rule | ||
type rule struct { | ||
// Centralized reservoir for keeping track of reservoir usage | ||
reservoir *reservoir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should leave out reservoir from this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think it's getting used in createRule
method to populate some of the values we're getting from getSamplingRules
call.
func (rs *remoteSampler) ShouldSample(parameters sdktrace.SamplingParameters) sdktrace.SamplingResult { | ||
// ToDo: add business logic for remote sampling | ||
|
||
return sdktrace.TraceIDRatioBased(0.05).ShouldSample(parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just need a WithInitialSampler
option and a field to store it in right? Not sure of the rearranging
} | ||
|
||
// setting global logger | ||
globalLogger = cfg.logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this - we shouldn't have a globallogger, logger needs to be on the sampler object itself. This isn't a globally initialized library
samplers/aws/xray/client.go
Outdated
} | ||
body := bytes.NewReader(statisticsByte) | ||
|
||
req, err := http.NewRequestWithContext(ctx, http.MethodPost, p.endpoint.String()+"/GetSamplingRules", body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a struct that won't be initialized many times so having 2 fields is basically no overhead, while computing a string on every poll is overhead.
defer output.Body.Close() | ||
|
||
var samplingRulesOutput *getSamplingRulesOutput | ||
if err := json.NewDecoder(output.Body).Decode(&samplingRulesOutput); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe json.Unmarshal() handles this correctly, but it would be nicer to see
var samplingRulesOutput getSamplingRulesOutput
i.e., drop the *
, not the &
var _ rand.Source64 = (*lockedSource64)(nil) | ||
|
||
type lockedSource struct { | ||
mu sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go doesn't provide a synchronized random number out of the box, so this is OK but I agree it'd be better to hand this to the user--let them pass in a rand.Source--they'll have such a lock figured out somewhere on their own, typically.
// ToDo: other fields will be used in business logic for remote sampling | ||
// reservoir is a reservoir distributed among all running instances of the SDK | ||
type reservoir struct { | ||
// Quota assigned to client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(What's happening here, with all the commented-out fields?)
func (rs *remoteSampler) ShouldSample(parameters sdktrace.SamplingParameters) sdktrace.SamplingResult { | ||
// ToDo: add business logic for remote sampling | ||
|
||
return sdktrace.TraceIDRatioBased(0.05).ShouldSample(parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it makes sense to plug in an initial value even though this PR is already big, it's the natural default.
|
||
// ToDo: other fields will be used in business logic for remote sampling | ||
// rule represents a centralized sampling rule | ||
type rule struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you place a reference to some documentation on the AWS side where manifest
and rule
are documented, please?
Closing this in favor of this complete PR for remote sampling support |
…jaeger (#1536) * Bump google.golang.org/api in /exporters/trace/jaeger Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.39.0 to 0.40.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/master/CHANGES.md) - [Commits](googleapis/google-api-go-client@v0.39.0...v0.40.0) Signed-off-by: dependabot[bot] <support@github.com> * Auto-fix go.sum changes in dependent modules Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
This is the intial PR for X-Ray remote sampling. The scope of this PR is to introduce data model for the business logic of remote sampling, adding support for fetching sampling rules and stroring.
Upcoming PRs just a heads up