Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

stats/view: add Flush to immediately report all collected data points regardless of buffering #862

Open
lychung83 opened this issue Aug 9, 2018 · 13 comments · May be fixed by #922
Open
Assignees

Comments

@lychung83
Copy link

Problem:

When exiting a program running opencensus monitoring, some of the data are not passed to exporter. This happens because as one can see in following code, reportUsage() is called only by timer, so when program exists, all data passed to opencensus after last timer firing event are simply lost.

func (w *worker) start() {
	for {
		select {
		case cmd := <-w.c:
			cmd.handleCommand(w)
		case <-w.timer.C:
			w.reportUsage(time.Now())
		case <-w.quit:
			w.timer.Stop()
			close(w.c)
			w.done <- true
			return
		}
	}
}

Solution proposal:

  1. Make func (w *worker) stop() to be (indirectly) callable by client of package.
  2. Make func (w *worker) start() to call w.reportUsage() before returning when <-w.quit is triggered.
@odeke-em
Copy link
Member

odeke-em commented Sep 4, 2018

Thank you for filing this issue @lychung83!

So currently, (*worker).stop() is NOT called by any function nor is case <-w.quit triggered by any exit condition. The app just exits if exiting.

I think the idiomatic fix here would to add a global function to view called say End() which will invoke (*worker)(defaultWorker).stop and that should involve passing w.reportUsage as you've suggested. However, that's going to cause a send on closed channel panic with the current stateful design of the worker.

/cc @rakyll @Ramonza

@rakyll
Copy link
Contributor

rakyll commented Sep 4, 2018

I think we need a global Flush function that reports all the collected points.

@odeke-em
Copy link
Member

odeke-em commented Sep 4, 2018

@rakyll perhaps I am bikeshedding or not, on the name: I thought of Flush too, but that name implies that continued usage after its invocation is allowed, yet we want a function that is terminally invoked, hence the recommendation of End :)

@rakyll
Copy link
Contributor

rakyll commented Sep 4, 2018

Flush function I was suggesting is not stopping the worker, it only flushes the unprocessed data points. Exporting a function that stops the worker might need more thinking because we never planned to give control of the worker to the user entirely.

Also, in practice, should you care about what's collected after you flushed due to the termination signal?

@odeke-em
Copy link
Member

odeke-em commented Sep 4, 2018

Flush function I was suggesting is not stopping the worker, it only flushes the unprocessed data points.

Roger that, that would work. And in deed, users can invoke defer view.Flush() before their main exits so it kills 2 birds with 1 stone.

Also, in practice, should you care about what's collected after you flushed due to the termination signal?

@lychung83 is this an actual problem that you've encountered or were you hypothesizing based off reading the code? I too don't think it should matter, and moreoever we haven't even provided a stop method nor does the worker know when we are terminating except by the Go runtime cleaning up.

@lychung83
Copy link
Author

To me, Flush() (not stopping the worker) seems enough. My use cases are:

  1. used at the end of the program to report any remaining data points in the worker.
  2. used for the test program to make sure all data points put by any measure.M() call to the exporter inspected by test w/o calling time.Sleep() for certain amount of time that's enough to fire worker.c

@odeke-em
Copy link
Member

Okay, am retitling this issue then as Flush is request that is different from the original request of when a program is exiting.

@odeke-em odeke-em changed the title Make it possible for opencensus to pass all remaining data to registered exporters when program exits. stats: add Flush to immediately report all collected data points regardless of buffering Sep 20, 2018
@odeke-em odeke-em changed the title stats: add Flush to immediately report all collected data points regardless of buffering stats/view: add Flush to immediately report all collected data points regardless of buffering Sep 20, 2018
odeke-em added a commit to orijtech/opencensus-go that referenced this issue Sep 21, 2018
Flush is a global function that immediately reports
all collected data by the worker, regardless of the
reporting duration or buffering.

Added tests but also ensured that calling (*worker).stop()
then Flush doesn't block and returns ASAP if the state
is "quit". The required protected w.quit from double close
but also using a select to detect cases of forever waiting
channel `w.quit` which was closed.

Fixes census-instrumentation#862
odeke-em added a commit to orijtech/opencensus-go that referenced this issue Sep 21, 2018
Flush is a global function that immediately reports
all collected data by the worker, regardless of the
reporting duration or buffering.

Added tests but also ensured that calling (*worker).stop()
then Flush doesn't block and returns ASAP if the state
is "quit". The required protected w.quit from double close
but also using a select to detect cases of forever waiting
channel `w.quit` which was closed.

Fixes census-instrumentation#862
@odeke-em odeke-em linked a pull request Sep 21, 2018 that will close this issue
@razvanh
Copy link

razvanh commented Sep 21, 2018

@semistrict
Copy link
Contributor

I don't think we should have a separate function to flush each Exporter and one to flush the internal stats buffer. We should have a single global Flush/Shutdown/Close function that only returns when everything has been flushed. It's also important that things get flushed in the right order: flushing the internal stats before flushing the buffer in the exporter will not work as intended. Users should not have to find this out themselves.

odeke-em added a commit to census-instrumentation/opencensus-service that referenced this issue Oct 5, 2018
Instrument interceptor/opencensus with stats to record
the number of received spans. This is accomplished by
a helper to process spans that sends over spans even
if the number of spans is 0. Also record with tag_key
"opencensus_interceptor" whose value is the name
of the respective interceptor.

The test to ensure that 0 length spans are also added
is currently disabled because:
* Issue census-instrumentation/opencensus-go#862
is not yet implemented which requests that the OpenCensus-Go
stats worker provide a method Flush to flush all data. Without
it, metrics will contain stale data from previous tests.

Add tests to lock-in this behavior.

Updates #63
odeke-em added a commit to census-instrumentation/opencensus-service that referenced this issue Oct 5, 2018
Instrument interceptor/opencensus with stats to record
the number of received spans. This is accomplished by
a helper to process spans that sends over spans even
if the number of spans is 0. Also record with tag_key
"opencensus_interceptor" whose value is the name
of the respective interceptor.

The test to ensure that 0 length spans are also added
is currently disabled because:
* Issue census-instrumentation/opencensus-go#862
is not yet implemented which requests that the OpenCensus-Go
stats worker provide a method Flush to flush all data. Without
it, metrics will contain stale data from previous tests.

Add tests to lock-in this behavior.

Updates #63
odeke-em added a commit to census-instrumentation/opencensus-service that referenced this issue Oct 5, 2018
Instrument interceptor/opencensus with stats to record
the number of received spans. This is accomplished by
a helper to process spans that sends over spans even
if the number of spans is 0. Also record with tag_key
"opencensus_interceptor" whose value is the name
of the respective interceptor.

The test to ensure that 0 length spans are also added
is currently disabled because:
* Issue census-instrumentation/opencensus-go#862
is not yet implemented which requests that the OpenCensus-Go
stats worker provide a method Flush to flush all data. Without
it, metrics will contain stale data from previous tests.

Add tests to lock-in this behavior.

Updates #63
odeke-em added a commit to census-instrumentation/opencensus-service that referenced this issue Oct 6, 2018
Instrument interceptor/opencensus with stats to record
the number of received spans. This is accomplished by
a helper to process spans that sends over spans even
if the number of spans is 0. Also record with tag_key
"opencensus_interceptor" whose value is the name
of the respective interceptor.

The test to ensure that 0 length spans are also added
is currently disabled because:
* Issue census-instrumentation/opencensus-go#862
is not yet implemented which requests that the OpenCensus-Go
stats worker provide a method Flush to flush all data. Without
it, metrics will contain stale data from previous tests.

Add tests to lock-in this behavior.

Updates #63
odeke-em added a commit to census-instrumentation/opencensus-service that referenced this issue Oct 7, 2018
Instrument interceptor/opencensus with stats to record
the number of received spans. This is accomplished by
a helper to process spans that sends over spans even
if the number of spans is 0. Also record with tag_key
"opencensus_interceptor" whose value is the name
of the respective interceptor.

The test to ensure that 0 length spans are also added
is currently disabled because:
* Issue census-instrumentation/opencensus-go#862
is not yet implemented which requests that the OpenCensus-Go
stats worker provide a method Flush to flush all data. Without
it, metrics will contain stale data from previous tests.

Add tests to lock-in this behavior.

Updates #63
odeke-em added a commit to census-instrumentation/opencensus-service that referenced this issue Oct 7, 2018
Instrument interceptor/opencensus with stats to record
the number of received spans. This is accomplished by
a helper to process spans that sends over spans even
if the number of spans is 0. Also record with tag_key
"opencensus_interceptor" whose value is the name
of the respective interceptor.

The test to ensure that 0 length spans are also added
is currently disabled because:
* Issue census-instrumentation/opencensus-go#862
is not yet implemented which requests that the OpenCensus-Go
stats worker provide a method Flush to flush all data. Without
it, metrics will contain stale data from previous tests.

Add tests to lock-in this behavior.

Also replace grpc.NewServer with a stats enabled server helper call.
This enables tracing and metrics for each gRPC server by replacing
naked usages of grpc.NewServer with the new helper function:
    internal.GRPCServerWithObservabilityEnabled

Updates #63
donners pushed a commit to donners/opencensus-go that referenced this issue Mar 7, 2019
Flush is a global function that immediately reports
all collected data by the worker, regardless of the
reporting duration or buffering.

Added tests but also ensured that calling (*worker).stop()
then Flush doesn't block and returns ASAP if the state
is "quit". The required protected w.quit from double close
but also using a select to detect cases of forever waiting
channel `w.quit` which was closed.

Fixes census-instrumentation#862
@rghetia
Copy link
Contributor

rghetia commented Mar 27, 2019

For metrics, concept of Reader was added recently. With reader one can call ReadAndExport to export all data on program termination. It would work same as flush.

@rghetia rghetia added the P2 label May 6, 2019
fivesheep pushed a commit to fivesheep/opencensus-service that referenced this issue Jun 12, 2019
Instrument interceptor/opencensus with stats to record
the number of received spans. This is accomplished by
a helper to process spans that sends over spans even
if the number of spans is 0. Also record with tag_key
"opencensus_interceptor" whose value is the name
of the respective interceptor.

The test to ensure that 0 length spans are also added
is currently disabled because:
* Issue census-instrumentation/opencensus-go#862
is not yet implemented which requests that the OpenCensus-Go
stats worker provide a method Flush to flush all data. Without
it, metrics will contain stale data from previous tests.

Add tests to lock-in this behavior.

Also replace grpc.NewServer with a stats enabled server helper call.
This enables tracing and metrics for each gRPC server by replacing
naked usages of grpc.NewServer with the new helper function:
    internal.GRPCServerWithObservabilityEnabled

Updates census-instrumentation#63
@marwan-at-work
Copy link

marwan-at-work commented Nov 2, 2019

Hi @odeke-em, I noticed one of your commits mentioned in this issue addresses this problem directly, so I'm just wondering if you'll eventually open it as a PR or if there's anything blocking it. Thank you! 🎉

@odeke-em
Copy link
Member

odeke-em commented Nov 2, 2019 via email

@marwan-at-work
Copy link

@odeke-em Thank you for the response. This seems to me more of a bug than a feature since OpenCensus swallows metrics when a process exists.

In the same spirit, does that mean all other features and bug requests are frozen?

I recently opened this issue which, if my assumption is correct, would be quite noteworthy to address: #1181

Thanks again :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
7 participants