Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning the http response status code and error message from HttpExporter.export( ) #6306

Open
galahad098 opened this issue Mar 19, 2024 · 9 comments
Labels
Feature Request Suggest an idea for this project

Comments

@galahad098
Copy link

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
In my implementation of the OtlpHttpSpanExporter, I want to access the http response status code and error message in case the export fails. Internally, the HttpExporter.export( ) method returns a CompletableResultCode response indicating successful or failed export and logs the status code and error message. I would like to know if it's possible to return this information from the exporter.

Describe alternatives you've considered
An alternative would be to write a custom exporter which returns all the required info.

@galahad098 galahad098 added the Feature Request Suggest an idea for this project label Mar 19, 2024
@jack-berg
Copy link
Member

The main problem is that the export API is stable, so we would need to evolve the generic CompletableResultCode class to be able to hold supplementary contextual information about the result (success or failure). Its possible, and is something I've thought about before, but it doesn't seem worthwhile when you consider that exporters are almost always paired with batch processor or periodic metric reader which wouldn't do anything with this additional info.

Consuming this info would require a custom processor or metric reader. So I'm curious to learn more about your use case. How are you using the OtlpHttpSpanExporter? What kind of info would you expect if there was no http status code (i.e. it was a connect or timeout error), or if it failed after retrying several times? What would you do with this info if it were available?

@galahad098
Copy link
Author

So we basically have an otel extension wherein custom span processors write to a queue which is periodically flushed via the exporter. The spans are exported to a collector service which routes the data to another consuming service and returns status codes for 3 scenarios : complete success (all spans routed successfully), complete failure and partial success.
In case of partial success, we only want to retry for the failed messages in the exported batch. Thus, it would be helpful to consume this additional status code info on top of the success/failure of the CompleteableResultCode.

@galahad098
Copy link
Author

Hi @jack-berg , Do you think it would be possible to undertake the enhancement of CompletableResultCode class? Being able to read the response would be really useful for us.

@jack-berg
Copy link
Member

Check out #6348, which lays the groundwork for this by evolving the CompletableResultCode API so that exception details can be included.

@galahad098
Copy link
Author

galahad098 commented Apr 4, 2024

Thanks for taking this up.. So the exception thrown in case of failure will now contain the error details and the HTTP response status code? Is that correct? is it possible to return the response body as well?

@jack-berg
Copy link
Member

Thanks for taking this up.. So the exception thrown in case of failure will now contain the error details and the HTTP response status code? Is that correct? is it possible to return the response body as well?

Additional PR(s) are needed for exporters to include exception in CompletableResultCode, and to decide how to represent what types of information is included in the exception. We don't do anything at all with the response body today, so including it will present an additional challenge.

@galahad098
Copy link
Author

Okay. I'll go through the guidelines for contribution and please let me know if I could pitch in anywhere. Thanks

@trask
Copy link
Member

trask commented Apr 5, 2024

In case of partial success, we only want to retry for the failed messages in the exported batch. Thus, it would be helpful to consume this additional status code info on top of the success/failure of the CompleteableResultCode.

@jack-berg is this something that could make sense to implement directly in the OTLP exporter?

Or if we can't change the default retry behavior, maybe a hook that allows people to override the default retry behavior?

@jack-berg
Copy link
Member

This is quite complicated to implement at the moment, regardless of where:

  • We currently don't parse the response body at all. We hand roll serialization, but this would require hand rolling deserialization, which we don't have prior art for in this repo.
  • The structure of OTLP responses do not make it easy to determine which records in a payload were rejected and which succeeded. See code here and relevant proto messages below. Notice how there is no structured representation of which records succeeded / failed. IIRC correctly, they punted on that when adding the support for partial success responses. Theoretically, a server could encode that information in the ExportTracePartialSuccess.error_message field, but I don't know of any prior art of standards that exist for how this would be encoded.
  • Assuming you could determine which records succeeded, we then need to slice up the original Collection<SpanData> and remove the entries that succeeded. Then determine how the existing retry rules interact with partial successes.

I think the reality is that as of today, partial success responses are not designed for this type of partial retry scenario. You can see in the protobuf comments that its intended to help a developer make config changes to address the problem, not for an exporter to interpret and automatically perform some a retry routine:

A developer-facing human-readable message in English. It should be used
// either to explain why the server rejected parts of the data during a partial
// success or to convey warnings/suggestions during a full success. The message
// should offer guidance on how users can address such issues.

maybe a hook that allows people to override the default retry behavior?

I could be open to this, but doing so is complicated so would want to see evidence that several people are running into issues. And still, it might be more appropriate to solve this by: 1. Passing failure details back via CompletableResultCode. 2. Disabling retry. 3. Pairing the exporter with a custom processor which interprets the error from CompletableResultCode and implements retry at the processor level.

message ExportTraceServiceResponse {
  // The details of a partially successful export request.
  //
  // If the request is only partially accepted
  // (i.e. when the server accepts only parts of the data and rejects the rest)
  // the server MUST initialize the `partial_success` field and MUST
  // set the `rejected_<signal>` with the number of items it rejected.
  //
  // Servers MAY also make use of the `partial_success` field to convey
  // warnings/suggestions to senders even when the request was fully accepted.
  // In such cases, the `rejected_<signal>` MUST have a value of `0` and
  // the `error_message` MUST be non-empty.
  //
  // A `partial_success` message with an empty value (rejected_<signal> = 0 and
  // `error_message` = "") is equivalent to it not being set/present. Senders
  // SHOULD interpret it the same way as in the full success case.
  ExportTracePartialSuccess partial_success = 1;
}

message ExportTracePartialSuccess {
  // The number of rejected spans.
  //
  // A `rejected_<signal>` field holding a `0` value indicates that the
  // request was fully accepted.
  int64 rejected_spans = 1;

  // A developer-facing human-readable message in English. It should be used
  // either to explain why the server rejected parts of the data during a partial
  // success or to convey warnings/suggestions during a full success. The message
  // should offer guidance on how users can address such issues.
  //
  // error_message is an optional field. An error_message with an empty value
  // is equivalent to it not being set.
  string error_message = 2;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Suggest an idea for this project
Projects
None yet
Development

No branches or pull requests

3 participants