Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error return to the Registrar interface functions #782

Closed
wants to merge 1 commit into from

Conversation

jstoja
Copy link

@jstoja jstoja commented Oct 24, 2018

The Register and Deregister method are logging errors but not
returning any error, which makes error handling not possible.
The implementation of the error is meant to reflect error from the
client api, not from what we consider an error (re-registering, ...).

  • Some test cases haven't been altered since errors are expected.
  • Eureka error handling is purely an error return from the client, trying to register several time is not considered as an error.

Closes #780.

The Register and Deregister method are logging errors but not
returning any error, which makes error handling not possible.
The implementation of the error is meant to reflect error from the
client api, not from what we consider an error (re-registering, ...).
Copy link
Member

@peterbourgon peterbourgon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are fundamental questions about service behavior during Register/Deregister errors. What do you think? What use case motivates this change for you?

r.logger.Log("err", err)
} else {
r.logger.Log("action", "register")
}
if r.service.TTL != nil {
go r.loop()
}
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Register returns an error, it shouldn't do other things, like go r.loop, don't you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the loop tries to register every TTL, so I feel that I still should let the loop be launched. I don't know how this should be handled.

p.logger.Log("err", err)
} else {
p.logger.Log("action", "register")
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't log an error you also return.

p.logger.Log("err", err)
} else {
p.logger.Log("action", "deregister")
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't log an error you also return.

r.quitmtx.Lock()
defer r.quitmtx.Unlock()
if r.quit != nil {
close(r.quit)
r.quit = nil
}
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, if Deregister returns an error, it shouldn't do other stuff.

I think this reveals a fundamental design tension: what should an application do if its service discovery Register or Deregister action fails? Can it do anything useful with that error besides log it? Arguably, Register failure might want to terminate the service, in some circumstances, but if Deregister fails, can/should we do anything?

if err := r.client.Register(r.service); err != nil {
r.logger.Log("err", err)
return
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log or return, not both.

@@ -93,4 +95,5 @@ func (r *Registrar) Deregister() {
close(r.quit)
r.quit = nil
}
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar questions about Deregister as above.

@jstoja
Copy link
Author

jstoja commented Oct 24, 2018

Hey thanks for the review and the very pertinent comments!

The reason for returning an error is exactly the use-case you pointed out.

  • When the registration fails, the error might be useful to implement a retry, or simply correctly exit. I agree that this is arguably useful, since we could also query the registry to check if we're correctly there.

  • When the de-registration fail, we could also retry it. Some service registry systems like Consul would just mark the service as un-healthy and only remove it from the pool after a some time.

I would prefer having an error returned and do something about it (or just not using it), than having nothing and try to find out somehow if my action succeeded.

The log/return thing was let like this because I don't wanted to change the actual logging pattern, but I'll change this as soon as I have time to.

@peterbourgon
Copy link
Member

Until now we've taken the approach that an error in Register or Deregister isn't fatal, and that we should log it, but keep doing whatever it is we're doing in the background, in the hopes that it eventually resolves. I understand that this PR supposes it would be nice to let the application know when these things fail, and in the abstract I agree, but we've discovered that changing the semantics of errors in these methods is more subtle than we anticipated.

By convention, a non-nil error means the function failed, and any other value returned by the it is undefined; I don't think we can break that convention. Should we have another "channel" (not necessarily a Go chan) on which to report errors to the application? Is that added complexity in every constructor, or Register/Deregister call, worth the value it brings? In case of error, should we keep any registration loop goroutines running, or terminate them?

I don't have good answers to these questions. I could be convinced there are good answers to them, but I don't think what we have right now is it.

@jstoja
Copy link
Author

jstoja commented Oct 31, 2018

I think the first issue here is that there are 2 registration workflow present here. The (1) first one is a simple service register where it will hold the service until a potential healthcheck fail for a defined amount of time, or it simply un-register. The (2) second one is a an insert in the k/v store with a TTL, that needs a loop to be kept present.
Even if both have similar semantics from go-kit point of view (the current service being registered somewhere), the expectation of how it's running is different.
In the case of a TTL with a loop (2), if the service is not registered on the first call, it might be in a later one. In the case of the service register (1), if the first call fails, then we won't ever be registered if we're not retrying in the application (where we currently have no feedback if it did succeed or not).
Maybe we could try to:

  1. add a loop on the (1) (even if that feels a bit weird) to be sure that we're registered.
  2. add a new method to check the registration object and see if we're correctly registered (in the case of the (1) we can have the actual service object and in (2) the actual key with the current tick ?). I'm not very fond of calling a method that doesn't return any error and then calling another one to get the error.
  3. improve the documentation with some boilerplate code checking directly in the service discovery system if we're correctly registered. This would mean no change in the Registrar API but more application code.

@yramanovich
Copy link

I agree that the Register and Deregister methods don't return any errors it's kind strange and @jstoja proposal makes sense. The problem is that different implementations follow different ways. In case of consul it's easy - we can just return error and that's it. If to speak about eureka it's more complicated: uses loops, heartbeats and so on.
So i see 2 ways:

  1. we can just try to register and return error if fails - developer should handle error and write retry logic.
  2. implement loop style and return <-chan error, so it'll be possible to see errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants