Add error return to the Registrar interface functions #782

jstoja · 2018-10-24T08:54:03Z

The Register and Deregister method are logging errors but not
returning any error, which makes error handling not possible.
The implementation of the error is meant to reflect error from the
client api, not from what we consider an error (re-registering, ...).

Some test cases haven't been altered since errors are expected.
Eureka error handling is purely an error return from the client, trying to register several time is not considered as an error.

Closes #780.

The Register and Deregister method are logging errors but not returning any error, which makes error handling not possible. The implementation of the error is meant to reflect error from the client api, not from what we consider an error (re-registering, ...).

peterbourgon

I think there are fundamental questions about service behavior during Register/Deregister errors. What do you think? What use case motivates this change for you?

peterbourgon · 2018-10-24T16:11:02Z

sd/etcd/registrar.go

 		r.logger.Log("err", err)
 	} else {
 		r.logger.Log("action", "register")
 	}
 	if r.service.TTL != nil {
 		go r.loop()
 	}
+	return err


If Register returns an error, it shouldn't do other things, like go r.loop, don't you think?

Actually the loop tries to register every TTL, so I feel that I still should let the loop be launched. I don't know how this should be handled.

peterbourgon · 2018-10-24T16:12:19Z

sd/consul/registrar.go

 		p.logger.Log("err", err)
-	} else {
-		p.logger.Log("action", "register")
+		return err


Don't log an error you also return.

peterbourgon · 2018-10-24T16:12:43Z

sd/consul/registrar.go

 		p.logger.Log("err", err)
-	} else {
-		p.logger.Log("action", "deregister")
+		return err


Don't log an error you also return.

peterbourgon · 2018-10-24T16:16:58Z

sd/etcd/registrar.go

 	r.quitmtx.Lock()
 	defer r.quitmtx.Unlock()
 	if r.quit != nil {
 		close(r.quit)
 		r.quit = nil
 	}
+	return err


Same thing here, if Deregister returns an error, it shouldn't do other stuff.

I think this reveals a fundamental design tension: what should an application do if its service discovery Register or Deregister action fails? Can it do anything useful with that error besides log it? Arguably, Register failure might want to terminate the service, in some circumstances, but if Deregister fails, can/should we do anything?

peterbourgon · 2018-10-24T16:21:23Z

sd/etcdv3/registrar.go

 	if err := r.client.Register(r.service); err != nil {
 		r.logger.Log("err", err)
-		return
+		return err


Log or return, not both.

peterbourgon · 2018-10-24T16:21:53Z

sd/etcdv3/registrar.go

@@ -93,4 +95,5 @@ func (r *Registrar) Deregister() {
 		close(r.quit)
 		r.quit = nil
 	}
+	return err


Similar questions about Deregister as above.

jstoja · 2018-10-24T16:39:18Z

Hey thanks for the review and the very pertinent comments!

The reason for returning an error is exactly the use-case you pointed out.

When the registration fails, the error might be useful to implement a retry, or simply correctly exit. I agree that this is arguably useful, since we could also query the registry to check if we're correctly there.
When the de-registration fail, we could also retry it. Some service registry systems like Consul would just mark the service as un-healthy and only remove it from the pool after a some time.

I would prefer having an error returned and do something about it (or just not using it), than having nothing and try to find out somehow if my action succeeded.

The log/return thing was let like this because I don't wanted to change the actual logging pattern, but I'll change this as soon as I have time to.

peterbourgon · 2018-10-29T17:04:45Z

Until now we've taken the approach that an error in Register or Deregister isn't fatal, and that we should log it, but keep doing whatever it is we're doing in the background, in the hopes that it eventually resolves. I understand that this PR supposes it would be nice to let the application know when these things fail, and in the abstract I agree, but we've discovered that changing the semantics of errors in these methods is more subtle than we anticipated.

By convention, a non-nil error means the function failed, and any other value returned by the it is undefined; I don't think we can break that convention. Should we have another "channel" (not necessarily a Go chan) on which to report errors to the application? Is that added complexity in every constructor, or Register/Deregister call, worth the value it brings? In case of error, should we keep any registration loop goroutines running, or terminate them?

I don't have good answers to these questions. I could be convinced there are good answers to them, but I don't think what we have right now is it.

jstoja · 2018-10-31T09:01:18Z

I think the first issue here is that there are 2 registration workflow present here. The (1) first one is a simple service register where it will hold the service until a potential healthcheck fail for a defined amount of time, or it simply un-register. The (2) second one is a an insert in the k/v store with a TTL, that needs a loop to be kept present.
Even if both have similar semantics from go-kit point of view (the current service being registered somewhere), the expectation of how it's running is different.
In the case of a TTL with a loop (2), if the service is not registered on the first call, it might be in a later one. In the case of the service register (1), if the first call fails, then we won't ever be registered if we're not retrying in the application (where we currently have no feedback if it did succeed or not).
Maybe we could try to:

add a loop on the (1) (even if that feels a bit weird) to be sure that we're registered.
add a new method to check the registration object and see if we're correctly registered (in the case of the (1) we can have the actual service object and in (2) the actual key with the current tick ?). I'm not very fond of calling a method that doesn't return any error and then calling another one to get the error.
improve the documentation with some boilerplate code checking directly in the service discovery system if we're correctly registered. This would mean no change in the Registrar API but more application code.

yramanovich · 2018-11-13T13:08:38Z

I agree that the Register and Deregister methods don't return any errors it's kind strange and @jstoja proposal makes sense. The problem is that different implementations follow different ways. In case of consul it's easy - we can just return error and that's it. If to speak about eureka it's more complicated: uses loops, heartbeats and so on.
So i see 2 ways:

we can just try to register and return error if fails - developer should handle error and write retry logic.
implement loop style and return <-chan error, so it'll be possible to see errors.

peterbourgon requested changes Oct 24, 2018

View reviewed changes

peterbourgon closed this Apr 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add error return to the Registrar interface functions #782

Add error return to the Registrar interface functions #782

jstoja commented Oct 24, 2018

peterbourgon left a comment

peterbourgon Oct 24, 2018

jstoja Oct 26, 2018

peterbourgon Oct 24, 2018

peterbourgon Oct 24, 2018

peterbourgon Oct 24, 2018

peterbourgon Oct 24, 2018

peterbourgon Oct 24, 2018

jstoja commented Oct 24, 2018

peterbourgon commented Oct 29, 2018

jstoja commented Oct 31, 2018

yramanovich commented Nov 13, 2018

Add error return to the Registrar interface functions #782

Add error return to the Registrar interface functions #782

Conversation

jstoja commented Oct 24, 2018

peterbourgon left a comment

Choose a reason for hiding this comment

peterbourgon Oct 24, 2018

Choose a reason for hiding this comment

jstoja Oct 26, 2018

Choose a reason for hiding this comment

peterbourgon Oct 24, 2018

Choose a reason for hiding this comment

peterbourgon Oct 24, 2018

Choose a reason for hiding this comment

peterbourgon Oct 24, 2018

Choose a reason for hiding this comment

peterbourgon Oct 24, 2018

Choose a reason for hiding this comment

peterbourgon Oct 24, 2018

Choose a reason for hiding this comment

jstoja commented Oct 24, 2018

peterbourgon commented Oct 29, 2018

jstoja commented Oct 31, 2018

yramanovich commented Nov 13, 2018