Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passthrough: return error if endpoint is empty and opt.Dialer is nil when building resolver #5732

Merged
merged 3 commits into from Nov 16, 2022

Conversation

huangchong94
Copy link
Contributor

@huangchong94 huangchong94 commented Oct 19, 2022

Empty string is clearly not a valid target when default scheme is passthrough and no custom dialer set. But grpc.Dial will not return error in this case until client starts making RPCs, user will get errors saying transport: Error while dialing dial tcp: missing address.

It would be better fail fast and notify user empty target is invalid before RPC calls.

This PR fixes this issue by checking 1. is endpoint empty 2. is there a custom dialer in passthroughBuilder.Build, if endpoint is empty and there is no custom dialer, an error will be returned.

RELEASE NOTES:

  • client: return an error from Dial if an empty target is passed and no custom dialer is present; the ClientConn would otherwise be unable to connect and perform RPCs

@huangchong94 huangchong94 changed the title passthrough: return error if endpoint is empty and opt.Dialer is nil when building passthrough resolver passthrough: return error if endpoint is empty and opt.Dialer is nil when building resolver Oct 19, 2022
Copy link
Contributor

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the colon character issue.


const scheme = "passthrough"

type passthroughBuilder struct{}

func (*passthroughBuilder) Build(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) {
if target.Endpoint == "" && opts.Dialer == nil {
return nil, errors.New("passthrough resolver:missing address")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ":" character looks wrong. Please make sure you are using the proper ASCII character (0x3A), and there should be a space after it.


const scheme = "passthrough"

type passthroughBuilder struct{}

func (*passthroughBuilder) Build(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) {
if target.Endpoint == "" && opts.Dialer == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we could do even stronger validation here (verify it could work when sent to net.Dial, e.g. split host/port & parse IP), but this is an improvement anyway.

@dfawley dfawley added this to the 1.51 Release milestone Oct 19, 2022
@dfawley dfawley added the Type: Behavior Change Behavior changes not categorized as bugs label Oct 19, 2022
@dfawley
Copy link
Contributor

dfawley commented Oct 19, 2022

Somehow this change still makes me nervous. I have a feeling we'll break someone that is doing something unexpected, like using interceptors to circumvent the underlying connection. Maybe this isn't worth doing.

@huangchong94
Copy link
Contributor Author

huangchong94 commented Oct 20, 2022

Somehow this change still makes me nervous. I have a feeling we'll break someone that is doing something unexpected, like using interceptors to circumvent the underlying connection. Maybe this isn't worth doing.

If we consider empty string is not a valid passthrough target without custom dialer, I think we should return error early in grpc.Dial, even if someone is using interceptors to circumvent the underlying connection, maybe we don't need maintain compatibility for this kind of usage?

Just like if someone is using an invalid dns target eg. dns:///foo:, grpc.Dial will return error, no matter what interceptor he is using.

@huangchong94
Copy link
Contributor Author

huangchong94 commented Oct 20, 2022

To provide more context: We're using grpc-go building microservices, and we usually initialize rpc clients during service startup, if initialization fail, server process will exit immediately.

When we deployed a new feature someone forgot to fill out some config struct in some config file, which caused target become empty string, grpc.Dial didn't return error and service started successfully, everything looked fine, but as new feature rolling out, traffic coming in, we began to receive alert messages saying transport: Error while dialing dial tcp: missing address. In this case, it would be better fail early and fail fast during startup time than getting errors during running time.

Certainly we will add more validation logic in our services, but if this validation is done by grpc-go itself, many others will benefit from it too.

@dfawley
Copy link
Contributor

dfawley commented Oct 21, 2022

Don't get me wrong, I agree with the changes here from a theoretical standpoint. However, we need to be very careful to not break any reasonable use cases. Maybe this should wait until #1786.

Consider also that, instead of an empty address, your engineer had a typo in the hostname in the target string. In that case, we wouldn't error out early, either. The system you're building needs to be tolerant to these types of situations anyway, so I don't think this is a high priority thing to change.

@huangchong94
Copy link
Contributor Author

Don't get me wrong, I agree with the changes here from a theoretical standpoint. However, we need to be very careful to not break any reasonable use cases. Maybe this should wait until #1786.

Consider also that, instead of an empty address, your engineer had a typo in the hostname in the target string. In that case, we wouldn't error out early, either. The system you're building needs to be tolerant to these types of situations anyway, so I don't think this is a high priority thing to change.

OK. I understand your concern about compatibility issue. If this PR should wait for #1786, you can close it or if you need me add more validation logic in build, I will find time to implement it.

@zasweq zasweq modified the milestones: 1.51 Release, 1.52 Release Nov 8, 2022
@dfawley
Copy link
Contributor

dfawley commented Nov 15, 2022

I talked this over with the other language leads and it looks like in C, it's impossible for the channel to fail at creation time, but in Java the channel can fail to be created, and it fails in the case where the target string is empty. The reason it fails is a bit different, however: it can't determine the authority to set for connections. So, I think we can pretty safely accept this without risk of breaking any realistic use cases. It's possible someone is using a channel with interceptors and not ever actually using the channel itself, but this isn't a supported use case, and there would be workarounds, too.

@dfawley dfawley assigned easwars and unassigned dfawley Nov 15, 2022
@dfawley dfawley requested a review from easwars November 15, 2022 17:33
@dfawley
Copy link
Contributor

dfawley commented Nov 15, 2022

@easwars, could you please make a second review?


const scheme = "passthrough"

type passthroughBuilder struct{}

func (*passthroughBuilder) Build(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) {
if target.Endpoint == "" && opts.Dialer == nil {
return nil, errors.New("passthrough resolver: missing address")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address here can lead to confusion because an address in this context usually refers to addresses returned by the resolver. Could you please change the error message to be passthrough: received empty target in Build().

@easwars easwars merged commit 817c1e8 into grpc:master Nov 16, 2022
1 check passed
jronak pushed a commit to jronak/grpc-go that referenced this pull request Nov 21, 2022
@yurishkuro
Copy link

@dfawley tend to agree with your reasons for not changing the existing behavior, especially given the example with invalid host name. The existing behavior was actually useful for testing error paths, those tests broke after this change. I was able to fix it with invalid host name instead (jaegertracing/jaeger#4149).

silvestre added a commit to cloudfoundry/app-autoscaler-release that referenced this pull request Jan 11, 2023
With grpc/grpc-go#5732 the grpc library
validates the presence of the endpoint information:
> client: return an error from Dial if an empty target is passed and no custom
> dialer is present; the ClientConn would otherwise be unable to connect and perform RPCs
@taylow
Copy link

taylow commented Jan 11, 2023

As @huangchong94 mentioned, a missing config now causes this to error much earlier in the process, as we have just found out. I think this is much clearer/more reliable, though the error message wasn't clear until doing a bit of poking around 😅 I understand the reason for not using the term address, and that the parameter for Dial is called target, but it did require us to search for the error to figure out why our services weren't starting.

silvestre added a commit to cloudfoundry/app-autoscaler-release that referenced this pull request Jan 11, 2023
With grpc/grpc-go#5732 the grpc library
validates the presence of the endpoint information:
> client: return an error from Dial if an empty target is passed and no custom
> dialer is present; the ClientConn would otherwise be unable to connect and perform RPCs
app-autoscaler-ci-bot pushed a commit to cloudfoundry/app-autoscaler-release that referenced this pull request Jan 11, 2023
* fix(deps): update module google.golang.org/grpc to v1.52.0

* `make package-specs`

* Provide fake endoint in tests

With grpc/grpc-go#5732 the grpc library
validates the presence of the endpoint information:
> client: return an error from Dial if an empty target is passed and no custom
> dialer is present; the ClientConn would otherwise be unable to connect and perform RPCs

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Silvestre Zabala <silvestre.zabala@sap.com>
@yurishkuro
Copy link

@taylow may be more useful for that specific use case, but the point is that (a) it changed the existing behavior, thus breaking other stuff, and (b) it's inconsistent since "missing config" fails immediately, but "config with typo" still behaves as before and only fails on attempts to send.

@marcelosousa
Copy link

As with @yurishkuro, I found that the previous behaviour was useful for testing purposes.
This change does breaks the tests in the Go library I'm working on (reviewpad/reviewpad#610) and I find it quite hackish to introduce dummy host names.

acrmp added a commit to cloudfoundry/loggregator-release that referenced this pull request Feb 27, 2023
- Remove unused BOSH property and associated config field from traffic
  controller
- Pass GRPCAddress rather than UDPAddress when creating a test server.
  gRPC-Go v1.52.0+ will error if the provided endpoint is empty:
  grpc/grpc-go#5732
acrmp added a commit to cloudfoundry/loggregator-release that referenced this pull request Feb 28, 2023
- Remove unused BOSH property and associated config field from traffic
  controller
- Pass GRPCAddress rather than UDPAddress when creating a test server.
  gRPC-Go v1.52.0+ will error if the provided endpoint is empty:
  grpc/grpc-go#5732
acrmp added a commit to cloudfoundry/loggregator-release that referenced this pull request Feb 28, 2023
- Remove unused BOSH property and associated config field from traffic
  controller
- Pass GRPCAddress rather than UDPAddress when creating a test server.
  gRPC-Go v1.52.0+ will error if the provided endpoint is empty:
  grpc/grpc-go#5732

(cherry picked from commit 63cf49e)
ctlong pushed a commit to cloudfoundry/loggregator-release that referenced this pull request Mar 6, 2023
- Remove unused BOSH property and associated config field from traffic
  controller
- Pass GRPCAddress rather than UDPAddress when creating a test server.
  gRPC-Go v1.52.0+ will error if the provided endpoint is empty:
  grpc/grpc-go#5732

(cherry picked from commit 63cf49e)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Behavior Change Behavior changes not categorized as bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants