Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation CSDS (xDS Config Dump) #25038

Merged
merged 12 commits into from
Mar 17, 2021
Merged

Implementation CSDS (xDS Config Dump) #25038

merged 12 commits into from
Mar 17, 2021

Conversation

lidizheng
Copy link
Contributor

@lidizheng lidizheng commented Dec 21, 2020

The xDS config dump in JSON is done. But the surface API for C++ is work-in-progress. If wanted, we can also add a Python interface.

Related https://github.com/protocolbuffers/upb/issues/370


This change is Reviewable

@lidizheng lidizheng added area/core lang/core release notes: yes Indicates if PR needs to be in release notes labels Dec 21, 2020
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an early review. This looks like a good start.

Please let me know if you have any questions. Thanks!

Reviewed 325 of 325 files at r1.
Reviewable status: all files reviewed, 24 unresolved discussions (waiting on @lidizheng)


src/core/ext/xds/xds_api.h, line 46 at r1 (raw file):

// default.
bool XdsSecurityEnabled();
bool XdsCsdsEnabled();

I don't think this is needed. The presence of the CSDS implementation doesn't actually affect our communication with the xDS server, so I don't think it needs to be gated.


src/core/ext/xds/xds_api.h, line 426 at r1 (raw file):

  // Synchronization status of xDS configs against the management server.
  enum ClientConfigStatus {

As per the style guide, enums and typedefs need to be declared before methods:

https://google.github.io/styleguide/cppguide.html#Declaration_Order


src/core/ext/xds/xds_api.h, line 454 at r1 (raw file):

    CdsUpdateMap cds_update_map;
    EdsUpdateMap eds_update_map;
    XdsJsonMap xds_json_map;

It does not make sense to have a separate map for this. We already have a way to return data for each individual resource, and this JSON string is just another piece of information for the individual resource. So instead, let's just add a Json field to each of the LdsUpdate, RdsUpdate, CdsUpdate, and EdsUpdate structs.


src/core/ext/xds/xds_api.h, line 478 at r1 (raw file):

  // Creates a ClientConfig message and returns it in JSON.
  std::string CreateClientConfigInJson(

This method should not need to take so many different maps as input. It should basically just need one map per resource type, and those maps should basically be the existing maps in the XdsClient that contain all of the necessary data.


src/core/ext/xds/xds_api.cc, line 122 at r1 (raw file):

}

// TODO(lidiz): Check to see if xDS security is enabled. This will be removed

This is not for xDS security. :)

Also, as mentioned in the .h file, I don't think we need this to begin with.


src/core/ext/xds/xds_api.cc, line 930 at r1 (raw file):

  size_t estimated_length = upb_json_encode(msg, m, ext_pool, 0, nullptr, 0, &status);
  // If encode failed
  if (estimated_length == (size_t)-1) {

What can cause this to fail? Can we just make this an assert?


src/core/ext/xds/xds_api.cc, line 932 at r1 (raw file):

  if (estimated_length == (size_t)-1) {
    const char *error_message = upb_status_errmsg(&status);
    gpr_log(GPR_DEBUG, "SerializeToJsonString!!! ERR %s", error_message);

We should not log in low-level library code like this. We're returning the error message to the caller, and they can log it if they want to.


src/core/ext/xds/xds_api.cc, line 936 at r1 (raw file):

    return "";
  }
  char* data = (char*)upb_arena_malloc(arena, estimated_length + 1);

Please use C++-style casts.


src/core/ext/xds/xds_api.cc, line 940 at r1 (raw file):

  size_t output_length = upb_json_encode(msg, m, ext_pool, 0, data, estimated_length + 1, &status);
  GPR_ASSERT(estimated_length == output_length);
  *error = GRPC_ERROR_NONE;

No need to do this explicitly. The caller should initialize to this value before calling this function, so this function can set it only if there's an error.


src/core/ext/xds/xds_api.cc, line 941 at r1 (raw file):

  GPR_ASSERT(estimated_length == output_length);
  *error = GRPC_ERROR_NONE;
  gpr_log(GPR_DEBUG, "SerializeToJsonString!!! %p %zu %s", msg, estimated_length, data);

Please remove.


src/core/ext/xds/xds_api.cc, line 942 at r1 (raw file):

  *error = GRPC_ERROR_NONE;
  gpr_log(GPR_DEBUG, "SerializeToJsonString!!! %p %zu %s", msg, estimated_length, data);
  return std::string(data);

Please use std::string(data, output_length) to avoid having the std::string ctor figure out the length of the string on its own.


src/core/ext/xds/xds_api.cc, line 945 at r1 (raw file):

}

inline grpc_error* ParseFromJsonString(const std::string& json_str, upb_msg *msg, const upb_msgdef *m, upb_arena* arena, const upb_symtab *ext_pool) {

Why do we need to parse JSON? This shouldn't be necessary.


src/core/ext/xds/xds_api.cc, line 955 at r1 (raw file):

}

google_protobuf_Timestamp* GrpcMillisToTimestamp(upb_arena* arena, const grpc_millis& value) {

grpc_millis is a POD type, so it can be passed in by value.


src/core/ext/xds/xds_api.cc, line 957 at r1 (raw file):

google_protobuf_Timestamp* GrpcMillisToTimestamp(upb_arena* arena, const grpc_millis& value) {
  google_protobuf_Timestamp* timestamp = google_protobuf_Timestamp_new(arena);
  // TODO(lidiz): not sure which type of clock to use.

I think the way you have this is fine.


src/core/ext/xds/xds_api.cc, line 1080 at r1 (raw file):

  upb_strview upb_lds_type_url = upb_strview_make(XdsApi::kLdsTypeUrl, strlen(XdsApi::kLdsTypeUrl));
  upb_strview upb_lds_version = StdStringToUpbString(resource_version_map[XdsApi::kLdsTypeUrl]);
  auto* listener_config_dump = envoy_admin_v3_ListenersConfigDump_new(arena.ptr());

It seems really cumbersome to construct a upb object and then dump it to JSON. The raw data we're saving for each individual resource is already in JSON form, so this is basically parsing the JSON to convert to upb and then converting back to JSON.

Why not just generate the JSON directly? That's what we do for channelz, and it works fine.

Also, note that if we do generate JSON directly instead of using upb, then there's no reason this method needs to be in the XdsApi class to begin with. This can be implemented directly in XdsClient::DumpClientConfigInJson().


src/core/ext/xds/xds_client.h, line 345 at r1 (raw file):

  // Stores the timestamp when the resource was last successfully updated.
  std::map<std::string /*type*/, grpc_millis /*last_update*/> update_time_map_;

I don't think it's sufficient to store the timestamp data on a per-resource-type basis, because we actually need to report it for each individual resource.

Note that for all resource types except for CDS and LDS, the server can send only those resources that have changed since the last update. If the client is subscribed to resources A, B, and C, but only resource A changes, the server can send a response containing a new version of A without affecting B or C. When that happens, A will have a newer timestamp than B or C, even though all three are the same resource type.

I think that instead of having a separate map here, we should just add a timestamp field to the existing structs for each resource type (i.e., all of the places where you added the raw_json field above).


src/core/ext/xds/xds_client.h, line 348 at r1 (raw file):

  // Stores the synchronization status of each resource.
  std::map<std::string /*type*/, XdsApi::ClientConfigStatus /*status*/> resource_status_map_;

As discussed in envoyproxy/envoy#14431, I think we're going to want to store this status on a per-resource basis as well, not a per-resource-type basis. So instead of having a separate map here, let's just add an XdsApi::ClientConfigStatus field to the structs where you added the raw_json field above.


src/core/ext/xds/xds_client.cc, line 260 at r1 (raw file):

  void AcceptEdsUpdate(XdsApi::EdsUpdateMap eds_update_map);

  void CacheResourceJsonStrings(const std::string& type_url, XdsApi::XdsJsonMap xds_json_map);

This should not be a separate method. We should just store the JSON in the existing struct that we're already updating in the AcceptLdsUpdate(), AcceptRdsUpdate(), AcceptCdsUpdate(), and AcceptEdsUpdate() methods.


src/core/ext/xds/xds_client.cc, line 2255 at r1 (raw file):

  XdsApi::XdsJsonMap listener_json_map, route_json_map, cluster_json_map, endpoint_json_map;
  for (auto& p : listener_map_) {
    gpr_log(GPR_DEBUG, "lds!!! %s", p.second.raw_json.c_str());

Please remove these log lines.


test/cpp/end2end/xds_end2end_test.cc, line 7188 at r1 (raw file):

  CheckRpcSendOk(kNumRpcs);
  grpc_error* error;
  grpc_core::RefCountedPtr<grpc_core::XdsClient> xds_client = grpc_core::XdsClient::GetOrCreate(&error);

We should not reach into internal APIs like this in an end-to-end test. Instead, let's implement the CSDS RPC service and get the data using that.


test/cpp/end2end/xds_end2end_test.cc, line 7318 at r1 (raw file):

INSTANTIATE_TEST_SUITE_P(XdsTest, CsdsTest,
                         ::testing::Values(TestType(false, true),

There's no point in running this with the xds resolver, and load reporting doesn't matter for this test. So let's just run it with TestType(false, false).


third_party/upb/BUILD, line 1 at r1 (raw file):

load(

We're not supposed to pull in this file from the upb repo. Please delete.


third_party/wyhash/wyhash.h, line 1 at r1 (raw file):

/* Copyright 2020 王一 Wang Yi <godspeed_china@yeah.net>

Where did this come from? Why do we need it? Is it okay to include this file in our code in terms of licensing issues?


tools/codegen/core/gen_upb_api.sh, line 176 at r1 (raw file):

# remove files that don't belong under upb-generated
find $UPB_OUTPUT_DIR -name "*.upbdefs.c" -type f -delete

None of these find commands are needed anymore.

Copy link
Contributor Author

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the early review! I need advice about several trade-offs. Once we settled the direction, I will continue to resolve other comments.

Reviewable status: all files reviewed, 24 unresolved discussions (waiting on @lidizheng and @markdroth)


src/core/ext/xds/xds_api.h, line 46 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I don't think this is needed. The presence of the CSDS implementation doesn't actually affect our communication with the xDS server, so I don't think it needs to be gated.

Removed.

With CSDS, it consumes slightly more memory and cycles. Well, it's always better to have it work out-of-box.


src/core/ext/xds/xds_api.h, line 426 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

As per the style guide, enums and typedefs need to be declared before methods:

https://google.github.io/styleguide/cppguide.html#Declaration_Order

Reordered the enum before the first method (the ctor). Following this rule, AdsParseResult looks like a counter example?


src/core/ext/xds/xds_api.h, line 454 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

It does not make sense to have a separate map for this. We already have a way to return data for each individual resource, and this JSON string is just another piece of information for the individual resource. So instead, let's just add a Json field to each of the LdsUpdate, RdsUpdate, CdsUpdate, and EdsUpdate structs.

It did create some difficulty to find the right abstraction to avoid adding JSON-related field to *Update structs. The idea was commented in the CSDS design doc.

https://docs.google.com/document/d/15a2ohrjOlpbazSJXqh-aFUXmi9uKAIygoZJDrR-ctqw/edit?disco=AAAAG8DgxyI

I'm okay either way. The implemented way (or the solution proposed in the CSDS comment) has a good separation between CSDS and existing code, so it's less coupled. The latest commented solution can simplify the logic. Which solution would you prefer? It affects many other choices below.


src/core/ext/xds/xds_api.cc, line 122 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This is not for xDS security. :)

Also, as mentioned in the .h file, I don't think we need this to begin with.

Removed.

Good catch. Copied the comment from XdsSecurity...


src/core/ext/xds/xds_api.cc, line 930 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

What can cause this to fail? Can we just make this an assert?

I did encountered one failure: the Any encoded type wasn't presented in the symbol table. From the json_encode.c, there are many other possible failures:

  • Out-of-range date, timestamp, duration
  • Several bad Any packaging problem (bad type url, unrecognized type, etc.)
  • Some snake case enforcement
  • Field exist but value is missing

Assert seems too harsh? Alternatively, we can log the error and move on. So, even if the cache failed, xDS is still usable. What's your recommended way of handling this?


src/core/ext/xds/xds_api.cc, line 932 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

We should not log in low-level library code like this. We're returning the error message to the caller, and they can log it if they want to.

Done.


src/core/ext/xds/xds_api.cc, line 936 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please use C++-style casts.

Using static_cast.


src/core/ext/xds/xds_api.cc, line 940 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

No need to do this explicitly. The caller should initialize to this value before calling this function, so this function can set it only if there's an error.

Done.


src/core/ext/xds/xds_api.cc, line 941 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please remove.

Done.


src/core/ext/xds/xds_api.cc, line 942 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please use std::string(data, output_length) to avoid having the std::string ctor figure out the length of the string on its own.

Done. TIL.


src/core/ext/xds/xds_api.cc, line 945 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Why do we need to parse JSON? This shouldn't be necessary.

If we cache the resources in JSON, then when we are trying assemble the ClientConfig proto message, we will need to parse the JSON string and build the huge proto, and finally convert it back to JSON. Alternatively, we could store the xDS resources in upb? WDYT?


src/core/ext/xds/xds_api.cc, line 955 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

grpc_millis is a POD type, so it can be passed in by value.

TIL. Done.


src/core/ext/xds/xds_api.cc, line 957 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I think the way you have this is fine.

TODO removed.


src/core/ext/xds/xds_api.cc, line 1080 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

It seems really cumbersome to construct a upb object and then dump it to JSON. The raw data we're saving for each individual resource is already in JSON form, so this is basically parsing the JSON to convert to upb and then converting back to JSON.

Why not just generate the JSON directly? That's what we do for channelz, and it works fine.

Also, note that if we do generate JSON directly instead of using upb, then there's no reason this method needs to be in the XdsApi class to begin with. This can be implemented directly in XdsClient::DumpClientConfigInJson().

Technically, with ProtoBuf, we have more contract enforcement. If we want to build JSON string directly, should we use string concatenation or our json.h?


test/cpp/end2end/xds_end2end_test.cc, line 7318 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

There's no point in running this with the xds resolver, and load reporting doesn't matter for this test. So let's just run it with TestType(false, false).

Done.


third_party/wyhash/wyhash.h, line 1 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Where did this come from? Why do we need it? Is it okay to include this file in our code in terms of licensing issues?

This is needed by the upb upgrade. Esun has a better solution than directly include it under our "third_party" in #25037.


tools/codegen/core/gen_upb_api.sh, line 176 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

None of these find commands are needed anymore.

Should be taken care of in #25037.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you didn't push any new commits, so I don't see any of the changes that you said were done. So I'm leaving those comments unresolved for now.

Please let me know if you have any questions. Thanks!

Reviewable status: all files reviewed, 23 unresolved discussions (waiting on @lidizheng and @markdroth)


src/core/ext/xds/xds_api.h, line 46 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Removed.

With CSDS, it consumes slightly more memory and cycles. Well, it's always better to have it work out-of-box.

The main reason for adding env var guards for new features is that if we add code that starts to use an existing xDS field that we were not previously using, then if an existing xDS server is populating that field, gRPC will automatically start using it. If we have not yet done enough integration testing to prove that the new feature works right, we don't want people to get the new behavior yet.

This isn't an issue for CSDS, because it doesn't change the behavior of the client, so that's why I don't think this guard is needed. Storing the data doesn't hurt anything if no one actually makes a CSDS request.


src/core/ext/xds/xds_api.h, line 426 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Reordered the enum before the first method (the ctor). Following this rule, AdsParseResult looks like a counter example?

You're right, that one is in the wrong place. Feel free to move it up to right before the ctor.


src/core/ext/xds/xds_api.h, line 454 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

It did create some difficulty to find the right abstraction to avoid adding JSON-related field to *Update structs. The idea was commented in the CSDS design doc.

https://docs.google.com/document/d/15a2ohrjOlpbazSJXqh-aFUXmi9uKAIygoZJDrR-ctqw/edit?disco=AAAAG8DgxyI

I'm okay either way. The implemented way (or the solution proposed in the CSDS comment) has a good separation between CSDS and existing code, so it's less coupled. The latest commented solution can simplify the logic. Which solution would you prefer? It affects many other choices below.

You're right, I had forgotten about the issue of not wanting to copy the JSON fields when we return the data to the watchers. What I said in the design doc comment was right: we don't want to add the fields to the the ?dsUpdate structs themselves.

This leaves two questions:

  1. How do we return the JSON from the XdsApi::ParseAdsResponse() method?
  2. How do we store the JSON in the XdsClient object?

For (2), what I said in the design doc comment is also correct: we should store the JSON as a new field in the XdsClient::{Listener,RouteConfig,Cluster,Endpoint}State structs.

For (1), I think the right way to do this is to change the definition of the [LRCE]dsUpdateMap such that the map value is a struct containing both the corresponding [LRCE]dsUpdate struct and the JSON string. For example, for LdsUpdateMap, we can change the definition to this:

struct LdsResourceData {
  LdsUpdate resource;
  std::string json;
};
using LdsUpdateMap = std::map<std::string /* resource_name */, LdsResourceData>;

My rule of thumb here is that we should never create more than one map with the same key. There's basically never any good reason to do that; it makes the code less efficient and harder to understand. If we already have an existing map with that key, add the new value you want to track to the map's value rather than creating a new map.


src/core/ext/xds/xds_api.cc, line 930 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

I did encountered one failure: the Any encoded type wasn't presented in the symbol table. From the json_encode.c, there are many other possible failures:

  • Out-of-range date, timestamp, duration
  • Several bad Any packaging problem (bad type url, unrecognized type, etc.)
  • Some snake case enforcement
  • Field exist but value is missing

Assert seems too harsh? Alternatively, we can log the error and move on. So, even if the cache failed, xDS is still usable. What's your recommended way of handling this?

I guess you're right, an assert is too harsh. This is fine then.

Does this mean that if the proto we're dumping contains an Any field and the content of that Any field is unknown, that the entire proto will fail to be dumped to JSON? That seems sub-optimal. Maybe talk to Josh to see if there's some way we can avoid that?


src/core/ext/xds/xds_api.cc, line 945 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

If we cache the resources in JSON, then when we are trying assemble the ClientConfig proto message, we will need to parse the JSON string and build the huge proto, and finally convert it back to JSON. Alternatively, we could store the xDS resources in upb? WDYT?

It is not necessary to parse and then re-dump the JSON. There are two reasonable ways to do this:

  1. Store the dumped JSON for the individual resources as Json objects instead of as std::string. Then we can construct the overall JSON for CSDS as a large Json object, and dump it to a string just once at the end.
  2. Store the dumped JSON for the indivudual resources as std::string and construct the overall JSON for CSDS by directly concatenating strings.

Either of those approaches allows doing exactly what we want with doing any extra parsing or conversions.


src/core/ext/xds/xds_api.cc, line 1080 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Technically, with ProtoBuf, we have more contract enforcement. If we want to build JSON string directly, should we use string concatenation or our json.h?

See my comment above. Either of those approaches is fine.

Node that the actual CSDS service implementation in the wrapped language layer is going to need to convert the JSON string returned by C-core into a proto anyway, so we will still have proto contract enforcement here.

Copy link
Contributor Author

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new batch contains the fix of the caching mechanism (literally map reduce) and CSDS response construction (from upb to Json). Previous fixes for comments are also included, sorry for the confusion.

The test case and C++ wrapper is still WIP.

Reviewable status: all files reviewed, 23 unresolved discussions (waiting on @lidizheng and @markdroth)


src/core/ext/xds/xds_api.h, line 426 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

You're right, that one is in the wrong place. Feel free to move it up to right before the ctor.

Done.


src/core/ext/xds/xds_api.h, line 454 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

You're right, I had forgotten about the issue of not wanting to copy the JSON fields when we return the data to the watchers. What I said in the design doc comment was right: we don't want to add the fields to the the ?dsUpdate structs themselves.

This leaves two questions:

  1. How do we return the JSON from the XdsApi::ParseAdsResponse() method?
  2. How do we store the JSON in the XdsClient object?

For (2), what I said in the design doc comment is also correct: we should store the JSON as a new field in the XdsClient::{Listener,RouteConfig,Cluster,Endpoint}State structs.

For (1), I think the right way to do this is to change the definition of the [LRCE]dsUpdateMap such that the map value is a struct containing both the corresponding [LRCE]dsUpdate struct and the JSON string. For example, for LdsUpdateMap, we can change the definition to this:

struct LdsResourceData {
  LdsUpdate resource;
  std::string json;
};
using LdsUpdateMap = std::map<std::string /* resource_name */, LdsResourceData>;

My rule of thumb here is that we should never create more than one map with the same key. There's basically never any good reason to do that; it makes the code less efficient and harder to understand. If we already have an existing map with that key, add the new value you want to track to the map's value rather than creating a new map.

Rule noted. Added *ResourceData struct, and updated the corresponding code.


src/core/ext/xds/xds_api.h, line 478 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This method should not need to take so many different maps as input. It should basically just need one map per resource type, and those maps should basically be the existing maps in the XdsClient that contain all of the necessary data.

This method is removed, as its logic being hauled into xds_client.cc.


src/core/ext/xds/xds_api.cc, line 122 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Removed.

Good catch. Copied the comment from XdsSecurity...

Done.


src/core/ext/xds/xds_api.cc, line 930 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I guess you're right, an assert is too harsh. This is fine then.

Does this mean that if the proto we're dumping contains an Any field and the content of that Any field is unknown, that the entire proto will fail to be dumped to JSON? That seems sub-optimal. Maybe talk to Josh to see if there's some way we can avoid that?

Updated the serialization error handling to log only. It should not block the proceeding of the entire xDS-flow.

About the Any problem, it should be orthogonal to this thread. Let's wait for Josh's reply.


src/core/ext/xds/xds_api.cc, line 936 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Using static_cast.

Done.


src/core/ext/xds/xds_api.cc, line 945 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

It is not necessary to parse and then re-dump the JSON. There are two reasonable ways to do this:

  1. Store the dumped JSON for the individual resources as Json objects instead of as std::string. Then we can construct the overall JSON for CSDS as a large Json object, and dump it to a string just once at the end.
  2. Store the dumped JSON for the indivudual resources as std::string and construct the overall JSON for CSDS by directly concatenating strings.

Either of those approaches allows doing exactly what we want with doing any extra parsing or conversions.

Used Json from json.h to construct the string. I hope the compiler can prevent as much error as possible, while concatenating strings might hide small errors that tolerated by the JSON parser.


src/core/ext/xds/xds_api.cc, line 955 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

TIL. Done.

Function removed.


src/core/ext/xds/xds_api.cc, line 957 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

TODO removed.

TIL we should better use REALTIME for timestamp that might be used outside of the gRPC library. Today, we only have gpr_format_timespec which assumes the timespec is using REALTIME.


src/core/ext/xds/xds_api.cc, line 1080 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

See my comment above. Either of those approaches is fine.

Node that the actual CSDS service implementation in the wrapped language layer is going to need to convert the JSON string returned by C-core into a proto anyway, so we will still have proto contract enforcement here.

The JSON deserialization is removed.


src/core/ext/xds/xds_client.h, line 345 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I don't think it's sufficient to store the timestamp data on a per-resource-type basis, because we actually need to report it for each individual resource.

Note that for all resource types except for CDS and LDS, the server can send only those resources that have changed since the last update. If the client is subscribed to resources A, B, and C, but only resource A changes, the server can send a response containing a new version of A without affecting B or C. When that happens, A will have a newer timestamp than B or C, even though all three are the same resource type.

I think that instead of having a separate map here, we should just add a timestamp field to the existing structs for each resource type (i.e., all of the places where you added the raw_json field above).

The last update time data is now split into each resource. The update logic makes sense, I thought only incremental ADS can update resources in batches.

However, the config status (synchronization state) and version is not yet split into each resource. I would like to defer this until we have incremental ADS?


src/core/ext/xds/xds_client.h, line 348 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

As discussed in envoyproxy/envoy#14431, I think we're going to want to store this status on a per-resource basis as well, not a per-resource-type basis. So instead of having a separate map here, let's just add an XdsApi::ClientConfigStatus field to the structs where you added the raw_json field above.

See above. There might be changes down the road. I added a TODO to finish this part once incremental ADS is available.


src/core/ext/xds/xds_client.cc, line 260 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This should not be a separate method. We should just store the JSON in the existing struct that we're already updating in the AcceptLdsUpdate(), AcceptRdsUpdate(), AcceptCdsUpdate(), and AcceptEdsUpdate() methods.

Updated. Done.


src/core/ext/xds/xds_client.cc, line 2255 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please remove these log lines.

Removed.


test/cpp/end2end/xds_end2end_test.cc, line 7318 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Done.

I found it makes more sense to run with TestType(true, false, true) and TestType(true, false, false). The xDS resolver seems essential for the test xDS server to generate xDS resources. We should also test the dump with RDS on and off.


third_party/upb/BUILD, line 1 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

We're not supposed to pull in this file from the upb repo. Please delete.

Delegated to the upb upgrade PR.


third_party/wyhash/wyhash.h, line 1 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

This is needed by the upb upgrade. Esun has a better solution than directly include it under our "third_party" in #25037.

Removed.


tools/codegen/core/gen_upb_api.sh, line 176 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Should be taken care of in #25037.

Removed.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good!

I have two significant comments. One is about how upb handles Any fields. The other is about making sure we come to consensus on the right behavior of CSDS w.r.t. per-resource status reporting. If we need to add fields to the CSDS protos, let's send a PR to do that instead of waiting for someone else to do it. :)

Please let me know if you have any questions about any of this. Thanks!

Reviewed 22 of 23 files at r2.
Reviewable status: all files reviewed, 23 unresolved discussions (waiting on @lidizheng)


src/core/ext/xds/xds_api.h, line 437 at r2 (raw file):

  // Synchronization status of xDS configs against the management server.
  enum ClientConfigStatus {

This doesn't appear to be used anywhere in the XdsApi code. Is it still needed?


src/core/ext/xds/xds_api.h, line 441 at r2 (raw file):

    CLIENT_UNKNOWN,
    // Client requested the config but hasn't received any config from
    // management

Please move this back to the previous line where it was.


src/core/ext/xds/xds_api.h, line 448 at r2 (raw file):

    // Client received the config and replied with NACK. Notably, the attached
    // config dump is not the NACKed version, but the most recent accepted one.
    // If

Same here.


src/core/ext/xds/xds_api.cc, line 930 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Updated the serialization error handling to log only. It should not block the proceeding of the entire xDS-flow.

About the Any problem, it should be orthogonal to this thread. Let's wait for Josh's reply.

Leaving this one unresolved pending resolution of the Any problem. My suspicion is that we need to somehow add all embedded message types into the upb symtab in order to make this work properly.


src/core/ext/xds/xds_api.cc, line 935 at r2 (raw file):

  GPR_ASSERT(estimated_length == output_length);
  // Parse the string into JSON structs
  return Json::Parse(std::string(data, output_length), error);

The parameter here can use absl::string_view instead of std::string, to avoid making an unnecessary allocation and copy of the string.


src/core/ext/xds/xds_api.cc, line 1489 at r2 (raw file):

    // Serialize into JSON and store it in the LdsUpdateMap
    grpc_error* error = GRPC_ERROR_NONE;
    envoy_extensions_filters_network_http_connection_manager_v3_HttpConnectionManager_getmsgdef(

What exactly does this call do? Is this needed to load the HttpConnectionManager msgdef into the symtab so that it can be dumped to JSON from inside the Any field? If so, does this mean that we're going to need to explicitly call the equivalent function for every message type that might possibly be embedded in an Any field of any xDS proto?

Does this actually need to be invoked once for every call to LdsResponseParse()? I believe that the symtab has the same lifetime as the XdsApi object, so once this message type is loaded, maybe it doesn't need to be loaded again?


src/core/ext/xds/xds_api.cc, line 1495 at r2 (raw file):

        symtab, &error);
    if (error != GRPC_ERROR_NONE) {
      gpr_log(GPR_ERROR, "Failed to serialize LDS: %s",

This should not log. Just return the error.


src/core/ext/xds/xds_api.cc, line 1499 at r2 (raw file):

      GRPC_ERROR_UNREF(error);
    } else {
      (*lds_update_map)[listener_name].json = std::move(parsed_json);

Let's not do the map lookup twice.

XdsApi::LdsResourceData& lds_resource_data = (*lds_update_map)[listener_name];
lds_resource_data.json = std::move(parsed_json);
XdsApi::LdsUpdate& lds_update = lds_resource_data.resource;

src/core/ext/xds/xds_api.cc, line 1624 at r2 (raw file):

        &error);
    if (error != GRPC_ERROR_NONE) {
      gpr_log(GPR_ERROR, "Failed to serialize RDS: %s",

Don't log, just return the error.


src/core/ext/xds/xds_api.cc, line 1628 at r2 (raw file):

      GRPC_ERROR_UNREF(error);
    } else {
      (*rds_update_map)[route_config_name].json = std::move(parsed_json);

Don't do the map lookup twice.


src/core/ext/xds/xds_api.cc, line 1794 at r2 (raw file):

        symtab, &error);
    if (error != GRPC_ERROR_NONE) {
      gpr_log(GPR_ERROR, "Failed to serialize CDS: %s",

Don't log, just return the error.


src/core/ext/xds/xds_api.cc, line 1798 at r2 (raw file):

      GRPC_ERROR_UNREF(error);
    } else {
      (*cds_update_map)[cluster_name].json = std::move(parsed_json);

Don't do the map lookup twice.


src/core/ext/xds/xds_api.cc, line 2075 at r2 (raw file):

        symtab, &error);
    if (error != GRPC_ERROR_NONE) {
      gpr_log(GPR_ERROR, "Failed to serialize EDS: %s",

Don't log, just return the error.


src/core/ext/xds/xds_api.cc, line 2079 at r2 (raw file):

      GRPC_ERROR_UNREF(error);
    } else {
      (*eds_update_map)[eds_service_name].json = std::move(parsed_json);

Don't do the map lookup twice.


src/core/ext/xds/xds_client.h, line 345 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

The last update time data is now split into each resource. The update logic makes sense, I thought only incremental ADS can update resources in batches.

However, the config status (synchronization state) and version is not yet split into each resource. I would like to defer this until we have incremental ADS?

Let's say that a client subscribes to resources RDS resources A and B. It gets an initial update at version 1, which it ACKs, because both resources are valid. Then it gets an RDS update for version 2 that includes only B, because A has not changed. However, the updated version of B is invalid, so the client NACKs and continues using version 1 of B. Now the state of resource A is {current_version=1, latest_version=1, latest_state=ACK}, and the state of resource B is {current_version=1, latest_version=2, latest_state=NACK}.

How is this situation to be represented in CSDS?


src/core/ext/xds/xds_client.h, line 348 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

See above. There might be changes down the road. I added a TODO to finish this part once incremental ADS is available.

I don't think this has anything to do with the incremental API. See my comments in the Envoy issue. I think we need to come to agreement with the Envoy folks on what the exact semantics of CSDS are before we go further on this.


src/core/ext/xds/xds_client.h, line 261 at r2 (raw file):

    Json raw_json;
    // The last updated timestamp
    gpr_timespec update_time;

Why not represent this as grpc_millis?


src/core/ext/xds/xds_client.cc, line 2230 at r2 (raw file):

std::string XdsClient::DumpClientConfigInJson() {
  // Listener part
  Json dynamic_listeners = Json::Array{};

Instead of saying Json foo = Json::Array, just say Json::Array foo. That way, you can interact with the array directly, and you can add it to a Json object later in exactly the same way.

Same thing throughout.


src/core/ext/xds/xds_client.cc, line 2238 at r2 (raw file):

    (*any_packed.mutable_object())["@type"] = XdsApi::kLdsTypeUrl;
    // Creates a DynamicListenerState JSON
    Json dynamic_listener_state = Json::Object{

Similarly, this can just be Json::Object dynamic_listener_state{.

Same thing throughout.


src/core/ext/xds/xds_client.cc, line 2239 at r2 (raw file):

    // Creates a DynamicListenerState JSON
    Json dynamic_listener_state = Json::Object{
        // TODO(lidiz): update to the source of individual version

What does this mean?


src/core/ext/xds/xds_client.cc, line 2240 at r2 (raw file):

    Json dynamic_listener_state = Json::Object{
        // TODO(lidiz): update to the source of individual version
        {"version_info", resource_version_map_[XdsApi::kLdsTypeUrl]},

When converting a protobuf field name to JSON, the default conversion removes the underscore and capitalizes the next character, so this should be versionInfo.

Same thing throughout.


src/core/ext/xds/xds_client.cc, line 2250 at r2 (raw file):

    };
    // Append to the config dump
    dynamic_listeners.mutable_array()->push_back(std::move(dynamic_listener));

If you change dynamic_listeners to Json::Array as I suggested above, then you can remove the mutable_array()-> part here.


test/cpp/end2end/xds_end2end_test.cc, line 7318 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

I found it makes more sense to run with TestType(true, false, true) and TestType(true, false, false). The xDS resolver seems essential for the test xDS server to generate xDS resources. We should also test the dump with RDS on and off.

Ah, right -- if you don't run with the xds resolver, you get only EDS resources, none of the other types. Other than that, it should work, but I agree that it's better to test with the xds resolver.

Anyway, this looks good.

Copy link
Contributor Author

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review. The handling of Any fields may require some processing in XdsApi ctor. As for adding fields, see comments below.

Reviewed 6 of 23 files at r2.
Reviewable status: all files reviewed, 23 unresolved discussions (waiting on @lidizheng and @markdroth)


src/core/ext/xds/xds_api.h, line 437 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This doesn't appear to be used anywhere in the XdsApi code. Is it still needed?

It's used in the XdsClient. It's the C++ representation of the client config status in CSDS, so I found it makes more sense to stay in XdsApi?


src/core/ext/xds/xds_api.h, line 441 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please move this back to the previous line where it was.

Done.


src/core/ext/xds/xds_api.h, line 448 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Same here.

Done.


src/core/ext/xds/xds_api.cc, line 930 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Leaving this one unresolved pending resolution of the Any problem. My suspicion is that we need to somehow add all embedded message types into the upb symtab in order to make this work properly.

Added as PreloadUpbSymbols to be invoked by XdsApi's ctor. What do you think the severity of serialization failure should be?


src/core/ext/xds/xds_api.cc, line 935 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

The parameter here can use absl::string_view instead of std::string, to avoid making an unnecessary allocation and copy of the string.

Done.


src/core/ext/xds/xds_api.cc, line 1489 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

What exactly does this call do? Is this needed to load the HttpConnectionManager msgdef into the symtab so that it can be dumped to JSON from inside the Any field? If so, does this mean that we're going to need to explicitly call the equivalent function for every message type that might possibly be embedded in an Any field of any xDS proto?

Does this actually need to be invoked once for every call to LdsResponseParse()? I believe that the symtab has the same lifetime as the XdsApi object, so once this message type is loaded, maybe it doesn't need to be loaded again?

Yes, we will need to load the symbols for embedded Any fields, but the number is not huge.

This is lifted into an individual function PreloadUpbSymbols, which loads all symbols needed for parsing Any. Currently, the only the HttpConnectionManager message symbols missing is complained by upb. In foreseeable future, we need to preload all HTTPFilters as they are encoded as Any.

On the other hand, google.protobuf.Any present in 93 xDS protobuf files across versions to date. It could be quite verbose to preload all of them, and potentially slowing us down. Since the size needed to preload is small, I would recommend to just preload the ones needed immediately.


src/core/ext/xds/xds_api.cc, line 1495 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This should not log. Just return the error.

Changed to return the error.

If we return the error, the xDS flow will be interrupted by failed JSON serialization. Do you think that's a good tradeoff?


src/core/ext/xds/xds_api.cc, line 1499 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Let's not do the map lookup twice.

XdsApi::LdsResourceData& lds_resource_data = (*lds_update_map)[listener_name];
lds_resource_data.json = std::move(parsed_json);
XdsApi::LdsUpdate& lds_update = lds_resource_data.resource;

Good idea! Changed.


src/core/ext/xds/xds_api.cc, line 1624 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Don't log, just return the error.

Done.


src/core/ext/xds/xds_api.cc, line 1628 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Don't do the map lookup twice.

Done.


src/core/ext/xds/xds_api.cc, line 1794 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Don't log, just return the error.

Done.


src/core/ext/xds/xds_api.cc, line 1798 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Don't do the map lookup twice.

Done.


src/core/ext/xds/xds_api.cc, line 2075 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Don't log, just return the error.

Done.


src/core/ext/xds/xds_api.cc, line 2079 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Don't do the map lookup twice.

Done.


src/core/ext/xds/xds_client.h, line 345 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Let's say that a client subscribes to resources RDS resources A and B. It gets an initial update at version 1, which it ACKs, because both resources are valid. Then it gets an RDS update for version 2 that includes only B, because A has not changed. However, the updated version of B is invalid, so the client NACKs and continues using version 1 of B. Now the state of resource A is {current_version=1, latest_version=1, latest_state=ACK}, and the state of resource B is {current_version=1, latest_version=2, latest_state=NACK}.

How is this situation to be represented in CSDS?

You are right, I missed the case that the update may be partial, so the version could split between individual resource in the same xDS type.

CSDS has reserved the fields for version info in individual resources. Updated to provide both per-xds-type version and per-resource version. The per-xds-type version map is kept for the sake of easier ACK message creation and update subscription.

Though the per-resource version info field is not yet implemented by the Traffic Director, we can be a bit ahead of them.


src/core/ext/xds/xds_client.h, line 348 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I don't think this has anything to do with the incremental API. See my comments in the Envoy issue. I think we need to come to agreement with the Envoy folks on what the exact semantics of CSDS are before we go further on this.

We haven't heard from them for envoyproxy/envoy#14431 for a while. For the first version we ship, as comment above, we can have per-resource version, but it would be challenging to get per-resource synchronization status:

The first issue of per-resource synchronization status is circular dependency. The dumped config is defined in config_dump.proto and sync status is defined in csds.proto, which depends on config_dump.proto.

The second issue is overlap. We could know which resource has been NACKed by comparing the config dump from the gRPC process and the management server.

The third issue is compatibility with existing config dumps. Since the change is going to happen in the config dump, it will apply to Envoy and all xDS servers. Do we want them to ignore this field or implement it in future?


src/core/ext/xds/xds_client.h, line 261 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Why not represent this as grpc_millis?

What's the rule of thumb here? I'm using gpr_timespec here because previous use case (Channelz trace) uses gpr_format_timespec to convert time into RFC3339 and stores the timestamp in gpr_timespec.


src/core/ext/xds/xds_client.cc, line 2230 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of saying Json foo = Json::Array, just say Json::Array foo. That way, you can interact with the array directly, and you can add it to a Json object later in exactly the same way.

Same thing throughout.

Done. TIL the C++ ctor supports implicit type conversion like this.


src/core/ext/xds/xds_client.cc, line 2238 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Similarly, this can just be Json::Object dynamic_listener_state{.

Same thing throughout.

Done. TIL.


src/core/ext/xds/xds_client.cc, line 2239 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

What does this mean?

Updated to fetch the true version of individual resource, instead of the latest version of the type of resource.


src/core/ext/xds/xds_client.cc, line 2240 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

When converting a protobuf field name to JSON, the default conversion removes the underscore and capitalizes the next character, so this should be versionInfo.

Same thing throughout.

Done.


src/core/ext/xds/xds_client.cc, line 2250 at r2 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

If you change dynamic_listeners to Json::Array as I suggested above, then you can remove the mutable_array()-> part here.

Done.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moving in the right direction!

Please let me know if you have any questions. Thanks!

Reviewed 4 of 4 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @lidizheng)


src/core/ext/xds/xds_api.h, line 437 at r2 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

It's used in the XdsClient. It's the C++ representation of the client config status in CSDS, so I found it makes more sense to stay in XdsApi?

If it's not used here, I would say just move it to xds_client.cc.


src/core/ext/xds/xds_api.cc, line 930 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Added as PreloadUpbSymbols to be invoked by XdsApi's ctor. What do you think the severity of serialization failure should be?

The core problem is that if we can't get a valid JSON dump here, then we can't include the resource in the CSDS response. We need to figure out how to handle that.

I think there are two main cases to worry about here:

  1. The proto contains an Any field that we don't look at, so we don't actually have any code to look at that field, which means that we probably don't have the proto linked in to begin with. (I'm not worried about the case where a new Any field gets added to the API, because our code wouldn't print that field anyway. But I am worried about the case where the server starts populating an existing Any field with a new payload type that we don't know about.)

  2. The proto contains an Any field that we do look at, and we do have that proto linked in, but we forgot to add that proto to the symtab, so upb can't dump it to JSON.

The only reasonable way I can see to handle case 1 would be to have upb format the content of the Any field as bytes in the JSON output. But I don't know if upb can do that, and I also don't know if the protobuf library will be able to handle that format when it converts the JSON back to proto in the wrapped language layer. This would be something you'd need to investigate with Josh.

For case 2, we might be able to ameliorate it by having some sort of presubmit check that ensures that if we add a new proto, we remember to add it to the symtab. Although if we solve case 1 by encoding as bytes, then that would also be a reasonable fallback for this case as well.

If we address those two cases, then I think serialization failures should be very rare. If so, then we can probably handle serialization failures by just returning an error, which will cause the update to be NACKed.


src/core/ext/xds/xds_api.cc, line 1495 at r2 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Changed to return the error.

If we return the error, the xDS flow will be interrupted by failed JSON serialization. Do you think that's a good tradeoff?

See above. I think it's a question of how common a case this is. If it's rare, then I think we could tolerate just failing here.

I do agree that it would be better to not reject a config for which JSON serialization fails if we can actually parse and use the config. But if we do that, then it seems like the resource would just wind up being missing in the CSDS response, which doesn't seem like a good state to be in either. I don't see any other reasonable options.


src/core/ext/xds/xds_client.h, line 348 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

We haven't heard from them for envoyproxy/envoy#14431 for a while. For the first version we ship, as comment above, we can have per-resource version, but it would be challenging to get per-resource synchronization status:

The first issue of per-resource synchronization status is circular dependency. The dumped config is defined in config_dump.proto and sync status is defined in csds.proto, which depends on config_dump.proto.

The second issue is overlap. We could know which resource has been NACKed by comparing the config dump from the gRPC process and the management server.

The third issue is compatibility with existing config dumps. Since the change is going to happen in the config dump, it will apply to Envoy and all xDS servers. Do we want them to ignore this field or implement it in future?

Taking your issues one at a time:

  1. If we have to, we can define a new enum in config_dump.proto. Or maybe we can move the enum definition to a new file that is used by both csds.proto and config_dump.proto.

  2. I don't understand the problem here. The goal of this effort is to expose the client's state. We should not need to compare with the management server's state to determine that (and it would basically be impossible to get both states in any sort of atomic way anyway). We want to know whether or not the client has NACKed the resource, and the client should be able to tell us that directly.

  3. This is no different than any other place where we add a new field to an existing API. It's fine for existing implementations not to populate the new field.

I've made a proposal in envoyproxy/envoy#14431 about what changes we need. Please go ahead and create a PR proposing something, and we can iterate on that.


src/core/ext/xds/xds_client.h, line 261 at r2 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

What's the rule of thumb here? I'm using gpr_timespec here because previous use case (Channelz trace) uses gpr_format_timespec to convert time into RFC3339 and stores the timestamp in gpr_timespec.

You can store it as grpc_millis and then convert to gpr_timestamp when generating the JSON, so that you can still use gpr_format_timespec().

In general, we use grpc_millis internally.


src/core/ext/xds/xds_client.h, line 263 at r3 (raw file):

    gpr_timespec update_time;
    // The version of the resource
    std::string version;

As per my proposal in envoyproxy/envoy#14431, I think we should record both the version that we're currently using and the most recent version that we've seen.


src/core/ext/xds/xds_client.h, line 353 at r3 (raw file):

  // Stores the synchronization status of each resource.
  std::map<std::string /*type*/, XdsApi::ClientConfigStatus /*status*/>

It looks like even if we do store the status on a per-resource basis, we'll still need to populate the per-resource-type status at a high level, so we'll still need this data. However, it does not make sense to have resource_version_map_ and resource_status_map_ be two different maps containing exactly the same set of keys. Instead, please combine them into a single map whose value is a struct containing the version and status.


src/core/ext/xds/xds_client.cc, line 2230 at r2 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Done. TIL the C++ ctor supports implicit type conversion like this.

Yup. That's why the style guide requires declaring most single-argument ctors as explicit. :)

@lidizheng
Copy link
Contributor Author

PR #25272 might not be merged quickly. So, for testing, I guess I will create a method to abstract out how the test case get the config dump. The method will access internal method (XdsClient) for now, and fetch info using CSDS stub in future.

@markdroth
Copy link
Member

Instead of trying to abstract this out, for now, let's just add a core test that tests the XdsClient API directly. As part of a subsequent PR that actually implements the CSDS RPC service in C++, you can add separate tests for the RPC service.

@lidizheng
Copy link
Contributor Author

lidizheng commented Feb 18, 2021

Update: after copying the Proto from Envoy to gRPC, I'm able to get the CSDS service running in gRPC++. However, there is yet another mismatched field somewhere between the ground truth proto and our copied version, which caused segfault:

Thread 1 "xds_end2end_tes" received signal SIGSEGV, Segmentation fault.
0x000000000063f382 in google::protobuf::internal::RepeatedPtrFieldBase::Get<google::protobuf::RepeatedPtrField<envoy::service::status::v3::PerXdsConfig>::TypeHandler> (this=0x807160, index=0) at external/com_google_protobuf/src/google/protobuf/repeated_field.h:1707
1707	external/com_google_protobuf/src/google/protobuf/repeated_field.h: No such file or directory.
(gdb) bt
#0  0x000000000063f382 in google::protobuf::internal::RepeatedPtrFieldBase::Get<google::protobuf::RepeatedPtrField<envoy::service::status::v3::PerXdsConfig>::TypeHandler> (this=0x807160, index=0)
    at external/com_google_protobuf/src/google/protobuf/repeated_field.h:1707
#1  0x000000000063f1ee in google::protobuf::RepeatedPtrField<envoy::service::status::v3::PerXdsConfig>::Get (this=0x807160, index=0)
    at external/com_google_protobuf/src/google/protobuf/repeated_field.h:2173
#2  0x000000000063f1c4 in envoy::service::status::v3::ClientConfig::_internal_xds_config (this=0x807150, index=0)
    at bazel-out/k8-dbg/bin/src/proto/grpc/testing/xds/v3/csds.pb.h:1293
#3  0x000000000063ec3b in envoy::service::status::v3::ClientConfig::xds_config (this=0x807150, index=0)
    at bazel-out/k8-dbg/bin/src/proto/grpc/testing/xds/v3/csds.pb.h:1297
#4  0x00000000005b4af8 in grpc::testing::(anonymous namespace)::ClientStatusDiscoveryServiceTest_TestXdsClientJsonDump_Test::TestBody (this=0x7e01c0) at test/cpp/end2end/xds_end2end_test.cc:8760
...

Link to the segfaulting location.

Also, pasting the dumped client config:

{
  "xdsConfig": [{
    "listenerConfig": {
      "dynamicListeners": [{
        "activeState": {
          "lastUpdated": "2021-02-17T16:24:40.124707135Z",
          "listener": {
            "@type": "type.googleapis.com/envoy.config.listener.v3.Listener",
            "apiListener": {
              "apiListener": {
                "@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager",
                "httpFilters": [{
                  "name": "router",
                  "typedConfig": {
                    "@type": "type.googleapis.com/envoy.extensions.filters.http.router.v3.Router"
                  }
                }],
                "rds": {
                  "configSource": {
                    "ads": {}
                  },
                  "routeConfigName": "route_config_name"
                }
              }
            },
            "name": "server.example.com"
          },
          "versionInfo": "1"
        },
        "clientStatus": 3,
        "name": "server.example.com"
      }],
      "versionInfo": "1"
    }
  }, {
    "routeConfig": {
      "dynamicRouteConfigs": [{
        "clientStatus": 3,
        "lastUpdated": "2021-02-17T16:24:40.136707143Z",
        "routeConfig": {
          "@type": "type.googleapis.com/envoy.config.route.v3.RouteConfiguration",
          "name": "route_config_name",
          "virtualHosts": [{
            "domains": ["*"],
            "routes": [{
              "match": {
                "prefix": ""
              },
              "route": {
                "cluster": "cluster_name"
              }
            }]
          }]
        },
        "versionInfo": "1"
      }]
    }
  }, {
    "clusterConfig": {
      "dynamicActiveClusters": [{
        "clientStatus": 4,
        "cluster": {
          "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
          "edsClusterConfig": {
            "edsConfig": {
              "ads": {}
            },
            "serviceName": "eds_service_name"
          },
          "name": "cluster_name",
          "type": "EDS"
        },
        "errorState": {
          "details": "{\"created\":\"@1613607880.171090360\",\"description\":\"errors parsing CDS response\",\"file\":\"src/core/ext/xds/xds_api.cc\",\"file_line\":2295,\"referenced_errors\":[{\"created\":\"@1613607880.171083628\",\"description\":\"cluster_name: DiscoveryType is not valid.\",\"file\":\"src/core/ext/xds/xds_api.cc\",\"file_line\":2053}]}",
          "lastUpdateAttempt": "2021-02-17T16:24:40.170707208Z",
          "versionInfo": "3"
        },
        "lastUpdated": "2021-02-17T16:24:40.147707209Z",
        "versionInfo": "1"
      }],
      "versionInfo": "1"
    }
  }, {
    "endpointConfig": {
      "dynamicEndpointConfigs": [{
        "clientStatus": 3,
        "endpointConfig": {
          "@type": "type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",
          "clusterName": "eds_service_name",
          "endpoints": [{
            "lbEndpoints": [{
              "endpoint": {
                "address": {
                  "socketAddress": {
                    "address": "127.0.0.1",
                    "portValue": 7303
                  }
                }
              }
            }],
            "loadBalancingWeight": 3,
            "locality": {
              "region": "xds_default_locality_region",
              "subZone": "locality0",
              "zone": "xds_default_locality_zone"
            }
          }]
        },
        "lastUpdated": "2021-02-17T16:24:40.158707209Z",
        "versionInfo": "1"
      }]
    }
  }]
}

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good!

Please let me know if you have any questions. Thanks!

Reviewed 1 of 23 files at r2, 16 of 23 files at r6, 28 of 28 files at r7.
Reviewable status: all files reviewed, 30 unresolved discussions (waiting on @lidizheng)


src/core/ext/xds/xds_api.h, line 420 at r7 (raw file):

      ClusterLoadReport>;

  // Parses an ADS response.

The first line of this comment should go next to ParseAdsResponse(), since it's about the method, not the struct.


src/core/ext/xds/xds_bootstrap.h, line 91 at r7 (raw file):

  const XdsServer& server() const { return servers_[0]; }
  const Node* node() const { return node_.get(); }
  const Json& raw_node() const { return raw_node_; }

I don't think the changes to XdsBootstrap are needed, since we don't need to populate the node field in the CSDS response.


src/core/ext/xds/xds_client.h, line 352 at r7 (raw file):

      bool send_all_clusters, const std::set<std::string>& clusters);

  static void UpdateResourceMetadataAcked(ResourceMetadata& resource_metadata,

If this method is modifying resource_metadata, then that parameter should (a) be passed in as a pointer instead of a reference and (b) should be the last parameter of the method instead of the first.


src/core/ext/xds/xds_client.h, line 352 at r7 (raw file):

      bool send_all_clusters, const std::set<std::string>& clusters);

  static void UpdateResourceMetadataAcked(ResourceMetadata& resource_metadata,

These two static methods (this one and CreateUpdateFailureStateJson() below) do not need to be methods on this class. Instead, they can be standalone functions in an anonymous namespace in the .cc file.


src/core/ext/xds/xds_client.h, line 355 at r7 (raw file):

                                          std::string version,
                                          grpc_millis update_time);
  void UpdatePerResourceStatesByParsedResult(XdsApi::AdsParseResult& result);

This parameter should be passed in as either a const reference or a non-const pointer.


src/core/ext/xds/xds_client.cc, line 1181 at r7 (raw file):

}

void XdsClient::UpdateResourceMetadataAcked(ResourceMetadata& resource_metadata,

These two methods are methods on XdsClient, but they're in the middle of a bunch of methods on XdsClient::ChannelState::AdsCallState. Please move them down to the appropriate place in the file. Note that methods should be defined in the .cc file in the same order in which they are declared in the .h file.


src/core/ext/xds/xds_client.cc, line 1197 at r7 (raw file):

  if (result.parse_error == GRPC_ERROR_NONE) {
    // ADS update is accepted
    for (auto& p : result.lds_update_map) {

In the ACK case, which should be the common case, this will cause us to iterate over every resource in the response twice, once here and a second time in the Acccept?dsUpdate() methods. I think it would be better to just call UpdateResourceMetadataAcked() from inside of those methods, where we are already iterating over the result.

This method, UpdatePerResourceStatesByParsedResult(), should be called only in the NACK case (and you can rename it accordingly).


src/core/ext/xds/xds_client.cc, line 1220 at r7 (raw file):

    // ADS update is rejected and the resource names in the failed update is
    // available.
    const std::string& details = grpc_error_string(result.parse_error);

grpc_error_string() returns a const char*, so this is not a reference, it's actually a new string. And the new string isn't actually necessary here, because we're making a new copy for each resource below anyway.

Instead, I suggest making this an absl::string_view to avoid unnecessary allocation.


src/core/ext/xds/xds_client.cc, line 2302 at r7 (raw file):

}

std::string XdsClient::DumpClientConfigInJson() {

This function needs to acquire the mutex before reading the internal state of the XdsClient.


src/core/ext/xds/xds_client.cc, line 2332 at r7 (raw file):

    dynamic_listeners.push_back(std::move(dynamic_listener));
  }
  // Dump the "requested" listeners

Why are we doing this only for listeners and not for the other resource types?


src/core/ext/xds/xds_client.cc, line 2334 at r7 (raw file):

  // Dump the "requested" listeners
  for (auto& name :
       chand_->ads_calld()->GetPendingResourceNames(XdsApi::kLdsTypeUrl)) {

Consider the case where the client gets a resource from the server, then the xDS stream is terminated and the client reestablishes it, and then the client sends the initial request for the resources it was previously subscribed to on the original stream. In this case, the client will continue caching and using the old resources, so the same resource name will show up in both listener_map_ and in chand_->ads_calld()->GetPendingResourceNames(XdsApi::kLdsTypeUrl). This will result in duplicate entries for the same resource name.

To fix this, I think we need to get the set of pending resources first, then remove entries from that set as we iterate over listener_map_. That way, when we're done iterating over listener_map_, we know that anything left in the set of pending resources is actually a new pending resource that wasn't already in the cache.


src/core/ext/xds/xds_client.cc, line 2425 at r7 (raw file):

  // Creates the ClientConfig message
  Json::Object client_config{
      // TODO(lidiz): add node information once we have a proper bootstrap JSON

I don't think we need to populate the node field in the CSDS response. That's necessary when the CSDS server is an xDS server, not when it's an xDS client.


src/cpp/server/csds/csds.h, line 37 at r7 (raw file):

    : public envoy::service::status::v3::ClientStatusDiscoveryService::Service {
 public:
  // A streaming call that responds client status for each request.

Are we not going to support streaming?


src/cpp/server/csds/csds.h, line 52 at r7 (raw file):

 private:
  // Access the XdsClient in Core for ClientConfig.
  absl::Status DumpClientConfig(

This doesn't need to be a method of this class; it can be a standalone function in an anonymous namespace in the .cc file.


src/cpp/server/csds/csds.cc, line 44 at r7 (raw file):

                                       grpc::protobuf::Message* message) {
  grpc::protobuf::json::JsonParseOptions options;
  options.case_insensitive_enum_parsing = true;

Why is this needed? Can't we just generate the enum values with the appropriate case?


src/cpp/server/csds/csds.cc, line 55 at r7 (raw file):

  auto xds_client = grpc_core::XdsClient::GetOrCreate(&error);
  if (error != GRPC_ERROR_NONE) {
    return absl::UnavailableError(

We need to call GRPC_ERROR_UNREF(error) before returning, to avoid memory leak.


test/cpp/end2end/xds_end2end_test.cc, line 2288 at r7 (raw file):

    void RegisterAllServices(ServerBuilder* builder) override {
      builder->RegisterService(&csds_service_);
    };

No need for semicolons following method definitions.

Same on the next two lines.


test/cpp/end2end/xds_end2end_test.cc, line 2289 at r7 (raw file):

      builder->RegisterService(&csds_service_);
    };
    void StartAllServices() override{};

Please add a space after override. Same on the next line. (I'm surprised clang-format didn't complain about this.)


test/cpp/end2end/xds_end2end_test.cc, line 8719 at r7 (raw file):

  }

  const envoy::service::status::v3::ClientConfig& FetchClientConfig() {

The return type can't be a reference, because the object containing the value you're returning goes out of scope when the function returns. Just return it by value (non-const).


test/cpp/end2end/xds_end2end_test.cc, line 8724 at r7 (raw file):

    Status status = csds_stub_->FetchClientStatus(
        &context, envoy::service::status::v3::ClientStatusRequest(), &response);
    EXPECT_TRUE(status.ok()) << "code=" << status.error_code()

This can use ASSERT instead of EXPECT.


test/cpp/end2end/xds_end2end_test.cc, line 8726 at r7 (raw file):

    EXPECT_TRUE(status.ok()) << "code=" << status.error_code()
                             << " message=" << status.error_message();
    GPR_ASSERT(response.config_size() == 1);

Please use ASSERT_EQ() here.


test/cpp/end2end/xds_end2end_test.cc, line 8731 at r7 (raw file):

 private:
  AdminServerThread* admin_server_thread_;

Suggest using unique_ptr<> here.


test/cpp/end2end/xds_end2end_test.cc, line 8748 at r7 (raw file):

      BuildEdsResource(args, DefaultEdsServiceName()));
  // Send several RPCs to ensure the xDS setup works
  WaitForAllBackends();

This doesn't actually accomplish anything, because there's only one backend in this test.


test/cpp/end2end/xds_end2end_test.cc, line 8750 at r7 (raw file):

  WaitForAllBackends();
  CheckRpcSendOk(kNumRpcs);

Please remove blank line.


test/cpp/end2end/xds_end2end_test.cc, line 8751 at r7 (raw file):

  CheckRpcSendOk(kNumRpcs);

  auto cluster = default_cluster_;

There's no guarantee that the client sees theses changes before you call CSDS. If you want to guarantee that the client sees a given change, then you need to verify that the channel's behavior changes as a result of the change.


test/cpp/end2end/xds_end2end_test.cc, line 8757 at r7 (raw file):

  cluster.set_name(kClusterName2);
  balancers_[0]->ads_service()->SetCdsResource(cluster);

Please remove blank line.


test/cpp/end2end/xds_end2end_test.cc, line 8761 at r7 (raw file):

      FetchClientConfig();
  // Ensure there are 4 xDS config: Listener, Route, Cluster, Endpoint
  EXPECT_EQ(4, client_config.xds_config_size());

We need to actually look at the contents here.


test/cpp/end2end/xds_end2end_test.cc, line 8761 at r7 (raw file):

      FetchClientConfig();
  // Ensure there are 4 xDS config: Listener, Route, Cluster, Endpoint
  EXPECT_EQ(4, client_config.xds_config_size());

If we're not running with RDS (i.e., the server is sending the RouteConfig in the LDS response), then there will be only 3.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting closer. The structure of the matchers in the tests still needs some work.

Please let me know if you have any questions. Thanks!

Reviewed 19 of 24 files at r16, 5 of 5 files at r17.
Reviewable status: all files reviewed, 24 unresolved discussions (waiting on @lidizheng)


src/core/ext/xds/xds_api.h, line 479 at r15 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

This is used directly in xds_client.cc. If we further nest it, it might make the line setting the status quite long. Can we keep current layering?

I think it's more important to make the API understandable than it is to minimize line length. Putting this enum inside of the only struct where it is ever used makes clear that it's not something anyone needs to know about if they're not using that struct. If that weren't important, then we'd never have any nested structs, and everything would always be defined at the global level.


src/core/ext/xds/xds_api.h, line 496 at r17 (raw file):

    NACKED
  };
  static_assert(static_cast<int>(envoy_admin_v3_UNKNOWN) ==

I don't think we're actually using the UNKNOWN value anywhere anymore, so this particular assert isn't needed. We probably don't need that element in our enum, either.


src/core/ext/xds/xds_api.h, line 533 at r17 (raw file):

      std::map<absl::string_view /*resource_name*/, const ResourceMetadata*>;

  struct PerXdsResourceMetadata {

Please call this ResourceTypeMetadata. The prefix PerXds doesn't really tell the reader what this is for.


src/core/ext/xds/xds_api.h, line 538 at r17 (raw file):

  };

  using PerXdsResourceMetadataMap =

Please call this ResourceTypeMetadataMap.


src/core/ext/xds/xds_api.cc, line 2981 at r15 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Done.

For consistency with the other helper functions, please make the EncodingContext the first argument.


src/core/ext/xds/xds_api.cc, line 2990 at r15 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Done.

Same here: Please make EncodingContext the first argument.


src/core/ext/xds/xds_api.cc, line 3052 at r15 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Envoy's ConfigDump only has "version_info" field in LDS and CDS, but not RDS and EDS. The ultimate solution might be updating their proto again.

Ah, I see -- for RDS and EDS, it's in the nested Dynamic{Route,Endpoint}Config struct instead. Okay, that's fine.


src/core/ext/xds/xds_api.cc, line 3034 at r17 (raw file):

  // Create the "containers" for xDS configs
  auto* client_config = envoy_service_status_v3_ClientConfig_new(arena.ptr());
  auto* listener_per_xds_config =

How about reordering this code to do everything related to a given resource type together in one place? I think that would make it easier to follow.

auto* client_config = envoy_service_status_v3_ClientConfig_new(arena.ptr());
for (const auto& p : per_xds_resource_metadata_map) {
  absl::string_view resource_type = p.first;
  const PerXdsResourceMetadata& resource_type_metadata = p.second;
  if (resource_type == kLdsUri) {
    auto* listener_per_xds_config =
        envoy_service_status_v3_ClientConfig_add_xds_config(client_config,
                                                            arena.ptr());
    auto* listener_config_dump =
        envoy_admin_v3_ListenersConfigDump_new(arena.ptr());
    envoy_service_status_v3_PerXdsConfig_set_listener_config(
        listener_per_xds_config, listener_config_dump);
    for (const auto& q : resource_type_metadata.metadata_map) {
      absl::string_view resource_name = q.first;
      const ResourceMetadata& metadata = *q.second;
      // ...add LDS resource to listener_config_dump...
    }
  } else if (resource_type == kRdsUri) {
    // ...
  } else if (resource_type == kCdsUri) {
    // ...
  } else if (resource_type == kEdsUri) {
     // ...
  }
}

You might even be able to move some of the code for the individual resources into a templated helper function to avoid some of the duplication.


src/core/ext/xds/xds_api.cc, line 3082 at r17 (raw file):

  // Put the xDS configs into "containers"
  for (auto& per_xds_resource_metadata : per_xds_resource_metadata_map) {
    const absl::string_view& type_url = per_xds_resource_metadata.first;

absl::string_view has sufficiently small storage size that there's no real overhead of copying, so it should be treated as a POD type. No need to make this a const reference.

Same thing throughout.


src/core/ext/xds/xds_client.cc, line 54 at r15 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Removed.

You removed it from xds_client.h, but it's still here in xds_client.cc.


src/core/ext/xds/xds_client.cc, line 2274 at r15 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Done. Just curious here, when I'm using the type url, I wonder why don't we use an enum type in most part and have a enum-string map if the real value is needed?

Please use else for the kRdsTypeUrl case as well.

I thought about using an enum originally, but it turns out not to buy us much. We'd basically be constantly converting back and forth between them, and there are a couple of cases where the code also needs to be able to handle arbitrary resource types, although they shouldn't happen in practice.


src/core/ext/xds/xds_client.cc, line 886 at r17 (raw file):

// Build a resource metadata struct for ADS result accepting methods and CSDS.
XdsApi::ResourceMetadata CreateResourceMetadataAcked(
    std::string& serialized_proto, const std::string& version,

These first two arguments should be passed by value, not as a references.


src/core/ext/xds/xds_client.cc, line 891 at r17 (raw file):

  resource_metadata.serialized_proto = std::move(serialized_proto);
  resource_metadata.update_time = update_time;
  resource_metadata.version = version;

Please use std::move() here.


src/core/ext/xds/xds_client.cc, line 936 at r17 (raw file):

    // Update the listener state.
    listener_state.update = std::move(lds_update);
    listener_state.meta = CreateResourceMetadataAcked(p.second.serialized_proto,

Please use std::move() for both the serialized proto and the version.

Same thing for all resource types.


src/core/ext/xds/xds_client.cc, line 1864 at r17 (raw file):

  MutexLock lock(&mu_);
  ListenerState& listener_state = listener_map_[listener_name_str];
  listener_state.meta.client_status = XdsApi::REQUESTED;

This is incorrect. If there is already an existing watcher for this listener, then we may already have gotten the resource from the server, so we don't want to ovewrite the state here.

Given that you changed the struct definition to initialize this field to REQUESTED, I think there's no need to expicitly set it at all here. If the entry for this resource didn't already exist, then this field will get this value automatically, and if the entry did already exist, then we don't want to override its value anyway. So I think this line can just be removed.

Same thing for all resource types.


src/core/ext/xds/xds_client.cc, line 2276 at r17 (raw file):

        resource_metadata = &it->second.meta;
      }
    } else if (resource_metadata == nullptr) {

This one should probably not use an else, since you want to check even if we used one of the branches above.


src/core/ext/xds/xds_client.cc, line 2289 at r17 (raw file):

  MutexLock lock(&mu_);
  XdsApi::PerXdsResourceMetadataMap per_xds_resource_metadata_map;
  // Update per-xds-type version if available, this version corresponding to the

Please write this one resource type at a time, so it's easier to follow:

auto& lds_data = per_xds_resource_metadata_map[XdsApi::kLdsTypeUrl];
lds_data.version = resource_version_map_[XdsApi::kLdsTypeUrl];
for (const auto& p : listener_map_) {
  lds_data.resource_metadata_map[p.first] = &p.second.meta;
}

Then repeat for each resource type.


src/cpp/server/csds/csds.cc, line 44 at r15 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

For Ruby/PHP/Python, I'm sure we will be fine to directly use C++ API. Nico and I talked about this before. mmx@ used to work on a project of moving the Core surface API to C++.

To be clear, I'm not saying this approach won't work; I'm saying that it's not a good idea. We should not have wrapped languages reaching into C-core internals. If that was okay, then why do we have a C-core surface API in the first place? I think that if the wrapped languages need something, we should explicitly add a method for it in the C-core API.

The question of whether the C-core API can be moved to C++ is a completely orthogonal question. AFAIK, the main blocker for that is actually C#, but that may go away if we agree to migrate C# users to the .NET implementation. But there may be other blockers as well. That's a change that would need a lot more discussion.


test/cpp/end2end/xds_end2end_test.cc, line 8775 at r9 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Done.

Updated most CSDS tests to use matchers. Except the error_state tests, since I haven't figure out how to turn matcher into a soft failure. Checking the ADS response state is not good enough, because even if our ADS implementation received a NACK that doesn't mean the XdsClient states are fully updated.

The matcher syntax is kind of verbose... and the matcher failure is kind of hard to read, if the input is a proto message: https://paste.googleplex.com/5323531878400000. It turns the message into bytes, which means very little to human.

I don't understand what you mean by "I haven't figure out how to turn matcher into a soft failure". Why are the error tests any different from the other tests?


test/cpp/end2end/xds_end2end_test.cc, line 9120 at r11 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Well, I'm open to other validation methods.

Hmmm... Actually, in this case, the config will never be valid, so maybe we don't need the deadline at all. What if you just send the RPC without any deadline and wait for it to fail with UNAVAILABLE? The RPC should not be completed by the client until it has attempted to fetch the config and found part of it missing.


test/cpp/end2end/xds_end2end_test.cc, line 8701 at r15 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Updated to a condition check.

Without this line, the timeout is 15s which is already too long to wait and the wait time multiplier is 4.

This condition check is incredibly fragile. An engineer who adds a test in the future is not going to know that having its name include "DoesNotExist" will cause magic behavior in the test framework.

Please either do this unconditionally or make it an explicitly configured parameter somehow. Do not do it magically based on the name of the test including a particular substring.


test/cpp/end2end/xds_end2end_test.cc, line 9666 at r17 (raw file):

    }
    XdsEnd2endTest::SetUp();
    // The ServerThread picks port using PortSaver, but PortSaver will be reset

I removed PortSaver in #25561, so this comment is probably no longer relevant.


test/cpp/end2end/xds_end2end_test.cc, line 9752 at r17 (raw file):

}

MATCHER_P(

The goal of defining these matchers is to move the verbosity out of the individual tests, so that the individual tests is easier to read. But the way you've structured these matchers actually has the opposite effect: the individual tests are much more verbose and harder to follow.

Instead of defining a separate matcher for each individual field in the proto (especially like this, where the matcher makes hard-coded assumptions about the first element of two different repeated fields), we should have one matcher for each proto message that we want to verify, and the matcher should take a separate parameter for each field in the proto message that we care about. And they should be written in a way that provides easy composition of matchers.

For example, let's say you have the following matchers for a RouteConfiguration:

MATCHER_Pn(EqualsRoute, ..., "equals Route") {
  // ...verifies all fields in Route...
}

MATCHER_P3(EqualsVirtualHost, name, domains_matcher, routes_matcher, "equals VirtualHost") {
  bool retval = true;
  // Check name.
  if (arg.name() != name) {
    *result_listener << "expected name " << name << ", got " << arg.name();
    retval = false;
  }
  // Check domains.
  if (!::testing::ExplainMatchResult(domains_matcher, arg.domains(), result_listener)) {
    retval = false;
  }
  // Check routes.
  if (!::testing::ExplainMatchResult(routes_matcher, arg.routes(), result_listener)) {
    retval = false;
  }
  return retval;
}

MATCHER_P2(EqualsRouteConfiguration, name, vhost_matcher, "equals RouteConfiguration") {
  bool retval = true;
  // Check name.
  if (arg.name() != name) {
    *result_listener << "expected name " << name << ", got " << arg.name();
    retval = false;
  }
  // Check list of virtual hosts.
  if (!::testing::ExplainMatchResult(vhost_matcher, arg.virtual_hosts(), result_listener)) {
    retval = false;
  }
  return retval;
}

Now you can invoke this in a test like this:

EqualsRouteConfiguration(
    "route_config_name",
    ::testing::ElementsAre(  // list of virtual hosts
        EqualsVirtualHost(
            kServerName,  // vhost name
            ::testing::ElementsAre(kServerName),  // domains
            ::testing::ElementsAre(
                EqualsRoute(...)))));

This makes the tests much easier to read and ensures that we don't miss anything.


test/cpp/end2end/xds_end2end_test.cc, line 9821 at r17 (raw file):

      ::testing::AllOf(
          ::testing::Property(&envoy::config::core::v3::Node::id,
                              ::testing::Eq("xds_end2end_test")),

I don't think you actually need to use ::testing::Eq() here; I think it's fine to just use the bare string, which will implicitly match equality.

There are actually very few places where ::testing::Eq() is actually needed. You can probably remove it everywhere you're using it.


test/cpp/end2end/xds_end2end_test.cc, line 9834 at r17 (raw file):

  EXPECT_THAT(
      client_config.xds_config(),
      ::testing::Contains(::testing::Property(

Instead of checking that the list contains each individual entry that we care about, please use ::testing::UnorderedElementsAre(). This ensures that we see exactly the right set of elements.

If you can't hard-code the matchers for all of the elements due to conditionals (such as whether or not we're using RDS), then you can construct an array of matchers and pass it to ::testing::UnorderedElementsAreArray().


test/cpp/end2end/xds_end2end_test.cc, line 9841 at r17 (raw file):

                  ::testing::Eq("1")),
              ::testing::Property(
                  &envoy::admin::v3::ListenersConfigDump::static_listeners_size,

Instead of checking that the size is 0, check that the elements list is empty:

::testing::Property(
    &envoy::admin::v3::ListenersConfigDump::static_listeners,
    ::testing::ElementsAre()),

Copy link
Contributor Author

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I have updated the structure of the matchers and addressed other comments. (If the matchers still doesn't work, can you point me to an ideal usage of matchers?)

Reviewed 16 of 24 files at r16.
Reviewable status: all files reviewed, 24 unresolved discussions (waiting on @markdroth)


src/core/ext/xds/xds_api.h, line 479 at r15 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I think it's more important to make the API understandable than it is to minimize line length. Putting this enum inside of the only struct where it is ever used makes clear that it's not something anyone needs to know about if they're not using that struct. If that weren't important, then we'd never have any nested structs, and everything would always be defined at the global level.

Done. Previously, I found XdsApi::ACKED looks nice, and hoped to use it for other scenarios. But in this case, the ClientResourceStatus enum is only used by ResourceMetadata. I agree it is indeed a better idea to nest the struct.


src/core/ext/xds/xds_api.h, line 496 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I don't think we're actually using the UNKNOWN value anywhere anymore, so this particular assert isn't needed. We probably don't need that element in our enum, either.

Removed the UNKNOWN enum.


src/core/ext/xds/xds_api.h, line 533 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please call this ResourceTypeMetadata. The prefix PerXds doesn't really tell the reader what this is for.

Updated. PerXds is following Envoy's naming style envoy.service.status.v3.PerXdsConfig.


src/core/ext/xds/xds_api.h, line 538 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please call this ResourceTypeMetadataMap.

Updated.


src/core/ext/xds/xds_api.cc, line 2981 at r15 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

For consistency with the other helper functions, please make the EncodingContext the first argument.

Updated.


src/core/ext/xds/xds_api.cc, line 2990 at r15 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Same here: Please make EncodingContext the first argument.

Done.


src/core/ext/xds/xds_api.cc, line 3034 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

How about reordering this code to do everything related to a given resource type together in one place? I think that would make it easier to follow.

auto* client_config = envoy_service_status_v3_ClientConfig_new(arena.ptr());
for (const auto& p : per_xds_resource_metadata_map) {
  absl::string_view resource_type = p.first;
  const PerXdsResourceMetadata& resource_type_metadata = p.second;
  if (resource_type == kLdsUri) {
    auto* listener_per_xds_config =
        envoy_service_status_v3_ClientConfig_add_xds_config(client_config,
                                                            arena.ptr());
    auto* listener_config_dump =
        envoy_admin_v3_ListenersConfigDump_new(arena.ptr());
    envoy_service_status_v3_PerXdsConfig_set_listener_config(
        listener_per_xds_config, listener_config_dump);
    for (const auto& q : resource_type_metadata.metadata_map) {
      absl::string_view resource_name = q.first;
      const ResourceMetadata& metadata = *q.second;
      // ...add LDS resource to listener_config_dump...
    }
  } else if (resource_type == kRdsUri) {
    // ...
  } else if (resource_type == kCdsUri) {
    // ...
  } else if (resource_type == kEdsUri) {
     // ...
  }
}

You might even be able to move some of the code for the individual resources into a templated helper function to avoid some of the duplication.

Split the config dump logic to 4 sub-functions, like other parts of the code. And updated the main function to the suggested flow.


src/core/ext/xds/xds_api.cc, line 3082 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

absl::string_view has sufficiently small storage size that there's no real overhead of copying, so it should be treated as a POD type. No need to make this a const reference.

Same thing throughout.

TIL, done. It's like a pointer. Here we don't need to worry about lifespan of the "name" field strings, so we can use them.


src/core/ext/xds/xds_client.cc, line 54 at r15 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

You removed it from xds_client.h, but it's still here in xds_client.cc.

Done.


src/core/ext/xds/xds_client.cc, line 2274 at r15 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please use else for the kRdsTypeUrl case as well.

I thought about using an enum originally, but it turns out not to buy us much. We'd basically be constantly converting back and forth between them, and there are a couple of cases where the code also needs to be able to handle arbitrary resource types, although they shouldn't happen in practice.

Done. Sounds good.


src/core/ext/xds/xds_client.cc, line 886 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

These first two arguments should be passed by value, not as a references.

Done.


src/core/ext/xds/xds_client.cc, line 891 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please use std::move() here.

Done.


src/core/ext/xds/xds_client.cc, line 936 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please use std::move() for both the serialized proto and the version.

Same thing for all resource types.

Done.


src/core/ext/xds/xds_client.cc, line 1864 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This is incorrect. If there is already an existing watcher for this listener, then we may already have gotten the resource from the server, so we don't want to ovewrite the state here.

Given that you changed the struct definition to initialize this field to REQUESTED, I think there's no need to expicitly set it at all here. If the entry for this resource didn't already exist, then this field will get this value automatically, and if the entry did already exist, then we don't want to override its value anyway. So I think this line can just be removed.

Same thing for all resource types.

Removed. Good catch!

So, in case of both pending and active, we will still present the xDS config as ACKED instead of REQUESTED.


src/core/ext/xds/xds_client.cc, line 2276 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This one should probably not use an else, since you want to check even if we used one of the branches above.

Oops, good catch!


src/core/ext/xds/xds_client.cc, line 2289 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please write this one resource type at a time, so it's easier to follow:

auto& lds_data = per_xds_resource_metadata_map[XdsApi::kLdsTypeUrl];
lds_data.version = resource_version_map_[XdsApi::kLdsTypeUrl];
for (const auto& p : listener_map_) {
  lds_data.resource_metadata_map[p.first] = &p.second.meta;
}

Then repeat for each resource type.

Done.


src/cpp/server/csds/csds.cc, line 44 at r15 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

To be clear, I'm not saying this approach won't work; I'm saying that it's not a good idea. We should not have wrapped languages reaching into C-core internals. If that was okay, then why do we have a C-core surface API in the first place? I think that if the wrapped languages need something, we should explicitly add a method for it in the C-core API.

The question of whether the C-core API can be moved to C++ is a completely orthogonal question. AFAIK, the main blocker for that is actually C#, but that may go away if we agree to migrate C# users to the .NET implementation. But there may be other blockers as well. That's a change that would need a lot more discussion.

Added as grpc_dump_xds_configs in grpc.h.

I took another look into our Cython layer. It gave me the wrong impression that we are free to use any Core header (public or not). Currently, all non-public headers imported in Cython is related to custom IO manager, which we expect it will be gone soon. So, it's better to keep our Core headers usage straight.


test/cpp/end2end/xds_end2end_test.cc, line 8775 at r9 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I don't understand what you mean by "I haven't figure out how to turn matcher into a soft failure". Why are the error tests any different from the other tests?

In the end, updated to ::testing::Value to check if the error_state is propagated. Previously, I thought we can only use EXPECT_THAT to start a matching process, but there actually are several more entrances.


test/cpp/end2end/xds_end2end_test.cc, line 9120 at r11 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Hmmm... Actually, in this case, the config will never be valid, so maybe we don't need the deadline at all. What if you just send the RPC without any deadline and wait for it to fail with UNAVAILABLE? The RPC should not be completed by the client until it has attempted to fetch the config and found part of it missing.

When the RPC fail with UNAVAILABLE, it means the xDS configs are in DOES_NOT_EXIST status. But this test case hopes to verify if the xDS configs will have an intermediate REQUESTED state, so it only gives a 1 second timeout here.


test/cpp/end2end/xds_end2end_test.cc, line 8701 at r15 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This condition check is incredibly fragile. An engineer who adds a test in the future is not going to know that having its name include "DoesNotExist" will cause magic behavior in the test framework.

Please either do this unconditionally or make it an explicitly configured parameter somehow. Do not do it magically based on the name of the test including a particular substring.

Added a new test class CsdsShortAdsTimeoutTest. Good catch!


test/cpp/end2end/xds_end2end_test.cc, line 9666 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I removed PortSaver in #25561, so this comment is probably no longer relevant.

Removed.


test/cpp/end2end/xds_end2end_test.cc, line 9752 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

The goal of defining these matchers is to move the verbosity out of the individual tests, so that the individual tests is easier to read. But the way you've structured these matchers actually has the opposite effect: the individual tests are much more verbose and harder to follow.

Instead of defining a separate matcher for each individual field in the proto (especially like this, where the matcher makes hard-coded assumptions about the first element of two different repeated fields), we should have one matcher for each proto message that we want to verify, and the matcher should take a separate parameter for each field in the proto message that we care about. And they should be written in a way that provides easy composition of matchers.

For example, let's say you have the following matchers for a RouteConfiguration:

MATCHER_Pn(EqualsRoute, ..., "equals Route") {
  // ...verifies all fields in Route...
}

MATCHER_P3(EqualsVirtualHost, name, domains_matcher, routes_matcher, "equals VirtualHost") {
  bool retval = true;
  // Check name.
  if (arg.name() != name) {
    *result_listener << "expected name " << name << ", got " << arg.name();
    retval = false;
  }
  // Check domains.
  if (!::testing::ExplainMatchResult(domains_matcher, arg.domains(), result_listener)) {
    retval = false;
  }
  // Check routes.
  if (!::testing::ExplainMatchResult(routes_matcher, arg.routes(), result_listener)) {
    retval = false;
  }
  return retval;
}

MATCHER_P2(EqualsRouteConfiguration, name, vhost_matcher, "equals RouteConfiguration") {
  bool retval = true;
  // Check name.
  if (arg.name() != name) {
    *result_listener << "expected name " << name << ", got " << arg.name();
    retval = false;
  }
  // Check list of virtual hosts.
  if (!::testing::ExplainMatchResult(vhost_matcher, arg.virtual_hosts(), result_listener)) {
    retval = false;
  }
  return retval;
}

Now you can invoke this in a test like this:

EqualsRouteConfiguration(
    "route_config_name",
    ::testing::ElementsAre(  // list of virtual hosts
        EqualsVirtualHost(
            kServerName,  // vhost name
            ::testing::ElementsAre(kServerName),  // domains
            ::testing::ElementsAre(
                EqualsRoute(...)))));

This makes the tests much easier to read and ensures that we don't miss anything.

Split the matching logic to 21 matchers. The test bodies should be more readable now.


test/cpp/end2end/xds_end2end_test.cc, line 9821 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I don't think you actually need to use ::testing::Eq() here; I think it's fine to just use the bare string, which will implicitly match equality.

There are actually very few places where ::testing::Eq() is actually needed. You can probably remove it everywhere you're using it.

Removed. TIL there are these special cases.


test/cpp/end2end/xds_end2end_test.cc, line 9834 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of checking that the list contains each individual entry that we care about, please use ::testing::UnorderedElementsAre(). This ensures that we see exactly the right set of elements.

If you can't hard-code the matchers for all of the elements due to conditionals (such as whether or not we're using RDS), then you can construct an array of matchers and pass it to ::testing::UnorderedElementsAreArray().

Done. Added a section to prepare the matchers for RDS on or off.


test/cpp/end2end/xds_end2end_test.cc, line 9841 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of checking that the size is 0, check that the elements list is empty:

::testing::Property(
    &envoy::admin::v3::ListenersConfigDump::static_listeners,
    ::testing::ElementsAre()),

Good idea. Done.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting closer!

The matcher structure is much nicer now. I've added a few comments to make it even clearer.

Please let me know if you have any questions. Thanks!

Reviewed 6 of 6 files at r18.
Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @lidizheng)


src/core/ext/xds/xds_api.h, line 522 at r18 (raw file):

      std::map<absl::string_view /*type_url*/, ResourceTypeMetadata>;
  static_assert(
      static_cast<int>(envoy_admin_v3_REQUESTED) ==

Instead of casting both enum values to ints, just cast one to the other:

static_assert(
    static_cast<ResourceMetadata::ClientResourceStatus>(envoy_admin_v3_REQUESTED) ==
        ResourceMetadata::ClientResourceStatus::REQUESTED,
    "");

src/core/ext/xds/xds_api.h, line 592 at r18 (raw file):

  // Assemble the client config proto message and return the serialized result.
  std::string AssembleClientConfig(
      const ResourceTypeMetadataMap& per_xds_resource_metadata_map);

Please call the parameter resource_type_metadata_map.


src/core/ext/xds/xds_api.cc, line 2997 at r18 (raw file):

namespace {
google_protobuf_Timestamp* GrpcMillisToTimestamp(const EncodingContext& context,
                                                 const grpc_millis& value) {

grpc_millis is a pod type, so it can be passed in by value instead of as a const reference.


src/core/ext/xds/xds_api.cc, line 3124 at r18 (raw file):

  upb_strview kCdsTypeUrlUpb = upb_strview_makez(XdsApi::kCdsTypeUrl);
  auto* cluster_config_dump =
      envoy_admin_v3_ClustersConfigDump_new(context.arena);

Can't this just call envoy_service_status_v3_PerXdsConfig_mutable_cluster_config() instead of creating the ClusterConfig object and then adding it into the PerXdsConfig?


src/core/ext/xds/xds_api.cc, line 3178 at r18 (raw file):

  envoy_service_status_v3_PerXdsConfig_set_endpoint_config(
      per_xds_config, endpoint_config_dump);

Please remove blank lines within functions.


src/core/ext/xds/xds_client.cc, line 2289 at r17 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Done.

Please set the version field for each resource type separately instead of having a separate loop for it, since this avoids duplicate map lookups in per_xds_resource_metadata_map.


src/core/ext/xds/xds_client.cc, line 2283 at r18 (raw file):

std::string XdsClient::DumpClientConfigBinary() {
  MutexLock lock(&mu_);
  XdsApi::ResourceTypeMetadataMap per_xds_resource_metadata_map;

Please call this resource_type_metadata_map.


src/core/ext/xds/xds_client.cc, line 2368 at r18 (raw file):

// The returned bytes may contain NULL(0), so we can't use c-string.
grpc_slice grpc_dump_xds_configs() {

This is part of the C-core API, so you need to declare a grpc_core::ExecCtx instance at the start of the function.


src/cpp/server/csds/csds.cc, line 29 at r18 (raw file):

#include <string>

#include "src/core/ext/xds/xds_client.h"

This include is no longer needed.


test/cpp/end2end/xds_end2end_test.cc, line 9120 at r11 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

When the RPC fail with UNAVAILABLE, it means the xDS configs are in DOES_NOT_EXIST status. But this test case hopes to verify if the xDS configs will have an intermediate REQUESTED state, so it only gives a 1 second timeout here.

Good point. Okay, I guess we'll just have to see if this causes flakiness.


test/cpp/end2end/xds_end2end_test.cc, line 8701 at r15 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Added a new test class CsdsShortAdsTimeoutTest. Good catch!

The condition check on the test name is still here. Please remove it.


test/cpp/end2end/xds_end2end_test.cc, line 9725 at r18 (raw file):

};

#define XDS_FIELD(name) \

Please don't use macros like this. Just inline the corresponding expressions.


test/cpp/end2end/xds_end2end_test.cc, line 9752 at r18 (raw file):

           "equals DynamicListener") {
  bool ok = true;
  ok &= XDS_MATCH(has_warming_state, false);

These two should use ::testing::ElementsAre() instead.


test/cpp/end2end/xds_end2end_test.cc, line 9769 at r18 (raw file):

}

MATCHER_P2(EqListener, name, api_listener, "equals Listener") {

Please call this EqApiListener.


test/cpp/end2end/xds_end2end_test.cc, line 9789 at r18 (raw file):

  ok &= XDS_FIELD(name);
  ok &= ::testing::ExplainMatchResult(
      cluster_name, arg.virtual_hosts(0).routes(0).route().cluster(),

Instead of hard-coding the indexes here, this should be something like:

ok &= ::testing::ExplainMatchResult(
    ::testing::ElementsAre(
        ::testing::ElementsAre(
            ::testing::Property(
                &envoy::config::route::v3::Route::route,
                ::testing::Property(
                    &envoy::config::route::v3::RouteAction::cluster,
                    cluster_name)))),
    arg.virtual_hosts(), result_listener);

test/cpp/end2end/xds_end2end_test.cc, line 9860 at r18 (raw file):

  ok &= XDS_FIELD(cluster_name);
  ok &= ::testing::ExplainMatchResult(port,
                                      arg.endpoints(0)

Similar comment as on line 9789 above.


test/cpp/end2end/xds_end2end_test.cc, line 9868 at r18 (raw file):

                                      result_listener);
  ok &= ::testing::ExplainMatchResult(
      weight, arg.endpoints(0).load_balancing_weight().value(),

Same here.


test/cpp/end2end/xds_end2end_test.cc, line 9937 at r18 (raw file):

                         "envoy.lb.does_not_support_overprovisioning")));
  // Prepare matches for RDS on or off
  ::testing::Matcher<google::protobuf::Any> api_listener_matcher = ::testing::_;

Even in the RDS case, this should still check the contents of the api_listener field.


test/cpp/end2end/xds_end2end_test.cc, line 9939 at r18 (raw file):

  ::testing::Matcher<google::protobuf::Any> api_listener_matcher = ::testing::_;
  ::testing::Matcher<envoy::admin::v3::RoutesConfigDump>
      route_config_dump_matcher = ::testing::_;

In the non-RDS case, we should verify that there are no entries returned for RDS.


test/cpp/end2end/xds_end2end_test.cc, line 9961 at r18 (raw file):

                  "1", ::testing::ElementsAre(EqDynamicListener(
                           kServerName,
                           EqDynamicListenerState(

Instead of having tests directly use EqDynamicListenerState, UnpackListener, and EqListener, how about just calling those automatically from inside of EqDynamicListener? That way, the code here will be more readable:

EqDynamicListener(kServerName, "1", ClientResourceStatus::ACKED, api_listener_matcher)

Similar comments for EqDynamicRouteConfig, EqDynamicCluster, and EqDynamicEndpointConfig.


test/cpp/end2end/xds_end2end_test.cc, line 10008 at r18 (raw file):

  balancers_[0]->ads_service()->SetLdsResource(listener);
  // The old xDS configs should still be effective.
  SetNextResolution({});

There's no need to send new resolution results here, for either the parent channel or the xDS channel.


test/cpp/end2end/xds_end2end_test.cc, line 10021 at r18 (raw file):

                "1",
                ::testing::ElementsAre(EqDynamicListener(
                    kServerName, EqDynamicListenerState("1", ::testing::_),

We should check that the original listener resource is still reported here.

Same thing for all resource type NACK tests.


test/cpp/end2end/xds_end2end_test.cc, line 10239 at r18 (raw file):

};

TEST_P(CsdsShortAdsTimeoutTest, XdsConfigDumpDoesNotExist) {

Please also test this for the other 3 resource types.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

Please let me know if you have any questions. Thanks!

Reviewed 9 of 9 files at r19.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @lidizheng)


test/cpp/end2end/xds_end2end_test.cc, line 9752 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

These two should use ::testing::ElementsAre() instead.

Sorry, nevermind this -- for some reason, I thought these were repeated fields.


test/cpp/end2end/xds_end2end_test.cc, line 10021 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

We should check that the original listener resource is still reported here.

Same thing for all resource type NACK tests.

Still need to take care of this.


test/cpp/end2end/xds_end2end_test.cc, line 9860 at r19 (raw file):

MATCHER_P(UnpackListener, matcher, "is a Listener") {
  Listener config;
  arg.UnpackTo(&config);

This matcher should fail if UnpackTo() returns false.

Same for all of the Unpack* matchers.


test/cpp/end2end/xds_end2end_test.cc, line 9891 at r19 (raw file):

           api_listener_matcher, error_state, "equals DynamicListener") {
  bool ok = true;
  ok &= !arg.has_warming_state();

This should either use ::testing::ExplainMatchResult() or manually add output to result_listener when the check fails.

Same for the next one.


test/cpp/end2end/xds_end2end_test.cc, line 9994 at r19 (raw file):

                         "envoy.lb.does_not_support_overprovisioning")));
  // Prepare matches for RDS on or off
  ::testing::Matcher<google::protobuf::Any> api_listener_matcher = ::testing::_;

Might as well not bother initializing either of these to ::testing::_, since we're now setting them below in all cases.

Copy link
Contributor Author

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the reviews! Introduced more matchers and test cases, hopefully, it can make the test more readable.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @markdroth)


src/core/ext/xds/xds_api.h, line 522 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of casting both enum values to ints, just cast one to the other:

static_assert(
    static_cast<ResourceMetadata::ClientResourceStatus>(envoy_admin_v3_REQUESTED) ==
        ResourceMetadata::ClientResourceStatus::REQUESTED,
    "");

Good idea!


src/core/ext/xds/xds_api.h, line 592 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please call the parameter resource_type_metadata_map.

Done.


src/core/ext/xds/xds_api.cc, line 2997 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

grpc_millis is a pod type, so it can be passed in by value instead of as a const reference.

Done.


src/core/ext/xds/xds_api.cc, line 3124 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Can't this just call envoy_service_status_v3_PerXdsConfig_mutable_cluster_config() instead of creating the ClusterConfig object and then adding it into the PerXdsConfig?

Good catch! Updated, I used the suggested pattern for LDS/RDS, but forgot to update CDS/EDS as well.


src/core/ext/xds/xds_api.cc, line 3178 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please remove blank lines within functions.

Done.


src/core/ext/xds/xds_client.cc, line 2289 at r17 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please set the version field for each resource type separately instead of having a separate loop for it, since this avoids duplicate map lookups in per_xds_resource_metadata_map.

Can we keep the way it is? The resource_version_map_ isn't guaranteed to have an entry for each of the 4 xDS types. If we don't use a loop, we will need to use the iterator-find pattern.

More specifically, we may save the lookups for resource_type_metadata_map but it costs us to lookup 4 times for resource_version_map_ which is more expensive than iterating and less cache friendly than iterating through it.


src/core/ext/xds/xds_client.cc, line 2283 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please call this resource_type_metadata_map.

Done.


src/core/ext/xds/xds_client.cc, line 2368 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This is part of the C-core API, so you need to declare a grpc_core::ExecCtx instance at the start of the function.

Added both ExecCtx and ApplicationCallbackExecCtx. Just curious, I was referencing Channelz's surface API, should Channelz also add these two context?


src/cpp/server/csds/csds.cc, line 29 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This include is no longer needed.

Done.


test/cpp/end2end/xds_end2end_test.cc, line 8701 at r15 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

The condition check on the test name is still here. Please remove it.

Removed.


test/cpp/end2end/xds_end2end_test.cc, line 9725 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please don't use macros like this. Just inline the corresponding expressions.

Done. As discussed offline, we should not add new macros to Core or tests.


test/cpp/end2end/xds_end2end_test.cc, line 9769 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please call this EqApiListener.

Technically, this matcher expects a envoy.config.listener.v3.Listener message. Should we still need to rename it to EqApiListener?

https://github.com/envoyproxy/envoy/blob/main/api/envoy/config/listener/v3/listener.proto#L39


test/cpp/end2end/xds_end2end_test.cc, line 9789 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of hard-coding the indexes here, this should be something like:

ok &= ::testing::ExplainMatchResult(
    ::testing::ElementsAre(
        ::testing::ElementsAre(
            ::testing::Property(
                &envoy::config::route::v3::Route::route,
                ::testing::Property(
                    &envoy::config::route::v3::RouteAction::cluster,
                    cluster_name)))),
    arg.virtual_hosts(), result_listener);

Done.


test/cpp/end2end/xds_end2end_test.cc, line 9860 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Similar comment as on line 9789 above.

Done.


test/cpp/end2end/xds_end2end_test.cc, line 9868 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Same here.

Done.


test/cpp/end2end/xds_end2end_test.cc, line 9937 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Even in the RDS case, this should still check the contents of the api_listener field.

Done.


test/cpp/end2end/xds_end2end_test.cc, line 9939 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

In the non-RDS case, we should verify that there are no entries returned for RDS.

Done.


test/cpp/end2end/xds_end2end_test.cc, line 9961 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of having tests directly use EqDynamicListenerState, UnpackListener, and EqListener, how about just calling those automatically from inside of EqDynamicListener? That way, the code here will be more readable:

EqDynamicListener(kServerName, "1", ClientResourceStatus::ACKED, api_listener_matcher)

Similar comments for EqDynamicRouteConfig, EqDynamicCluster, and EqDynamicEndpointConfig.

Done. Compressed the depth of nesting matchers.


test/cpp/end2end/xds_end2end_test.cc, line 10008 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

There's no need to send new resolution results here, for either the parent channel or the xDS channel.

Removed.


test/cpp/end2end/xds_end2end_test.cc, line 10021 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Still need to take care of this.

Done.


test/cpp/end2end/xds_end2end_test.cc, line 10239 at r18 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please also test this for the other 3 resource types.

3 more test cases added.


test/cpp/end2end/xds_end2end_test.cc, line 9860 at r19 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This matcher should fail if UnpackTo() returns false.

Same for all of the Unpack* matchers.

Good catch! Done.


test/cpp/end2end/xds_end2end_test.cc, line 9891 at r19 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This should either use ::testing::ExplainMatchResult() or manually add output to result_listener when the check fails.

Same for the next one.

Good catch! Done.


test/cpp/end2end/xds_end2end_test.cc, line 9994 at r19 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Might as well not bother initializing either of these to ::testing::_, since we're now setting them below in all cases.

Done.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's just one small nit remaining. Feel free to merge after addressing.

Thanks for doing this!

Reviewed 1 of 1 files at r20.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @lidizheng)


src/core/ext/xds/xds_client.cc, line 2368 at r18 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Added both ExecCtx and ApplicationCallbackExecCtx. Just curious, I was referencing Channelz's surface API, should Channelz also add these two context?

If it's not already doing so, it should be, yes. Feel free to send me a separate PR for that.


test/cpp/end2end/xds_end2end_test.cc, line 9860 at r19 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Good catch! Done.

I think that if ok is false after UnpackTo() runs, then there's no point in invoking the matcher for the contents, because we know it will fail.

Copy link
Contributor Author

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the 20 rounds of review and hundreds of comments! Really appreciate the help!

Reviewed 4 of 9 files at r19.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @markdroth)


test/cpp/end2end/xds_end2end_test.cc, line 9860 at r19 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I think that if ok is false after UnpackTo() runs, then there's no point in invoking the matcher for the contents, because we know it will fail.

Done!

@lidizheng
Copy link
Contributor Author

Known failure: both interop tests failed due to #25737.

@lidizheng lidizheng merged commit 27de24a into grpc:master Mar 17, 2021
lidizheng added a commit that referenced this pull request Mar 17, 2021
lidizheng added a commit that referenced this pull request Mar 17, 2021
lidizheng added a commit that referenced this pull request Mar 17, 2021
lidizheng added a commit to lidizheng/grpc that referenced this pull request Mar 18, 2021
lidizheng added a commit to lidizheng/grpc that referenced this pull request Mar 19, 2021
lidizheng added a commit to lidizheng/grpc that referenced this pull request Mar 22, 2021
lidizheng added a commit that referenced this pull request Mar 22, 2021
* Revert "Revert "CSDS Implementation (#25038)" (#25745)"

This reverts commit 98fd4e1.

* Add xDS special Bazel build rules

* Add 2 todos to remove the added rules
@lidizheng lidizheng changed the title CSDS Implementation Implementation CSDS (xDS Config Dump) Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants