Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSM O11y test] Ping GMP endpoint during test for debugging purpose #33

Merged
merged 29 commits into from
May 21, 2024

Conversation

stanley-cheung
Copy link
Contributor

@stanley-cheung stanley-cheung commented Feb 7, 2024

  • Ping GMP endpoint in client and server pod <pod IP>:9464/metrics during the CSM Observability test
  • This will provide insight into what GMP sees from the OTel plugin, before sending metrics to Cloud Monitoring
  • This will help debug in case we get these extra metrics time series without a valid data point, in a future test flake.

- Ping GMP endpoint in client and server pod curl localhost:9464/metrics during test
- This will provide insight into what GMP gets from the OTel plugin before sending metrics to Cloud Monitoring
@sergiitk
Copy link
Member

sergiitk commented Feb 7, 2024

I most definitely won't recommend this approach. Instead, we should consider making normal HTTP requests using requests or urllib3.

@stanley-cheung stanley-cheung marked this pull request as draft February 7, 2024 18:44
- But this doesn't work yet. Can't quite figure out what's the exact request URL to use. Tried quite a few combinations and none of them works.
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 7, 2024

I most definitely won't recommend this approach. Instead, we should consider making normal HTTP requests using requests or urllib3.

Do you know what the request URL should I be using? I tried quite a few combinations but none of them works

  • http://{server pod name}.{server namespace}.svc.cluster.local:9464/metrics
  • http://{server pod name}.{server namespace}.pod.cluster.local:9464/metrics
  • http://{frontend service name}.{server namespace}.svc.cluster.local:9464/metrics
  • http://{server pod IP replaced dots with dashes}.{server namespace}.pod.cluster.local:9464/metrics
  • http://{server pod IP}:9464/metrics

If I supply {server pod name}, {server namespace}, {container name} into pod_exec and then run curl -s localhost:9464/metrics, it does work. So it seems like it's a matter arranging {server pod name}, {server namespace}, {container name} into the correct FQDN?

I just realized I am running the requests.get() from the main test driver (csm_observability_test.py). That is outside of the cluster isn't it? So it won't understand the pod IP address or anything .svc.cluster.local isn't it.

- `enable_csm_observability` is now a constructor parameter for the client runner and the server runner, instead of a `run()` parameter.
- If `enable_csm_observability`, we want to do a port forwarding for the `pod_monitoring_port = 9464`, such that we can issue a GET request against it in the CSM test
- Added `monitoring_port` and `monitoring_host` as class variable to the client runner and server runner.
- Override `_xds_test_server_for_pod` in the `GammaServerRunner` class
- Minimize the places where we need to hardcode the port `9464`
@stanley-cheung stanley-cheung changed the title [CSM O11y test] [DO NOT MERGE] Ping GMP endpoint regularly for debug purpose [CSM O11y test] Ping GMP endpoint regularly for debug purpose Feb 8, 2024
@stanley-cheung stanley-cheung changed the title [CSM O11y test] Ping GMP endpoint regularly for debug purpose [CSM O11y test] Ping GMP endpoint during test for debugging purpose Feb 8, 2024
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 8, 2024

@stanley-cheung stanley-cheung marked this pull request as ready for review February 8, 2024 01:03
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 8, 2024

tests/gamma/csm_observability_test.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
framework/test_app/runners/k8s/k8s_xds_client_runner.py Outdated Show resolved Hide resolved
tests/gamma/csm_observability_test.py Outdated Show resolved Hide resolved
tests/gamma/csm_observability_test.py Outdated Show resolved Hide resolved
framework/test_app/runners/k8s/gamma_server_runner.py Outdated Show resolved Hide resolved
framework/test_app/runners/k8s/gamma_server_runner.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
- Added a PrometheusLogger class to write the prometheus endpoint log to a separate log file related to logs_subdir
- monitoring_port is now stored in XdsTestClient and XdsTestServer rather than the runner
- use rpc_host instead of a separate monitoring_host
- put the constant DEFAULT_MONITORING_PORT in the KubernetesBaseRunner
- fixed requirements.txt and requirements.lock
- removed a previous hack is_legit_time_series()
- renamed function to ping_prometheus_endpoint instead of referring it as GMP
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 21, 2024

@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 21, 2024

@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Mar 5, 2024

Copy link
Member

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with

  1. FIxing the log collection arg
  2. Running full csm test suite + lb test suite

tests/gamma/csm_observability_test.py Outdated Show resolved Hide resolved
tests/gamma/csm_observability_test.py Outdated Show resolved Hide resolved
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented May 17, 2024

@stanley-cheung
Copy link
Contributor Author

@sergiitk Incorporated latest feedback. Using ClientDeploymentArgs now. Will do ServerDeploymentArgs in a separate PR.

Want to do a final check and merge? Thanks.

Copy link
Member

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that a bunch grpc/core/master/linux/psm-csm test run failed with

Traceback (most recent call last):
  File "/tmp/tmp.2ObQJ9ydNu/psm-interop/tests/app_net_ssa_test.py", line 111, in test_session_affinity_policy
    test_client: _XdsTestClient = self.startTestClient(
  File "/tmp/tmp.2ObQJ9ydNu/psm-interop/framework/xds_k8s_testcase.py", line 928, in startTestClient
    return self._start_test_client(test_server.xds_uri, **kwargs)
  File "/tmp/tmp.2ObQJ9ydNu/psm-interop/framework/xds_k8s_testcase.py", line 831, in _start_test_client
    test_client = self.client_runner.run(
  File "/tmp/tmp.2ObQJ9ydNu/psm-interop/framework/test_app/runners/k8s/k8s_xds_client_runner.py", line 188, in run
    **self.deployment_args.as_dict(),
AttributeError: 'NoneType' object has no attribute 'as_dict'

@sergiitk
Copy link
Member

The lb test suite failed with the same error.

@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented May 20, 2024

@sergiitk sergiitk merged commit d07a3ab into grpc:main May 21, 2024
7 checks passed
@stanley-cheung stanley-cheung deleted the ping-gmp-endpoint branch May 21, 2024 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants