Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unify metrics #2897

Open
BulatSaif opened this issue Mar 26, 2024 · 5 comments
Open

unify metrics #2897

BulatSaif opened this issue Mar 26, 2024 · 5 comments
Assignees

Comments

@BulatSaif
Copy link
Contributor

We are setting up a Kusama relay and noticed that alerts created for Westend will not work for Kusama. Prometheus metrics have unique names for each chain, which makes it impossible to reuse Grafana dashboards or alerts.

The current metrics are in an incorrect format.

BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_source_at_target_block_number{}=2889751

possible new format:

substrate_relay_best_source_at_target_block_number{name="BridgeHubRococo_to_BridgeHubWestend", lane="00000002"}=2889751

All metric should have same prefix (e.g substrate_relay), name and lane should be set as labels

Here is current metrics
# HELP BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_source_at_target_block_number Best block number at the source_at_target
# TYPE BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_source_at_target_block_number gauge
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_source_at_target_block_number 2889751
# HELP BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_source_block_number Best block number at the source
# TYPE BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_source_block_number gauge
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_source_block_number 2891780
# HELP BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_target_at_source_block_number Best block number at the target_at_source
# TYPE BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_target_at_source_block_number gauge
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_target_at_source_block_number 3069239
# HELP BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_target_block_number Best block number at the target
# TYPE BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_target_block_number gauge
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_best_target_block_number 3071212
# HELP BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_is_source_and_source_at_target_using_different_forks Whether the best finalized source block at target node is different (value 1) from the corresponding block at the source node
# TYPE BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_is_source_and_source_at_target_using_different_forks gauge
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_is_source_and_source_at_target_using_different_forks 0
# HELP BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_is_target_and_target_at_source_using_different_forks Whether the best finalized source block at target node is different (value 1) from the corresponding block at the source node
# TYPE BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_is_target_and_target_at_source_using_different_forks gauge
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_is_target_and_target_at_source_using_different_forks 0
# HELP BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_lane_state_nonces Nonces of the lane state
# TYPE BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_lane_state_nonces gauge
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_lane_state_nonces{type="source_latest_confirmed"} 395
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_lane_state_nonces{type="source_latest_generated"} 395
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_lane_state_nonces{type="target_latest_confirmed"} 394
BridgeHubRococo_to_BridgeHubWestend_MessageLane_00000002_lane_state_nonces{type="target_latest_received"} 395
# HELP BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_source_at_target_block_number Best block number at the source_at_target
# TYPE BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_source_at_target_block_number gauge
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_source_at_target_block_number 3069239
# HELP BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_source_block_number Best block number at the source
# TYPE BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_source_block_number gauge
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_source_block_number 3071212
# HELP BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_target_at_source_block_number Best block number at the target_at_source
# TYPE BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_target_at_source_block_number gauge
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_target_at_source_block_number 2889751
# HELP BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_target_block_number Best block number at the target
# TYPE BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_target_block_number gauge
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_best_target_block_number 2891780
# HELP BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_is_source_and_source_at_target_using_different_forks Whether the best finalized source block at target node is different (value 1) from the corresponding block at the source node
# TYPE BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_is_source_and_source_at_target_using_different_forks gauge
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_is_source_and_source_at_target_using_different_forks 0
# HELP BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_is_target_and_target_at_source_using_different_forks Whether the best finalized source block at target node is different (value 1) from the corresponding block at the source node
# TYPE BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_is_target_and_target_at_source_using_different_forks gauge
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_is_target_and_target_at_source_using_different_forks 0
# HELP BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_lane_state_nonces Nonces of the lane state
# TYPE BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_lane_state_nonces gauge
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_lane_state_nonces{type="source_latest_confirmed"} 345
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_lane_state_nonces{type="source_latest_generated"} 345
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_lane_state_nonces{type="target_latest_confirmed"} 344
BridgeHubWestend_to_BridgeHubRococo_MessageLane_00000002_lane_state_nonces{type="target_latest_received"} 345
# HELP Rococo_to_BridgeHubWestend_Sync_best_source_at_target_block_number Best block number at the source_at_target
# TYPE Rococo_to_BridgeHubWestend_Sync_best_source_at_target_block_number gauge
Rococo_to_BridgeHubWestend_Sync_best_source_at_target_block_number 9738473
# HELP Rococo_to_BridgeHubWestend_Sync_best_source_block_number Best block number at the source
# TYPE Rococo_to_BridgeHubWestend_Sync_best_source_block_number gauge
Rococo_to_BridgeHubWestend_Sync_best_source_block_number 9738473
# HELP Rococo_to_BridgeHubWestend_Sync_is_source_and_source_at_target_using_different_forks Whether the best finalized source block at target node is different (value 1) from the corresponding block at the source node
# TYPE Rococo_to_BridgeHubWestend_Sync_is_source_and_source_at_target_using_different_forks gauge
Rococo_to_BridgeHubWestend_Sync_is_source_and_source_at_target_using_different_forks 0
# HELP Westend_to_BridgeHubRococo_Sync_best_source_at_target_block_number Best block number at the source_at_target
# TYPE Westend_to_BridgeHubRococo_Sync_best_source_at_target_block_number gauge
Westend_to_BridgeHubRococo_Sync_best_source_at_target_block_number 20126360
# HELP Westend_to_BridgeHubRococo_Sync_best_source_block_number Best block number at the source
# TYPE Westend_to_BridgeHubRococo_Sync_best_source_block_number gauge
Westend_to_BridgeHubRococo_Sync_best_source_block_number 20126282
# HELP Westend_to_BridgeHubRococo_Sync_is_source_and_source_at_target_using_different_forks Whether the best finalized source block at target node is different (value 1) from the corresponding block at the source node
# TYPE Westend_to_BridgeHubRococo_Sync_is_source_and_source_at_target_using_different_forks gauge
Westend_to_BridgeHubRococo_Sync_is_source_and_source_at_target_using_different_forks 0
# HELP at_BridgeHubRococo_relay_BridgeHubWestendMessages_balance Balance of the BridgeHubWestendMessages relay account at the BridgeHubRococo
# TYPE at_BridgeHubRococo_relay_BridgeHubWestendMessages_balance gauge
at_BridgeHubRococo_relay_BridgeHubWestendMessages_balance 199.99850050765502
# HELP at_BridgeHubRococo_relay_BridgeHubWestendMessages_reward_for_msgs_from_BridgeHubWestend_on_lane_00000002 Reward of the BridgeHubWestendMessages relay account at BridgeHubRococo for delivering messages from BridgeHubWestend on lane [0, 0, 0, 2]
# TYPE at_BridgeHubRococo_relay_BridgeHubWestendMessages_reward_for_msgs_from_BridgeHubWestend_on_lane_00000002 gauge
at_BridgeHubRococo_relay_BridgeHubWestendMessages_reward_for_msgs_from_BridgeHubWestend_on_lane_00000002 0.001094992722
# HELP at_BridgeHubRococo_relay_BridgeHubWestendMessages_reward_for_msgs_to_BridgeHubWestend_on_lane_00000002 Reward of the BridgeHubWestendMessages relay account at BridgeHubRococo for delivering messages confirmations from BridgeHubWestend on lane [0, 0, 0, 2]
# TYPE at_BridgeHubRococo_relay_BridgeHubWestendMessages_reward_for_msgs_to_BridgeHubWestend_on_lane_00000002 gauge
at_BridgeHubRococo_relay_BridgeHubWestendMessages_reward_for_msgs_to_BridgeHubWestend_on_lane_00000002 0.000356661027
# HELP at_BridgeHubWestend_relay_BridgeHubRococoMessages_balance Balance of the BridgeHubRococoMessages relay account at the BridgeHubWestend
# TYPE at_BridgeHubWestend_relay_BridgeHubRococoMessages_balance gauge
at_BridgeHubWestend_relay_BridgeHubRococoMessages_balance 555.094752173248
# HELP at_BridgeHubWestend_relay_BridgeHubRococoMessages_reward_for_msgs_from_BridgeHubRococo_on_lane_00000002 Reward of the BridgeHubRococoMessages relay account at BridgeHubWestend for delivering messages from BridgeHubRococo on lane [0, 0, 0, 2]
# TYPE at_BridgeHubWestend_relay_BridgeHubRococoMessages_reward_for_msgs_from_BridgeHubRococo_on_lane_00000002 gauge
at_BridgeHubWestend_relay_BridgeHubRococoMessages_reward_for_msgs_from_BridgeHubRococo_on_lane_00000002 0.152400542016
# HELP at_BridgeHubWestend_relay_BridgeHubRococoMessages_reward_for_msgs_to_BridgeHubRococo_on_lane_00000002 Reward of the BridgeHubRococoMessages relay account at BridgeHubWestend for delivering messages confirmations from BridgeHubRococo on lane [0, 0, 0, 2]
# TYPE at_BridgeHubWestend_relay_BridgeHubRococoMessages_reward_for_msgs_to_BridgeHubRococo_on_lane_00000002 gauge
at_BridgeHubWestend_relay_BridgeHubRococoMessages_reward_for_msgs_to_BridgeHubRococo_on_lane_00000002 44.687927316672
# HELP process_cpu_usage_percentage Process CPU usage
# TYPE process_cpu_usage_percentage gauge
process_cpu_usage_percentage 0.6120887398719788
# HELP process_memory_usage_bytes Process memory (resident set size) usage
# TYPE process_memory_usage_bytes gauge
process_memory_usage_bytes 81155588096
# HELP substrate_relay_build_info A metric with a constant '1' value labeled by version
# TYPE substrate_relay_build_info gauge
substrate_relay_build_info{commit="ccf18d62-clean",version="1.0.1"} 1
# HELP system_average_load System load average
# TYPE system_average_load gauge
system_average_load{over="15min"} 0.43
system_average_load{over="1min"} 0.37
system_average_load{over="5min"} 0.41
@bkontur bkontur self-assigned this Mar 27, 2024
@bkontur
Copy link
Contributor

bkontur commented Mar 27, 2024

@BulatSaif I can take a look, how is this time-sensitive?

@BulatSaif
Copy link
Contributor Author

BulatSaif commented Mar 27, 2024

@BulatSaif I can take a look, how is this time-sensitive?

It's not critical. Ideally we would like to have it before the Polkadot-Kusama relay launch. But probably it is too late for such feature request.

@bkontur
Copy link
Contributor

bkontur commented Mar 27, 2024

@BulatSaif I can take a look, how is this time-sensitive?

It's not critical. Ideally we would like to have it before the Polkadot-Kusama relay launch. But probably it is to late for such feature request.

ok, let me prioritize it

@bkontur
Copy link
Contributor

bkontur commented Mar 27, 2024

I started with some investigation and analyze, what we have now.
https://prometheus.io/docs/practices/naming/#labels

Hints:

  • BridgeHubRococo_to_BridgeHubWestend for messages - SOURCE_NAME / TARGET_NAME
  • Rococo_to_BridgeHubWestend or Westend_to_BridgeHubRococo for finality - SOURCE_NAME / TARGET_NAME
  • at_BridgeHubRococo_relay_BridgeHubWestendMessages - C::NAME / account.tag
  • Rococo_to_BridgeHubWestend_Parachains_best_parachain_block_number_at_source - RelayChain::NAME / TargetChain::NAME + Source:PARACHAIN_ID

Questions / open points

  • Do we want to run different substrate-relay process/docker (e.g. for different lanes or we won't use complex relayer, but separate relay for finality, relay for parachains, relay for messages, ...)? (resolved: see comment)
    • Do they share the same prometheus host/port?
      • What about relayer for BridgeHubRococo vs RococoBulletin? The same prometheus host/port?
    • How do we identify e.g. system_average_load or process_cpu_usage_percentage or process_memory_usage_bytes or substrate_relay_build_info which relayer instance is it for?
      • maybe to add --relayer-name as a optional parameter on startup? And pass it to every metric possible like label relayer=XYZ?
  • Add label name or bridge to all metrics (including process_cpu_usage_percentage or system_average_load)?
    • it could identify relayer deployment BridgeHubRococo <> BridgeHubWestend or BridgeHubKusama vs BridgeHubPolkadot or BridgeHubRococo vs RococoBulettin or BridgeHubPolkadot vs PolkadotBulletin
    • I don't know maybe we don't need it now?
  • Convert CHAIN prefixes to label like @BulatSaif suggests
    • for SOURCE_NAME / TARGET_NAME or RelayChain::NAME / TargetChain::NAME it is ok I guess, name=<CHAIN1_to_CHAIN2>
    • but for C::NAME / account.tag it should be two labels: {name=<CHAIN>, account=<account.tag>
      • if we use different/multiple relayer accounts, account.tag will not be sufficient, maybe we should change it to 5GxRGwT8bU1JeBPTUXc7LEjZMxNrK8MyL2NJnkWFQJTQ4sii or short 5GxRGw...
  • ...

TODO

  • convert every prefix which contains lane to label as {lane="00000002"}
    • I think adding lane as label makes sense, because in the near feature, we could have another lane for a governance between CollectivesPolkadot vs Kusama (so we could reuse the same relayer instance for several lanes)
  • looks like the same could apply to Source:PARACHAIN_ID, so we can convert prefixes with paraId to the label paraId=1002 for prefixes like _Parachains (I know this is not our case now, because we use just BridgeHubParaIds)
  • convert SOURCE_NAME / TARGET_NAME or RelayChain::NAME / TargetChain::NAME to the name=<CHAIN1_to_CHAIN2>
  • fix docs for {}_Parachains_{} samples - do not contain PARA_ID https://github.com/paritytech/parity-bridges-common/blob/polkadot-staging/relays/parachains/src/parachains_loop.rs#L124-L127
  • ...

@BulatSaif
Copy link
Contributor Author

How do we identify e.g. system_average_load or process_cpu_usage_percentage or process_memory_usage_bytes or substrate_relay_build_info which relayer instance is it for?

When we scrape, we populate metrics with additional labels. For example:

at_BridgeHubKusama_relay_BridgeHubPolkadotMessages_balance{container="bridges-common-relay", domain="parity-chains", instance="10.148.27.93:9615", job="bridges-common-relay", namespace="bridges-common-relay", pod="bridges-common-relay-7f6c4fc687-mp57l", prometheus="monitoring/prometheus-stack-kube-prom-prometheus", service="bridges-common-relay"}

Combining the namespace and pod labels is enough to locate which instance it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants