Some debug-span and debug-log changes to help with filtering during tracing analysis #11289

tayfunelmas · 2024-05-10T19:44:58Z

This is not the complete list of changes I plan to do but wanted to get a first batch reviewed first.

There are two kinds of changes applied to debug logs and traces that would help filtering the them when analyzing traces (either through the new tracing UI or through the log files):

Simplify the name, eg. instead of full sentence (unless sentence makes more sense), replace with an identifier-like name (in most cases reflecting the type of operation or function). Move the params substituted in the string to separate log/span fields.
Add missing fields and attempt to standardize the naming for the fields so that when it comes to filtering traces for a certain analysis we know what kind of fields are available across the logs/traces for the same kind of entity/operation. For example:

shard_id for shard id (also prefer shard_id over UID as it has extra version and it can be obtain by other means).
sync_hash for hash of the state-sync
height for the block height (use last_has and prev_hash if it is not clear from the context)
block_hash for hash of the block
chunk_hash/chunk_hashes for hash of the chunk
part_id for the id of the state-sync part
error for the error type or message
sync_type for the sync type (eg. block, head, state)
height_included for the block height a chunk is included
height_created for the block height a chunk is created

…into more-tracing

codecov · 2024-05-10T20:11:51Z

Codecov Report

Attention: Patch coverage is 57.33333% with 32 lines in your changes are missing coverage. Please review.

Project coverage is 71.10%. Comparing base (6e96382) to head (53febb4).
Report is 1 commits behind head on master.

Files	Patch %	Lines
chain/chain/src/chain.rs	35.71%	5 Missing and 4 partials ⚠️
.../stateless_validation/chunk_endorsement_tracker.rs	0.00%	2 Missing and 5 partials ⚠️
...alidation/partial_witness/partial_witness_actor.rs	0.00%	5 Missing and 1 partial ⚠️
...idation/partial_witness/partial_witness_tracker.rs	0.00%	0 Missing and 2 partials ⚠️
...src/stateless_validation/state_witness_producer.rs	50.00%	0 Missing and 2 partials ⚠️
chain/client/src/sync_jobs_actor.rs	60.00%	1 Missing and 1 partial ⚠️
chain/chain-primitives/src/error.rs	0.00%	0 Missing and 1 partial ⚠️
chain/chain/src/chain_update.rs	80.00%	0 Missing and 1 partial ⚠️
...nt/src/stateless_validation/chunk_validator/mod.rs	50.00%	1 Missing ⚠️
...client/src/stateless_validation/shadow_validate.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #11289      +/-   ##
==========================================
+ Coverage   71.08%   71.10%   +0.01%     
==========================================
  Files         783      783              
  Lines      156875   156813      -62     
  Branches   156875   156813      -62     
==========================================
- Hits       111517   111495      -22     
+ Misses      40522    40495      -27     
+ Partials     4836     4823      -13

Flag	Coverage Δ
backward-compatibility	`0.24% <0.00%> (+<0.01%)`	⬆️
db-migration	`0.24% <0.00%> (+<0.01%)`	⬆️
genesis-check	`1.39% <0.00%> (+<0.01%)`	⬆️
integration-tests	`37.12% <57.33%> (-0.04%)`	⬇️
linux	`68.82% <53.33%> (+0.03%)`	⬆️
linux-nightly	`70.53% <57.33%> (+<0.01%)`	⬆️
macos	`52.22% <37.33%> (+0.05%)`	⬆️
pytests	`1.61% <0.00%> (+<0.01%)`	⬆️
sanity-checks	`1.40% <0.00%> (+<0.01%)`	⬆️
unittests	`65.52% <40.00%> (+0.01%)`	⬆️
upgradability	`0.29% <0.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

chain/chain/src/chain.rs

nagisa · 2024-05-13T06:06:10Z

chain/chain/src/chain.rs

@@ -2525,7 +2519,7 @@ impl Chain {
            "get_state_response_part",
            shard_id,
            part_id,
-            %sync_hash)
+            ?sync_hash)


Why was this change necessary? By default Display is intended for anything reaches user eyes, including logs...

Looks like they are the same

impl fmt::Debug for CryptoHash { fn fmt(&self, fmtr: &mut fmt::Formatter<'_>) -> fmt::Result { fmt::Display::fmt(self, fmtr) } } impl fmt::Display for CryptoHash { fn fmt(&self, fmtr: &mut fmt::Formatter<'_>) -> fmt::Result { self.to_base58_impl(|encoded| fmtr.write_str(encoded)) } }

(Though for hashes in particular, I have a habit of using Debug, because I think the ethereum libraries implement Display by displaying the hash with ellipsis in the middle and it's really annoying. I guess Near code doesn't do that but the habit and fear has already formed :) )

Also for the consistency because in majority of other places hashes are logged with Debug.

chain/chain/src/chain_update.rs

chain/client/src/client.rs

chain/client/src/stateless_validation/chunk_endorsement_tracker.rs

robin-near

What's the significance of the "target" parameter in the tracing analysis? How should we choose the target going forward?

robin-near · 2024-05-13T17:47:46Z

chain/chain/src/chain.rs

@@ -2525,7 +2519,7 @@ impl Chain {
            "get_state_response_part",
            shard_id,
            part_id,
-            %sync_hash)
+            ?sync_hash)


Looks like they are the same

impl fmt::Debug for CryptoHash { fn fmt(&self, fmtr: &mut fmt::Formatter<'_>) -> fmt::Result { fmt::Display::fmt(self, fmtr) } } impl fmt::Display for CryptoHash { fn fmt(&self, fmtr: &mut fmt::Formatter<'_>) -> fmt::Result { self.to_base58_impl(|encoded| fmtr.write_str(encoded)) } }

(Though for hashes in particular, I have a habit of using Debug, because I think the ethereum libraries implement Display by displaying the hash with ellipsis in the middle and it's really annoying. I guess Near code doesn't do that but the habit and fear has already formed :) )

chain/chain/src/chain_update.rs

chain/client/src/client.rs

chain/client/src/stateless_validation/state_witness_producer.rs

nagisa · 2024-05-13T19:45:53Z

What's the significance of the "target" parameter in the tracing analysis?

It aids the ability to filter (out) the (un)interesting parts of the code. I personally think that a single word that's used in too many spans/events is insufficiently precise, but this is something that each of us needs to discover independently, unfortunately.

robin-near · 2024-05-14T19:03:40Z

@nagisa I understand that it aids filtering at the logging level, but this is talking about tracing where we have the ability to take in all traces and then filter/analyze them later. So I wanted to know how @tayfunelmas is using the target tag in his analysis, and if anyone changes the tags in the future, what impact it would have on the tracing analysis tooling.

nagisa · 2024-05-14T20:05:53Z

but this is talking about tracing where we have the ability to take in all traces and then filter/analyze them later.

It isn't actually practical to send everything that's traced off to somewhere in many cases. It is already a constant >500KiB/s of ingest with the few things we trace at the debug level and that's already enough for grafana tempo to start dropping some of the ingest traffic at our fairly conservative default ingest limits. And with an increase of those limits, so would increase our monthly cost (this holds true regardless of where we ship the traces off to, unless they are held locally for immediate inspection, such as with your tool; but even then storage ain't free…)

If we didn't filter our traces at the emitter we'd be looking at potentially dozens of megabytes of traces per second. Components like the compiler or hyper are particularly chatty. So even for traces it is important to have well thought-out targets and levels. Otherwise even just gathering an useful trace becomes a chore (as we recently found out -- enabling host function tracing ended up with traces so truncated that they were largely incomprehensible…)

Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>

…r.rs Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>

tayfunelmas · 2024-05-14T21:28:01Z

What's the significance of the "target" parameter in the tracing analysis? How should we choose the target going forward?

Actually both useful and not. First, I am planning to primarily use the span/log names and the fields. The target field will only help to filter down the logs, even though the data that needs to be processed may be still large, but better than having everything. I am expecting that for certain analysis (eg. chunk/block production), the interesting pieces of the logs will be coming from certain targets such as "client" and "network", this is why I did some changes to the target names for the events that I think are interesting.

…into more-tracing

nagisa · 2024-05-16T09:55:51Z

How should we choose the target going forward?

Our style guide, by the way, has some guidance on that. It reads as such:

Always specify the target explicitly. A good default value to use is the crate name, or the module path (e.g. chain::client) so that events and spans common to a topic can be grouped together. This grouping can later be used for customizing which events to output.

tayfunelmas added 10 commits May 6, 2024 13:44

Add more tracing

0f440e9

Change target=client for some non-client code.

ad2164c

Replace stateless-validation target with client target.

e24c008

Merge branch 'more-tracing' of https://github.com/tayfunelmas/nearcore …

8757492

…into more-tracing

More update

2f92a67

Remove unnecessary msg_type

27ad166

Merge branch 'master' into more-tracing

0f04ed7

Update

a53995e

Remove incorrect message

74cdddd

More message changes

e6c8e82

tayfunelmas requested a review from a team as a code owner May 10, 2024 19:44

tayfunelmas requested review from saketh-are and robin-near and removed request for saketh-are May 10, 2024 19:44

Merge branch 'master' into more-tracing

3f748ab

nagisa reviewed May 13, 2024

View reviewed changes

robin-near approved these changes May 13, 2024

View reviewed changes

tayfunelmas and others added 4 commits May 14, 2024 14:00

Update chain/chain/src/chain.rs

8aad28a

Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>

Update chain/chain/src/chain.rs

a1ab52b

Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>

Update chain/chain/src/chain.rs

9386357

Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>

Update chain/client/src/stateless_validation/chunk_endorsement_tracke…

54c3aec

…r.rs Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>

tayfunelmas added 4 commits May 14, 2024 14:29

Address comments

411c709

Merge branch 'more-tracing' of https://github.com/tayfunelmas/nearcore …

d2602ce

…into more-tracing

Merge branch 'master' into more-tracing

d771bf1

Merge branch 'master' into more-tracing

7a17d74

Merge branch 'master' into more-tracing

53febb4

tayfunelmas enabled auto-merge May 17, 2024 19:32

tayfunelmas added this pull request to the merge queue May 17, 2024

Merged via the queue into near:master with commit 7d5a6c5 May 17, 2024
27 of 29 checks passed

tayfunelmas deleted the more-tracing branch May 17, 2024 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some debug-span and debug-log changes to help with filtering during tracing analysis #11289

Some debug-span and debug-log changes to help with filtering during tracing analysis #11289

tayfunelmas commented May 10, 2024 •

edited

codecov bot commented May 10, 2024 •

edited

nagisa May 13, 2024

robin-near May 13, 2024

tayfunelmas May 14, 2024

robin-near left a comment

robin-near May 13, 2024

nagisa commented May 13, 2024

robin-near commented May 14, 2024

nagisa commented May 14, 2024

tayfunelmas commented May 14, 2024

nagisa commented May 16, 2024 •

edited

Some debug-span and debug-log changes to help with filtering during tracing analysis #11289

Some debug-span and debug-log changes to help with filtering during tracing analysis #11289

Conversation

tayfunelmas commented May 10, 2024 • edited

codecov bot commented May 10, 2024 • edited

Codecov Report

nagisa May 13, 2024

Choose a reason for hiding this comment

robin-near May 13, 2024

Choose a reason for hiding this comment

tayfunelmas May 14, 2024

Choose a reason for hiding this comment

robin-near left a comment

Choose a reason for hiding this comment

robin-near May 13, 2024

Choose a reason for hiding this comment

nagisa commented May 13, 2024

robin-near commented May 14, 2024

nagisa commented May 14, 2024

tayfunelmas commented May 14, 2024

nagisa commented May 16, 2024 • edited

tayfunelmas commented May 10, 2024 •

edited

codecov bot commented May 10, 2024 •

edited

nagisa commented May 16, 2024 •

edited