Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: allow &[u8] to be recorded as event/span field #2954

Merged
merged 4 commits into from May 3, 2024

Conversation

mladedav
Copy link
Contributor

Motivation

Users may want to pass data to Subscribes which is not valid UTF-8. Currently, it would have to be encoded into &str to be passed as a field value.

Solution

This branch adds a record_byte_slice method to Visit. It has an implementation falling back to record_debug so it is not be a breaking change.

JsonVisitor got an overridden implementation as it should not use Debug output and encode it as a string, but rather store the bytes as an array.

PrettyVisitor go an overridden implementation to make the output more pretty.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Overall, this looks good, but I had a few small suggestions.

@@ -224,6 +224,11 @@ pub trait Visit {
self.record_debug(field, &value)
}

/// Visit a byte slice.
fn record_byte_slice(&mut self, field: &Field, value: &[u8]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: i might just call this record_bytes, personally? but it's fine either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I went with your suggestion.

@@ -224,6 +224,11 @@ pub trait Visit {
self.record_debug(field, &value)
}

/// Visit a byte slice.
fn record_byte_slice(&mut self, field: &Field, value: &[u8]) {
self.record_debug(field, &value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be nice to have the default record_debug call format the bytes in hexadecimal rather than in decimal, since code that records a byte slice is probably trying to record binary data rather than a list of integers?

Copy link
Contributor Author

@mladedav mladedav Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to writing out the hex representation in packed way, e.g. a7e93f. You didn't mean just changing decimal to hexadecimal as in [a7, e9, 3f], right?

There is also a question of whether to include quotes (I guess not, record_debug will most likely add them) or leading 0x (I don't think it's used with longer sequences of bytes but not sure).

tracing-subscriber/src/fmt/format/pretty.rs Outdated Show resolved Hide resolved
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback on how we format byte slices by default --- let me know what you think?

Comment on lines 295 to 299
for byte in self.0 {
f.write_fmt(format_args!("{byte:02x}"))?;
}

Ok(())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit on the fence as to whether we want the default format for a byte slice to be just the raw bytes without delimiters, or whether we ought to include delimiters by default.

Consider a byte array like [0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64]. This format will currently output the string:

68656c6c6f20776f726c64

This has the advantage of being compact, but it's kind of hard to pick out the individual bytes visually. On the other hand, if we changed the format to use debug_list(), like this:

Suggested change
for byte in self.0 {
f.write_fmt(format_args!("{byte:02x}"))?;
}
Ok(())
// Unfortunately, we must use a struct with a custom `Debug`
// impl here, because we can't return `format_args!` from
// the `map` closure, as it creates a reference to the scope
// that the macro is invoked in.
struct HexByte(u8);
impl fmt::Debug for HexByte {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{:02x}", self.0)
}
}
f.debug_list()
.entries(self.0.iter().copied().map(HexByte))
.finish()

we could print out something much closer to the original Rust array syntax:

[68, 65, 6c, 6c, 6f, 20, 77, 6f, 72, 6c, 64]

This, IMO, makes it much easier to see the value of an individual byte position in the sequence. On the other hand, it has the disadvantage of being substantially longer --- there are two additional characters per byte (the comma and the space), doubling the length of the output. That could be a problem for long slices.

And, if the Subscriber implementation formats record_debug fields with {:#}, the slice will be printed out vertically, like this:

[
    68,
    65,
    6c,
    6c,
    6f,
    20,
    77,
    6f,
    72,
    6c,
    64,
]

which I think is kind of awful to see in logs.

As a compromise between not delimiting the bytes at all and using two characters per byte, we could also just output a space between each byte, without a comma or the array brackets, like this:

Suggested change
for byte in self.0 {
f.write_fmt(format_args!("{byte:02x}"))?;
}
Ok(())
let mut bytes = self.0.iter();
if let Some(byte) = bytes.next() {
write!(f, "{byte:02x}")?;
for byte in bytes {
write!(f, " {byte:02x}")?;
}
}
Ok(())

That would produce output like this:

68 65 6c 6c 6f 20 77 6f 72 6c 64

IMO, this is probably the best default, as it makes it easier to visually inspect the byte at the _n_th index, but it only adds one character per byte, so the output isn't quite as long, and it doesn't behave horribly if the subscriber chooses to use :#? to format the debug value. On the other hand, it doesn't look as close to the Rust array literal syntax, and in general, we try to prefer Debug output that looks as close to Rust syntax as possible. But, in this case, I don't think that's as important as concerns like length. So, this is the option I would probably pick, but, I'm open to being convinced otherwise --- what do you think?

Here's a playground with all the different options, for reference: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e8c8623d74c80ef8bcc5d57c2370915f

Copy link
Contributor Author

@mladedav mladedav Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The :#? issue can be solved in both cases by constructing the whole thing manually so then it would be just a question of whether tot also add the braces and commas or not.

I actually think the version with braces is better because of slices of lengths 0 and 1. The entry seems to be completely missing for the empty slice and could be easily mistaken for a number when the length is 1.

// b"" -- alt is the same
hexbytes1 = 
hexbytes2 = []
hexbytes3 = 
// b" "
hexbytes1 = 20
hexbytes2 = [20]
hexbytes3 = 20

-- fmt alt mode --

hexbytes1 (alt) = 20
hexbytes2 (alt) = [
    20,
]
hexbytes3 (alt) = 20

That however just argues for the inclusion of braces but we could still not use commas to save the space.

There's also still a risk to misrepresent the data because someone could not realize what I just used in the example isn't 20 but 0x20 but I agree adding the 0x everywhere would be too wasteful. And since we're already not precise about things like this, I don't think the commas are really needed.

So I would go with this (this is what I pushed here now):

[68 65 6c 6c 6f 20 77 6f 72 6c 64]

For completeness, there's also the option to use commas instead of spaces to have it shorter but closer to a correct syntax (which it still isn't because of the 0x prefix) but I feel that also sacrifices a bit of readability as it's more cramped.

[68,65,6c,6c,6f,20,77,6f,72,6c,64]

Comment on lines 491 to 494
fn record_byte_slice(&mut self, field: &Field, value: &[u8]) {
self.values
.insert(field.name(), serde_json::Value::from(value));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the JSON format is intended to be machine-readable, I think we should continue to record byte slices as a JSON array of numeric values, rather than as strings. That way, a tool that deserializes the JSON output will still see a slice of bytes and can decide how to format those bytes on its own, or perform more processing of the byte values.

On the other hand, I can imagine that users who aren't doing much processing of JSON logs might want to be able to just get a hex string in their JSON. Perhaps we should make this a toggleable option for the JSON formatter? I'm fine with adding that in a separate PR, if you prefer, but for now, I think a JSON array ought to be the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put back the implementation using arrays for now. I'll make separate PR later for the option.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, thank you!

`record_byte_slice` renamed to `record_bytes`, default implementation provides a type that prints hex representation of the bytes in its debug impl, removed both overriding implementations (json and pretty) because hex is good enough for both
@hawkw hawkw enabled auto-merge (squash) May 3, 2024 17:06
@hawkw
Copy link
Member

hawkw commented May 3, 2024

okay, i've enabled auto merge so this should merge once CI passes — let's merge master to pick up the Clippy fixes?

@hawkw hawkw merged commit 690a9a6 into tokio-rs:master May 3, 2024
55 checks passed
@mladedav mladedav deleted the dm/visit-slice branch May 3, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants