tracing: allow `&[u8]` to be recorded as event/span field #2954

mladedav · 2024-04-25T22:45:16Z

Motivation

Users may want to pass data to Subscribes which is not valid UTF-8. Currently, it would have to be encoded into &str to be passed as a field value.

Solution

This branch adds a record_byte_slice method to Visit. It has an implementation falling back to record_debug so it is not be a breaking change.

JsonVisitor got an overridden implementation as it should not use Debug output and encode it as a string, but rather store the bytes as an array.

PrettyVisitor go an overridden implementation to make the output more pretty.

hawkw

Thanks for the PR! Overall, this looks good, but I had a few small suggestions.

hawkw · 2024-04-25T23:10:30Z

tracing-core/src/field.rs

@@ -224,6 +224,11 @@ pub trait Visit {
        self.record_debug(field, &value)
    }

+    /// Visit a byte slice.
+    fn record_byte_slice(&mut self, field: &Field, value: &[u8]) {


nit, take it or leave it: i might just call this record_bytes, personally? but it's fine either way.

Sure, I went with your suggestion.

hawkw · 2024-04-25T23:11:53Z

tracing-core/src/field.rs

@@ -224,6 +224,11 @@ pub trait Visit {
        self.record_debug(field, &value)
    }

+    /// Visit a byte slice.
+    fn record_byte_slice(&mut self, field: &Field, value: &[u8]) {
+        self.record_debug(field, &value)


it might be nice to have the default record_debug call format the bytes in hexadecimal rather than in decimal, since code that records a byte slice is probably trying to record binary data rather than a list of integers?

I changed it to writing out the hex representation in packed way, e.g. a7e93f. You didn't mean just changing decimal to hexadecimal as in [a7, e9, 3f], right?

There is also a question of whether to include quotes (I guess not, record_debug will most likely add them) or leading 0x (I don't think it's used with longer sequences of bytes but not sure).

tracing-subscriber/src/fmt/format/pretty.rs

hawkw

Some more feedback on how we format byte slices by default --- let me know what you think?

hawkw · 2024-04-26T17:15:53Z

tracing-core/src/field.rs

+        for byte in self.0 {
+            f.write_fmt(format_args!("{byte:02x}"))?;
+        }
+
+        Ok(())


I'm a little bit on the fence as to whether we want the default format for a byte slice to be just the raw bytes without delimiters, or whether we ought to include delimiters by default.

Consider a byte array like [0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64]. This format will currently output the string:

68656c6c6f20776f726c64

This has the advantage of being compact, but it's kind of hard to pick out the individual bytes visually. On the other hand, if we changed the format to use debug_list(), like this:

Suggested change

for byte in self.0 {

f.write_fmt(format_args!("{byte:02x}"))?;

}

Ok(())

// Unfortunately, we must use a struct with a custom `Debug`

// impl here, because we can't return `format_args!` from

// the `map` closure, as it creates a reference to the scope

// that the macro is invoked in.

struct HexByte(u8);

impl fmt::Debug for HexByte {

fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {

write!(f, "{:02x}", self.0)

}

}

f.debug_list()

.entries(self.0.iter().copied().map(HexByte))

.finish()

we could print out something much closer to the original Rust array syntax:

[68, 65, 6c, 6c, 6f, 20, 77, 6f, 72, 6c, 64]

This, IMO, makes it much easier to see the value of an individual byte position in the sequence. On the other hand, it has the disadvantage of being substantially longer --- there are two additional characters per byte (the comma and the space), doubling the length of the output. That could be a problem for long slices.

And, if the Subscriber implementation formats record_debug fields with {:#}, the slice will be printed out vertically, like this:

[ 68, 65, 6c, 6c, 6f, 20, 77, 6f, 72, 6c, 64, ]

which I think is kind of awful to see in logs.

As a compromise between not delimiting the bytes at all and using two characters per byte, we could also just output a space between each byte, without a comma or the array brackets, like this:

Suggested change

for byte in self.0 {

f.write_fmt(format_args!("{byte:02x}"))?;

}

Ok(())

let mut bytes = self.0.iter();

if let Some(byte) = bytes.next() {

write!(f, "{byte:02x}")?;

for byte in bytes {

write!(f, " {byte:02x}")?;

}

}

Ok(())

That would produce output like this:

68 65 6c 6c 6f 20 77 6f 72 6c 64

IMO, this is probably the best default, as it makes it easier to visually inspect the byte at the _n_th index, but it only adds one character per byte, so the output isn't quite as long, and it doesn't behave horribly if the subscriber chooses to use :#? to format the debug value. On the other hand, it doesn't look as close to the Rust array literal syntax, and in general, we try to prefer Debug output that looks as close to Rust syntax as possible. But, in this case, I don't think that's as important as concerns like length. So, this is the option I would probably pick, but, I'm open to being convinced otherwise --- what do you think?

Here's a playground with all the different options, for reference: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e8c8623d74c80ef8bcc5d57c2370915f

The :#? issue can be solved in both cases by constructing the whole thing manually so then it would be just a question of whether tot also add the braces and commas or not.

I actually think the version with braces is better because of slices of lengths 0 and 1. The entry seems to be completely missing for the empty slice and could be easily mistaken for a number when the length is 1.

// b"" -- alt is the same hexbytes1 = hexbytes2 = [] hexbytes3 = // b" " hexbytes1 = 20 hexbytes2 = [20] hexbytes3 = 20 -- fmt alt mode -- hexbytes1 (alt) = 20 hexbytes2 (alt) = [ 20, ] hexbytes3 (alt) = 20

That however just argues for the inclusion of braces but we could still not use commas to save the space.

There's also still a risk to misrepresent the data because someone could not realize what I just used in the example isn't 20 but 0x20 but I agree adding the 0x everywhere would be too wasteful. And since we're already not precise about things like this, I don't think the commas are really needed.

So I would go with this (this is what I pushed here now):

[68 65 6c 6c 6f 20 77 6f 72 6c 64]

For completeness, there's also the option to use commas instead of spaces to have it shorter but closer to a correct syntax (which it still isn't because of the 0x prefix) but I feel that also sacrifices a bit of readability as it's more cramped.

[68,65,6c,6c,6f,20,77,6f,72,6c,64]

hawkw · 2024-04-26T17:19:10Z

tracing-subscriber/src/fmt/format/json.rs

-    fn record_byte_slice(&mut self, field: &Field, value: &[u8]) {
-        self.values
-            .insert(field.name(), serde_json::Value::from(value));
-    }


Since the JSON format is intended to be machine-readable, I think we should continue to record byte slices as a JSON array of numeric values, rather than as strings. That way, a tool that deserializes the JSON output will still see a slice of bytes and can decide how to format those bytes on its own, or perform more processing of the byte values.

On the other hand, I can imagine that users who aren't doing much processing of JSON logs might want to be able to just get a hex string in their JSON. Perhaps we should make this a toggleable option for the JSON formatter? I'm fine with adding that in a separate PR, if you prefer, but for now, I think a JSON array ought to be the default.

I've put back the implementation using arrays for now. I'll make separate PR later for the option.

hawkw

looks good to me, thank you!

`record_byte_slice` renamed to `record_bytes`, default implementation provides a type that prints hex representation of the bytes in its debug impl, removed both overriding implementations (json and pretty) because hex is good enough for both

…json has its own implementation

hawkw · 2024-05-03T17:07:39Z

okay, i've enabled auto merge so this should merge once CI passes — let's merge master to pick up the Clippy fixes?

mladedav requested review from hawkw, carllerche, davidbarsky and a team as code owners April 25, 2024 22:45

hawkw requested changes Apr 25, 2024

View reviewed changes

hawkw reviewed Apr 26, 2024

View reviewed changes

hawkw approved these changes Apr 27, 2024

View reviewed changes

mladedav added 3 commits May 3, 2024 16:18

tracing: allow &[u8] to be recorded as event/span field

0c47131

review fixes

7d935e8

`record_byte_slice` renamed to `record_bytes`, default implementation provides a type that prints hex representation of the bytes in its debug impl, removed both overriding implementations (json and pretty) because hex is good enough for both

review second round - debug impl output changed to [de ad be ef] and …

6ac7c0a

…json has its own implementation

mladedav force-pushed the dm/visit-slice branch from dd26124 to 6ac7c0a Compare May 3, 2024 14:19

Merge branch 'master' into dm/visit-slice

9d60548

hawkw enabled auto-merge (squash) May 3, 2024 17:06

hawkw merged commit 690a9a6 into tokio-rs:master May 3, 2024
55 checks passed

mladedav deleted the dm/visit-slice branch May 3, 2024 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracing: allow `&[u8]` to be recorded as event/span field #2954

tracing: allow `&[u8]` to be recorded as event/span field #2954

mladedav commented Apr 25, 2024

hawkw left a comment

hawkw Apr 25, 2024

mladedav Apr 26, 2024

hawkw Apr 25, 2024

mladedav Apr 26, 2024 •

edited

hawkw left a comment

hawkw Apr 26, 2024

mladedav Apr 26, 2024 •

edited

hawkw Apr 26, 2024

mladedav Apr 26, 2024

hawkw left a comment

hawkw commented May 3, 2024

-        for byte in self.0 {
-            f.write_fmt(format_args!("{byte:02x}"))?;
-        }
-        Ok(())
+        // Unfortunately, we must use a struct with a custom `Debug`
+        // impl here, because we can't return `format_args!` from
+        // the `map` closure, as it creates a reference to the scope
+        // that the macro is invoked in.
+        struct HexByte(u8);
+        impl fmt::Debug for HexByte {
+            fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+                write!(f, "{:02x}", self.0)
+            }
+        }
+        f.debug_list()
+            .entries(self.0.iter().copied().map(HexByte))
+            .finish()

-        for byte in self.0 {
-            f.write_fmt(format_args!("{byte:02x}"))?;
-        }
-        Ok(())
+        let mut bytes = self.0.iter();
+        if let Some(byte) = bytes.next() {
+            write!(f, "{byte:02x}")?;
+            for byte in bytes {
+                write!(f, " {byte:02x}")?;
+            }
+        }
+        Ok(())

tracing: allow &[u8] to be recorded as event/span field #2954

tracing: allow &[u8] to be recorded as event/span field #2954

Conversation

mladedav commented Apr 25, 2024

Motivation

Solution

hawkw left a comment

Choose a reason for hiding this comment

hawkw Apr 25, 2024

Choose a reason for hiding this comment

mladedav Apr 26, 2024

Choose a reason for hiding this comment

hawkw Apr 25, 2024

Choose a reason for hiding this comment

mladedav Apr 26, 2024 • edited

Choose a reason for hiding this comment

hawkw left a comment

Choose a reason for hiding this comment

hawkw Apr 26, 2024

Choose a reason for hiding this comment

mladedav Apr 26, 2024 • edited

Choose a reason for hiding this comment

hawkw Apr 26, 2024

Choose a reason for hiding this comment

mladedav Apr 26, 2024

Choose a reason for hiding this comment

hawkw left a comment

Choose a reason for hiding this comment

hawkw commented May 3, 2024

tracing: allow `&[u8]` to be recorded as event/span field #2954

tracing: allow `&[u8]` to be recorded as event/span field #2954

mladedav Apr 26, 2024 •

edited

mladedav Apr 26, 2024 •

edited