Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add valuable support for Uuids #583

Open
QnnOkabayashi opened this issue Feb 5, 2022 · 9 comments
Open

Add valuable support for Uuids #583

QnnOkabayashi opened this issue Feb 5, 2022 · 9 comments

Comments

@QnnOkabayashi
Copy link
Contributor

Motivation
The Tokio team is releasing a new crate, valuable, which is intended to be used for passing in custom types as fields in logs. Since Uuids are commonly used in applications that also use tracing like Kanidm, we should implement the Valuable trait behind a feature gate.

Solution
Derive the Valuable trait.

#[cfg_attr(feature = "valuable", derive(Valuable))]
struct Uuid(Bytes);

Alternatives

  1. Continue to record Uuids as strings.
  2. Create a wrapper type around the Uuid and rely on using as_bytes in the trait implementation. This would be incredibly annoying though, because it would mean applications would have to use:
let uuid = Uuid::new_v4();
tracing::info!(my_id = MyWrapper(uuid).as_value(), "the uuid");

Hopefully tracing will add a sigil to reduce verbosity with as_value() similar to how ? and % are used for Debug and Display formatting, but avoiding a wrapper type would be ideal.

Is it blocking?
No

Anything else?
The valuable crate is still in the very early stages, and could possibly have breaking changes. However, I think the very minimal API that this crate would depend on (just deriving the trait) is unlikely to change.

@KodrAus
Copy link
Member

KodrAus commented Feb 10, 2022

I've been expecting this for a while, so we've already got a bit of a plan in place 🙂 We've got zerocopy as an example of bringing in unstable dependencies in the stable library. So we can do the same for uuid.

As for representation, I think we'd probably be most useful implementing Valuable manually with a text-based format (tokio-rs/valuable#43 (comment)), rather than deriving it and getting a list of raw bytes. The reason being users in that logging context can recognize a stringified UUID as being a UUID, but 16 bytes doesn't give any clues. What I'm not sure of is whether we can/should implement as_value differently to visit so that you get bytes from one and a string from the other:

impl valuable::Valuable for Uuid {
    // If you ask for a value you get our byte representation
    fn as_value(&self) -> valuable::Value {
        valuable::Valuable::as_value(&self.0)
    }

    // If you visit you get a hyphenated string
    fn visit(&self, v: &mut dyn valuable::Visit) {
        v.visit_value(valuable::Value::Displayable(self.hyphenated()))
    }
}

My gut feel on that though is "no that's a bad idea".

@QnnOkabayashi
Copy link
Contributor Author

I think it would be better to visit Uuids as sequences of bytes, since the string representation is already possible in tracing because any Debug type can be passed in as dyn Debug. However, there's no way to get the raw bytes. Visiting the bytes would allow crates like tracing-forest to easily read in Uuids and pass them back to the application if desired without any extra overhead of formatting and parsing.

In terms of 16 bytes not representing the semantic meaning of a Uuid: values passed to subscribers have an associated &'static str name. If a subscriber is expecting a Uuid, then the crate can specify which name to use. Otherwise, it can get passed to the dyn Debug visitor.

@KodrAus
Copy link
Member

KodrAus commented Feb 14, 2022

So for the tracing-forest use-case what you'd like to do is receive a value, determine that it's a Uuid by its associated name, and then use something like Uuid::from_slice on the result of visit_primitive_slice to capture that value until it's ready to be re-emitted? Relying on stringly-typed APIs to detect that you're looking at a UUID and converting it one way or another seems a little dubious to me, but it could be ok if that's the direction valuable recommends.

I think part of the problem here may be that valuable implies tracing as a primary use-case, which in turn implies a preference for human-readable representations of your data, since you want it to be understood on the other end. If we were just considering valuable support independently of the diagnostics usecase then using the raw 16-byte representation would be perfectly suitable. If tracing-forest would need to match on Uuid's associated name either way, then my preference would still be for using a text-based representation for UUIDs. That maximizes interoperability with tooling at the other end without forcing stringly-typed matches on all users, and the parser is not expensive. If you already need to potentially allocate maps and strings to cache values then parsing UUIDs isn't likely to have much extra impact.

@QnnOkabayashi
Copy link
Contributor Author

I agree that stringly-typed APIs are kinda sus. However, the official blog post specifically uses an example to inspect an HTTP header type with very specific fields. Furthermore, valuable allows for structs to have named fields, so we could make the Value representation have the bytes as a named field for further checking. In terms of ergonomics, valuable is intended to offer you a zero cost view into a type, and stringifying defeats this purpose: it should be up to the Subscriber both a) if it even wants to format the Uuid and b) which format to use.

Another detail to consider is that providing the bytes for a Valuable implementation forces nothing new on tracing users. It's just as easy to pass in the Uuid as a dyn Debug, which already does exactly what you're proposing and is stable.

@KodrAus
Copy link
Member

KodrAus commented Feb 14, 2022

Hmm, when would you expect an end-user of tracing to capture a Uuid using its Valuable implementation over its Debug? I was under the impression that tracing may choose to replace their entire value API with valuable in the future so if that’s the case then users won’t really have a choice to make between a human-readable Debug and machine-readable Valuable. Today I also wouldn’t expect anybody to make that choice anyway if the default compiles.

I think what’s missing from this discussion that’s going to keep us going in circles at the moment is more input on potential usecases for a valuable implementation. It’s a brand new API so we don’t have access to a lot of that, but I’m just a bit reluctant to commit to a direction without a bigger picture to know what we gain or lose by opting for one representation over another.

@QnnOkabayashi
Copy link
Contributor Author

Using tracing:

An idea I had for tracing-forest was to allow for different contexts, so that disjoint concurrent tasks can map to entirely different contexts with their own log trees, without having to enumerate each contexts at compile time. This could use Uuids within the Subscriber to map a log to its context, without ever emitting the ids of each context to the logs. If there were many logs, this would require many unseen and unnecessary formats.

Outside of tracing

To address your "going in circles" point: I think valuable fills a very similar role to serde, and will show up in similar places. Like valuable, serde also has an abstract introspection layer. This layer relies on a Serializer to visit and translate it into a tangible format. On the other hand, valuable is just the first half of this, an abstract introspection layer. Thus, the entire responsibility of the Serializer is expected of the user of valuable::Values. Because of this, I claim that valuable is not meant to serve as the end-of-the-line processing unit; it's an intermediate tool that is followed by more processing, just like how Serializers do more work on the data once it's recorded. This applies to tracing as well.

Uuid's Serialize implementation even makes the distinction that it is up to the Serializer how the Uuid should be serialized. The current Serialize implementation looks like this:

if serializer.is_human_readable() {
    serializer
        .serialize_str(&self.to_hyphenated().encode_lower(&mut [0; 36]))
} else {
    serializer.serialize_bytes(self.as_bytes())
}

Making the Valuable trait return a string instead of the raw bytes would be analogous to implementing Serialize like this (I didn't verify this compiles, but the idea is here):

// analogous to what's exposed by the `Valuable` impl
let value = self.to_hyphenated().encode_lower(&mut [0; 36]);

// analogous to what the downstream `valuable` user has to write
if serializer.is_human_readable() {
    serializer
        .serialize_str(value)
} else {
    serializer.serialize_bytes(Uuid::parse_str(value).unwrap().as_bytes())
}

This example seems dumb, because all these decisions are already made known to us (we can call Serializer::is_human_readable). However, valuable basically gives us this introspection before any decisions are made, meaning that we should be as unassuming as possible, and therefore do the least amount of work possible.

Also, since this would be an unstable feature, wouldn't it still be okay to change the implementation down the line before it stabilizes if we decide we want to? If this is the case that I think we should try it this way and we can get feedback on it first.

@KodrAus
Copy link
Member

KodrAus commented Feb 22, 2022

Yeh, I can see the value of the role valuable fills, and I think it's a great one. The issue we've got is that its introspection tools are great for compound values like maps, lists, and enums, but it doesn't give us much for primitive values like UUIDs, IP addresses, and such to describe their content in a way that can be recognized by a person and inspected programmatically. The Value::Path variant seems to suggest a path where these types end up called out specially. I wouldn't want to suggest there should be a Value::Uuid variant though. As much as I'd like to treat valuable as an independent tool for what it is, I don't think we can really take a direction that penalizes the primary diagnostics use-case that tracing suggests.

I was thinking we could just copy the approach used for IP addresses in UUID, but I see we haven't got them yet in valuable. I think this is something we'll need some input from the valuable maintainers on, because we might be one of the first libraries looking closely at how to implement it. I'll go and open an issue over in valuable and cc you.

@KodrAus
Copy link
Member

KodrAus commented Feb 22, 2022

I've opened tokio-rs/valuable#87 to get some input from the valuable maintainers.

@KodrAus
Copy link
Member

KodrAus commented May 4, 2022

Coming back around to this, it looks like we don't have a clear best path forward, so as much as I'd like to keep the default implementation of valuable useful for UUIDs in diagnostics I think the most direct representation of the data (that is, as bytes) is a reasonable starting point.

We can follow the same approach as zerocopy and add an unstable valuable feature to uuid with an implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants