Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidUri errors could probably include the invalid bytes #571

Open
hawkw opened this issue Nov 3, 2022 · 8 comments
Open

InvalidUri errors could probably include the invalid bytes #571

hawkw opened this issue Nov 3, 2022 · 8 comments

Comments

@hawkw
Copy link
Contributor

hawkw commented Nov 3, 2022

The InvalidUri error type is returned when trying to parse an input as a URI, if the bytes are not a valid URI. This error type does not contain the value that we were trying to parse, but does indicate how the input was invalid.

Since all methods of parsing a value as a Uri either take a Bytes or convert the input (string or byte slice) into a Bytes, the error type probably could include the input, and format it as part of their fmt::Display/fmt::Debug implementations. Uri::from_shared takes a Bytes by value, so returning it in the error value shouldn't have a significant performance cost --- if we're trying to parse a URI from an &str or &[u8], we will have already copied it in order to convert it into a Bytes, and if the input was itself a Bytes, the reference count has already been incremented in order to call Uri::from_shared.

This would make it much easier for an application to display the invalid input to the user, which is desirable, especially in cases where the invalid URI came from user input.

@LucioFranco
Copy link
Member

Would it make sense in the display to limit the amount of viewable bytes? That said, I think this makes sense to me.

@hawkw
Copy link
Contributor Author

hawkw commented Nov 4, 2022

Would it make sense in the display to limit the amount of viewable bytes? That said, I think this makes sense to me.

Yeah, we probably want to have some kind of "common sense limit" on how much of the URI we print...

@seanmonstar
Copy link
Member

On the one hand, better errors are a great user-centric improvement, I love em! If there's a way to improve it for users, even just some of the time, that'd be wonderful.

On the other hand, here's some concerns we'd need to overcome:

  • Some of the parsing methods take Bytes, which is cheap to clone, or even just own, similar to the TryFrom<String>. But those taking a slice, there'd be a cost to copying into a new vec to store in the error.
  • If it were a Bytes, would storing a clone of it in an error message make the original buffer possibly live way longer than one assumed (considering the Bytes could be a small slice of a bigger one that took a big read from the socket)?
    • Would it be a case for a Bytes method that forced a copy if the diff were big enough?
  • The parse data usually comes from an outsider, would printing that data as part of the error message have any security considerations? I imagine the answer is a tentative yes, as in "people could screw up and print it in a way that an attacker can do bad things", so it wouldn't be our fault, but it could be making it easier for some to be pwned.
  • Likewise from a privacy point of view: would the data possibly have PII that people logging these errors weren't expecting?

@hawkw
Copy link
Contributor Author

hawkw commented Nov 4, 2022

  • Some of the parsing methods take Bytes, which is cheap to clone, or even just own, similar to the TryFrom<String>. But those taking a slice, there'd be a cost to copying into a new vec to store in the error.

Hmm, don't the parsing methods that take a slice convert the slice into a Bytes before trying to parse it, anyway? E.g.

http/src/uri/mod.rs

Lines 713 to 715 in 34a9d6b

fn try_from(t: &'a [u8]) -> Result<Self, Self::Error> {
Uri::from_shared(Bytes::copy_from_slice(t))
}

If I understand the code correctly, it looks like all paths that construct a Uri from a &[u8] or an &str will already copy the bytes into a Bytes before trying to parse, and then the Bytes is dropped if the URI is invalid? Or am I misunderstanding something here?

  • The parse data usually comes from an outsider, would printing that data as part of the error message have any security considerations? I imagine the answer is a tentative yes, as in "people could screw up and print it in a way that an attacker can do bad things", so it wouldn't be our fault, but it could be making it easier for some to be pwned.

  • Likewise from a privacy point of view: would the data possibly have PII that people logging these errors weren't expecting?

This is a valid concern --- perhaps we would want to add a method to the InvalidUri error to access the input bytes, rather than always including it in the Debug implementation? That way, the application can choose whether or not to actually log the URI value. The downside is that this would probably mean that, for hyper users, hyper would need to actually expose the http::uri::InvalidUri as a source and/or downcast target for its error types, or else there would be no way for the application to explicitly choose to display the invalid input...

@hawkw
Copy link
Contributor Author

hawkw commented Nov 4, 2022

To elaborate on the motivation here a bit, the particular use-case I'm thinking of with this feature request is one where the errors are being returned by a hyper server. In that case, there isn't currently a good way for the application to choose whether or not it wants to display the invalid input, because the buffer that the bytes were read into is owned by hyper, and it's hyper's HTTP parser that's calling into http::Uri's parsing functions, rather than the application itself. In that case, the application can't easily access the portion of the buffer that contained the invalid URI, without doing something really unpleasant like wrapping the underlying IO resource to buffer all the bytes, and essentially reimplementing most of the HTTP parser just to find potentially invalid URIs.

For applications where the primary interaction with URI parsing is passing a user input that might be a URI into hyper, it would be much easier for the application to just hang onto the input and choose to log it if hyper returns an invalid URI error. So, I'm mainly concerned about the server use case here.

@seanmonstar
Copy link
Member

don't the parsing methods that take a slice convert the slice into a Bytes before trying to parse it, anyway?

Uh, well, woops. That could be improved...

To elaborate on the motivation here a bit, the particular use-case I'm thinking of with this feature request is one where the errors are being returned by a hyper server.

Making better error messages is already splendid motivation, I didn't mean to seem against the idea. I think we should always help people understand what went wrong. Just wanted to make sure we didn't miss anything while doing so.

@hawkw
Copy link
Contributor Author

hawkw commented Nov 4, 2022

don't the parsing methods that take a slice convert the slice into a Bytes before trying to parse it, anyway?

Uh, well, woops. That could be improved...

Hmm, if we're planning to change those functions to only copy the bytes into a Bytes if the URI is valid, I suppose we could make the input component of the error optional, and only include it when the URI was constructed from a Bytes or an already-owned String etc? But I think that's probably only a good idea if we're going to include the input in the Display output, rather than allowing explicit access to it, since it seems a bit weird for the error to sometimes contain the invalid input and sometimes not contain it, depending on what type the URI was parsed from...

To elaborate on the motivation here a bit, the particular use-case I'm thinking of with this feature request is one where the errors are being returned by a hyper server.

Making better error messages is already splendid motivation, I didn't mean to seem against the idea. I think we should always help people understand what went wrong. Just wanted to make sure we didn't miss anything while doing so.

We're in agreement on that --- you didn't come off as against it at all! I also want to consider all the possible reasons we shouldn't do this. I mainly thought it was worth explaining the motivation in order to see whether there might be other ways to surface invalid inputs in that don't have the same drawbacks as always including it in the error (e.g. some kind of new API in Hyper?)...

@hawkw
Copy link
Contributor Author

hawkw commented Apr 20, 2023

Quick ping on this, I'd be happy to open a PR to add it if we can come to consensus on the constraints!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants