Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe audit #560

Closed
wants to merge 1 commit into from
Closed

Unsafe audit #560

wants to merge 1 commit into from

Conversation

JoshMcguigan
Copy link
Contributor

Hello,

I just did a quick unsafe audit, directed here from rust-secure-code/safety-dance#51.

I was able to remove the unsafe block from decode_utf8_lossy. Also, I removed the duplicate definition of decode_utf8_lossy by allowing src/form_urlencoded.rs to import it from percent_encoding/lib.rs rather than src/query_encoding.rs.

All other unsafe blocks look appropriate to me, as they ensure we don't waste CPU cycles checking the validity of bytes that we already have confirmed in some way as safe to convert to UTF-8, but I added comments to justify why they are safe.

@SimonSapin
Copy link
Member

Thanks for doing this audit!

It looks like the new decode_utf8_lossy spends more CPU cycles: if the first UTF-8 error is near the end of a large string, almost all of the string will be scanned twice. Once in String::from_utf8 and again from the start in String::from_utf8_lossy.

Also, percent-encoding is a crate that can be used separately, and that has a public API governed by SemVer. decode_utf8_lossy is short enough that I’d prefer duplicating it rather than making it part of the stable API.

idna/src/punycode.rs Outdated Show resolved Hide resolved
@JoshMcguigan JoshMcguigan force-pushed the unsafe-audit branch 3 times, most recently from f332e0c to 592f776 Compare November 3, 2019 12:41
@JoshMcguigan
Copy link
Contributor Author

@SimonSapin Thanks for the review.

It looks like the new decode_utf8_lossy spends more CPU cycles: if the first UTF-8 error is near the end of a large string, almost all of the string will be scanned twice. Once in String::from_utf8 and again from the start in String::from_utf8_lossy.

Good catch, I was thinking only about allocations. I've rolled this back, but added some small comments/tweaks that I think improve readability. Let me know what you think.

Also, percent-encoding is a crate that can be used separately, and that has a public API governed by SemVer. decode_utf8_lossy is short enough that I’d prefer duplicating it rather than making it part of the stable API.

This makes sense. For now, I've rolled back the de-duplication and added a note so if those functions are modified in the future they can be changed in both places. It seems the ideal case would be for this functionality to live in the standard library. I'm thinking the function signature of String::from_utf8_lossy could look like:

impl String {
    fn from_utf8_lossy(bytes: T) -> Cow<str>
        where T: LossyConvertibleBytes;
}

Where LossyConvertibleBytes would be implemented for &[u8], [u8], and Cow<[u8]>. Do you see any issues with this? I know this discussion would ultimately need to happen on the Rust repo, but I just wanted to sanity check it here first.

@SimonSapin
Copy link
Member

Regarding the standard library’s from_utf8_lossy: rather than a new trait, maybe Into<Cow<'a [u8]>> would work? Either way, feel free to pursue this separately but this repo won’t be able to rely on it any time soon :)

@JoshMcguigan
Copy link
Contributor Author

Is there any further work you'd like to see here before merging?

@Shnatsel
Copy link

There seem to be no outstanding concerns on this PR. Is anything still blocking the merge?

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #607) made this pull request unmergeable. Please resolve the merge conflicts.

src/form_urlencoded.rs Outdated Show resolved Hide resolved
@JoshMcguigan JoshMcguigan force-pushed the unsafe-audit branch 3 times, most recently from be2660f to fa1d0ff Compare June 20, 2020 05:18
@JoshMcguigan
Copy link
Contributor Author

I've rebased and updated this based on the latest changes. Let me know if this needs anything else.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably 3b587ea) made this pull request unmergeable. Please resolve the merge conflicts.

@djc
Copy link
Contributor

djc commented Aug 24, 2020

I've applied some minor formatting tweaks to your commit and pushed it as 3b587ea. Thanks for all the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants