Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some safety docs #720

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add some safety docs #720

wants to merge 2 commits into from

Conversation

Manishearth
Copy link

I was performing a safety review of this crate and found the existing string comments a bit hard to follow. In particular, a lot of them refer to functions outside of the module for their invariants, which feels sketchy, until you realize that most of them are relying on relatively local invariants based on constants.

I added explicit safety comments for all of the unsafe in toml_edit. However I'm unable to figure out why mlb_quotes and mll_quotes are safe. There is no call to bytes there, and it seems to be using a terminated parser.

.map(|b: &[u8]| unsafe { from_utf8_unchecked(b, "`digit` and `_` filter out non-ASCII") })
.parse_next(input)
/// Safety-usable invariant upheld by only using `digit` and `_` in the parser

Check failure

Code scanning / clippy

expected one of ., ;, ?, }, or an operator, found doc comment /// Safety-usable invariant upheld by only using digitand_ in the parser Error

expected one of ., ;, ?, }, or an operator, found doc comment /// Safety-usable invariant upheld by only using digitand_ in the parser
.map(|b: &[u8]| unsafe { from_utf8_unchecked(b, "`digit` and `_` filter out non-ASCII") })
.parse_next(input)
/// Safety-usable invariant upheld by only using `digit` and `_` in the parser

Check failure

Code scanning / clippy

expected one of ., ;, ?, }, or an operator, found doc comment /// Safety-usable invariant upheld by only using digitand_ in the parser Error

expected one of ., ;, ?, }, or an operator, found doc comment /// Safety-usable invariant upheld by only using digitand_ in the parser
Comment on lines +258 to 259
// Safety: `digit` only produces ASCII
.map(|b: &[u8]| unsafe { from_utf8_unchecked(b, "`is_ascii_digit` filters out on-ASCII") })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicating the debug assertion message. Is that a specific special comment syntax you are needing?

The comment is stale from our conversion from is_ascii_digit to DIGIT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epage Yeah so this comment is twofold, for one I wasn't sure what the debug message was referring to, and the other thing is that it is a bit of a norm to justify safety with a Safety: comment. I think the debug message mostly satisfies the need, but I got super confused as an unsafe reviewer so decided to just do it the proper way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand the standard convention but we are double checking the invariants in debug builds and I want those panics to carry over what assumption is being broken, so we have the message already and I don't want to be doubling up on the message. Or put another way, this is an extra level of protection over a simple comment.

@@ -101,6 +102,7 @@ pub(crate) fn is_unquoted_char(c: u8) -> bool {
UNQUOTED_CHAR.contains_token(c)
}

// Safety-usable invariant: UNQUOTED_CHAR is only ASCII ranges
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful to make safety as local as possible; split out what needs to be verified. If some function is depending on some other function's behavior, it's useful to document that on the function (or constant). For two reasons:

  • when verifying, you don't have to chase down invariants and make sense of them each time
  • when modifying code, it's clear what additional properties one must uphold.

I'm following a higher standard of unsafe code commenting, one that I do not necessarily apply during review but try to follow to make the lives of future reviewers easier when properly unsafe commenting a crate.

(The general framing for that higher standard is "unsafe code should be documented such that an unsafe reviewer can easily validate the code whilst maintaining minimal state in their head")

So this isn't 100% necessary, but I felt if I was adding comments I might as well do this anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that this is likely to go stale without a linter that can process the links of safety invariants.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes. Ideally people maintaining unsafe code also keep this up to date, but if not, it'll get caught on future unsafe reviews when updates are pushed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, had these comments been there, my safety review would have been much less work. If these comments are there and incorrect, it would still be less work because it's quite easy to verify incorrectness when it's local.

Comment on lines 221 to 222
// Safety: ???
.map(|b| unsafe { from_utf8_unchecked(b, "`bytes` out non-ASCII") })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t. However I'm unable to figure out why mlb_quotes and mll_quotes are safe. There is no call to bytes there, and it seems to be using a terminated parser.

terminated says to ignore the second parser, only returning the result of the first parser. So in this case, we are effectively calling from_utf8_unchecked("\""). So the invariant we are upholding is that " is ASCII.

@epage
Copy link
Member

epage commented Apr 27, 2024

btw some core assumptions to the parser are

  • While we parse using bytes, the only content that can enter this parser started as &str and so we can assume the data is valid UTF-8
  • At every point, we should only be splitting text on character boundaries

So while the other safety comments are more specific, we have an extra layer of invariants helping out.

@Manishearth
Copy link
Author

  • While we parse using bytes, the only content that can enter this parser started as &str and so we can assume the data is valid UTF-8
  • At every point, we should only be splitting text on character boundaries

Yes, however these assumptions are extremely nonlocal and thus overall not very useful for unsafe review.

@Manishearth
Copy link
Author

Fixed comments about terminated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants