Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass "new" tests #537

Merged
merged 15 commits into from Jan 7, 2020
Merged

Pass "new" tests #537

merged 15 commits into from Jan 7, 2020

Conversation

o0Ignition0o
Copy link
Contributor

@o0Ignition0o o0Ignition0o commented Aug 2, 2019

current status :

test result: ok. 63 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Please note I tried my best to follow the spec, but it sometimes (often?) felt like I was just hacking my way until tests passed.

I know the PR is huge, I would love to split it (and then squash it) if it makes sense.

Mentioning @nox , @SimonSapin and @valenting because they're the contributors I've seen the most in the commit logs :)

Please let me know if there's anything I can do to improve the PR, and thank you for your time!


This change is Reviewable

@o0Ignition0o o0Ignition0o force-pushed the 40_tests_left branch 3 times, most recently from f3fbe89 to 3e4557c Compare August 3, 2019 21:31
@o0Ignition0o o0Ignition0o changed the title WIP: Pass "new" tests Pass "new" tests Aug 3, 2019
@o0Ignition0o o0Ignition0o marked this pull request as ready for review August 3, 2019 22:28
@o0Ignition0o
Copy link
Contributor Author

The goal would be to merge this branch into the tests one, before merging both I suppose.
I can of course also make a pull request directly to master if need be

@o0Ignition0o
Copy link
Contributor Author

Rebased into master, and fixed the merge issues.
I think a lot of us are on PTO right now, please let me know when you're back :)

@nox
Copy link
Contributor

nox commented Aug 19, 2019

The goal would be to merge this branch into the tests one, before merging both I suppose.
I can of course also make a pull request directly to master if need be

I don't understand this part.

@o0Ignition0o
Copy link
Contributor Author

o0Ignition0o commented Aug 19, 2019

I don't understand this part.

My bad, I finally changed the target branch to master. the sentence is not relevant anymore

@nox
Copy link
Contributor

nox commented Aug 19, 2019

Could you maybe squash some of the commits together to make the tryouts commits and whatnot disappear, to help with review?

@o0Ignition0o
Copy link
Contributor Author

o0Ignition0o commented Aug 19, 2019

Sure, would you rather have one huge commit, or several smaller ones ?
I know the PR is pretty big, it's probably a PITA to review, I'm willing to do anything I can to ease the review process, I just don't know how to make it easier to read.

@nox
Copy link
Contributor

nox commented Aug 19, 2019

Small ones are ok as long as they are self-contained, for example the commits for setter fixes are ok, but the ones with commented-out assertions aren't.

@o0Ignition0o o0Ignition0o force-pushed the 40_tests_left branch 4 times, most recently from 03df7ad to 6aa53cc Compare August 19, 2019 13:17
@o0Ignition0o
Copy link
Contributor Author

I've narrowed it down to 5-6 commits, not sure I can do more though :/

@nox
Copy link
Contributor

nox commented Aug 19, 2019

That's great already! Thanks a lot for that.

src/host.rs Outdated
fn from(host: Host<S>) -> HostInternal {
match host {
Host::Domain(ref s) if s.to_string().is_empty() => HostInternal::None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is this doing? This allocates a new string every time. Do we really need the generics here? Couldn't it be From<Host<String>> or From<Host<&'_ str>>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't need the generics here.
The main goal of this change is to make sure HostInternal::None is returned when an empty domain is given.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a list of the failing tests that occur if I don't add this check:

failures:
    "file://hi/x".host = "" 
    "file://hi/x".hostname = "" 
    "file://y/".host = "loc%41lhost" 
    "file://y/".hostname = "loc%41lhost" 

Maybe the fix should occur somewhere else, ie after the host parsing ?

........................................................................................................................................................................thread '"file://hi/x".hostname = "" ' panicked at 'called `Result::unwrap()` on an `Err` value: "!( !host_str.is_empty() ) for URL \"file:///x\""', src/libcore/result.rs.:999:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
..thread '"file://hi/x".host = "" ' panicked at 'called `Result::unwrap()` on an `Err` value: "!( !host_str.is_empty() ) for URL \"file:///x\""', src/libcore/result.rs:999:5
.......F.F...thread '"file://y/".host = "loc%41lhost" ' panicked at 'called `Result::unwrap()` on an `Err` value: "!( !host_str.is_empty() ) for URL \"file:///\""thread '', "file://y/".hostname = "loc%41lhost" src/libcore/result.rs' panicked at ':called `Result::unwrap()` on an `Err` value: "!( !host_str.is_empty() ) for URL \"file:///\""999', :src/libcore/result.rs5:
999:5
..F.F............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the <From<Host<S>>>::from call done in those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I m not really sure this is the question you're asking me, but the conversion I'm trying to work on is host.into()

src/parser.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@nox nox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a complete review yet.

src/parser.rs Outdated Show resolved Hide resolved
src/parser.rs Outdated Show resolved Hide resolved
src/parser.rs Outdated Show resolved Hide resolved
src/parser.rs Outdated Show resolved Hide resolved
src/quirks.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
let mut has_host = true; // FIXME
parser.parse_path_start(scheme_type, &mut has_host, parser::Input::new(path));
if scheme_type.is_file() {
parser::trim_path(&mut parser.serialization, path_start);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the spec for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to dig quite a bit to figure out how to pass the path tests.
All I could find was this discussion.

Here's a list of test that fail when the path isn't trimmed:

running 713 tests
..........................................................................................................................................................thread '"file:///unicorn".pathname = "//\\/" File URLs and (back)slashes' panicked at 'assertion failed: `(left == right)`
  left: `"file://////"`,
 right: `"file:///"`.', tests/data.rs:192:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
thread '"file:///unicorn".pathname = "//monkey/..//" File URLs and (back)slashes' panicked at 'assertion failed: `(left == right)`
  left: `"file://///"`,
 right: `"file:///"`.', tests/data.rs:192:5
..........FF.......thread '"file://monkey/".pathname = "\\\\" File URLs and (back)slashes' panicked at 'assertion failed: `(left == right)`
  left: `"file://monkey//"`,
 right: `"file://monkey/"`', .tests/data.rs:192:5
.F.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
failures:

failures:
    "file:///unicorn".pathname = "//\\/" File URLs and (back)slashes
    "file:///unicorn".pathname = "//monkey/..//" File URLs and (back)slashes
    "file://monkey/".pathname = "\\\\" File URLs and (back)slashes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a spec issue, please link to a spec commit.

If I understand correctly, you trim the path only after it was built and not trimmed? Couldn't it just not be pushing empty segments on repeated slashes in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just found https://bugzilla.mozilla.org/show_bug.cgi?id=1351603, but I still cannot find any spec commit :/
I ll have a look at parse_path_start and try to not push the empty segments

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was the spec bug for the tests: whatwg/url#278

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If url’s scheme is "file" and c is the EOF code point, U+003F (?), or U+0023 (#), 

then while url’s path’s size is greater than 1 and url’s path[0] is the empty string, 

validation error, remove the first item from url’s path.

is probably the culprit ! thanks for the hint @valenting :D

There's probably a way to avoid pushing the empty segments, which I ll work on now that I see the spec :)

}

pub fn trim_tab_and_newlines(original_input: &'i str) -> Self {
let input = original_input.trim_matches(ascii_tab_or_new_line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmh, when trimming those, are they no longer syntax violations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad, they are !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand anymore what this code is supposed to be doing anymore. Why are we not trimming c0_control_or_space anymore? This method looks oddly like with_log now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As defined in the basic url parser spec, removing C0 control or space from input only occurs when If url is not given.

Removing tabs and newlines occurs either way though.

This means we need to use different parser::Input s.

I think It would be more explicit if with_log / trim_tab_and_newlines mentionned that, ie
parser::Input::with_base_url / parser::Input::no_base_url or something, or having a factory that would decide on the best parser to build depending on having a base_url.

Setters would use the former, and the default parser would use the latter.
What do you think ?

src/parser.rs Outdated Show resolved Hide resolved
src/parser.rs Outdated Show resolved Hide resolved
src/parser.rs Outdated
return;
}
// If url’s scheme is "file", path’s size is 1, and path[0] is a normalized Windows drive letter, then return.
let segments: Vec<&str> = self.serialization[path_start..]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems bad to me that we allocate a vector all the time here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, I'll try to figure something else out :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it ! I'm pushing it as a fixup, will do a rebase once we're both safisfied with most of the fixes

Copy link
Collaborator

@valenting valenting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Let's wait for #566 to be closed and we can merge this.

Reviewed 2 of 3 files at r9.
Reviewable status: all files reviewed, 1 unresolved discussion

@valenting valenting dismissed nox’s stale review January 7, 2020 11:01

Nox doesn't have time for the review. I've done it instead.

@o0Ignition0o
Copy link
Contributor Author

@valenting Just rebased ! 🤞

@valenting valenting merged commit 9cd6467 into servo:master Jan 7, 2020
@o0Ignition0o o0Ignition0o deleted the 40_tests_left branch January 7, 2020 14:01
valenting added a commit to valenting/rust-url that referenced this pull request Jan 8, 2020
Issue servo#537 fixed a large number of small bugs that affected spec
compliance. We should publish a new crate version with these
changes.
valenting added a commit to valenting/rust-url that referenced this pull request Jan 8, 2020
PR servo#537 fixed a large number of issues which affected compliance
with the URL spec. We should release a new crate version with
these changes.
@valenting valenting mentioned this pull request Jan 8, 2020
bors-servo pushed a commit that referenced this pull request Jan 8, 2020
Update version to 2.1.1

PR #537 fixed a large number of issues which affected compliance
with the URL spec. We should release a new crate version with
these changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants