Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Html approver mangles html with boolean attributes #44

Open
markijbema opened this issue Mar 10, 2014 · 9 comments
Open

Html approver mangles html with boolean attributes #44

markijbema opened this issue Mar 10, 2014 · 9 comments

Comments

@markijbema
Copy link
Contributor

the following (valid) html:

<!DOCTYPE html>
<html>
<title>Hoi</title>
<script async defer src=\"http://foo.com/bar.js\"></script>
<h1>yo</h1>

becomes this:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><script>window.FactlinkProxiedUri = "http://www.example.org/foo?bar=baz";</script><script></script>defer
               src="http://localhost:8000/lib/dist/factlink_loader.js?o=proxy"
               onload="__internalFactlinkState('proxyLoaded')"
         &gt;</html>
@kytrinyx
Copy link
Contributor

See also #7

@hotgazpacho
Copy link
Contributor

So, the html posted above is not a valid HTML document. It's an incomplete fragment (no <body> tag set, no enclosing </html>). How should this be treated? Should it become a valid document, or be left as a fragment?

@hotgazpacho
Copy link
Contributor

To clarify, if we tell nokogiri to treat it as a fragment (after telling it to parse as HTML instead of XML), we get the following:

<title>Hoi</title><script async defer src="http://foo.com/bar.js"></script><h1>yo</h1>

However, if we tell nokogiri to treat it as a document, we get this:

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hoi</title>
<script async defer src="http://foo.com/bar.js"></script>
</head>
<body><h1>yo</h1></body>
</html>

Which is correct?

@markijbema
Copy link
Contributor Author

I would say the latter. If we care about the exact bytes/chars, we can still use text. If we care about content/structure of html, the latter will result in more useful diffs (which is very useful in pullrequests)

@markijbema
Copy link
Contributor Author

I simplified the example I think. I'm not sure whether actual validity was an issue, and removing the html opening tag would make it valid html5

@kytrinyx
Copy link
Contributor

kytrinyx commented Apr 4, 2014

It's important to NOT change the output.

  1. A lot of the testing is for legacy things that do not have valid structure, and also sometimes for fragments that get loaded elsewhere as part of a larger page.
  2. If the document is invalid, I don't want my approval to show me valid HTML. The golden master file should contain what the response returned -- the only change should be whitespace.

If I have an HTML fragment, I want to have standardised formatting (so :text is not useful in that case), which is why I introduced nokogiri in the first place (I tried using Tidy, but couldn't figure out how to get it running). This avoids spurious whitespace failures.

@kytrinyx
Copy link
Contributor

kytrinyx commented Apr 4, 2014

(Sorry it took so long to respond. I'm at a conference, and I find it very hard to keep track of issues/pull requests during conferences).

@markijbema
Copy link
Contributor Author

The first half of your comment contradicts the point about the spacing though. Spaces can be very significant in html, and changing it can break your app. Those changes would go undetected, but others wouldn't. I think we agree the only use of a formatter is to change the output. The question is the amount of change it should apply. To me, allowing changing spacing, but not other stuff seems a bit arbitrary.

I don't think it's wise to use the same formatter for html documents, and html fragments. I never assumed this formatter was aso used for html fragments. I'm not sure about the semantics of the XML parsers, but html fragments don't need to be valid xml documents (for instance, they don't need a root).

Regarding invalid documents there are still multiple strategies. Do you think this is an invalid strategie for valid documents? For invalid documents we could (for instance)

  1. raise an exception, and fail the test
  2. output the unformatted html verbatim.

@kytrinyx
Copy link
Contributor

kytrinyx commented Apr 5, 2014

Spaces can be very significant in html, and changing it can break your app.

Ah, see this is where I fall down because I don't know front-end stuff! OK, cool. Thanks so much for clarifying.

I think that either of these strategies could work. @hotgazpacho suggested just outputting a warning. That might be the best thing to do since it's really up to the user what the real thing they're trying to verify is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants