Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid html produced from our default HTML Pipeline #941

Open
timdiggins opened this issue Mar 9, 2022 · 3 comments
Open

Invalid html produced from our default HTML Pipeline #941

timdiggins opened this issue Mar 9, 2022 · 3 comments

Comments

@timdiggins
Copy link
Collaborator

Looking at #940 , I realised that while the markdown content:

**http://www.example.com**

will generate something anodyne like:

<!DOCTYPE html>
<title>whatever</title>
<body>
<strong><a href="http://www.example.com"></strong>
</body>

which is fully valid.

But with onebox enabled , then the following markdown:

**https://github.com/thredded/thredded/commit/7276f52e17e2bc80b1fe49e8894bf35c4e2e724a**

will result in a complex structure like:

<!DOCTYPE html>
<title>whatever</title>
<body>
  </strong><strong>
    <header class="source">...clipped...</header>
    <article class="onebox-body">...clipped...</article>
    <div class="onebox-metadata"></div>
    <div style="clear: both"></div>
  </strong>
</body>

This is invalid html5 because flow elements can't be contained inside a phrasing element like <strong>(https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-strong-element). A oneboxed result will I think always generate flow elements not purely phrasing elements.

Browsers will handle this, but shouldn't we be somehow fixing this? -- maybe not oneboxing if it's a phrasing element? or dropping the phrasing? Sounds complicated either way.

Thoughts @glebm ?

@glebm
Copy link
Collaborator

glebm commented Mar 9, 2022

We have extract_block_from_paragraph! which works around a similar issue for paragraphs. (

def extract_block_from_paragraph!(element)
)

Perhaps we can add something similar for bolding/italics/underscore?

@timdiggins
Copy link
Collaborator Author

Do you think maybe we shouldn't attempt to perform onebox if it's inside a bold / italic/ underscore.

That might mean adapting the following line:

next unless url.present? && url == element.content && on_its_own_line?(element)

and replace on_its_own_line? with a new method allows_flow_content? (which can just look at what tag the parent element is).

Sound plausible?

@glebm
Copy link
Collaborator

glebm commented Mar 10, 2022

Yes, that makes sense. It should still be allowed within paragraphs though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants