Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ruby rtc element is incorrectly processed. #1294

Closed
rhdunn opened this issue Jan 7, 2020 · 2 comments
Closed

The ruby rtc element is incorrectly processed. #1294

rhdunn opened this issue Jan 7, 2020 · 2 comments
Labels
Milestone

Comments

@rhdunn
Copy link

rhdunn commented Jan 7, 2020

Given the markup in the example from https://www.w3.org/TR/2001/REC-ruby-20010531/#complex:

<ruby>
  <rbc>
    <rb>10</rb>
    <rb>31</rb>
    <rb>2002</rb>
  </rbc>
  <rtc>
    <rt>Month</rt>
    <rt>Day</rt>
    <rt>Year</rt>
  </rtc>
  <rtc>
    <rt rbspan="3">Expiration Date</rt>
  </rtc>
</ruby>

the jsoup parser treats the rtc element as an unknown element that gets closed immediately. This causes it to serialize in xml mode as:

<rtc></rtc><rt>Month</rt><rt>Day</rt><rt>Year</rt>

I have checked the behaviour of Firefox and Chrome, and they preserve the rtc element structure, e.g.:

<rtc><rt>Month</rt><rt>Day</rt><rt>Year</rt></rtc>

The rtc element is supported in the W3C HTML spec [1], but not the WHATWG spec. Also, even though the rbc element is not listed in either of those (only in the Ruby Annotations specification), the jsoup parser preserves the rbc element structure.

[1] https://www.w3.org/TR/2014/REC-html5-20141028/text-level-semantics.html#the-rtc-element

Tricker-z added a commit to Tricker-z/jsoup that referenced this issue May 3, 2020
Tricker-z added a commit to Tricker-z/jsoup that referenced this issue May 3, 2020
WinstonHuTiger added a commit to InanisV/jsoup that referenced this issue May 8, 2020
final merge for the progress report
wudiiv11 added a commit to InanisV/jsoup that referenced this issue May 8, 2020
@jhy jhy closed this as completed in 220a3b2 Feb 20, 2023
@jhy jhy added this to the 1.16.1 milestone Feb 20, 2023
@jhy jhy added the fixed label Feb 20, 2023
@jhy
Copy link
Owner

jhy commented Feb 20, 2023

Thanks -- fixed. I brought the implementation up to the current spec as defined by WHATWG (scroll to 'A start tag whose tag name is one of: "rb", "rtc"' and the following "rp" / "rt" section).

Note that in jsoup (and other browsers), tags which aren't explicitly defined are still supported, they get default treatment. The bug described above was not because "rtc" was not known, it was because there was explicit handling for it -- and the spec changed since implemented.

I'd appreciate it if interested users can test and review, and raise any issues found.

@jhy
Copy link
Owner

jhy commented Feb 20, 2023

Also, I checked that Tag defines the tags ruby, rp, rt, which are the only elements defined in the element spec - https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-ruby-element

Those are all marked as inline (phrasing). Other tags (rtc, rc) are not explicitly defined and so get treated as block. If those are used commonly in the wild, we could add those to Tag as inline to help formatting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants