Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexOutOfBoundsException with nested anchors #1608

Closed
TheLQ opened this issue Aug 8, 2021 · 3 comments
Closed

IndexOutOfBoundsException with nested anchors #1608

TheLQ opened this issue Aug 8, 2021 · 3 comments
Labels
duplicate This is a duplicate issue or root-cause of another issue

Comments

@TheLQ
Copy link

TheLQ commented Aug 8, 2021

The following HTML is illegal but is from a 3rd party site I do not control.

<a>
    <b>
        <div>
            <a>test</a>
        </div>
    </b>
</a>

Fails with

java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
        at java.base/java.util.ArrayList.rangeCheckForAdd(ArrayList.java:756)
        at java.base/java.util.ArrayList.add(ArrayList.java:481)
        at org.jsoup.parser.HtmlTreeBuilder.pushWithBookmark(HtmlTreeBuilder.java:640)
        at org.jsoup.parser.HtmlTreeBuilderState$7.inBodyEndTagAdoption(HtmlTreeBuilderState.java:879)
        at org.jsoup.parser.HtmlTreeBuilderState$7.inBodyEndTag(HtmlTreeBuilderState.java:731)
        at org.jsoup.parser.HtmlTreeBuilderState$7.process(HtmlTreeBuilderState.java:288)
        at org.jsoup.parser.HtmlTreeBuilder.process(HtmlTreeBuilder.java:149)
        at org.jsoup.parser.TreeBuilder.processEndTag(TreeBuilder.java:108)
        at org.jsoup.parser.HtmlTreeBuilderState$7.inBodyStartTag(HtmlTreeBuilderState.java:307)
        at org.jsoup.parser.HtmlTreeBuilderState$7.process(HtmlTreeBuilderState.java:286)
        at org.jsoup.parser.HtmlTreeBuilder.process(HtmlTreeBuilder.java:149)
        at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:76)
        at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:51)
        at org.jsoup.parser.Parser.parseInput(Parser.java:49)
        at org.jsoup.helper.DataUtil.parseInputStream(DataUtil.java:191)
        at org.jsoup.helper.DataUtil.load(DataUtil.java:72)
        at org.jsoup.Jsoup.parse(Jsoup.java:135)

If you remove the <b> or <div> it works fine. In the original page the first anchor and bold wasn't closed, so spilled over into the message content div, which contained a link.

Tested against 1.14.1

@jhy jhy closed this as completed in 04735f9 Aug 11, 2021
@jhy jhy added the duplicate This is a duplicate issue or root-cause of another issue label Aug 11, 2021
@jhy
Copy link
Owner

jhy commented Aug 11, 2021

Thanks! This is the same as we found in #1576 (caught by the fuzz) and is fixed in mainline, and will be release in 1.14.2.

@TheLQ
Copy link
Author

TheLQ commented Aug 12, 2021

Thanks for being proactive with a fuzzer. It's rare that projects do that.

Not sure how I missed that ticket, I really did search first... sorry about that.

@jhy
Copy link
Owner

jhy commented Aug 12, 2021

Thanks, and no worries at all. Plus, we got a cleaner testcase out of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This is a duplicate issue or root-cause of another issue
Projects
None yet
Development

No branches or pull requests

2 participants