Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.absUrl does not work with telephone numbers. #1610

Closed
FlameDuck opened this issue Aug 13, 2021 · 1 comment
Closed

Node.absUrl does not work with telephone numbers. #1610

FlameDuck opened this issue Aug 13, 2021 · 1 comment
Milestone

Comments

@FlameDuck
Copy link

A Node that contains a telephone number will return a blank string, when parsed with absUrl.

As far as I can tell this is a violation of the contract, which states:

"An absolute URL if one could be made, or an empty string (not null) if the attribute was missing or could not be made successfully into a URL."

Here is a small jUnit test that reproduces the problem, that is it fails, when it should pass.

@Test
void test() {
    Jsoup.parse("<!DOCTYPE html>\n" +
                    "<html>\n" +
                    "<head>\n" +
                    "    <title>Not Empty</title>\n" +
                    "</head>\n" +
                    "<body>\n" +
                    "<a href=\"/example.html\">Example Hyperlink</a>\n" +
                    "<a href=\"mailto:example@eample.com\">Example Mailto Link</a>\n" +
                    "<a href=\"tel:202-456-1414 \">Example Telephone Link</a>\n" +
                    "</body>\n" +
                    "</html>\n", "http://localhost/")
            .select("a[href]")
            .stream().map(node -> node.absUrl("href"))
            .forEach(absUrl -> assertFalse(absUrl.isBlank()));
}
@jhy jhy closed this as completed in 50ff710 Aug 14, 2021
@jhy jhy added this to the 1.14.2 milestone Aug 14, 2021
@jhy
Copy link
Owner

jhy commented Aug 14, 2021

Thanks, fixed. It was failing when constructing it as a URL, because Java will throw a MalformedUrlException if there is not a defined URL stream handler for it.

So the change is to catch those, and if the URL starts with a syntactically valid URI scheme, pass it as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants