Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound prefixes not handled #1341

Open
SimonSchmid opened this issue Mar 17, 2020 · 3 comments · May be fixed by #1682
Open

Unbound prefixes not handled #1341

SimonSchmid opened this issue Mar 17, 2020 · 3 comments · May be fixed by #1682

Comments

@SimonSchmid
Copy link

Hello,
I want to report an issue I am having with jsoup. I have not found a similar issue, so I am creating a new one.

I created a toy example that illustrates the issue:

<!doctype html>
<html lang="de">
    <head>

    </head>
    <body>
	<test:h1>UnboundPrefix</test:h1>
	<svg width="180" height="180" xlink:href="UnboundPrefix">
        	<rect x="20" y="20" rx="20" ry="20" width="100" height="100" style="fill:lightgray; stroke:#1c87c9; stroke-width:4;"/>
      	</svg>
    </body>
</html>

This webpage contains two unbound prefixes, one in within a tag and one within an attribute. Jsoup does not handle these according to https://html.spec.whatwg.org/#creating-and-inserting-nodes and https://html.spec.whatwg.org/#coercing-an-html-dom-into-an-infoset. There it says, the first case (tag) should be handled as follows: <test:h1> becomes <testU00003Ah1>. The second case is handled by adding the xlink namespace to the html tag.

Without the unbound prefixes being fixed, I have issues using XPath. It would be nice if jsoup handles such cases.

Regards,
Simon

@SimonSchmid
Copy link
Author

Is this something that will be addressed anytime soon?

@lexamxu
Copy link

lexamxu commented Mar 5, 2021

Hi, we are a student group and we would like to fix this bug. Can't guarantee that we are able to fix it but we would like to have a try.

duanyang25 added a commit to duanyang25/jsoup that referenced this issue Dec 4, 2021
@duanyang25
Copy link

duanyang25 commented Dec 6, 2021

Hi @SimonSchmid. I am an undergraduate student. One of my courses this semester related to Software Engineering requires us to fix issues on Github.

I can understand the first case, but I am confusing with the second case "one within an attribute". May I ask what is the expected output for the second case? Could you explain a little bit about "The second case is handled by adding the xlink namespace to the html tag."? Thank you very much.

The second case that I understand is xlink:href="UnboundPrefix". So you want to access the value UnboundPrefix with the name xlink:href, right?

I am currently working on converting : to Unicode so that Jsoup can give the name containing it for the first case. But I may need more information about the second case.

I now understand what you want for the second case from the link you provided https://html.spec.whatwg.org/#coercing-an-html-dom-into-an-infoset. You may want to search the attribute by the key "xlinkU00003Ahref" rather than "xlink:href". Please take a look at PR #1682.

duanyang25 added a commit to duanyang25/jsoup that referenced this issue Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants