Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there any ways to find NodeId? #376

Closed
yigitkonur opened this issue Apr 15, 2021 · 3 comments
Closed

Are there any ways to find NodeId? #376

yigitkonur opened this issue Apr 15, 2021 · 3 comments

Comments

@yigitkonur
Copy link

yigitkonur commented Apr 15, 2021

While parsing HTML, we also need to find NodeId like scraper.rs library does in Rust:

image

Are there any way to find relevant NodeID info by using goquery?

@yigitkonur yigitkonur changed the title Are there any ways to find NodeId in goquery? Are there any ways to find NodeId? Apr 15, 2021
@yigitkonur
Copy link
Author

Actually, we need to find the position of parsed divs in HTML. It may be a line position or character position or even NodeId that can give us hints about the positions of parsed values.

@mna
Copy link
Member

mna commented Apr 15, 2021

Hello Yiğit,

AFAIK the nodeid is not a "thing" i.e. not a dom property or anything, it looks like it's just an incrementing ID that the Rust library assigns to each node in the tree? It doesn't exist as-is in goquery, but if the actual value is not important, just that it is unique, then if you store a reference of the internal node (the *html.Node corresponding to a goquery.Selection), it gives you a "unique identity" for that node (and you can access all the other relevant tree items from that node, as per the golang.org/x/net/html package's API - https://pkg.go.dev/golang.org/x/net/html#Node).

That being said, you also mention finding the position of elements in html, there's something I implemented a while ago in a branch that could give you what you want (or a good starting point), take a look at this issue and comment: #198 (comment) and the code is here: https://github.com/PuerkitoBio/goquery/blob/wip-selector/utilities.go#L21-L75 . Note that I don't remember how extensively it was tested, so use with care :)

Hope this helps!
Martin

@mna
Copy link
Member

mna commented Apr 21, 2021

Closing as it seems like that addressed your issue, feel free to re-open if there's more to discuss about this.

@mna mna closed this as completed Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants