Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nokogiri 1.5.0 on libxml2 2.7.8 reading HTML line numbers as "0" #613

Closed
kwhitaker opened this issue Feb 7, 2012 · 7 comments
Closed

Nokogiri 1.5.0 on libxml2 2.7.8 reading HTML line numbers as "0" #613

kwhitaker opened this issue Feb 7, 2012 · 7 comments

Comments

@kwhitaker
Copy link

Attempting to parse HTML files on CentOS5, running ruby 1.9.2, nokogiri 1.5.0, and libxml2 2.7.8.

Parsing a file with syntax like this:

html = Nokogiri::HTML(File.read('index.html'))
html.css("a").each {|href| puts href.line}

results in "0" for every line number. If I instead parse it as xml:

html = Nokogiri::XML(File.read('index.html'))

the line numbers will be displayed correctly. I know there was a previous issue with libxml2 2.7.3, and I also know that CentOS comes with libxml2 2.6.2. However, I've followed the tutorial for installation on the site, and built Nokogiri against libxml2 2.7.8. Here's my nokogiri -v output:

# Nokogiri (1.5.0)
    --- 
    warnings: []

    nokogiri: 1.5.0
    ruby: 
      version: 1.9.2
      platform: x86_64-linux
      description: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
      engine: ruby
    libxml: 
      binding: extension
      compiled: 2.7.8
      loaded: 2.7.8

I do still technically have libxml 2.6.2 installed on the system via yum, but it doesn't look like it's affected the nokogiri build. Is there some other step I should be using?

As an aside, if I must end up using Nokogiri::XML to parse the html, will it work with HTML4 and HTML5 documents, as well as XHTML?

Thanks.

@flavorjones
Copy link
Member

Hello!

Thanks for reporting this. I'm unable to reproduce it with:


# Nokogiri (1.5.0)
    ---
    warnings: []
    nokogiri: 1.5.0
    ruby:
      version: 1.9.2
      platform: x86_64-linux
      description: ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      compiled: 2.7.8
      loaded: 2.7.8

So perhaps this is a problem either specific to your HTML file (can you provide it?) or your version of 1.9.2 (you have p0, I have p290) (can you upgrade it?).

@kwhitaker
Copy link
Author

Thanks for the response! Unfortunately, upgrading our version of Ruby at this time isn't really an option–all of our code has been built against p0, and we won't be upgrading it for a while probably.

Here is the html file:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta name="viewport" content="width=device-width; height=device-height, initial-scale=1.0; maximum-scale=1.0; user-scalable=0;" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Dockers</title>
</head>
<body class="fullpage-vert" onunload="javascript:clearInterval(audioLoop);">
<div id="container">
    <div id="danceHolder">
        <img id="danceVid" src="1-1.jpg" width="320" height="480" alt="" />
    </div>
    <div id="introHolder">
        <img id="introVid" src="0-1.jpg" width="320" height="480" alt="" />
        <div id="ctabg"></div>
        <div id="cta1"></div>
        <div id="cta2"></div>
        <div id="cta3"></div>
        <div id="phone"></div>
        <div id="logo"></div>
    </div>
</div>
</body>
</html>

@flavorjones
Copy link
Member

Well, I didn't mean "upgrade your production servers", I meant "can you try this on your dev machine with a different version of ruby". I'm trying to isolate what the cause could be, and as I mentioned before, we differ on the patchlevel of ruby we're running.

The HTML you included above doesn't appear to match well with the ruby script you included in the original post, since there are no "a" elements in it. That said, if I change the script to search for "div", I see line numbers appropriately, so we're left with either:

a) it has something to do with the version of Ruby you're on, or
b) it is something else that we don't know about yet

Please let me know if you're able to reproduce with a newer version of 1.9.2!

@flavorjones
Copy link
Member

Closing, pending more information from the original reporter.

@jeremy
Copy link
Contributor

jeremy commented Oct 26, 2018

#1658

@flavorjones
Copy link
Member

@jeremy Can you expand a bit on why you're linking that PR to this issue? They're both discussing line numbers, but aren't directly related in a causal or solution-y way.

@jeremy
Copy link
Contributor

jeremy commented Oct 27, 2018

My bad. They naively appeared related in both cause and solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants