Skip to content
This repository has been archived by the owner on Aug 26, 2023. It is now read-only.

JRuby support #24

Closed
voondo opened this issue May 21, 2015 · 17 comments
Closed

JRuby support #24

voondo opened this issue May 21, 2015 · 17 comments

Comments

@voondo
Copy link

voondo commented May 21, 2015

It would be great if we could support JRuby

@voondo voondo changed the title JRuby JRuby support May 21, 2015
@musaffa
Copy link

musaffa commented May 31, 2015

+1 for JRuby support.

@headius
Copy link

headius commented Jan 27, 2017

+1 for JRuby support two years later :-D

@headius
Copy link

headius commented Nov 14, 2017

+1 again...could it just be FFI based? The ext is super small.

@headius
Copy link

headius commented Nov 14, 2017

link rgrove/sanitize#166

@stevecheckoway
Copy link
Collaborator

@headius Forgive my ignorance, but the extension pulls in the entire gumbo parser. How would this work with JRuby and FFI?

@rubys
Copy link
Owner

rubys commented Nov 14, 2017

Count me as ignorant too :-)

@headius
Copy link

headius commented Nov 14, 2017

FFI would provide the bindings to the library, so the only remaining part would be getting the library alone to build when using JRuby. I assume that should be a straightforward make process.

@headius
Copy link

headius commented Feb 23, 2018

Circling back once against. I'm working with Discourse folks to get JRuby working, and this is one of the stumbling blocks. https://meta.discourse.org/t/getting-discourse-running-on-jruby/81273/5

A good example of how to both build a native library AND use FFI to bind it is in the sassc gem: https://github.com/sass/sassc-ruby

@maxpospischil
Copy link

+1 for JRuby support

@stevecheckoway
Copy link
Collaborator

It's not entirely clear to me how this would work without substantial rewriting. As I understand it, https://github.com/ffi/ffi is a standard foreign-function interface. It lets you call functions defined in dynamic libraries.

Nokogumbo does get compiled to a dynamic library, but it links to libruby. E.g., on my Mac with Ruby installed via MacPorts, I have

$ otool -L lib/nokogumbo/nokogumbo.bundle
lib/nokogumbo/nokogumbo.bundle:
	/opt/local/lib/libruby.2.4.dylib (compatibility version 2.4.0, current version 2.4.4)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.50.4)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

And although it doesn't show up in the list of libraries it's linking it, it contains a bunch of symbols from libxml2 that it expects Nokogiri to provide.

$ nm -u lib/nokogumbo/nokogumbo.bundle |grep -Ei 'xml|html'
_Nokogiri_wrap_xml_document
_cNokogiriXmlSyntaxError
_htmlNewDocNoDtD
_xmlAddChild
_xmlCreateIntSubset
_xmlDocSetRootElement
_xmlNewCDataBlock
_xmlNewDocComment
_xmlNewDocNode
_xmlNewDocText
_xmlNewProp

Of course, the JRuby version of Nokogiri doesn't include libxml2 and indeed wouldn't have any of those symbols because it uses some Java library for XML. That's no problem because Nokogumbo can be compiled such that it uses libruby to interact with Nokogiri. But that's still using libruby.

The only way I see this working would be to compile gumbo-parser as a dynamic library (which is easy to do), use FFI to call functions in the library and reimplement nokogumbo.c in Ruby.

Right now, the approach we use while parsing is to create node structures in C (from Gumbo) and then convert them into different node structures using libxml2. And once the whole tree is built, wrap that in a Ruby object.

Compiling without libxml2 is worse. It creates the C structures (from Gumbo) and then creates Ruby objects (Nokogiri::XML::Nodes) for each one, each of which creates the underlying libxml2 structure.

Going the FFI route would be worse still. It'd create the C structures (from Gumbo) and for each of those, FFI would create a Ruby object, those would need to be converted to Nokogiri::XML::Nodes which would create the underlying libxml2 structure.

I'm not a fan of this approach so that means we'd need two separate implementations. I'm also not a huge fan of keeping the two in sync, but maybe it wouldn't be too bad?

The other possible approach would be to write some Java to implement the tree construction. One advantage would be its ability to interface directly with the Nokogiri classes which are implemented in Java for Jruby.

@rubys
Copy link
Owner

rubys commented Sep 1, 2018

Charles Nutter has offered to help: https://twitter.com/headius/status/1035744876955086848

@dometto
Copy link

dometto commented Sep 4, 2019

@rubys @headius do you still have plans to work on this?

@jeremyhaile
Copy link

Any updates? Would love to use this in jruby!

@headius
Copy link

headius commented Jan 13, 2020

To address @stevecheckoway's question... we would use FFI to do everything that the extension does, but call out to gumbo directly as a C library. You are right, though, the integration with Nokogiri would be problematic.

An alternative might be to write a JRuby extension using our Java FFI (https://github.com/jnr/jnr-ffi) to call out to gumbo and integrate directly with the Java-based Nokogiri.

I'm not sure about the best path forward right now. Calling gumbo wouldn't be hard, but integrating with Nokogiri is trickier.

@headius
Copy link

headius commented Jan 13, 2020

Note there are projects out there like ffi_gen that might help us generate FFI bindings for Gumbo, if that's the path that makes sense.

@oshanz
Copy link

oshanz commented Apr 18, 2020

+1 for JRuby support

@flavorjones
Copy link
Collaborator

Superseded by sparklemotion/nokogiri#2227

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants