Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document for JRuby users the best practices for thread safety (doc.xpath is not thread-safe on JRuby) #40

Open
kares opened this issue Jan 31, 2017 · 10 comments

Comments

@kares
Copy link

kares commented Jan 31, 2017

somehow, I got the impression that xpath is thread-safe (even under JRuby)

... or at least fixable (with the xpath-support cache) when its just an expr (no binds/fn-handler passed)

but was proven wrong as I added a test :

      def test_concurrent_xpath
        # skip("no need to run under MRI") unless Nokogiri.jruby?
        doc = Nokogiri::XML(File.open(XPATH_FILE))

        threads = []; errors = []
        4.times do |i|
          threads << Thread.start do
            begin
              sleep 0.01 * i
              assert_equal 11, doc.xpath('.//category').size
              assert_equal 1165, doc.xpath('//module').size
              assert_equal 10353, doc.xpath('//module/param').size
              assert_equal 14, doc.xpath('//param[@name="href"]').size
            rescue Exception => ex
              errors << ex
            end
          end
        end
        threads.each(&:join)
        raise errors.first if errors.any?
      end

it either reports incorrect results or in a better case fails with an error such as :

Nokogiri::XML::TestXPath#test_concurrent_xpath:
Java::JavaLang::ArrayIndexOutOfBoundsException: -1
    org.apache.xml.utils.IntStack.push(IntStack.java:86)
    org.apache.xpath.XPathContext.pushCurrentNode(XPathContext.java:793)
    org.apache.xpath.axes.PredicatedNodeTest.acceptNode(PredicatedNodeTest.java:468)
    org.apache.xpath.axes.DescendantIterator.nextNode(DescendantIterator.java:222)
    org.apache.xpath.axes.NodeSequence.nextNode(NodeSequence.java:335)
    org.apache.xpath.axes.NodeSequence.runTo(NodeSequence.java:494)
    org.apache.xml.dtm.ref.DTMNodeList.<init>(DTMNodeList.java:81)
    org.apache.xpath.objects.XNodeSet.nodelist(XNodeSet.java:346)
    nokogiri.XmlXpathContext.tryGetNodeSet(XmlXpathContext.java:190)
    nokogiri.XmlXpathContext.node_set(XmlXpathContext.java:163)
    nokogiri.XmlXpathContext.evaluate(XmlXpathContext.java:130)

this is caused by re-using the same cached xpath object between threads.
on the other hand, maybe there never was an official guarantee regarding thread-safety.

@flavorjones
Copy link
Member

@kares has submitted a fix for this in sparklemotion/nokogiri#1597

@flavorjones
Copy link
Member

Ah, my mistake, sparklemotion/nokogiri#1597 does not contain a fix for this. Ignore my previous comment.

flavorjones referenced this issue in sparklemotion/nokogiri Feb 10, 2017
This test was submitted in #1596 by @kares.
@flavorjones
Copy link
Member

I've pushed a branch, 1596_kares_xpath-not-threadsafe, with this failing test in it.

@kares
Copy link
Author

kares commented Feb 10, 2017

@flavorjones thanks, was more interested in your input on this (if you have any or know anyone who does).

somehow I assumed its meant to be (that doc.xpath is "officially") thread-safe, but of course with the caching done as is I was proven wrong (doing sparklemotion/nokogiri#1597 I assumed that its only the binding cases that are problematic but its the whole underlying cached internal XPath structure). was thinking about introducing a system property to switch to a thread-local cache as it would make sense in some scenarios (but probably not to have it on by default) ?

basically current status is: doc.xpath not thread-safe due the caching, without the caching its (terribly slow but) thread-safe. so the switch might have several states:

  • -Dnokogiri.xpath.cache=true (default)
  • -Dnokogiri.xpath.cache=thread
  • -Dnokogiri.xpath.cache=false

@flavorjones
Copy link
Member

@jvshahid Any thoughts on this?

flavorjones referenced this issue in sparklemotion/nokogiri Jan 22, 2019
This test was submitted in #1596 by @kares.
@flavorjones
Copy link
Member

flavorjones commented Jan 22, 2019

I've rebased the branch 1596_kares_xpath-not-threadsafe just to try to keep this issue up to date. Still not sure what the right thing to do is.

@jvshahid
Copy link
Member

was thinking about introducing a system property to switch to a thread-local cache as it would make sense in some scenarios (but probably not to have it on by default) ?

Why not ? I don't the idea of introducing more flags that the user has to discover and configure properly. I think Nokogiri should DTRT and have a thread local cache.

@jvshahid
Copy link
Member

I researched this a little bit more. The underlying libraries aren't thread safe. Java DOM's specification doesn't require implementation and Xerces certainly is indeed not thread safe according to their FAQ 1. I think the right thing to do is to document this fact somewhere on the docs site, i.e. best practices for concurrent usage of Nokogiri.

@flavorjones flavorjones changed the title doc.xpath is not thread-safe (JRuby) Document for JRuby users the best practices for thread safety (doc.xpath is not thread-safe on JRuby) Jan 11, 2020
@flavorjones
Copy link
Member

Updated title to indicate the plan is to document thread-safety limitations, and am deleting the branch with the failing test.

@kares
Copy link
Author

kares commented Jan 15, 2020

I don't the idea of introducing more flags that the user has to discover and configure properly. I think Nokogiri should DTRT and have a thread local cache.

👍 should be the default despite it having a negative performance impact,
some 'power' users might want to change to a non-thread-safe cache option or maybe not ...

@flavorjones flavorjones transferred this issue from sparklemotion/nokogiri Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants