Support JRuby and fix travis failures #108

Adithya-copart · 2020-02-04T02:06:08Z

Re-opening #106 here to eliminate the noise due to numerous debugging commits.
I find it easier to trigger travis builds than running the specs locally.

The changes in this PR also include @gaffneyc's changes in #107:

Workarounds for the lack of read_int64, read(:pointer), read_string_to_null in JRuby.
Avoid using FFI::AutoPointer objects inside Rdkafka::Consumer and explicitly calling rdkafka_destroy during close.
Avoid calls to rd_kafka_consumer_poll after consumer is closed.
Use RbConfig::CONFIG to check the host OS which is JRuby friendly.
Switch to confluentinc/cp-kafka and confluentinc/cp-zookeeper licensed with a Apache 2.0 license.
Add JRuby to the list of rubies in .travis.yml.
Remove the usage of AutoPointer as it'll lead to closing the socket connections when a child process created using Kernel#fork exits which could lead to segmentation faults.

Findings:

E, [2020-01-29T06:17:58.952506 #9416] ERROR -- : rdkafka: [thrd:GroupCoordinator]: 1/1 brokers are down in specs is trivial as per confluentinc/confluent-kafka-dotnet#1129 (comment)

I'll gladly rebase this PR if you'd wish to merge #107 first.

Free ptr objects in ensure blocks and add workaround for lack of read(:pointer), read_string_to_null and read_int64 ffi methods in JRuby Add JRuby 9.2.9.0 to travis WIP: Debugging. Modify error_spec and skip the spec using fork in JRuby. Use RbConfig to check for MacOS. WIP: Free @native_kafka in the consumer/producer during close Manually free the AutoPointer using ruby instead of a C call Switch to confluence docker images ProducerSpec: Closing the IO in forked process Avoiding AutoPointer instances being copied into forked process Alter the consumer spec to test for the message produced Remove consumer commit after assertions and increase the handle wait for JRuby for reason still unknown Rely on GC to call the finalizer to close @native_kafka socket Revert "Rely on GC to call the finalizer to close @native_kafka socket" This reverts commit 38ea6d6. Skip AutoPointer for consumer class Remove rd_kafka_consumer_close as rd_kafka_destroy takes care of it for consumers Adding rd_kafka_consumer_close and ensure it is only called once Skip fork spec Revert "Skip fork spec" This reverts commit d4cfb58. Remove autorelease and free pointers accessed using read_pointer

When converting a Rdkafka::TopicPartitionList to a native type, Rdkafka was depending on AutoPointer to eventually free them. There are a couple cases where a Client handle returns a pointer to the application that the application is then required to free, which could be leading to test instability. This takes a pass at ensuring all uses of native TopicPartitionList instances (even those return) are deterministically freed. Ensure poll returns nil after a consumer is closed Add the read_array_of_uint64 workaround for JRuby

…stroy will result in a segmentation fault but not #free

gaffneyc · 2020-02-04T19:08:02Z

Just as a note, it looks like jruby is working on updating their version of FFI for the new APIs.

…4_t. Use AutoPointer#free in producer instead of rd_kafka_destroy to close @native_kafka instances if consumer#close is not called.

Adithya-copart · 2020-02-05T05:32:47Z

Even though this build is green, there is a segmentation fault in the forked process.

The AutoPointer objects result in a call to rdkafka_destory after GC in both the parent and forked process. The second call to rdkafka_destroy results in a segmentation fault as the object is already free'd.

thijsc · 2020-02-05T09:34:37Z

Looking very good, thanks for your hard work on this! Guess we should get rid of the last usage of autopointer.

Adithya-copart · 2020-02-05T16:45:15Z

Debug level shows this before ERROR -- : rdkafka: [thrd:GroupCoordinator]: 1/1 brokers are down:

D, [2020-02-05T10:33:49.874983 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839": delete consume_test_topic [1]
D, [2020-02-05T10:33:49.874996 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839": waiting for 1 toppar(s), 2 unassignment(s), 0 commit(s) (state up, join-state wait-unassign) before terminating
D, [2020-02-05T10:33:49.875009 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839" received op REPLY:FETCH_STOP in state up (join state wait-unassign, v5) for consume_test_topic [1]
D, [2020-02-05T10:33:49.875022 #97694] DEBUG -- : rdkafka: [thrd:main]: Unassign not done yet (1 wait_unassign, 1 assigned, 0 wait commit, join state wait-unassign): FETCH_STOP done
D, [2020-02-05T10:33:49.875043 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839" received op PARTITION_LEAVE in state up (join state wait-unassign, v5) for consume_test_topic [2]
D, [2020-02-05T10:33:49.875058 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839": delete consume_test_topic [2]
D, [2020-02-05T10:33:49.875070 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839": waiting for 0 toppar(s), 1 unassignment(s), 0 commit(s) (state up, join-state wait-unassign) before terminating
D, [2020-02-05T10:33:49.875083 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839" received op REPLY:FETCH_STOP in state up (join state wait-unassign, v5) for consume_test_topic [2]
D, [2020-02-05T10:33:49.875095 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839": unassign done in state up (join state wait-unassign): without new assignment: FETCH_STOP done
D, [2020-02-05T10:33:49.875108 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839": leave (in state up)
D, [2020-02-05T10:33:49.875133 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839" changed join state wait-unassign -> init (v5, state up)
D, [2020-02-05T10:33:49.875169 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839": waiting for 0 toppar(s), 0 unassignment(s), 0 commit(s), wait-leave, (state up, join-state init) before terminating
D, [2020-02-05T10:33:49.875186 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839": waiting for 0 toppar(s), 0 unassignment(s), 0 commit(s), wait-leave, (state up, join-state init) before terminating
D, [2020-02-05T10:33:49.878075 #97694] DEBUG -- : rdkafka: [thrd:main]: LeaveGroup response received in state up
D, [2020-02-05T10:33:49.878115 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839" changed state up -> term (v5, join-state init)
D, [2020-02-05T10:33:49.878147 #97694] DEBUG -- : rdkafka: [thrd:main]: Group "ruby-test-178839" broker localhost:9092/1 is no longer coordinator
E, [2020-02-05T10:33:49.878195 #97694] ERROR -- : rdkafka: [thrd:GroupCoordinator]: 1/1 brokers are down

@thijsc Looks like this is similar to confluentinc/confluent-kafka-dotnet#1129 and the reason is this.

Adithya-copart · 2020-02-05T17:44:20Z

Looks like travis encountered a segmentation fault while doing a cleanup after running the specs even after getting rid of AutoPointer. Not sure which object is triggering this Segmentation fault.

We might need debugging enabled in Travis as the specs work locally for me against 2.6.3.

JRuby delivery callback spec failure is intermittent in travis and always passes in local.
I will bump the timeout and see if it'll be consistent.

Adithya-copart · 2020-02-16T19:21:06Z

@thijsc 6d7b472 ensures that all socket connections are closed which fixed the travis segmentation fault.

Twelve socket connections were previously open by the time Kernel#fork is called in the specs. So, these connections are shared between the parent and forked process and could be responsible for the segmentation fault in travis.

I modified the spec in 3e5331a to make sure the producer is instantiated in the child process. I think sharing the sockets between processes should not be encouraged. It's easier to create a new producer in a forked process.

I changed the behavior of Consumer#close and Producer#close to make sure rd_kafka_destory is only called once. This added the side-effect of returning true when the socket is actually closed and nil in subsequent calls. Is it acceptable to change the return value?

…inish processing

…calls the delivery callback

thijsc · 2020-02-18T09:54:09Z

Thanks so much for your hard work on this!

rearden-steel · 2020-03-02T14:18:51Z

Seems that you have not updated lib/rdkafka/version.rb to reflect version change.

Adithya-copart · 2020-03-02T15:58:39Z

@rearden-steel I'm assuming that it'll be done by the maintainers when publishing a new release.

Adithya Pentela and others added 3 commits February 3, 2020 19:22

Call AutoPointer#free while closing the producer. Calling rd_kafka_de…

3f1b655

…stroy will result in a segmentation fault but not #free

Adithya-copart changed the title ~~Jruby~~ Support JRuby and fix travis failures Feb 4, 2020

Adithya-copart requested a review from thijsc February 4, 2020 04:35

Using ternary operator to make code climate happy

db9c75e

Use read_array_of_int64 as rd_kafka_query_watermark_offsets uses int6…

caf3f02

…4_t. Use AutoPointer#free in producer instead of rd_kafka_destroy to close @native_kafka instances if consumer#close is not called.

Remove AutoPointer references

f9732b6

Bump the timeout to see if it resolve travis failure for JRuby

914faf0

Adithya Pentela added 3 commits February 13, 2020 22:01

Added the missing watermarks_test_topic. Added exit! in fork spec

0dd1a0d

Avoid sharing the socket in forked process

3e5331a

Ensure sockets are closed in specs

6d7b472

Adithya Pentela added 5 commits February 16, 2020 13:21

Trigger travis

a93f720

Add a sleep to let the thread that uses Binding#DeliveryCallback to f…

c4d4d30

…inish processing

Trigger travis

1a49c50

Trigger travis

2e27b7f

Using Producer#close to eliminate the race with producer thread that …

8e945fc

…calls the delivery callback

Adithya-copart mentioned this pull request Feb 18, 2020

Bump librdkafka to 1.3.0 #109

Closed

thijsc merged commit 740d723 into karafka:master Feb 18, 2020

This was referenced Feb 18, 2020

fix travis builds #94

Closed

Fix build #98

Closed

Adithya-copart mentioned this pull request Feb 18, 2020

Abort trap: 6 Assertion failed: (r == 0), function rwlock_wrlock, file tinycthread_extra.c #95

Closed

Adithya-copart mentioned this pull request Jul 7, 2020

Don't enable SimpleCov in CI #125

Merged

adi-pen mentioned this pull request Sep 18, 2023

Re-add jRuby build #222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support JRuby and fix travis failures #108

Support JRuby and fix travis failures #108

Adithya-copart commented Feb 4, 2020 •

edited

gaffneyc commented Feb 4, 2020 •

edited

Adithya-copart commented Feb 5, 2020

thijsc commented Feb 5, 2020

Adithya-copart commented Feb 5, 2020

Adithya-copart commented Feb 5, 2020

Adithya-copart commented Feb 16, 2020 •

edited

thijsc commented Feb 18, 2020

rearden-steel commented Mar 2, 2020

Adithya-copart commented Mar 2, 2020

Support JRuby and fix travis failures #108

Support JRuby and fix travis failures #108

Conversation

Adithya-copart commented Feb 4, 2020 • edited

gaffneyc commented Feb 4, 2020 • edited

Adithya-copart commented Feb 5, 2020

thijsc commented Feb 5, 2020

Adithya-copart commented Feb 5, 2020

Adithya-copart commented Feb 5, 2020

Adithya-copart commented Feb 16, 2020 • edited

thijsc commented Feb 18, 2020

rearden-steel commented Mar 2, 2020

Adithya-copart commented Mar 2, 2020

Adithya-copart commented Feb 4, 2020 •

edited

gaffneyc commented Feb 4, 2020 •

edited

Adithya-copart commented Feb 16, 2020 •

edited