-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix build #98
Conversation
I think this might a a docker compose issue, no clue why it randomly stalls. We never see this happen when running this gem in production. Any ideas are welcome. |
Yesterday, I was doing some tests and now consumer behaviour have changed. If it helps, I modified wait_for_assignment function to raise an exception:
I also was looking for Kafka+docker issues but I had no lucky. |
Spent some more time on this, not getting anywhere. Tempted to get rid of the compose setup and run Kafka/Zookeeper locally again. |
Symptoms locally are that the connection to the broker is lost:
On Travis a bunch of other things break, but trying to get the build green locally first. |
Tried some bisection, but this is tricky since older librkafka versions don't work anymore because of the timespec linking issue. @edenhill do you have any suggestions? |
What's the problem? The resolution errors for localhost? |
Tried running this with wider FD limits and same result. The test suite runs fine for a while, and runs fine if you run individual tests. If you run the whole set it is unable to keep a connection after a while. |
As part of some work to look into the build failures (see karafka#98) I wanted an easy way to run an automated bisect against different librdkafka commits. This task is designed to be used during development and the default task should still be used when installing the gem.
After a bit of poking I discovered that removing the AutoPointer from Config#native_kafka gets the tests running consistently on 1.2.2 for me locally. |
@gaffneyc Nice find! That suggests there's something wrong with the termination of the client. Going to poke around in this direction too. |
@gaffneyc I'm trying using the handle without the destroy, test suite is still losing connection about half way through. Did you make any other changes? |
Still completely stuck on this issue, I have no leads at the moment. Behaviour I'm seeing:
If anybody has any suggestions on in which direction to prod further I'd be very happy to hear them. @edenhill any new thoughts on this maybe? |
@thijsc No, sorry, I spent a couple hours on it one night and wasn't able to get it working. My theory was that using AutoPointer was causing issues because when objects were collected was non-deterministic and most (all?) need to be freed before the handle can be destroyed otherwise it blocks. To test the theory I ended up starting a new client from scratch and had enough luck with it that I just... kept... going. I recently open sourced it here: https://github.com/deadmanssnitch/kafka/ and we're using it in production as of this week. |
Hm not super happy about that honestly, think we don't need further fragmentation and would rather welcome your contributions here. |
I'm sorry 😞 it wasn't my intention to upset you or fragment the community. It started as a proof of concept and I felt that it diverged enough from rdkafka that it made sense to be its own thing. |
You are of course within your rights to do whatever you want. It just seems to me that we're better off with one wrapper gem if there is no real divergence on the approach. Shall we discuss moving these two projects back into one via another channel? In the meantime I tried removing all destroy calls to see if that fixes the consumer spec, it doesn't. So I think incorrect cleanup causing this is a red herring. |
Feel free to email me at my github username at gmail. Two thoughts to quickly decide if it is AutoPointer causing issues. When a test fails run the GC then check ObjectSpace to see how many AutoPointers or other Rdkafka objects are still in memory. Second, I believe when calling |
I actually just removed all autopointers and destroy calls, thus introducing a bunch of memory leaks. Same symptoms. |
@thijsc I took a pass at removing AutoPointer from TopicPartitionList (see #107) as there are some cases where librdkafka can return a pointer to a TPL (see #subscription for example) and the application is now required to call I was able to get the tests all passing locally with the change. |
All passing except for one segfault where the Consumer or Producer's |
@gaffneyc I had a similar segfault in a forked process when the autopointer releaser call to I think I made some progress in fixing the build failure by switching the docker images in #106. All the jobs failed with the same two spec failures. Update: I wonder if
|
Build was fixed in #108 |
As part of some work to look into the build failures (see karafka#98) I wanted an easy way to run an automated bisect against different librdkafka commits. This task is designed to be used during development and the default task should still be used when installing the gem.
Attempt to fix the build. Also upgrades to librdkafka 1.2.2.
All individual tests are green based on some random sampling. It just loses connection to the broker after a while. Re-running failing specs immediately after produced a green result.
Not sure why this is breaking exactly.