Skip to content

Client segfaults when threads are used #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
criccomini opened this issue Nov 18, 2016 · 12 comments
Closed

Client segfaults when threads are used #78

criccomini opened this issue Nov 18, 2016 · 12 comments

Comments

@criccomini
Copy link

As part of #40, we started noticing that our tests were failing with a segfault in the Travis CI container, but only for Python 3.4.

We were unable to reproduce this locally on our laptops, but with Docker, and using a Travis CI container, we can. Here are the steps to trigger a segfault:

# Start Docker
docker run -it quay.io/travisci/travis-python /bin/bash

Once you're in your Docker container, run these commands:

sudo add-apt-repository ppa:fkrull/deadsnakes
sudo apt-get update
sudo apt-get install python3.4
pip install pip==7.1.0
pip install virtualenv==13.1.2
virtualenv -p /usr/bin/python3.4 /test
source /test/bin/activate
cd ~
git clone --depth=50 https://github.com/confluentinc/confluent-kafka-python.git
cd confluent-kafka-python/
git fetch origin pull/40/head:40
git checkout 40
export LD_LIBRARY_PATH=$PWD/tmp-build/lib
bash tools/bootstrap-librdkafka.sh v0.9.2 tmp-build
pip install pytest-timeout
pip install -v --global-option=build_ext --global-option="-Itmp-build/include/:/opt/python/3.4.2/include/python3.4m" --global-option="-Ltmp-build/lib" . .[avro]
git checkout 8148a79026d7324c34a24a03ba9043ebe2f931fd
py.test -v --timeout 20 --ignore=tmp-build --import-mode append

This will show a segfault before the tests finish.

Furthermore, I'm able to trigger this on master without any of our changes. After the above commands, run:

git checkout master

Then apply this diff:

diff --git a/tests/test_threads.py b/tests/test_threads.py
index 09f6467..e4f76e8 100644
--- a/tests/test_threads.py
+++ b/tests/test_threads.py
@@ -64,6 +64,35 @@ def test_thread_safety():
 
     print('Done')
 
+import time
+class myThread (threading.Thread):
+    def __init__(self, threadID, name, counter):
+        threading.Thread.__init__(self)
+        self.threadID = threadID
+        self.name = name
+        self.counter = counter
+    def run(self):
+        print("Starting " + self.name)
+        print_time(self.name, self.counter, 5)
+        print("Exiting " + self.name)
+
+def print_time(threadName, delay, counter):
+    while counter:
+        if exitFlag:
+            threadName.exit()
+        time.sleep(delay)
+        print("%s: %s" % (threadName, time.ctime(time.time())))
+        counter -= 1
+
+def test_foo():
+    # Create new threads
+    thread1 = myThread(1, "Thread-1", 1)
+    thread2 = myThread(2, "Thread-2", 2)
+
+    # Start new Threads
+    thread1.start()
+    thread2.start()
+

It's just some example of Python threading that I found on Stack Overflow. After the diff is applied, run the same py.test command again. It will again segfault.

@criccomini
Copy link
Author

Note had to install Python 3.4 on Ubuntu Precise via third party package because the Travis Precise image comes with Python 3.2, and that was giving me trouble with pip/virtualenv.

@criccomini
Copy link
Author

Also, 3.4 is what's used in the Travis CI in #40, where we first saw the segfaults. A proper way to replicate the exact environment would be to use Travis' CLI and build plugin to generate the real build.sh command that's run in the actual environment, but this seemed good enough to replicate the seg faults.

@criccomini
Copy link
Author

Here's the output of master with the above thread patch applied:

# py.test -v --timeout 20 --ignore=tmp-build --import-mode append -s
=========================================================================================================================================================================== test session starts ============================================================================================================================================================================
platform linux -- Python 3.4.5, pytest-3.0.4, py-1.4.31, pluggy-0.4.0 -- /test/bin/python3.4
cachedir: .cache
rootdir: /root/confluent-kafka-python, inifile: tox.ini
plugins: timeout-1.2.0
timeout: 20.0s method: signal
collected 19 items 

tests/test_Consumer.py::test_basic_api OK: poll() timeout
PASSED
tests/test_Consumer.py::test_on_commit SKIPPED
tests/test_Consumer.py::test_subclassing PASSED
tests/test_KafkaError.py::test_error_cb %3|1479486831.870|FAIL|rdkafka#producer-3| 127.0.0.1:1/bootstrap: Connect to ipv4#127.0.0.1:1 failed: Connection refused
error_cb KafkaError{code=_TRANSPORT,val=-195,str="127.0.0.1:1/bootstrap: Connect to ipv4#127.0.0.1:1 failed: Connection refused"}
error_cb KafkaError{code=_ALL_BROKERS_DOWN,val=-187,str="1/1 brokers are down"}
PASSED
tests/test_KafkaError.py::test_subclassing PASSED
tests/test_Producer.py::test_basic_api delivery <class 'str'>
PASSED
tests/test_Producer.py::test_subclassing PASSED
tests/test_TopicPartition.py::test_sort PASSED
tests/test_TopicPartition.py::test_cmp PASSED
tests/test_TopicPartition.py::test_hash PASSED
tests/test_TopicPartition.py::test_subclassing PASSED
tests/test_docs.py::test_verify_docs PASSED
tests/test_enums.py::test_enums -168
27
PASSED
tests/test_enums.py::test_tstype_enums PASSED
tests/test_misc.py::test_version Using confluent_kafka module version 0.9.2 (0x90100)
Using librdkafka version 0.9.1.2-128-g7018d2 (0x902ff)
PASSED
tests/test_misc.py::test_error_cb %3|1479486833.213|FAIL|rdkafka#consumer-7| localhost:65531/bootstrap: Connect to ipv4#127.0.0.1:65531 failed: Connection refused
%3|1479486834.226|FAIL|rdkafka#consumer-7| localhost:65531/bootstrap: Connect to ipv4#127.0.0.1:65531 failed: Connection refused
PASSED
tests/test_misc.py::test_stats_cb PASSED
tests/test_threads.py::test_thread_safety 1 Flush took 1.000
2 Flush took 1.001
1 Intentional callback crash: ok
1 Done
4 Flush took 2.002
2 Flush took 1.001
2 Done
3 Flush took 2.002
4 Intentional callback crash: ok
4 Done
3 Flush took 1.000
3 Done
Done
PASSED
tests/test_threads.py::test_foo Fatal Python error: Couldn't create autoTLSkey mapping
Segmentation fault

@criccomini
Copy link
Author

This discussion makes me pretty nervous. The producer callback seems fine, but I noticed some callbacks in the consumer as well. I haven't dug into this much, but I did notice that I have to comment out EVERY instance of Consumer() and Producer() in the tests in order to get the segfault to go away.

If I remove test_Consumer.py, test_Producer.py, and test_threads.py(), the tests on #40 pass. If I add just a single call to Consumer() or Producer() the tests segfault.

@criccomini
Copy link
Author

Attaching a diff off of master since #40 has had a force push applied to it that lost the checksum I mentioned. Not super relevant, since you can reproduce this off of master with the thread diff above, but just for historical purposes.

broken-pr-40.diff.zip

@edenhill
Copy link
Contributor

The provided test_threads.py patch doesn't really work (undefined variables, func parameter order error), can you provide an example that works(but segfaults)?

@criccomini
Copy link
Author

Issues with that part of the code are irrelevant. It seg faults before it even gets there.

I have double checked. Follow the instructions at the top of the issue, but just use master (instead of PR 40). Put the thread diff in /tmp/diff.diff and apply it (with bad params or whatever):

patch -p1 </tmp/diff.diff

Then run the tests using the command shown at the top of the isssue. It seg faults.

If you feel compelled to re-order the params, and fix the exit_flag variable, go for it. It doesn't matter.

@edenhill
Copy link
Contributor

The crash occurs when Producer or Consumer initprocs raise an exception

@criccomini
Copy link
Author

:(

Any idea why?

@edenhill
Copy link
Contributor

Yes sir, fix is on the way!

@edenhill
Copy link
Contributor

Thank you for the repro env and everything

@criccomini
Copy link
Author

Woot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants