-
Notifications
You must be signed in to change notification settings - Fork 915
Proper logger #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We could expose a log callback, it is not totally clear how it should be called though. So in librdkafka v0.9.4 I added support for routing logs to a queue which can then be polled through a poll() call and this is what the Python client should be using to make sure the callbacks are called from a more Python-friendly context thread-wise. There are a couple of options here:
I'm open to all suggestions. |
I would like also to have some controll about the logging of the librdkafka within this python library. First thing i was looking for was if the log_cb is available in python. I wouldn't mind if if for some reason the log is delayed but having no controll of output and where to log is worse. |
Is there any value in trying to integrate with some Python logging framework? |
Not sure if I get the question. For example: Within python i use standard python logging (https://docs.python.org/3.6/library/logging.html) Now as I think about it, I guess the information you already log is good enough. |
@ewencp comments on best (..) way to provide logs from librdkafka to Python land? |
Any temporary workaround for getting the stderr / stdout info into a log file without having to do some file descriptor juggling? i.e. http://eli.thegreenplace.net/2015/redirecting-all-kinds-of-stdout-in-python/ |
I think I like option #1, "serve this log queue from a background thread, calling the callbacks on this thread. This will provide the snappiest log experience (minimal delay between logging and displaying the logs) but it also means there will be user callbacks called on a non-main thread which might trip people up.". Anyone using this is only going to be feeding the logs into their own logging system, so it should not really trip people up that it is invoked in the context of a different thread than their main thread, I would think. In terms of giving the app some control over the format, I don't know. Would it make sense for the app to provide a log object as returned by the python logging.getLogger()? That way the app could set up their formatter and log level any way they wanted, and confluent-kafka-python could just call log.info, log.debug, etc. Then there would be no need for callbacks. |
@edenhill Python's built-in logging is considered best practice, so it's not crazy to go directly to that. However, obviously the callback gives more flexibility. Not everyone uses the built-in (e.g. afaik some WSGI servers like uWSGI have their own logging mechanism). But those might end up just being different handlers in Python's logging (or possible to be adapted to as such). In terms of multiple threads, I'm not worried about coalescing them all in C and then processing in Python as long as messages within a thread can be maintained in order. However, option 1 does have the drawback that logs can potentially get backlogged. This may always be a problem in python due to the GIL, so I assume there is some back pressure. If there is, then using a shared C queue only temporarily alleviates the issue. Putting back pressure on the original threads might actually be a better solution. (And if there isn't back pressure, then there is an infinite queue that needs fixing.) Option 2 seems bad if logs can have long delays before processing. This seems like it will lead to debugging nightmares since logs that should have been written may not have been. Any async processing can result in this case, but at least the background thread should be processing the messages as fast as possible. Option 1 and piping directly to Python's logging library seems the easiest for the user, but has the lag problem. Option 1 and leaving it up to the user to implement the callback provides flexibility. Processing log messages directly and eating the cost of GIL and python code to process the message (which we may want to better understand the relative costs of) seems the "most correct" solution in that you make the threads deal with the general slowness of the rest of the code. I'm curious to know what the actual cost of the last option is, or what the exact implications are for that thread of things move slow (e.g. what if you were just on a slow processor? would that cause things to break?). |
Hi, any update on this issue ? |
I have a branch which has not yet been pushed, will file a PR soon. |
Added to v0.11.4 |
I seem to get a lot of log messages to stdout/stderr that look more or less like this:
%3|1488984668.648|FAIL|Master_BE_23.79.234.201#producer-2| 23.79.234.201:9092/bootstrap: Receive failed: Disconnected
For correlation with other events from other logs, and for general human readability, it would be nice if the timestamp were in a form more like "2017-03-08 16:11:36.095 UTC" or something. Is there a way to control this from the config block I hand to my Consumer or Producer?
Similarly, is there a way to tell confluent-kafka-python to put all this in a log file instead of strdout/stderr?
The text was updated successfully, but these errors were encountered: