Logging breaks with binary output or between multi-byte UTF-8 characters

...and also introduces significant performance overhead.

Currently the logger a) attempts to parse all yielded segments of data as UTF-8, and b) calls `logger.debug` with all output. This causes a few problems.

1. If the output of the command, on stdout or stderr, is non-decodable binary data (e.g. an image), the `String(data:encoding:)` will fail. This doesn't actually cause any issues, it just returns nil and fails to log, but it does mean that the logging silently doesn't log anything.
2. That process incurs an overhead. If calling with a significant amount of data (megabytes or more), this introduces a significant performance penalty.
3. Even if the output on stdout or stderr of the command is UTF-8, this also does not work as expected because the process may not be yielding data aligned to UTF-8 codepoint boundaries. For example, an emoji may be two bytes, if these bytes were yielded separately, two incorrect strings would be produced instead of one correct emoji.

I see the reason for logging, but I think this currently makes the library less usable in different use-cases. I think a good option would be to:

1. Make this logging optional, beyond debug log level (as this may still be a useful level for those reading non-UTF-8 data), with a flag on the class that disables any attempt to parse the bytes.
2. When enabled, attempt to parse as a string, but fall back to printing the bytes, perhaps in hexadecimal format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logging breaks with binary output or between multi-byte UTF-8 characters #213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Logging breaks with binary output or between multi-byte UTF-8 characters #213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions