Skip to content

Logging breaks with binary output or between multi-byte UTF-8 characters #213

@danpalmer

Description

@danpalmer

...and also introduces significant performance overhead.

Currently the logger a) attempts to parse all yielded segments of data as UTF-8, and b) calls logger.debug with all output. This causes a few problems.

  1. If the output of the command, on stdout or stderr, is non-decodable binary data (e.g. an image), the String(data:encoding:) will fail. This doesn't actually cause any issues, it just returns nil and fails to log, but it does mean that the logging silently doesn't log anything.
  2. That process incurs an overhead. If calling with a significant amount of data (megabytes or more), this introduces a significant performance penalty.
  3. Even if the output on stdout or stderr of the command is UTF-8, this also does not work as expected because the process may not be yielding data aligned to UTF-8 codepoint boundaries. For example, an emoji may be two bytes, if these bytes were yielded separately, two incorrect strings would be produced instead of one correct emoji.

I see the reason for logging, but I think this currently makes the library less usable in different use-cases. I think a good option would be to:

  1. Make this logging optional, beyond debug log level (as this may still be a useful level for those reading non-UTF-8 data), with a flag on the class that disables any attempt to parse the bytes.
  2. When enabled, attempt to parse as a string, but fall back to printing the bytes, perhaps in hexadecimal format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions