Skip to content

reading invalid text data, other encodings #1792

@JeffBezanson

Description

@JeffBezanson

We probably need to change 1 or 2 behaviors when reading invalid data or unknown encodings. There are two cases: Char, and things like readuntil/readline.

readuntil can be reasonably defined in terms of bytes: just read everything until a certain value. This is good because then you can at least get the data without explicit support for every encoding. Currently we might return an invalid UTF8String, from which you can get the (unaltered) data. I don't know whether that is the best approach. Maybe there should be a lower-level routine that returns a byte array. We also need functions that do the same for different fixed-width encodings (16-bit, 32-bit).

Reading a Char I don't think can be done reasonably without knowing the encoding. The best immediate change I can think of is to give an error for invalid data while trying to read a UTF-8-encoded Char.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs decisionA decision on this change is needed

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions