Skip to content

Improved handling of strings and unicode #234

@ddevault

Description

@ddevault

Treating []u8 as strings is incorrect. []u8 is an array of octets, not an array of characters. Zig should support Unicode more explicitly and enforce the distinction between []u8 and str in the language and standard library.

I propose adding a rune type, which holds one unicode codepoint. The underlying storage mechanism isn't relevant to the programmer, who can only assume it's an int capable of holding a unicode codepoint. On platforms whose pointers are sufficiently sized, it should probably be a usize under the covers. I also propose adding a str type, which is opaque but offers length and indexing of runes. The underlying string encoding is also not important to the programmer, but some possible strategies include always using UTF-8 or UCS-32, or upgrading the encoding as necessary to fit the runes the user attempts to place in it.

Also provided should be standard library functions for manipulating strings separately from []u8, and helpful functions to convert str to []u8 and back again in arbitrary encodings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions