Improved handling of strings and unicode

Treating []u8 as strings is incorrect. []u8 is an array of octets, not an array of characters. Zig should support Unicode more explicitly and enforce the distinction between []u8 and str in the language and standard library.

I propose adding a rune type, which holds one unicode codepoint. The underlying storage mechanism isn't relevant to the programmer, who can only assume it's an int capable of holding a unicode codepoint. On platforms whose pointers are sufficiently sized, it should probably be a usize under the covers. I also propose adding a str type, which is opaque but offers length and indexing of runes. The underlying string encoding is also not important to the programmer, but some possible strategies include always using UTF-8 or UCS-32, or upgrading the encoding as necessary to fit the runes the user attempts to place in it.

Also provided should be standard library functions for manipulating strings separately from []u8, and helpful functions to convert str to []u8 and back again in arbitrary encodings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improved handling of strings and unicode #234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improved handling of strings and unicode #234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions