File tree Expand file tree Collapse file tree 4 files changed +9
-8
lines changed
Expand file tree Collapse file tree 4 files changed +9
-8
lines changed Original file line number Diff line number Diff line change @@ -18,7 +18,8 @@ representable in a given `AbstractChar` type.
1818Internally, an `AbstractChar` type may use a variety of encodings. Conversion
1919to `UInt32` will not reveal this encoding because it always returns the
2020Unicode value of the character. (Typically, the raw encoding can be obtained
21- via [`reinterpret`](@ref).)
21+ via [`reinterpret`](@ref).) Character I/O uses UTF-8 by default for all
22+ character types, regardless of their internal encoding.
2223"""
2324AbstractChar
2425
@@ -148,8 +149,7 @@ hash(x::Char, h::UInt) =
148149# fallbacks:
149150isless (x:: AbstractChar , y:: AbstractChar ) = isless (Char (x), Char (y))
150151== (x:: AbstractChar , y:: AbstractChar ) = Char (x) == Char (y)
151- hash (x:: AbstractChar , h:: UInt ) =
152- hash_uint64 (((UInt32 (x) + UInt64 (0xd060fad0 )) << 32 ) ⊻ UInt64 (h))
152+ hash (x:: AbstractChar , h:: UInt ) = hash (Char (x), h)
153153widen (:: Type{T} ) where {T<: AbstractChar } = T
154154
155155- (x:: AbstractChar , y:: AbstractChar ) = Int (x) - Int (y)
Original file line number Diff line number Diff line change @@ -14,8 +14,8 @@ about strings:
1414* String indexing is done in terms of these code units:
1515 * Characters are extracted by `s[i]` with a valid string index `i`
1616 * Each `AbstractChar` in a string is encoded by one or more code units
17- * Only the index of the first code unit of a `AbstractChar` is a valid index
18- * The encoding of a `AbstractChar` is independent of what precedes or follows it
17+ * Only the index of the first code unit of an `AbstractChar` is a valid index
18+ * The encoding of an `AbstractChar` is independent of what precedes or follows it
1919 * String encodings are [self-synchronizing] – i.e. `isvalid(s, i)` is O(1)
2020
2121[self-synchronizing]: https://en.wikipedia.org/wiki/Self-synchronizing_code
Original file line number Diff line number Diff line change @@ -410,7 +410,7 @@ If `count` is provided, replace at most `count` occurrences.
410410or a regular expression.
411411If `r` is a function, each occurrence is replaced with `r(s)`
412412where `s` is the matched substring (when `pat`is a `Regex` or `AbstractString`) or
413- character (when `pat` is a `AbstractChar` or a collection of `AbstractChar`).
413+ character (when `pat` is an `AbstractChar` or a collection of `AbstractChar`).
414414If `pat` is a regular expression and `r` is a `SubstitutionString`, then capture group
415415references in `r` are replaced with the corresponding matched text.
416416To remove instances of `pat` from `string`, set `r` to the empty `String` (`""`).
Original file line number Diff line number Diff line change @@ -28,8 +28,9 @@ There are a few noteworthy high-level features about Julia's strings:
2828 additional ` AbstractString ` subtypes (e.g. for other encodings). If you define a function expecting
2929 a string argument, you should declare the type as ` AbstractString ` in order to accept any string
3030 type.
31- * Like C and Java, but unlike most dynamic languages, Julia has a first-class type representing
32- a single character, called ` AbstractChar ` . This is just a special kind of 32-bit primitive type whose numeric value represents a Unicode code point.
31+ * Like C and Java, but unlike most dynamic languages, Julia has a first-class type for representing
32+ a single character, called ` AbstractChar ` . The built-in ` Char ` subtype of ` AbstractChar `
33+ is a 32-bit primitive type that can represent any Unicode character.
3334 * As in Java, strings are immutable: the value of an ` AbstractString ` object cannot be changed.
3435 To construct a different string value, you construct a new string from parts of other strings.
3536 * Conceptually, a string is a * partial function* from indices to characters: for some index values,
You can’t perform that action at this time.
0 commit comments