diff --git a/Documentation/Evolution/RegexSyntax.md b/Documentation/Evolution/RegexSyntaxRunTimeConstruction.md similarity index 92% rename from Documentation/Evolution/RegexSyntax.md rename to Documentation/Evolution/RegexSyntaxRunTimeConstruction.md index 97e3b45da..d0e04a1f7 100644 --- a/Documentation/Evolution/RegexSyntax.md +++ b/Documentation/Evolution/RegexSyntaxRunTimeConstruction.md @@ -2,7 +2,7 @@ Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter deliberation continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal. Additionally, this is the syntax accepted from a string used for run-time regex construction, so we're devoting an entire pitch/proposal to the topic of _regex syntax_, distinct from the result builder DSL or the choice of delimiters for literals. --> -# Run-time Regex Construction +# Regex Syntax and Run-time Construction - Authors: [Hamish Knight](https://github.com/hamishknight), [Michael Ilseman](https://github.com/milseman) @@ -16,7 +16,7 @@ The overall story is laid out in [Regex Type and Overview][overview] and each in Swift aims to be a pragmatic programming language, striking a balance between familiarity, interoperability, and advancing the art. Swift's `String` presents a uniquely Unicode-forward model of string, but currently suffers from limited processing facilities. -`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage. +`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage, such as the need to translate between `NSRange` and `Range`. ```swift let pattern = #"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"# @@ -42,7 +42,7 @@ func processEntry(_ line: String) -> Transaction? { } ``` -Fixing these fundamental limitations requires migrating to a completely different engine and type system representation. This is the path we're proposing with `Regex`, outlined in [Regex Type and Overview][overview]. Details on the semantic mismatch between ICU and Swift's `String` is discussed in [Unicode for String Processing][pitches]. +Fixing these fundamental limitations requires migrating to a completely different engine and type system representation. This is the path we're proposing with `Regex`, outlined in [Regex Type and Overview][overview]. Details on the semantic differences between ICU's string model and Swift's `String` is discussed in [Unicode for String Processing][pitches]. Run-time construction is important for tools and editors. For example, SwiftPM allows the user to provide a regular expression to filter tests via `swift test --filter`. @@ -60,7 +60,6 @@ let regex: Regex<(Substring, Substring, Substring, Substring, Substring)> = try! Regex(compiling: pattern) ``` - ### Syntax We propose accepting a syntactic "superset" of the following existing regular expression engines: @@ -80,11 +79,87 @@ Regex syntax will be part of Swift's source-compatibility story as well as its b ## Detailed Design - +We propose initializers to declare and compile a regex from syntax. Upon failure, these initializers throw compilation errors, such as for syntax or type errors. API for retrieving error information is future work. + +```swift +extension Regex { + /// Parse and compile `pattern`, resulting in a strongly-typed capture list. + public init(compiling pattern: String, as: Output.Type = Output.self) throws +} +extension Regex where Output == AnyRegexOutput { + /// Parse and compile `pattern`, resulting in an existentially-typed capture list. + public init(compiling pattern: String) throws +} +``` + +We propose `AnyRegexOutput` for capture types not known at compilation time, alongside casting API to convert to a strongly-typed capture list. + +```swift +/// A type-erased regex output +public struct AnyRegexOutput { + /// Creates a type-erased regex output from an existing output. + /// + /// Use this initializer to fit a regex with strongly typed captures into the + /// use site of a dynamic regex, i.e. one that was created from a string. + public init(_ match: Regex.Match) + + /// Returns a typed output by converting the underlying value to the specified + /// type. + /// + /// - Parameter type: The expected output type. + /// - Returns: The output, if the underlying value can be converted to the + /// output type, or nil otherwise. + public func `as`(_ type: Output.Type) -> Output? +} +extension AnyRegexOutput: RandomAccessCollection { + public struct Element { + /// The range over which a value was captured. `nil` for no-capture. + public var range: Range? + + /// The slice of the input over which a value was captured. `nil` for no-capture. + public var substring: Substring? + + /// The captured value. `nil` for no-capture. + public var value: Any? + } + + // Trivial collection conformance requirements -We propose the following syntax for regex. + public var startIndex: Int { get } + + public var endIndex: Int { get } + + public var count: Int { get } + + public func index(after i: Int) -> Int + + public func index(before i: Int) -> Int + + public subscript(position: Int) -> Element +} +``` + +We propose adding an API to `Regex.Match` to cast the output type to a concrete one. A regex match will lazily create a `Substring` on demand, so casting the match itself saves ARC traffic vs extracting and casting the output. + +```swift +extension Regex.Match where Output == AnyRegexOutput { + /// Creates a type-erased regex match from an existing match. + /// + /// Use this initializer to fit a regex match with strongly typed captures into the + /// use site of a dynamic regex match, i.e. one that was created from a string. + public init(_ match: Regex.Match) + + /// Returns a typed match by converting the underlying values to the specified + /// types. + /// + /// - Parameter type: The expected output type. + /// - Returns: A match generic over the output type if the underlying values can be converted to the + /// output type. Returns `nil` otherwise. + public func `as`(_ type: Output.Type) -> Regex.Match? +} +``` + +The rest of this proposal will be a detailed and exhaustive definition of our proposed regex syntax.
Grammar Notation @@ -856,6 +931,12 @@ We are deferring runtime support for callouts from regex literals as future work ## Alternatives Considered +### Failalbe inits + +There are many ways for compilation to fail, from syntactic errors to unsupported features to type mismatches. In the general case, run-time compilation errors are not recoverable by a tool without modifying the user's input. Even then, the thrown errors contain valuable information as to why compilation failed. For example, swiftpm presents any errors directly to the user. + +As proposed, the errors thrown will be the same errors presented to the Swift compiler, tracking fine-grained source locations with specific reasons why compilation failed. Defining a rich error API is future work, as these errors are rapidly evolving and it is too early to lock in the ABI. + ### Skip the syntax diff --git a/Sources/_StringProcessing/Regex/AnyRegexOutput.swift b/Sources/_StringProcessing/Regex/AnyRegexOutput.swift index cac0e46c3..bd0fc47c9 100644 --- a/Sources/_StringProcessing/Regex/AnyRegexOutput.swift +++ b/Sources/_StringProcessing/Regex/AnyRegexOutput.swift @@ -37,6 +37,7 @@ extension Regex.Match where Output == AnyRegexOutput { } } +/// A type-erased regex output public struct AnyRegexOutput { let input: String fileprivate let _elements: [ElementRepresentation] @@ -70,6 +71,7 @@ extension AnyRegexOutput { /// Returns a typed output by converting the underlying value to the specified /// type. + /// /// - Parameter type: The expected output type. /// - Returns: The output, if the underlying value can be converted to the /// output type, or nil otherwise. @@ -119,13 +121,20 @@ extension AnyRegexOutput: RandomAccessCollection { fileprivate let representation: ElementRepresentation let input: String + /// The range over which a value was captured. `nil` for no-capture. public var range: Range? { representation.bounds } + /// The slice of the input over which a value was captured. `nil` for no-capture. public var substring: Substring? { range.map { input[$0] } } + + /// The captured value, `nil` for no-capture + public var value: Any? { + fatalError() + } } public var startIndex: Int { @@ -152,3 +161,23 @@ extension AnyRegexOutput: RandomAccessCollection { .init(representation: _elements[position], input: input) } } + +extension Regex.Match where Output == AnyRegexOutput { + /// Creates a type-erased regex match from an existing match. + /// + /// Use this initializer to fit a regex match with strongly typed captures into the + /// use site of a dynamic regex match, i.e. one that was created from a string. + public init(_ match: Regex.Match) { + fatalError("FIXME: Not implemented") + } + + /// Returns a typed match by converting the underlying values to the specified + /// types. + /// + /// - Parameter type: The expected output type. + /// - Returns: A match generic over the output type if the underlying values can be converted to the + /// output type. Returns `nil` otherwise. + public func `as`(_ type: Output.Type) -> Regex.Match? { + fatalError("FIXME: Not implemented") + } +}