diff --git a/Documentation/Evolution/ProposalOverview.md b/Documentation/Evolution/ProposalOverview.md new file mode 100644 index 000000000..24ac0301c --- /dev/null +++ b/Documentation/Evolution/ProposalOverview.md @@ -0,0 +1,55 @@ + +# Regex Proposals + +## Regex Type and Overview + +- [Pitch](https://forums.swift.org/t/pitch-regex-type-and-overview/56029) +- Proposal: To-be-scheduled + +Presents basic Regex type and gives an overview of how everything fits into the overall story + + +## Regex Builder DSL + +- [Pitch thread](https://forums.swift.org/t/pitch-regex-builder-dsl/56007) + +Covers the result builder approach and basic API. + + +## Run-time Regex Construction + +- Pitch thread: [Regex Syntax](https://forums.swift.org/t/pitch-regex-syntax/55711) + + Brief: Syntactic superset of PCRE2, Oniguruma, ICU, UTS\#18, etc. + +Covers the "interior" syntax, extended syntaxes, run-time construction of a regex from a string, and details of `AnyRegexOutput`. + +Note: The above pitch drills into the syntax, the revised pitch including two initializers and existential details is still under development. + +## Regex Literals + +- [Draft](https://github.com/apple/swift-experimental-string-processing/pull/187) +- (Old) original pitch: + + [Thread](https://forums.swift.org/t/pitch-regular-expression-literals/52820) + + [Update](https://forums.swift.org/t/pitch-regular-expression-literals/52820/90) + + +## String processing algorithms + +- [Pitch thread](https://forums.swift.org/t/pitch-regex-powered-string-processing-algorithms/55969) + +Proposes a slew of Regex-powered algorithms. + +Introduces `CustomMatchingRegexComponent`, which is a monadic-parser style interface for external parsers to be used as components of a regex. + +## Unicode for String Processing + +- Draft: TBD +- (Old) [Character class definitions](https://forums.swift.org/t/pitch-character-classes-for-string-processing/52920) + +Covers three topics: + +- Proposes literal and DSL API for library-defined character classes, Unicode scripts and properties, and custom character classes. +- Proposes literal and DSL API for options that affect matching behavior. +- Defines how Unicode scalar-based classes are extended to grapheme clusters in the different semantic and other matching modes. + + diff --git a/Documentation/Evolution/RegexSyntax.md b/Documentation/Evolution/RegexSyntax.md index faa327176..9f4c6e8a0 100644 --- a/Documentation/Evolution/RegexSyntax.md +++ b/Documentation/Evolution/RegexSyntax.md @@ -10,27 +10,56 @@ Hello, we want to issue an update to [Regular Expression Literals](https://forum A regex declares a string processing algorithm using syntax familiar across a variety of languages and tools throughout programming history. We propose the ability to create a regex at run time from a string containing regex syntax (detailed here), API for accessing the match and captures, and a means to convert between an existential capture representation and concrete types. -The overall story is laid out in [Regex Type and Overview](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/RegexTypeOverview.md) and each individual component is tracked in [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107). +The overall story is laid out in [Regex Type and Overview][overview] and each individual component is tracked in [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107). ## Motivation Swift aims to be a pragmatic programming language, striking a balance between familiarity, interoperability, and advancing the art. Swift's `String` presents a uniquely Unicode-forward model of string, but currently suffers from limited processing facilities. - +`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage. + +```swift +let pattern = #"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"# +let nsRegEx = try! NSRegularExpression(pattern: pattern) -The full string processing effort includes a regex type with strongly typed captures, the ability to create a regex from a string at runtime, a compile-time literal, a result builder DSL, protocols for intermixing 3rd party industrial-strength parsers with regex declarations, and a slew of regex-powered algorithms over strings. +func processEntry(_ line: String) -> Transaction? { + let range = NSRange(line.startIndex.. +We propose run-time construction of `Regex` from a best-in-class treatment of familiar regular expression syntax. A `Regex` is generic over its `Output`, which includes capture information. This may be an existential `AnyRegexOutput`, or a concrete type provided by the user. + +```swift +let pattern = #"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"# +let regex = try! Regex(compiling: pattern) +// regex: Regex + +let regex: Regex<(Substring, Substring, Substring, Substring, Substring)> = + try! Regex(compiling: pattern) +``` + ### Syntax @@ -866,3 +895,9 @@ This proposal regards _syntactic_ support, and does not necessarily mean that ev [unicode-scripts]: https://www.unicode.org/reports/tr24/#Script [unicode-script-extensions]: https://www.unicode.org/reports/tr24/#Script_Extensions [balancing-groups]: https://docs.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#balancing-group-definitions +[overview]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/RegexTypeOverview.md +[pitches]: https://github.com/apple/swift-experimental-string-processing/issues/107 + + + + diff --git a/Documentation/Evolution/RegexTypeOverview.md b/Documentation/Evolution/RegexTypeOverview.md index 504111181..55e9963cc 100644 --- a/Documentation/Evolution/RegexTypeOverview.md +++ b/Documentation/Evolution/RegexTypeOverview.md @@ -14,7 +14,7 @@ We propose addressing this basic shortcoming through an effort we are calling re 3. A literal for compile-time construction of a regex with statically-typed captures, enabling powerful source tools. 4. An expressive and composable result-builder DSL, with support for capturing strongly-typed values. 5. A modern treatment of Unicode semantics and string processing. -6. A treasure trove of string processing algorithms, along with library-extensible protocols enabling industrial-strength parsers to be used seamlessly as regex components. +6. A slew of regex-powered string processing algorithms, along with library-extensible protocols enabling industrial-strength parsers to be used seamlessly as regex components. This proposal provides details on \#1, the `Regex` type and captures, and gives an overview of how each of the other proposals fit into regex in Swift.