From 5bf6f4514f91a12dbf407253770b4acafe1519e3 Mon Sep 17 00:00:00 2001 From: Michael Ilseman Date: Fri, 22 Apr 2022 08:42:43 -0600 Subject: [PATCH 1/2] Rename custom match prefix protocol and add doc comments --- Documentation/Evolution/ProposalOverview.md | 2 +- Documentation/Evolution/RegexTypeOverview.md | 10 ++--- .../Evolution/StringProcessingAlgorithms.md | 21 +++++----- .../Regex/CustomComponents.swift | 39 +++++++++++++++++++ .../Regex/DSLConsumers.swift | 29 -------------- Tests/RegexBuilderTests/CustomTests.swift | 12 +++--- Tests/RegexBuilderTests/RegexDSLTests.swift | 4 +- 7 files changed, 64 insertions(+), 53 deletions(-) create mode 100644 Sources/_StringProcessing/Regex/CustomComponents.swift delete mode 100644 Sources/_StringProcessing/Regex/DSLConsumers.swift diff --git a/Documentation/Evolution/ProposalOverview.md b/Documentation/Evolution/ProposalOverview.md index 4346932b5..5b3fb99db 100644 --- a/Documentation/Evolution/ProposalOverview.md +++ b/Documentation/Evolution/ProposalOverview.md @@ -39,7 +39,7 @@ Covers the "interior" syntax, extended syntaxes, run-time construction of a rege Proposes a slew of Regex-powered algorithms. -Introduces `CustomMatchingRegexComponent`, which is a monadic-parser style interface for external parsers to be used as components of a regex. +Introduces `CustomPrefixMatchRegexComponent`, which is a monadic-parser style interface for external parsers to be used as components of a regex. ## Unicode for String Processing diff --git a/Documentation/Evolution/RegexTypeOverview.md b/Documentation/Evolution/RegexTypeOverview.md index 6eed648f0..68dd6ccc7 100644 --- a/Documentation/Evolution/RegexTypeOverview.md +++ b/Documentation/Evolution/RegexTypeOverview.md @@ -231,7 +231,7 @@ The result builder allows for inline failable value construction, which particip Swift regexes describe an unambiguous algorithm, where choice is ordered and effects can be reliably observed. For example, a `print()` statement inside the `TryCapture`'s transform function will run whenever the overall algorithm naturally dictates an attempt should be made. Optimizations can only elide such calls if they can prove it is behavior-preserving (e.g. "pure"). -`CustomMatchingRegexComponent`, discussed in [String Processing Algorithms][pitches], allows industrial-strength parsers to be used a regex components. This allows us to drop the overly-permissive pre-parsing step: +`CustomPrefixMatchRegexComponent`, discussed in [String Processing Algorithms][pitches], allows industrial-strength parsers to be used a regex components. This allows us to drop the overly-permissive pre-parsing step: ```swift func processEntry(_ line: String) -> Transaction? { @@ -431,7 +431,7 @@ Regular expressions have a deservedly mixed reputation, owing to their historica * "Regular expressions are bad because you should use a real parser" - In other systems, you're either in or you're out, leading to a gravitational pull to stay in when... you should get out - - Our remedy is interoperability with real parsers via `CustomMatchingRegexComponent` + - Our remedy is interoperability with real parsers via `CustomPrefixMatchRegexComponent` - Literals with refactoring actions provide an incremental off-ramp from regex syntax to result builders and real parsers * "Regular expressions are bad because ugly unmaintainable syntax" - We propose literals with source tools support, allowing for better syntax highlighting and analysis @@ -516,7 +516,7 @@ Regex are compiled into an intermediary representation and fairly simple analysi ### Future work: parser combinators -What we propose here is an incremental step towards better parsing support in Swift using parser-combinator style libraries. The underlying execution engine supports recursive function calls and mechanisms for library extensibility. `CustomMatchingRegexComponent`'s protocol requirement is effectively a [monadic parser](https://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf), meaning `Regex` provides a regex-flavored combinator-like system. +What we propose here is an incremental step towards better parsing support in Swift using parser-combinator style libraries. The underlying execution engine supports recursive function calls and mechanisms for library extensibility. `CustomPrefixMatchRegexComponent`'s protocol requirement is effectively a [monadic parser](https://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf), meaning `Regex` provides a regex-flavored combinator-like system. An issues with traditional parser combinator libraries are the compilation barriers between call-site and definition, resulting in excessive and overly-cautious backtracking traffic. These can be eliminated through better [compilation techniques](https://core.ac.uk/download/pdf/148008325.pdf). As mentioned above, Swift's support for custom static compilation is still under development. @@ -565,9 +565,9 @@ Regexes are often used for tokenization and tokens can be represented with Swift ### Future work: baked-in localized processing -- `CustomMatchingRegexComponent` gives an entry point for localized processors +- `CustomPrefixMatchRegexComponent` gives an entry point for localized processors - Future work includes (sub?)protocols to communicate localization intent --> -[pitches]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md \ No newline at end of file +[pitches]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md diff --git a/Documentation/Evolution/StringProcessingAlgorithms.md b/Documentation/Evolution/StringProcessingAlgorithms.md index 74416ae63..f461f1976 100644 --- a/Documentation/Evolution/StringProcessingAlgorithms.md +++ b/Documentation/Evolution/StringProcessingAlgorithms.md @@ -8,7 +8,7 @@ We propose: 1. New regex-powered algorithms over strings, bringing the standard library up to parity with scripting languages 2. Generic `Collection` equivalents of these algorithms in terms of subsequences -3. `protocol CustomMatchingRegexComponent`, which allows 3rd party libraries to provide their industrial-strength parsers as intermixable components of regexes +3. `protocol CustomPrefixMatchRegexComponent`, which allows 3rd party libraries to provide their industrial-strength parsers as intermixable components of regexes This proposal is part of a larger [regex-powered string processing initiative](https://forums.swift.org/t/declarative-string-processing-overview/52459). Throughout the document, we will reference the still-in-progress [`RegexProtocol`, `Regex`](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/StronglyTypedCaptures.md), and result builder DSL, but these are in flux and not formally part of this proposal. Further discussion of regex specifics is out of scope of this proposal and better discussed in another thread (see [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107) for links to relevant threads). @@ -132,7 +132,7 @@ Parsing a currency string such as `$3,020.85` with regex is also tricky, as it c ### Complex string processing -We propose a `CustomMatchingRegexComponent` protocol which allows types from outside the standard library participate in regex builders and `RegexComponent` algorithms. This allows types, such as `Date.ParseStrategy` and `FloatingPointFormatStyle.Currency`, to be used directly within a regex: +We propose a `CustomPrefixMatchRegexComponent` protocol which allows types from outside the standard library participate in regex builders and `RegexComponent` algorithms. This allows types, such as `Date.ParseStrategy` and `FloatingPointFormatStyle.Currency`, to be used directly within a regex: ```swift let dateRegex = Regex { @@ -169,22 +169,23 @@ We also propose the following regex-powered algorithms as well as their generic ## Detailed design -### `CustomMatchingRegexComponent` +### `CustomPrefixMatchRegexComponent` -`CustomMatchingRegexComponent` inherits from `RegexComponent` and satisfies its sole requirement; Conformers can be used with all of the string algorithms generic over `RegexComponent`. +`CustomPrefixMatchRegexComponent` inherits from `RegexComponent` and satisfies its sole requirement; Conformers can be used with all of the string algorithms generic over `RegexComponent`. ```swift -/// A protocol for custom match functionality. -public protocol CustomMatchingRegexComponent : RegexComponent { - /// Match the input string within the specified bounds, beginning at the given index, and return - /// the end position (upper bound) of the match and the matched instance. +/// A protocol allowing custom types to function as regex components by +/// providing the raw functionality backing `prefixMatch`. +public protocol CustomPrefixMatchRegexComponent : RegexComponent { + /// Process the input string within the specified bounds, beginning at the given index, and return + /// the end position (upper bound) of the match and the produced output. /// - Parameters: /// - input: The string in which the match is performed. /// - index: An index of `input` at which to begin matching. /// - bounds: The bounds in `input` in which the match is performed. /// - Returns: The upper bound where the match terminates and a matched instance, or `nil` if /// there isn't a match. - func match( + func consuming( _ input: String, startingAt index: String.Index, in bounds: Range @@ -198,7 +199,7 @@ public protocol CustomMatchingRegexComponent : RegexComponent { We use Foundation `FloatingPointFormatStyle.Currency` as an example for protocol conformance. It would implement the `match` function with `Match` being a `Decimal`. It could also add a static function `.localizedCurrency(code:)` as a member of `RegexComponent`, so it can be referred as `.localizedCurrency(code:)` in the `Regex` result builder: ```swift -extension FloatingPointFormatStyle.Currency : CustomMatchingRegexComponent { +extension FloatingPointFormatStyle.Currency : CustomPrefixMatchRegexComponent { public func match( _ input: String, startingAt index: String.Index, diff --git a/Sources/_StringProcessing/Regex/CustomComponents.swift b/Sources/_StringProcessing/Regex/CustomComponents.swift new file mode 100644 index 000000000..e8111555c --- /dev/null +++ b/Sources/_StringProcessing/Regex/CustomComponents.swift @@ -0,0 +1,39 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift.org open source project +// +// Copyright (c) 2021-2022 Apple Inc. and the Swift project authors +// Licensed under Apache License v2.0 with Runtime Library Exception +// +// See https://swift.org/LICENSE.txt for license information +// +//===----------------------------------------------------------------------===// + +@available(SwiftStdlib 5.7, *) +/// A protocol allowing custom types to function as regex components by +/// providing the raw functionality backing `prefixMatch`. +public protocol CustomPrefixMatchRegexComponent: RegexComponent { + /// Process the input string within the specified bounds, beginning at the given index, and return + /// the end position (upper bound) of the match and the produced output. + /// - Parameters: + /// - input: The string in which the match is performed. + /// - index: An index of `input` at which to begin matching. + /// - bounds: The bounds in `input` in which the match is performed. + /// - Returns: The upper bound where the match terminates and a matched instance, or `nil` if + /// there isn't a match. + func consuming( + _ input: String, + startingAt index: String.Index, + in bounds: Range + ) throws -> (upperBound: String.Index, output: RegexOutput)? +} + +@available(SwiftStdlib 5.7, *) +extension CustomPrefixMatchRegexComponent { + public var regex: Regex { + let node: DSLTree.Node = .matcher(RegexOutput.self, { input, index, bounds in + try consuming(input, startingAt: index, in: bounds) + }) + return Regex(node: node) + } +} diff --git a/Sources/_StringProcessing/Regex/DSLConsumers.swift b/Sources/_StringProcessing/Regex/DSLConsumers.swift deleted file mode 100644 index eb8ace8d3..000000000 --- a/Sources/_StringProcessing/Regex/DSLConsumers.swift +++ /dev/null @@ -1,29 +0,0 @@ -//===----------------------------------------------------------------------===// -// -// This source file is part of the Swift.org open source project -// -// Copyright (c) 2021-2022 Apple Inc. and the Swift project authors -// Licensed under Apache License v2.0 with Runtime Library Exception -// -// See https://swift.org/LICENSE.txt for license information -// -//===----------------------------------------------------------------------===// - -@available(SwiftStdlib 5.7, *) -public protocol CustomMatchingRegexComponent: RegexComponent { - func match( - _ input: String, - startingAt index: String.Index, - in bounds: Range - ) throws -> (upperBound: String.Index, output: RegexOutput)? -} - -@available(SwiftStdlib 5.7, *) -extension CustomMatchingRegexComponent { - public var regex: Regex { - let node: DSLTree.Node = .matcher(RegexOutput.self, { input, index, bounds in - try match(input, startingAt: index, in: bounds) - }) - return Regex(node: node) - } -} diff --git a/Tests/RegexBuilderTests/CustomTests.swift b/Tests/RegexBuilderTests/CustomTests.swift index d17c3a142..269f9ebaa 100644 --- a/Tests/RegexBuilderTests/CustomTests.swift +++ b/Tests/RegexBuilderTests/CustomTests.swift @@ -14,13 +14,13 @@ import _StringProcessing @testable import RegexBuilder // A nibbler processes a single character from a string -private protocol Nibbler: CustomMatchingRegexComponent { +private protocol Nibbler: CustomPrefixMatchRegexComponent { func nibble(_: Character) -> RegexOutput? } extension Nibbler { // Default implementation, just feed the character in - func match( + func consuming( _ input: String, startingAt index: String.Index, in bounds: Range @@ -49,10 +49,10 @@ private struct Asciibbler: Nibbler { } } -private struct IntParser: CustomMatchingRegexComponent { +private struct IntParser: CustomPrefixMatchRegexComponent { struct ParseError: Error, Hashable {} typealias RegexOutput = Int - func match(_ input: String, + func consuming(_ input: String, startingAt index: String.Index, in bounds: Range ) throws -> (upperBound: String.Index, output: Int)? { @@ -71,7 +71,7 @@ private struct IntParser: CustomMatchingRegexComponent { } } -private struct CurrencyParser: CustomMatchingRegexComponent { +private struct CurrencyParser: CustomPrefixMatchRegexComponent { enum Currency: String, Hashable { case usd = "USD" case ntd = "NTD" @@ -84,7 +84,7 @@ private struct CurrencyParser: CustomMatchingRegexComponent { } typealias RegexOutput = Currency - func match(_ input: String, + func consuming(_ input: String, startingAt index: String.Index, in bounds: Range ) throws -> (upperBound: String.Index, output: Currency)? { diff --git a/Tests/RegexBuilderTests/RegexDSLTests.swift b/Tests/RegexBuilderTests/RegexDSLTests.swift index cc5afda39..4bcea83ab 100644 --- a/Tests/RegexBuilderTests/RegexDSLTests.swift +++ b/Tests/RegexBuilderTests/RegexDSLTests.swift @@ -855,9 +855,9 @@ class RegexDSLTests: XCTestCase { var patch: Int var dev: String? } - struct SemanticVersionParser: CustomMatchingRegexComponent { + struct SemanticVersionParser: CustomPrefixMatchRegexComponent { typealias RegexOutput = SemanticVersion - func match( + func consuming( _ input: String, startingAt index: String.Index, in bounds: Range From 49fe84aa81473f88484c7255c70217369c3cd16e Mon Sep 17 00:00:00 2001 From: Michael Ilseman Date: Fri, 22 Apr 2022 09:26:44 -0600 Subject: [PATCH 2/2] Update algo proposal prose --- .../Evolution/StringProcessingAlgorithms.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/Documentation/Evolution/StringProcessingAlgorithms.md b/Documentation/Evolution/StringProcessingAlgorithms.md index f461f1976..edefbd19b 100644 --- a/Documentation/Evolution/StringProcessingAlgorithms.md +++ b/Documentation/Evolution/StringProcessingAlgorithms.md @@ -10,7 +10,7 @@ We propose: 2. Generic `Collection` equivalents of these algorithms in terms of subsequences 3. `protocol CustomPrefixMatchRegexComponent`, which allows 3rd party libraries to provide their industrial-strength parsers as intermixable components of regexes -This proposal is part of a larger [regex-powered string processing initiative](https://forums.swift.org/t/declarative-string-processing-overview/52459). Throughout the document, we will reference the still-in-progress [`RegexProtocol`, `Regex`](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/StronglyTypedCaptures.md), and result builder DSL, but these are in flux and not formally part of this proposal. Further discussion of regex specifics is out of scope of this proposal and better discussed in another thread (see [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107) for links to relevant threads). +This proposal is part of a larger [regex-powered string processing initiative](https://github.com/apple/swift-evolution/blob/main/proposals/0350-regex-type-overview.md), the status of each proposal is tracked [here](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md). Further discussion of regex specifics is out of scope of this proposal and better discussed in their relevant reviews. ## Motivation @@ -91,18 +91,18 @@ Note: Only a subset of Python's string processing API are included in this table ### Complex string processing -Even with the API additions, more complex string processing quickly becomes unwieldy. Up-coming support for authoring regexes in Swift help alleviate this somewhat, but string processing in the modern world involves dealing with localization, standards-conforming validation, and other concerns for which a dedicated parser is required. +Even with the API additions, more complex string processing quickly becomes unwieldy. String processing in the modern world involves dealing with localization, standards-conforming validation, and other concerns for which a dedicated parser is required. Consider parsing the date field `"Date: Wed, 16 Feb 2022 23:53:19 GMT"` in an HTTP header as a `Date` type. The naive approach is to search for a substring that looks like a date string (`16 Feb 2022`), and attempt to post-process it as a `Date` with a date parser: ```swift let regex = Regex { - capture { - oneOrMore(.digit) + Capture { + OneOrMore(.digit) " " - oneOrMore(.word) + OneOrMore(.word) " " - oneOrMore(.digit) + OneOrMore(.digit) } } @@ -128,7 +128,7 @@ DEBIT 03/24/2020 IRX tax payment ($52,249.98) Parsing a currency string such as `$3,020.85` with regex is also tricky, as it can contain localized and currency symbols in addition to accounting conventions. This is why Foundation provides industrial-strength parsers for localized strings. -## Proposed solution +## Proposed solution ### Complex string processing @@ -136,13 +136,13 @@ We propose a `CustomPrefixMatchRegexComponent` protocol which allows types from ```swift let dateRegex = Regex { - capture(dateParser) + Capture(dateParser) } let date: Date = header.firstMatch(of: dateRegex).map(\.result.1) let currencyRegex = Regex { - capture(.localizedCurrency(code: "USD").sign(strategy: .accounting)) + Capture(.localizedCurrency(code: "USD").sign(strategy: .accounting)) } let amount: [Decimal] = statement.matches(of: currencyRegex).map(\.result.1) @@ -167,16 +167,16 @@ We also propose the following regex-powered algorithms as well as their generic |`matches(of:)`| Returns a collection containing all matches of the specified `RegexComponent` | -## Detailed design +## Detailed design ### `CustomPrefixMatchRegexComponent` -`CustomPrefixMatchRegexComponent` inherits from `RegexComponent` and satisfies its sole requirement; Conformers can be used with all of the string algorithms generic over `RegexComponent`. +`CustomPrefixMatchRegexComponent` inherits from `RegexComponent` and satisfies its sole requirement. Conformers can be used with all of the string algorithms generic over `RegexComponent`. ```swift /// A protocol allowing custom types to function as regex components by /// providing the raw functionality backing `prefixMatch`. -public protocol CustomPrefixMatchRegexComponent : RegexComponent { +public protocol CustomPrefixMatchRegexComponent: RegexComponent { /// Process the input string within the specified bounds, beginning at the given index, and return /// the end position (upper bound) of the match and the produced output. /// - Parameters: @@ -200,7 +200,7 @@ We use Foundation `FloatingPointFormatStyle.Currency` as an example for ```swift extension FloatingPointFormatStyle.Currency : CustomPrefixMatchRegexComponent { - public func match( + public func consuming( _ input: String, startingAt index: String.Index, in bounds: Range