Skip to content

Commit 81bc5d0

Browse files
authored
Updates for algorithms proposal (#319)
* Rename custom match prefix protocol and add doc comments * Update algo proposal prose
1 parent 8dd8470 commit 81bc5d0

File tree

7 files changed

+75
-64
lines changed

7 files changed

+75
-64
lines changed

Documentation/Evolution/ProposalOverview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Covers the "interior" syntax, extended syntaxes, run-time construction of a rege
3939

4040
Proposes a slew of Regex-powered algorithms.
4141

42-
Introduces `CustomMatchingRegexComponent`, which is a monadic-parser style interface for external parsers to be used as components of a regex.
42+
Introduces `CustomPrefixMatchRegexComponent`, which is a monadic-parser style interface for external parsers to be used as components of a regex.
4343

4444
## Unicode for String Processing
4545

Documentation/Evolution/RegexTypeOverview.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ The result builder allows for inline failable value construction, which particip
231231

232232
Swift regexes describe an unambiguous algorithm, where choice is ordered and effects can be reliably observed. For example, a `print()` statement inside the `TryCapture`'s transform function will run whenever the overall algorithm naturally dictates an attempt should be made. Optimizations can only elide such calls if they can prove it is behavior-preserving (e.g. "pure").
233233

234-
`CustomMatchingRegexComponent`, discussed in [String Processing Algorithms][pitches], allows industrial-strength parsers to be used a regex components. This allows us to drop the overly-permissive pre-parsing step:
234+
`CustomPrefixMatchRegexComponent`, discussed in [String Processing Algorithms][pitches], allows industrial-strength parsers to be used a regex components. This allows us to drop the overly-permissive pre-parsing step:
235235

236236
```swift
237237
func processEntry(_ line: String) -> Transaction? {
@@ -431,7 +431,7 @@ Regular expressions have a deservedly mixed reputation, owing to their historica
431431

432432
* "Regular expressions are bad because you should use a real parser"
433433
- In other systems, you're either in or you're out, leading to a gravitational pull to stay in when... you should get out
434-
- Our remedy is interoperability with real parsers via `CustomMatchingRegexComponent`
434+
- Our remedy is interoperability with real parsers via `CustomPrefixMatchRegexComponent`
435435
- Literals with refactoring actions provide an incremental off-ramp from regex syntax to result builders and real parsers
436436
* "Regular expressions are bad because ugly unmaintainable syntax"
437437
- We propose literals with source tools support, allowing for better syntax highlighting and analysis
@@ -516,7 +516,7 @@ Regex are compiled into an intermediary representation and fairly simple analysi
516516

517517
### Future work: parser combinators
518518

519-
What we propose here is an incremental step towards better parsing support in Swift using parser-combinator style libraries. The underlying execution engine supports recursive function calls and mechanisms for library extensibility. `CustomMatchingRegexComponent`'s protocol requirement is effectively a [monadic parser](https://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf), meaning `Regex` provides a regex-flavored combinator-like system.
519+
What we propose here is an incremental step towards better parsing support in Swift using parser-combinator style libraries. The underlying execution engine supports recursive function calls and mechanisms for library extensibility. `CustomPrefixMatchRegexComponent`'s protocol requirement is effectively a [monadic parser](https://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf), meaning `Regex` provides a regex-flavored combinator-like system.
520520

521521
An issues with traditional parser combinator libraries are the compilation barriers between call-site and definition, resulting in excessive and overly-cautious backtracking traffic. These can be eliminated through better [compilation techniques](https://core.ac.uk/download/pdf/148008325.pdf). As mentioned above, Swift's support for custom static compilation is still under development.
522522

@@ -565,9 +565,9 @@ Regexes are often used for tokenization and tokens can be represented with Swift
565565
566566
### Future work: baked-in localized processing
567567
568-
- `CustomMatchingRegexComponent` gives an entry point for localized processors
568+
- `CustomPrefixMatchRegexComponent` gives an entry point for localized processors
569569
- Future work includes (sub?)protocols to communicate localization intent
570570
571571
-->
572572

573-
[pitches]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md
573+
[pitches]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md

Documentation/Evolution/StringProcessingAlgorithms.md

Lines changed: 22 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ We propose:
88

99
1. New regex-powered algorithms over strings, bringing the standard library up to parity with scripting languages
1010
2. Generic `Collection` equivalents of these algorithms in terms of subsequences
11-
3. `protocol CustomMatchingRegexComponent`, which allows 3rd party libraries to provide their industrial-strength parsers as intermixable components of regexes
11+
3. `protocol CustomPrefixMatchRegexComponent`, which allows 3rd party libraries to provide their industrial-strength parsers as intermixable components of regexes
1212

13-
This proposal is part of a larger [regex-powered string processing initiative](https://forums.swift.org/t/declarative-string-processing-overview/52459). Throughout the document, we will reference the still-in-progress [`RegexProtocol`, `Regex`](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/StronglyTypedCaptures.md), and result builder DSL, but these are in flux and not formally part of this proposal. Further discussion of regex specifics is out of scope of this proposal and better discussed in another thread (see [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107) for links to relevant threads).
13+
This proposal is part of a larger [regex-powered string processing initiative](https://github.com/apple/swift-evolution/blob/main/proposals/0350-regex-type-overview.md), the status of each proposal is tracked [here](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md). Further discussion of regex specifics is out of scope of this proposal and better discussed in their relevant reviews.
1414

1515
## Motivation
1616

@@ -91,18 +91,18 @@ Note: Only a subset of Python's string processing API are included in this table
9191

9292
### Complex string processing
9393

94-
Even with the API additions, more complex string processing quickly becomes unwieldy. Up-coming support for authoring regexes in Swift help alleviate this somewhat, but string processing in the modern world involves dealing with localization, standards-conforming validation, and other concerns for which a dedicated parser is required.
94+
Even with the API additions, more complex string processing quickly becomes unwieldy. String processing in the modern world involves dealing with localization, standards-conforming validation, and other concerns for which a dedicated parser is required.
9595

9696
Consider parsing the date field `"Date: Wed, 16 Feb 2022 23:53:19 GMT"` in an HTTP header as a `Date` type. The naive approach is to search for a substring that looks like a date string (`16 Feb 2022`), and attempt to post-process it as a `Date` with a date parser:
9797

9898
```swift
9999
let regex = Regex {
100-
capture {
101-
oneOrMore(.digit)
100+
Capture {
101+
OneOrMore(.digit)
102102
" "
103-
oneOrMore(.word)
103+
OneOrMore(.word)
104104
" "
105-
oneOrMore(.digit)
105+
OneOrMore(.digit)
106106
}
107107
}
108108

@@ -128,21 +128,21 @@ DEBIT 03/24/2020 IRX tax payment ($52,249.98)
128128
Parsing a currency string such as `$3,020.85` with regex is also tricky, as it can contain localized and currency symbols in addition to accounting conventions. This is why Foundation provides industrial-strength parsers for localized strings.
129129

130130

131-
## Proposed solution
131+
## Proposed solution
132132

133133
### Complex string processing
134134

135-
We propose a `CustomMatchingRegexComponent` protocol which allows types from outside the standard library participate in regex builders and `RegexComponent` algorithms. This allows types, such as `Date.ParseStrategy` and `FloatingPointFormatStyle.Currency`, to be used directly within a regex:
135+
We propose a `CustomPrefixMatchRegexComponent` protocol which allows types from outside the standard library participate in regex builders and `RegexComponent` algorithms. This allows types, such as `Date.ParseStrategy` and `FloatingPointFormatStyle.Currency`, to be used directly within a regex:
136136

137137
```swift
138138
let dateRegex = Regex {
139-
capture(dateParser)
139+
Capture(dateParser)
140140
}
141141

142142
let date: Date = header.firstMatch(of: dateRegex).map(\.result.1)
143143

144144
let currencyRegex = Regex {
145-
capture(.localizedCurrency(code: "USD").sign(strategy: .accounting))
145+
Capture(.localizedCurrency(code: "USD").sign(strategy: .accounting))
146146
}
147147

148148
let amount: [Decimal] = statement.matches(of: currencyRegex).map(\.result.1)
@@ -167,24 +167,25 @@ We also propose the following regex-powered algorithms as well as their generic
167167
|`matches(of:)`| Returns a collection containing all matches of the specified `RegexComponent` |
168168

169169

170-
## Detailed design
170+
## Detailed design
171171

172-
### `CustomMatchingRegexComponent`
172+
### `CustomPrefixMatchRegexComponent`
173173

174-
`CustomMatchingRegexComponent` inherits from `RegexComponent` and satisfies its sole requirement; Conformers can be used with all of the string algorithms generic over `RegexComponent`.
174+
`CustomPrefixMatchRegexComponent` inherits from `RegexComponent` and satisfies its sole requirement. Conformers can be used with all of the string algorithms generic over `RegexComponent`.
175175

176176
```swift
177-
/// A protocol for custom match functionality.
178-
public protocol CustomMatchingRegexComponent : RegexComponent {
179-
/// Match the input string within the specified bounds, beginning at the given index, and return
180-
/// the end position (upper bound) of the match and the matched instance.
177+
/// A protocol allowing custom types to function as regex components by
178+
/// providing the raw functionality backing `prefixMatch`.
179+
public protocol CustomPrefixMatchRegexComponent: RegexComponent {
180+
/// Process the input string within the specified bounds, beginning at the given index, and return
181+
/// the end position (upper bound) of the match and the produced output.
181182
/// - Parameters:
182183
/// - input: The string in which the match is performed.
183184
/// - index: An index of `input` at which to begin matching.
184185
/// - bounds: The bounds in `input` in which the match is performed.
185186
/// - Returns: The upper bound where the match terminates and a matched instance, or `nil` if
186187
/// there isn't a match.
187-
func match(
188+
func consuming(
188189
_ input: String,
189190
startingAt index: String.Index,
190191
in bounds: Range<String.Index>
@@ -198,8 +199,8 @@ public protocol CustomMatchingRegexComponent : RegexComponent {
198199
We use Foundation `FloatingPointFormatStyle<Decimal>.Currency` as an example for protocol conformance. It would implement the `match` function with `Match` being a `Decimal`. It could also add a static function `.localizedCurrency(code:)` as a member of `RegexComponent`, so it can be referred as `.localizedCurrency(code:)` in the `Regex` result builder:
199200

200201
```swift
201-
extension FloatingPointFormatStyle<Decimal>.Currency : CustomMatchingRegexComponent {
202-
public func match(
202+
extension FloatingPointFormatStyle<Decimal>.Currency : CustomPrefixMatchRegexComponent {
203+
public func consuming(
203204
_ input: String,
204205
startingAt index: String.Index,
205206
in bounds: Range<String.Index>
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
//===----------------------------------------------------------------------===//
2+
//
3+
// This source file is part of the Swift.org open source project
4+
//
5+
// Copyright (c) 2021-2022 Apple Inc. and the Swift project authors
6+
// Licensed under Apache License v2.0 with Runtime Library Exception
7+
//
8+
// See https://swift.org/LICENSE.txt for license information
9+
//
10+
//===----------------------------------------------------------------------===//
11+
12+
@available(SwiftStdlib 5.7, *)
13+
/// A protocol allowing custom types to function as regex components by
14+
/// providing the raw functionality backing `prefixMatch`.
15+
public protocol CustomPrefixMatchRegexComponent: RegexComponent {
16+
/// Process the input string within the specified bounds, beginning at the given index, and return
17+
/// the end position (upper bound) of the match and the produced output.
18+
/// - Parameters:
19+
/// - input: The string in which the match is performed.
20+
/// - index: An index of `input` at which to begin matching.
21+
/// - bounds: The bounds in `input` in which the match is performed.
22+
/// - Returns: The upper bound where the match terminates and a matched instance, or `nil` if
23+
/// there isn't a match.
24+
func consuming(
25+
_ input: String,
26+
startingAt index: String.Index,
27+
in bounds: Range<String.Index>
28+
) throws -> (upperBound: String.Index, output: RegexOutput)?
29+
}
30+
31+
@available(SwiftStdlib 5.7, *)
32+
extension CustomPrefixMatchRegexComponent {
33+
public var regex: Regex<RegexOutput> {
34+
let node: DSLTree.Node = .matcher(RegexOutput.self, { input, index, bounds in
35+
try consuming(input, startingAt: index, in: bounds)
36+
})
37+
return Regex(node: node)
38+
}
39+
}

Sources/_StringProcessing/Regex/DSLConsumers.swift

Lines changed: 0 additions & 29 deletions
This file was deleted.

Tests/RegexBuilderTests/CustomTests.swift

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ import _StringProcessing
1414
@testable import RegexBuilder
1515

1616
// A nibbler processes a single character from a string
17-
private protocol Nibbler: CustomMatchingRegexComponent {
17+
private protocol Nibbler: CustomPrefixMatchRegexComponent {
1818
func nibble(_: Character) -> RegexOutput?
1919
}
2020

2121
extension Nibbler {
2222
// Default implementation, just feed the character in
23-
func match(
23+
func consuming(
2424
_ input: String,
2525
startingAt index: String.Index,
2626
in bounds: Range<String.Index>
@@ -49,10 +49,10 @@ private struct Asciibbler: Nibbler {
4949
}
5050
}
5151

52-
private struct IntParser: CustomMatchingRegexComponent {
52+
private struct IntParser: CustomPrefixMatchRegexComponent {
5353
struct ParseError: Error, Hashable {}
5454
typealias RegexOutput = Int
55-
func match(_ input: String,
55+
func consuming(_ input: String,
5656
startingAt index: String.Index,
5757
in bounds: Range<String.Index>
5858
) throws -> (upperBound: String.Index, output: Int)? {
@@ -71,7 +71,7 @@ private struct IntParser: CustomMatchingRegexComponent {
7171
}
7272
}
7373

74-
private struct CurrencyParser: CustomMatchingRegexComponent {
74+
private struct CurrencyParser: CustomPrefixMatchRegexComponent {
7575
enum Currency: String, Hashable {
7676
case usd = "USD"
7777
case ntd = "NTD"
@@ -84,7 +84,7 @@ private struct CurrencyParser: CustomMatchingRegexComponent {
8484
}
8585

8686
typealias RegexOutput = Currency
87-
func match(_ input: String,
87+
func consuming(_ input: String,
8888
startingAt index: String.Index,
8989
in bounds: Range<String.Index>
9090
) throws -> (upperBound: String.Index, output: Currency)? {

Tests/RegexBuilderTests/RegexDSLTests.swift

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -863,9 +863,9 @@ class RegexDSLTests: XCTestCase {
863863
var patch: Int
864864
var dev: String?
865865
}
866-
struct SemanticVersionParser: CustomMatchingRegexComponent {
866+
struct SemanticVersionParser: CustomPrefixMatchRegexComponent {
867867
typealias RegexOutput = SemanticVersion
868-
func match(
868+
func consuming(
869869
_ input: String,
870870
startingAt index: String.Index,
871871
in bounds: Range<String.Index>

0 commit comments

Comments
 (0)