Skip to content

Commit 4b5da35

Browse files
committed
Add wholeMatch and prefixMatch
Add the functions to string processing algorithms proposal and implement the change. Move the functions from `String` and `SubString` extensions to `BidirectionalCollection`. Add tests for `firstMatch`, `wholeMatch`, and `prefixMatch` that use a custom `BidirectionalCollection` type.
1 parent b24d3ea commit 4b5da35

File tree

4 files changed

+188
-24
lines changed

4 files changed

+188
-24
lines changed

Documentation/Evolution/StringProcessingAlgorithms.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -162,10 +162,11 @@ We also propose the following regex-powered algorithms as well as their generic
162162
|`replace(:with:subrange:maxReplacements)`| Replaces all occurrences of the sequence matching the given `RegexComponent` or sequence with a given collection |
163163
|`split(by:)`| Returns the longest possible subsequences of the collection around elements equal to the given separator |
164164
|`firstMatch(of:)`| Returns the first match of the specified `RegexComponent` within the collection |
165+
|`wholeMatch(of:)`| Matches the specified `RegexComponent` in the collection as a whole |
166+
|`prefixMatch(of:)`| Matches the specified `RegexComponent` against the collection at the beginning |
165167
|`matches(of:)`| Returns a collection containing all matches of the specified `RegexComponent` |
166168

167169

168-
169170
## Detailed design
170171

171172
### `CustomMatchingRegexComponent`
@@ -389,7 +390,7 @@ extension BidirectionalCollection where SubSequence == Substring {
389390
}
390391
```
391392

392-
#### First match
393+
#### Match
393394

394395
```swift
395396
extension BidirectionalCollection where SubSequence == Substring {
@@ -398,6 +399,16 @@ extension BidirectionalCollection where SubSequence == Substring {
398399
/// - Returns: The first match of `regex` in the collection, or `nil` if
399400
/// there isn't a match.
400401
public func firstMatch<R: RegexComponent>(of regex: R) -> RegexMatch<R.Match>?
402+
403+
/// Match a regex in its entirety.
404+
/// - Parameter r: The regex to match against.
405+
/// - Returns: The match if there is one, or `nil` if none.
406+
public func wholeMatch<R: RegexComponent>(of r: R) -> Regex<R.Output>.Match?
407+
408+
/// Match part of the regex, starting at the beginning.
409+
/// - Parameter r: The regex to match against.
410+
/// - Returns: The match if there is one, or `nil` if none.
411+
public func prefixMatch<R: RegexComponent>(of r: R) -> Regex<R.Output>.Match?
401412
}
402413
```
403414

@@ -473,7 +484,7 @@ extension RangeReplaceableCollection where SubSequence == Substring {
473484
/// - Returns: A new collection in which all occurrences of subsequence
474485
/// matching `regex` in `subrange` are replaced by `replacement`.
475486
public func replacing<R: RegexComponent, Replacement: Collection>(
476-
_ regex: R,
487+
_ r: R,
477488
with replacement: Replacement,
478489
subrange: Range<Index>,
479490
maxReplacements: Int = .max
@@ -489,7 +500,7 @@ extension RangeReplaceableCollection where SubSequence == Substring {
489500
/// - Returns: A new collection in which all occurrences of subsequence
490501
/// matching `regex` are replaced by `replacement`.
491502
public func replacing<R: RegexComponent, Replacement: Collection>(
492-
_ regex: R,
503+
_ r: R,
493504
with replacement: Replacement,
494505
maxReplacements: Int = .max
495506
) -> Self where Replacement.Element == Element
@@ -502,7 +513,7 @@ extension RangeReplaceableCollection where SubSequence == Substring {
502513
/// - maxReplacements: A number specifying how many occurrences of the
503514
/// sequence matching `regex` to replace. Default is `Int.max`.
504515
public mutating func replace<R: RegexComponent, Replacement: Collection>(
505-
_ regex: R,
516+
_ r: R,
506517
with replacement: Replacement,
507518
maxReplacements: Int = .max
508519
) where Replacement.Element == Element
@@ -609,4 +620,4 @@ Trimming a string from both sides shares a similar story. For example, `"ababa".
609620

610621
### Future API
611622

612-
Some Python functions are not currently included in this proposal, such as trimming the suffix from a string/collection. This pitch aims to establish a pattern for using `RegexComponent` with string processing algorithms, so that further enhancement can to be introduced to the standard library easily in the future, and eventually close the gap between Swift and other popular scripting languages.
623+
Some common string processing functions are not currently included in this proposal, such as trimming the suffix from a string/collection, and finding overlapping ranges of matched substrings. This pitch aims to establish a pattern for using `RegexComponent` with string processing algorithms, so that further enhancement can to be introduced to the standard library easily in the future, and eventually close the gap between Swift and other popular scripting languages.

Sources/_StringProcessing/Algorithms/Matching/FirstMatch.swift

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ extension BidirectionalCollection {
3939

4040
extension BidirectionalCollection where SubSequence == Substring {
4141
@available(SwiftStdlib 5.7, *)
42+
@_disfavoredOverload
4243
func firstMatch<R: RegexComponent>(
4344
of regex: R
4445
) -> _MatchResult<RegexConsumer<R, Self>>? {

Sources/_StringProcessing/Regex/Match.swift

Lines changed: 9 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -159,32 +159,23 @@ extension Regex {
159159
}
160160

161161
@available(SwiftStdlib 5.7, *)
162-
extension String {
162+
extension BidirectionalCollection where SubSequence == Substring {
163+
/// Match a regex in its entirety.
164+
/// - Parameter r: The regex to match against.
165+
/// - Returns: The match if there is one, or `nil` if none.
163166
public func wholeMatch<R: RegexComponent>(
164167
of r: R
165168
) -> Regex<R.RegexOutput>.Match? {
166-
try? r.regex.wholeMatch(in: self)
169+
try? r.regex.wholeMatch(in: self[...].base)
167170
}
168171

172+
/// Match part of the regex, starting at the beginning.
173+
/// - Parameter r: The regex to match against.
174+
/// - Returns: The match if there is one, or `nil` if none.
169175
public func prefixMatch<R: RegexComponent>(
170176
of r: R
171177
) -> Regex<R.RegexOutput>.Match? {
172-
try? r.regex.prefixMatch(in: self)
173-
}
174-
}
175-
176-
@available(SwiftStdlib 5.7, *)
177-
extension Substring {
178-
public func wholeMatch<R: RegexComponent>(
179-
of r: R
180-
) -> Regex<R.RegexOutput>.Match? {
181-
try? r.regex.wholeMatch(in: self)
182-
}
183-
184-
public func prefixMatch<R: RegexComponent>(
185-
of r: R
186-
) -> Regex<R.RegexOutput>.Match? {
187-
try? r.regex.prefixMatch(in: self)
178+
try? r.regex.prefixMatch(in: self[...])
188179
}
189180
}
190181

Tests/RegexBuilderTests/CustomTests.swift

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,51 @@ func customTest<Match: Equatable>(
133133
}
134134
}
135135

136+
// Test support
137+
struct Concat : Equatable {
138+
var wrapped: String
139+
init(_ name: String, _ suffix: Int?) {
140+
if let suffix = suffix {
141+
wrapped = name + String(suffix)
142+
} else {
143+
wrapped = name
144+
}
145+
}
146+
}
147+
148+
extension Concat : Collection {
149+
typealias Index = String.Index
150+
typealias Element = String.Element
151+
152+
var startIndex: Index { return wrapped.startIndex }
153+
var endIndex: Index { return wrapped.endIndex }
154+
155+
subscript(position: Index) -> Element {
156+
return wrapped[position]
157+
}
158+
159+
func index(after i: Index) -> Index {
160+
return wrapped.index(after: i)
161+
}
162+
}
163+
164+
extension Concat: BidirectionalCollection {
165+
typealias Indices = String.Indices
166+
typealias SubSequence = String.SubSequence
167+
168+
func index(before i: Index) -> Index {
169+
return wrapped.index(before: i)
170+
}
171+
172+
var indices: Indices {
173+
wrapped.indices
174+
}
175+
176+
subscript(bounds: Range<Index>) -> Substring {
177+
Substring(wrapped[bounds])
178+
}
179+
}
180+
136181
class CustomRegexComponentTests: XCTestCase {
137182
// TODO: Refactor below into more exhaustive, declarative
138183
// tests.
@@ -467,4 +512,120 @@ class CustomRegexComponentTests: XCTestCase {
467512
)
468513

469514
}
515+
516+
517+
func testMatchVarients() {
518+
func customTest<Match: Equatable>(
519+
_ regex: Regex<Match>,
520+
_ input: Concat,
521+
expected: (wholeMatch: Match?, firstMatch: Match?, prefixMatch: Match?),
522+
file: StaticString = #file, line: UInt = #line
523+
) {
524+
let wholeResult = input.wholeMatch(of: regex)?.output
525+
let firstResult = input.firstMatch(of: regex)?.output
526+
let prefixResult = input.prefixMatch(of: regex)?.output
527+
XCTAssertEqual(wholeResult, expected.wholeMatch, file: file, line: line)
528+
XCTAssertEqual(firstResult, expected.firstMatch, file: file, line: line)
529+
XCTAssertEqual(prefixResult, expected.prefixMatch, file: file, line: line)
530+
}
531+
532+
typealias CaptureMatch1 = (Substring, Int?)
533+
func customTest(
534+
_ regex: Regex<CaptureMatch1>,
535+
_ input: Concat,
536+
expected: (wholeMatch: CaptureMatch1?, firstMatch: CaptureMatch1?, prefixMatch: CaptureMatch1?),
537+
file: StaticString = #file, line: UInt = #line
538+
) {
539+
let wholeResult = input.wholeMatch(of: regex)?.output
540+
let firstResult = input.firstMatch(of: regex)?.output
541+
let prefixResult = input.prefixMatch(of: regex)?.output
542+
XCTAssertEqual(wholeResult?.0, expected.wholeMatch?.0, file: file, line: line)
543+
XCTAssertEqual(wholeResult?.1, expected.wholeMatch?.1, file: file, line: line)
544+
545+
XCTAssertEqual(firstResult?.0, expected.firstMatch?.0, file: file, line: line)
546+
XCTAssertEqual(firstResult?.1, expected.firstMatch?.1, file: file, line: line)
547+
548+
XCTAssertEqual(prefixResult?.0, expected.prefixMatch?.0, file: file, line: line)
549+
XCTAssertEqual(prefixResult?.1, expected.prefixMatch?.1, file: file, line: line)
550+
}
551+
552+
var regex = Regex {
553+
OneOrMore(.digit)
554+
}
555+
556+
customTest(regex, Concat("amy", 2023), expected:(nil, "2023", nil)) // amy2023
557+
customTest(regex, Concat("amy2023", nil), expected:(nil, "2023", nil))
558+
customTest(regex, Concat("amy", nil), expected:(nil, nil, nil))
559+
customTest(regex, Concat("", 2023), expected:("2023", "2023", "2023")) // 2023
560+
customTest(regex, Concat("bob012b", 2023), expected:(nil, "012", nil)) // b012b2023
561+
customTest(regex, Concat("bob012b", nil), expected:(nil, "012", nil))
562+
customTest(regex, Concat("007bob", 2023), expected:(nil, "007", "007"))
563+
customTest(regex, Concat("", nil), expected:(nil, nil, nil))
564+
565+
regex = Regex {
566+
OneOrMore(CharacterClass("a"..."z"))
567+
}
568+
569+
customTest(regex, Concat("amy", 2023), expected:(nil, "amy", "amy")) // amy2023
570+
customTest(regex, Concat("amy", nil), expected:("amy", "amy", "amy"))
571+
customTest(regex, Concat("amy2022-bob", 2023), expected:(nil, "amy", "amy")) // amy2023
572+
customTest(regex, Concat("", 2023), expected:(nil, nil, nil)) // 2023
573+
customTest(regex, Concat("bob012b", 2023), expected:(nil, "bob", "bob")) // b012b2023
574+
customTest(regex, Concat("bob012b", nil), expected:(nil, "bob", "bob"))
575+
customTest(regex, Concat("007bob", 2023), expected:(nil, "bob", nil))
576+
customTest(regex, Concat("", nil), expected:(nil, nil, nil))
577+
578+
regex = Regex {
579+
OneOrMore {
580+
CharacterClass("A"..."Z")
581+
OneOrMore(CharacterClass("a"..."z"))
582+
Repeat(.digit, count: 2)
583+
}
584+
}
585+
586+
customTest(regex, Concat("Amy12345", nil), expected:(nil, "Amy12", "Amy12"))
587+
customTest(regex, Concat("Amy", 2023), expected:(nil, "Amy20", "Amy20"))
588+
customTest(regex, Concat("Amy", 23), expected:("Amy23", "Amy23", "Amy23"))
589+
customTest(regex, Concat("", 2023), expected:(nil, nil, nil)) // 2023
590+
customTest(regex, Concat("Amy23 Boba17", nil), expected:(nil, "Amy23", "Amy23"))
591+
customTest(regex, Concat("amy23 Boba17", nil), expected:(nil, "Boba17", nil))
592+
customTest(regex, Concat("Amy23 boba17", nil), expected:(nil, "Amy23", "Amy23"))
593+
customTest(regex, Concat("amy23 Boba", 17), expected:(nil, "Boba17", nil))
594+
customTest(regex, Concat("Amy23Boba17", nil), expected:("Amy23Boba17", "Amy23Boba17", "Amy23Boba17"))
595+
customTest(regex, Concat("Amy23Boba", 17), expected:("Amy23Boba17", "Amy23Boba17", "Amy23Boba17"))
596+
customTest(regex, Concat("23 Boba", 17), expected:(nil, "Boba17", nil))
597+
598+
let twoDigitRegex = Regex {
599+
OneOrMore {
600+
CharacterClass("A"..."Z")
601+
OneOrMore(CharacterClass("a"..."z"))
602+
Capture(Repeat(.digit, count: 2)) { Int($0) }
603+
}
604+
}
605+
606+
customTest(twoDigitRegex, Concat("Amy12345", nil), expected: (nil, ("Amy12", 12), ("Amy12", 12)))
607+
customTest(twoDigitRegex, Concat("Amy", 12345), expected: (nil, ("Amy12", 12), ("Amy12", 12)))
608+
customTest(twoDigitRegex, Concat("Amy", 12), expected: (("Amy12", 12), ("Amy12", 12), ("Amy12", 12)))
609+
customTest(twoDigitRegex, Concat("Amy23 Boba", 17), expected: (nil, firstMatch: ("Amy23", 23), prefixMatch: ("Amy23", 23)))
610+
customTest(twoDigitRegex, Concat("amy23 Boba20", 23), expected:(nil, ("Boba20", 20), nil))
611+
customTest(twoDigitRegex, Concat("Amy23Boba17", nil), expected:(("Amy23Boba17", 17), ("Amy23Boba17", 17), ("Amy23Boba17", 17)))
612+
customTest(twoDigitRegex, Concat("Amy23Boba", 17), expected:(("Amy23Boba17", 17), ("Amy23Boba17", 17), ("Amy23Boba17", 17)))
613+
614+
let millennium = Regex {
615+
CharacterClass("A"..."Z")
616+
OneOrMore(CharacterClass("a"..."z"))
617+
Capture { Repeat(.digit, count: 4) } transform: { v -> Int? in
618+
guard let year = Int(v) else { return nil }
619+
return year > 2000 ? year : nil
620+
}
621+
}
622+
623+
customTest(millennium, Concat("Amy2025", nil), expected: (("Amy2025", 2025), ("Amy2025", 2025), ("Amy2025", 2025)))
624+
customTest(millennium, Concat("Amy", 2025), expected: (("Amy2025", 2025), ("Amy2025", 2025), ("Amy2025", 2025)))
625+
customTest(millennium, Concat("Amy1995", nil), expected: (("Amy1995", nil), ("Amy1995", nil), ("Amy1995", nil)))
626+
customTest(millennium, Concat("Amy", 1995), expected: (("Amy1995", nil), ("Amy1995", nil), ("Amy1995", nil)))
627+
customTest(millennium, Concat("amy2025", nil), expected: (nil, nil, nil))
628+
customTest(millennium, Concat("amy", 2025), expected: (nil, nil, nil))
629+
}
470630
}
631+

0 commit comments

Comments
 (0)