Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions Sources/_StringProcessing/RegexDSL/Assertion.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
//===----------------------------------------------------------------------===//
//
// This source file is part of the Swift.org open source project
//
// Copyright (c) 2021-2022 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
//
//===----------------------------------------------------------------------===//

import _MatchingEngine

public struct Anchor {
internal enum Kind {
case startOfSubject
case endOfSubjectBeforeNewline
case endOfSubject
case firstMatchingPositionInSubject
case textSegmentBoundary
case startOfLine
case endOfLine
case wordBoundary
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be a listing of built-in assertions, or are each of these the kinds of assertions someone could write?


var kind: Kind
var isInverted: Bool = false
}

extension Anchor: RegexProtocol {
var astAssertion: AST.Atom.AssertionKind {
if !isInverted {
switch kind {
case .startOfSubject: return .startOfSubject
case .endOfSubjectBeforeNewline: return .endOfSubjectBeforeNewline
case .endOfSubject: return .endOfSubject
case .firstMatchingPositionInSubject: return .firstMatchingPositionInSubject
case .textSegmentBoundary: return .textSegment
case .startOfLine: return .startOfLine
case .endOfLine: return .endOfLine
case .wordBoundary: return .wordBoundary
}
} else {
switch kind {
case .startOfSubject: fatalError("Not yet supported")
case .endOfSubjectBeforeNewline: fatalError("Not yet supported")
case .endOfSubject: fatalError("Not yet supported")
case .firstMatchingPositionInSubject: fatalError("Not yet supported")
case .textSegmentBoundary: return .notTextSegment
case .startOfLine: fatalError("Not yet supported")
case .endOfLine: fatalError("Not yet supported")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have a representation for these negated assertions in the AST, since things like notWordBoundary are represented as specific individual cases. These fatalError'd cases don't have a regex literal equivalent, but would be available if we use an API like Assertion.wordBoundary.inverted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we going to an AST here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSLTree tracks assertions using AST.Atom.AssertionKind right now.

case .wordBoundary: return .notWordBoundary
}
}
}

public var regex: Regex<Substring> {
Regex(node: .atom(.assertion(astAssertion)))
}
}

// MARK: - Public API

extension Anchor {
public static var startOfSubject: Anchor {
Anchor(kind: .startOfSubject)
}

public static var endOfSubjectBeforeNewline: Anchor {
Anchor(kind: .endOfSubjectBeforeNewline)
}

public static var endOfSubject: Anchor {
Anchor(kind: .endOfSubject)
}

// TODO: Are we supporting this?
// public static var resetStartOfMatch: Anchor {
// Anchor(kind: resetStartOfMatch)
// }

public static var firstMatchingPositionInSubject: Anchor {
Anchor(kind: .firstMatchingPositionInSubject)
}

public static var textSegmentBoundary: Anchor {
Anchor(kind: .textSegmentBoundary)
}

public static var startOfLine: Anchor {
Anchor(kind: .startOfLine)
}

public static var endOfLine: Anchor {
Anchor(kind: .endOfLine)
}

public static var wordBoundary: Anchor {
Anchor(kind: .wordBoundary)
}

public var inverted: Anchor {
var result = self
result.isInverted.toggle()
return result
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want an isInverted then? Also, is it the case that all anchors can be inverted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that a property makes sense if we aren't going to also expose kind as public API, and the purpose of this is just to carry the regex wrapper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in this PR can be inverted, just need a little more plumbing. If we want to provide the functionality of a "reset match" assertion, that could just be a separate function or type, since it isn't an anchor anyway.

}

public func lookahead<R: RegexProtocol>(
isNegative: Bool = false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While generally Boolean properties should read like an assertion about the receiver, I'm not sure it's quite as useful for function parameters. In this case, isNegative is not forming a phrase with the base name to produce Boolean result, but a dictation by the caller to modify the behavior of the callee. As such, I wonder if we should call it negative instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is like the second parameter in split(separator: "-", omittingEmptySubsequences: false). 👍🏻

@RegexBuilder _ content: () -> R
) -> Regex<R.Match> {
Regex(node: .group(isNegative ? .negativeLookahead : .lookahead, content().regex.root))
}

public func lookahead<R: RegexProtocol>(
_ component: R,
isNegative: Bool = false
) -> Regex<R.Match> {
Regex(node: .group(isNegative ? .negativeLookahead : .lookahead, component.regex.root))
}
25 changes: 25 additions & 0 deletions Tests/RegexTests/RegexDSLTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,31 @@ class RegexDSLTests: XCTestCase {
}
}
}

func testAssertions() throws {
try _testDSLCaptures(
("aaaaab", "aaaaab"),
("caaaaab", nil),
("aaaaabc", nil),
captureType: Substring.self, ==)
{
Anchor.startOfLine
"a".+
"b"
Anchor.endOfLine
}

try _testDSLCaptures(
("aaaaa1", "aaaaa1"),
("aaaaa", nil),
("aaaaab", nil),
captureType: Substring.self, ==)
{
"a".+
lookahead(CharacterClass.digit)
CharacterClass.word
}
}

func testNestedGroups() throws {
try _testDSLCaptures(
Expand Down