Skip to content

Conversation

@ViktorHofer
Copy link
Owner

@ViktorHofer ViktorHofer commented Mar 6, 2018

Related: https://github.com/dotnet/corefx/issues/24145

Reviewers please read this first:

Implement & expose new Span/Memory APIs

The APIs aren't approved by or even submitted to the API review board yet. I had to make sure that what I had planned was even possible before taking it to the board. That's why this PR targets my master branch and not the corefx's one.

I introduced Span/Memory in places where it makes most sense.

Memory overloads: Match, Matches (return the provided input text in the Match object)
Span overloads: Escape, IsMatch, Replace, Match.Result

Replace with a MatchEvaluator is a special situation as the evaluator allows to reference members in the internal used Match object. As we are using a "light version" of Match (we are not setting the input Memory to avoid conversion from Span to Memory) for APIs that operate but not return a Match object, a "full" Match must be created here. To avoid copy costs from Span to Memory I'm pinning the Span for the lifetime of the Replace operation until the evaluation is done.

Break inheritance between RegexInterpreter and RegexRunner

I was discussing this with Stephen before. The current architecture makes it hard/impossible to modernize the Interpreter and introduce goodies like Span/MemoryPool because RegexRunner itself is exposed because of the CompileToAssembly functionality, which we currently don't support in Core. As the Interpreter itself is internal it makes sense to get rid of this burden and modernize it.

Currently the relevant code is copied from RegexRunner to RegexInterpreter but I play with idea of changing it to a by ref type and introduce ValueListBuilder members for the stacks. This should remove a lot of duplicate code. More about that later. Therefore please do yourself a favor and skip reviewing the code additions in RegexInterpreter.cs!

The CompiledRunner currently boxes the input string to a Memory when creating the Match object. We are limited with introducing changes here and I plan to discuss breaking compatibility with netfx for assemblies that were compiled with CompileToAssembly. More about that later, not part of this discussion/PR.

Tests

I added tests for the new overloads in all places where the string counterpart were used. As the string and the Span/Memory overloads share the same implementation, coverage should be fine. If you find any holes, please let me know.

Minor changes

  • Replace code paths especially for RTL optimized by avoiding an additional data structure for the reversed transformed inputs
  • Add missing xmldocs to exposed APIs where it was missing

Performance

Early Perfview/BenchmarkDotNet tests show that the existing string overloads benefit from the changes and aren't slower than they were before. More benchmarks and traces will follow soon!

I played with various inputs i.e. the regex-redux benchmark. After hacking the Shared ArrayPool (resizing the max array size) I could see with Perfview that 99% of the string GC cleanup was removed and replaced by borrowed char arrays which of course currently don't get cleaned up: https://github.com/dotnet/coreclr/issues/7747.

API Review

Ref struct for Match and siblings (Capture & Group).

I had a discussion with Jan offline and he pointed out that we might want to introduce a ref struct MatchValue type that is returned by APIs that take Span/Memory as an input.

The problem with just using the current class Match is that it gives you unsecure access to the Span. For example, you can send the Match object to other thread and start working on the Span there, while the current thread unwinds and frees the memory.
ref struct MatchValue would avoid this issue
(And also saved the alllocation)

The issues with that is that we currently have the following hiearchy: Match --> Group --> Capture and that Groups and Match contain collections of Captures/Groups.

Yes, the flip from class to valuetype tends to be like this. E.g. when we have introduced ValueTask to CoreFX, a bunch of parallel ValueSomething types went with it.
It is a topic for API review discussion
One option is to just not have Span version of the APIs that returns these collections or have callbacks

startat overload

For things like Regex.Replace, the startat argument means copy everything up to startat to the destination Span, and then run regular Replace that does not take startat method. So this looks like a convenience method to me - it saves you from typing a tiny bit of code in rare cases to achieve the same effect.

Should we add these startat convenience overloads for Span also?
If yes, this commit should be reverted bf7d7f9

RegexSplitEnumerator RTL yield order

If you call the Span version of Regex.Split and pass RegexOptions.RightToLeft to it the yield order of the enumerator will also be right to left as we start looking for matches from right to left. The current implementation (which is not an enumerator!) reverses the captured strings before returning.

RegexSplitEnumerator GetEnumerator (see ref diff)

I'm not aware of any other cases in the BCL where we have a GetEnumerator method like this on a struct enumerator. I understand you want it to be able to directly foreach the results without introducing an enumerable struct to serve as the return type, but I'm not sure this is a pattern we want to introduce. You should be sure to highlight this as part of any API review discussion.

Proposed APIs

This diff contains the Memory overloads and the MatchEvaluator overloads. See discussion above if we should introduce new ref types for Match, Group & Capture.

namespace System.Text.RegularExpressions
{
    public partial class Capture
    {
        internal Capture() { }
        public int Index { get { throw null; } }
        public int Length { get { throw null; } }
        public string Value { get { throw null; } }
+       public ReadOnlySpan<char> ValueSpan { get { throw null; } }
        public override string ToString() { throw null; }
    }
    public partial class Match : System.Text.RegularExpressions.Group
    {
        internal Match() { }
        public static System.Text.RegularExpressions.Match Empty { get { throw null; } }
        public virtual System.Text.RegularExpressions.GroupCollection Groups { get { throw null; } }
        public System.Text.RegularExpressions.Match NextMatch() { throw null; }
        public virtual string Result(string replacement) { throw null; }
+       public virtual bool TryResult(string replacement, Span<char> destination, out int charsWritten) { throw null; }
        public static System.Text.RegularExpressions.Match Synchronized(System.Text.RegularExpressions.Match inner) { throw null; }
    }
    public partial class Regex : System.Runtime.Serialization.ISerializable
    {
+       public ref struct SplitEnumerator
+       {
+           public ReadOnlySpan<char> Current { get { throw null; } }
+           public SplitEnumerator GetEnumerator() { throw null; }
+           public bool MoveNext() { throw null; }
+       }
        protected internal System.Collections.Hashtable caps;
        protected internal System.Collections.Hashtable capnames;
        protected internal int capsize;
        protected internal string[] capslist;
        protected internal System.Text.RegularExpressions.RegexRunnerFactory factory;
        public static readonly System.TimeSpan InfiniteMatchTimeout;
        protected internal System.TimeSpan internalMatchTimeout;
        protected internal string pattern;
        protected internal System.Text.RegularExpressions.RegexOptions roptions;
        protected Regex() { }
        protected Regex(System.Runtime.Serialization.SerializationInfo info, System.Runtime.Serialization.StreamingContext context) { }
        public Regex(string pattern) { }
        public Regex(string pattern, System.Text.RegularExpressions.RegexOptions options) { }
        public Regex(string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { }
        public static int CacheSize { get { throw null; } set { } }
        [System.CLSCompliant(false)]
        protected System.Collections.IDictionary Caps { get { throw null; } set { } }
        [System.CLSCompliant(false)]
        protected System.Collections.IDictionary CapNames { get { throw null; } set { } }
        public System.TimeSpan MatchTimeout { get { throw null; } }
        public System.Text.RegularExpressions.RegexOptions Options { get { throw null; } }
        public bool RightToLeft { get { throw null; } }
        public static string Escape(string str) { throw null; }
+       public static bool TryEscape(ReadOnlySpan<char> str, Span<char> destination, out int charsWritten) { throw null; }
        public string[] GetGroupNames() { throw null; }
        public int[] GetGroupNumbers() { throw null; }
        public string GroupNameFromNumber(int i) { throw null; }
        public int GroupNumberFromName(string name) { throw null; }
        protected void InitializeReferences() { }
        public bool IsMatch(string input) { throw null; }
+       public bool IsMatch(ReadOnlySpan<char> input) { throw null; }
        public bool IsMatch(string input, int startat) { throw null; }
        public static bool IsMatch(string input, string pattern) { throw null; }
        public static bool IsMatch(string input, string pattern, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static bool IsMatch(string input, string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static bool IsMatch(ReadOnlySpan<char> input, string pattern, System.Text.RegularExpressions.RegexOptions options = System.Text.RegularExpressions.RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public System.Text.RegularExpressions.Match Match(string input) { throw null; }
+       public System.Text.RegularExpressions.Match Match(ReadOnlyMemory<char> input) { throw null; }
        public System.Text.RegularExpressions.Match Match(string input, int startat) { throw null; }
        public System.Text.RegularExpressions.Match Match(string input, int beginning, int length) { throw null; }
        public static System.Text.RegularExpressions.Match Match(string input, string pattern) { throw null; }
        public static System.Text.RegularExpressions.Match Match(string input, string pattern, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static System.Text.RegularExpressions.Match Match(string input, string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static System.Text.RegularExpressions.Match Match(ReadOnlyMemory<char> input, string pattern, System.Text.RegularExpressions.RegexOptions options = RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public System.Text.RegularExpressions.MatchCollection Matches(string input) { throw null; }
+       public System.Text.RegularExpressions.MatchCollection Matches(ReadOnlyMemory<char> input) { throw null; }
        public System.Text.RegularExpressions.MatchCollection Matches(string input, int startat) { throw null; }
        public static System.Text.RegularExpressions.MatchCollection Matches(string input, string pattern) { throw null; }
        public static System.Text.RegularExpressions.MatchCollection Matches(string input, string pattern, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static System.Text.RegularExpressions.MatchCollection Matches(string input, string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static System.Text.RegularExpressions.MatchCollection Matches(ReadOnlyMemory<char> input, string pattern, System.Text.RegularExpressions.RegexOptions options = RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public string Replace(string input, string replacement) { throw null; }
        public string Replace(string input, string replacement, int count) { throw null; }
+       public bool TryReplace(ReadOnlySpan<char> input, string replacement, Span<char> destination, out int charsWritten, int count = -1) { throw null; }
        public string Replace(string input, string replacement, int count, int startat) { throw null; }
        public static string Replace(string input, string pattern, string replacement) { throw null; }
        public static string Replace(string input, string pattern, string replacement, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static string Replace(string input, string pattern, string replacement, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static bool TryReplace(ReadOnlySpan<char> input, string pattern, string replacement, Span<char> destination, out int charsWritten, System.Text.RegularExpressions.RegexOptions options = System.Text.RegularExpressions.RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public static string Replace(string input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator) { throw null; }
        public static string Replace(string input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static string Replace(string input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static bool TryReplace(ReadOnlySpan<char> input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator, Span<char> destination, out int charsWritten, System.Text.RegularExpressions.RegexOptions options = System.Text.RegularExpressions.RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public string Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator) { throw null; }
        public string Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator, int count) { throw null; }
+       public bool TryReplace(ReadOnlySpan<char> input, System.Text.RegularExpressions.MatchEvaluator evaluator, Span<char> destination, out int charsWritten, int count = -1) { throw null; }
        public string Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator, int count, int startat) { throw null; }
        public string[] Split(string input) { throw null; }
        public string[] Split(string input, int count) { throw null; }
        public string[] Split(string input, int count, int startat) { throw null; }
+       public SplitEnumerator Split(ReadOnlySpan<char> input, int count = 0) { throw null; }
        public static string[] Split(string input, string pattern) { throw null; }
        public static string[] Split(string input, string pattern, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static string[] Split(string input, string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static SplitEnumerator Split(ReadOnlySpan<char> input, string pattern, RegexOptions options = RegexOptions.None, TimeSpan? matchTimeout = null) { throw null; }
        void System.Runtime.Serialization.ISerializable.GetObjectData(System.Runtime.Serialization.SerializationInfo si, System.Runtime.Serialization.StreamingContext context) { }
        public override string ToString() { throw null; }
        public static string Unescape(string str) { throw null; }
+       public static bool TryUnescape(ReadOnlySpan<char> str, Span<char> destination, out int charsWritten) { throw null; }
        protected bool UseOptionC() { throw null; }
        protected bool UseOptionR() { throw null; }
        protected internal static void ValidateMatchTimeout(System.TimeSpan matchTimeout) { }
    }
}

@ViktorHofer ViktorHofer self-assigned this Mar 6, 2018
@ViktorHofer
Copy link
Owner Author

cc @stephentoub @jkotas @danmosemsft @joshfree @vancem

_pos += value.Length;
}

public unsafe void AppendReversed(ReadOnlySpan<char> value)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas are you fine with adding these helper to ValueStringBuilder or should I fork them off?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add this in your local copy (ValueStringBuilder.AppendReverse.cs). I do not think it should be in the common one.

@@ -0,0 +1,114 @@
// Licensed to the .NET Foundation under one or more agreements.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub as discussed offline, I reused your NativeOwnedMemory here.

@jkotas
Copy link

jkotas commented Mar 6, 2018

CompileToAssembly functionality, which we currently don't support in Core.

Are you sure that it is not supported? I believe that people are using it - with the desktop tool.


namespace System.Buffers
{
internal sealed class CharOwnedMemory : OwnedMemory<char>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why this is needed. Why can't you just pass the Span or pinned pointer around (wrapped in a struct so that it does not look ugly)?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used for Regex.Replace with a MatchEvaluator overload. As the evaluator can reference any public member of the Match object it could reference the Value or the ValueMemory. As I store input string in the Match object as a ReadOnlyMemory (https://github.com/ViktorHofer/corefx/pull/1/files#diff-49b12395ba18f98f80686a1e251ab22cR46) the Span needs to be converted. I don't see any other way here without allocating the text in a string.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ValueMemory a memory? Why it is not a Span instead?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because 1) the Match object is returned to caller and 2) the type itself isn't a byref type, I can't store the Span in there, even if I want to.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you would have to store a pinned pointer in there. It is what you are doing already - I do not see why you need an extra level of indirection to store a pinned pointer.

Copy link

@jkotas jkotas Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that instead of CharOwnedMemory - that is relative expensive because of it is an object with finalizer - you can have a struct like this instead:

struct MemoryOrPinnedSpan
{
    Memory<char> _memory;
    char* _pinnedSpan;
    int _length;

    Span<char> AsSpan()
    {
        return (_pinnedSpan != null) ? new Span(_pinnedSpan, _length) : _memory.AsSpan();
    }
}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and with that you also propose to change the exposed ValueMemory to ValueSpan? I just want to be clear that we are talking here about all API calls not just about the special Replace case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Also, my preference would be to have the API secure-by-design so that it is impossible to get dangling pointers. It would mean to introduce a new ref struct MatchValue

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with Jan offline, will incorporate his feedback and will put additional API review feedback in a remarks section at the top.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API calls which are unrelated to the Span pinning (everything except Replace with MatchEvaluator) "suffer" under the additional wrapper which introduces two additional sizeof(int) members. Not really painful but worth to point out...

public int _capsize;
public ExclusiveReference _runnerref;
public SharedReference _replref;
public WeakReference<RegexReplacement> _replref;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to make unrelated changes in separate PRs so that big changes like this are easier to review.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I should put that change back into a separate PR where it was to begin with. I paid attention to not add unrelated work but failed with this one.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created dotnet#27791 for the SharedReference work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you push up a change to eliminate this from the diff now maybe

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ViktorHofer
Copy link
Owner Author

ViktorHofer commented Mar 6, 2018

Are you sure that it is not supported? I believe that people are using it - with the desktop tool.

I meant to produce a compiled regex assembly with Core, not to reference it.

@ViktorHofer ViktorHofer force-pushed the RegexSpan branch 2 times, most recently from b80ed5e to 02263ac Compare March 7, 2018 00:53
@ViktorHofer
Copy link
Owner Author

I addressed the initial feedback. Eagerly waiting for more opinions. Thanks!

@danmoseley
Copy link
Collaborator

@JeremyKuhne you might be interested in a look also.


return repl.Replacement(this);
// Writes the ValueStringBuilder's content either into the output Span or returns
// a string dependening on the targetSpan switch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit, typo

}
}

public static string CopyOutput(this ValueStringBuilder vsb, Span<char> destination, bool reverse, bool targetSpan, out int charsWritten)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems clumsy to have a method that returns either a string (and doesn't use 2 parameters) or null (and uses those parameters)

Why not have two overloads? One that has the last two parameters and returns void, the other that does not and returns string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for CopyInput

Copy link
Owner Author

@ViktorHofer ViktorHofer Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We follow this pattern also in other places in the BCL, e.g. in BigInteger: https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Numerics/src/System/Numerics/BigNumber.cs#L506 done by @stephentoub. IMO this is the easiest way for code sharing without unnecessary duplicate members for both Span and string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub opinion?


public MemoryOrPinnedSpan(char* pinnedSpan, int length)
{
_pinnedSpan = pinnedSpan;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit, it is easier to read if initialization order is consistent and matches field order.

Copy link
Owner Author

@ViktorHofer ViktorHofer Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually sort instance fields by their visibility level. I switched the ordering now, I hope that's fine.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mix of public and private fields looks pretty weird to me.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's already changed with the latest commit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was for what you have in the latest commit. I would write it like this:

private readonly ReadOnlyMemory<T> _memory; 

private readonly char* _pinnedSpan;
private readonly int _length; 

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool, but I need to access the length from outside.

Copy link
Owner Author

@ViktorHofer ViktorHofer Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, hit enter to soon. The MatchCollection needs to access the length, that's why I made it public here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So make a property for it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually I'm fine with just exposing the field, if it's readonly, especially if the surrounding class is internal. But as we shouldn't spend too much time in discussing such a minor implementation detail I'll change it to a prop.

@danmoseley
Copy link
Collaborator

This change is very likely too risky for 2.1. You might consider whether there is any part that would be acceptably low risk that could go in 2.1 for perf.

public bool IsMatch(string input) { throw null; }
public bool IsMatch(ReadOnlySpan<char> input) { throw null; }
public bool IsMatch(string input, int startat) { throw null; }
public bool IsMatch(ReadOnlySpan<char> input, int startat) { throw null; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overloads that take additional index are not necessary with Span. The caller can slice the Span instead.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas the index here called startat does not mark the start of the string but the location where to start searching. This has different semantics as you can start searching at position 6 but the original input string will still be the full string. I think this isn't relevant for Regex.IsMatch but for at least Regex.Split and Regex.Replace and probably also for Regex.Match.

Copy link

@jkotas jkotas Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For things like Regex.Replace, the startat argument means copy everything up to startat to the destination Span, and then run regular Replace that does not take startat method. So this looks like a convenience method to me - it saves you from typing a tiny bit of code in rare cases to achieve the same effect.

A good question for API review discussion whether this convenience method is worth having.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you describe also applies to at least Regex.Split. I need to look into the other calls if there any differences. Thanks for the input, I will add it to the points to discuss during API review.

@ViktorHofer
Copy link
Owner Author

This change is very likely too risky for 2.1. You might consider whether there is any part that would be acceptably low risk that could go in 2.1 for perf.

I agree that the change's risk level is at least medium. The tests cover all the added overloads but I don't have any data yet of how much this is regressing Regex.Match with RegexOptions.Compiled flag enabled (because of the added string --> AsMemory() -> the PinnedSpanOrMemory wrapper -> 2x).

@danmoseley
Copy link
Collaborator

Risk is partly because I also don't believe we have good enough functional or perf tests yet.

@ViktorHofer
Copy link
Owner Author

I would like to submit this for api-review for Tuesday (or Thursday, or whenever is time). Therefore it would be great if I could gather more feedback until then.

My task till next week is to post benchmarks for every API call with interpreted and compiled mode and post perfview stats.

@jkotas
Copy link

jkotas commented Mar 8, 2018

I think the biggest risk is in the API design. We do not have experience and established patterns on how to do these APIs well. I do not think that the proposed APIs are right, except for the simple ones that just take Span and do not have expose any object model or callbacks. I think a good way to reduce the risk would be the just expose these and leave the rest for later.

When we have done similar Span-based APIs for paths or string operations a while ago, it took us multiple iterations (after the API was approved) to figure out what the APIs should actually do and what they shape should actually be.

@ViktorHofer
Copy link
Owner Author

Sounds good to me! I'm fine with just exposing the Span overloads for now. I'm not that happy either with the current Memory overloads as their sole purpose is to store the supplied input text in the returned object. If we accept Span here instead of Memory we would need to copy the input manually into a Memory, as pinning doesn't work here. Not ideal either... I agree that with the tools and shapes we currently have, it's not trivial to design a perfect API here.

@ViktorHofer
Copy link
Owner Author

I.e. the Regex.Split path which currently returns string[] could also be optimized for performance and reduced allocation with Span but because there's now way to return a collection of Spans, we are bounded here. Do we have any plans to solve that limitation @jkotas? I'm not involved in these design discussions, that's why I'm asking.

@jkotas
Copy link

jkotas commented Mar 8, 2018

return a collection of Spans

The current pattern for this is to return it via ref-like enumerator.

@ViktorHofer
Copy link
Owner Author

ViktorHofer commented Mar 8, 2018

Could you please point to some code/PR/issue where I can read more about that? Couldn't find anything related in search.

@jkotas
Copy link

jkotas commented Mar 8, 2018

  • Span/ReadOnlySpan.GetEnumerator() is ref-like regular enumerator
  • FileSystemEnumerator and ref struct FileSystemEntry is callback based enumerator

@ViktorHofer
Copy link
Owner Author

Thanks Jan. I added a RegexSplitEnumerator in the latest commit.

public static string Replace(string input, string pattern, string replacement) { throw null; }
public static string Replace(string input, string pattern, string replacement, System.Text.RegularExpressions.RegexOptions options) { throw null; }
public static string Replace(string input, string pattern, string replacement, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
public static bool TryReplace(ReadOnlySpan<char> input, Span<char> destination, out int charsWritten, string pattern, string replacement, System.Text.RegularExpressions.RegexOptions options = System.Text.RegularExpressions.RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
Copy link

@stephentoub stephentoub Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others may disagree, but I think the destination and charsWritten should be moved to the end of the fixed arguments. Same for other such APIs.

MarcoRossignoli and others added 22 commits June 16, 2018 20:35
Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
These should not need baselining after dotnet#30438.
The bug has been fixed in ProjectN 3 years ago.

Fixes #16747
Fixes #21198
Remove SharedReference helper and use WeakRefrence<T> directly instead.
Add comments for code paths that were unclear.
Break inheritance between RegexInterpreter and RegexRunner.
Use MemoryOrPinnedSpan instead of custom OwnedMemory type.
Expose Match.ValueSpan and remove Match.ValueMemory
Add a ref Split enumerator to reduce allocations, expose it in ref and
add tests.
Make PinnedSpanOrMemory struct Empty a singleton.
Minor code styling.
Destination span and out int in method signature reordered
Move SplitEnumerator inside Regex
Minor code cleanup & fix refs
Remove MemoryOrPinnedSpan empty cache
ViktorHofer pushed a commit that referenced this pull request Sep 7, 2018
* Fix ServiceController name population perf

* Split tests

* Remove dead field

* Remove new use of DangerousGetHandle

* SafeHandle all the things!

* VSB #1

* VSB #2

* Fix GLE

* Initialize machineName in ctor

* Test for empty name ex

* Null names

* Inadvertent edit

* Unix build

* Move interop into class

* Reverse SafeHandle for HAllocGlobal

* Fix tests

* Disable test for NETFX

* CR feedback

* Pattern matching on VSB

* Direct call

* typo
ViktorHofer pushed a commit that referenced this pull request Aug 9, 2019
* Json prototype (#1)

Basic API for Json writable DOM with scenarios including collection initialization, accessing and modifying Json nodes.

* Json prototype - transformation API  (#2)

* transformation API added
* assertions to existing scenarios added

* Json prototype (#1)

Basic API for Json writable DOM with scenarios including collection initialization, accessing and modifying Json nodes.

* Json prototype - transformation API  (#2)

* transformation API added
* assertions to existing scenarios added

* JsonNumber implementation and tests (#3)

* IEquatables added
* JsonNumber implementation and unit tests added
* pragmas deleted
* review comments included, more tests added
* decimal support added to JsonNumber
* decimal support added to JsonArray and JsonObject

* all unimplemented classes and methods with accompanying tests removed

* First part of documentation added

* documentation completed

* missing exceptions added

* JsonElement changes removed

* part of the review comments included

* work on review comments

* code refactor

* more decimal tests added using MemberData

* more decimal tests added using MemberData

* more test cases added

* equals summary adjusted, equals tests added

* more Equals tests added, GetHashCode tests added, minor changes

* scientifing notation support added, rational numbers tests fixes

* rational overflow tests added

* ulong maxvalue tests added to rational types

* presision problems fixes

* exception strings fixed

* CI failing fixes (hopefully), review comments included

* missing == tests added to achieve 100% branch coverage

* review comments included

* trailing whitespaces fixes

* equals comments added

* equals object refactored to call quals json number

* merge fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.