-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Today, StringBuilder has problem in that the only way to read the characters out of the StringBuilder is to do it a character at a time [] or call the ToString APIs to convert it to a string (and then index it). Both of these are slow.
Typically what people actually do is call ToString, which forces the creation of a large string. Often the very next thing that happens is that the String is written to a file, leaving behind a large, dead string on the GC heap.
With the advent of Span there is now an efficient, safe mechanism for extracting the data in the StringBuilder.
The proposed API is
/// <summary>
/// ForEachChunk is an efficient way of accessing all the characters in the StringBuilder.
/// It is an alternative to calling ToString() on the StringBuilder.
///
/// The 'callback' delegate is called 0 or more times, each time being passed a chunk of
/// the string as a ReadOnlySpan of characters (in order). This is repeated until all
/// the characters in the span have been passed back through 'callback'.
/// </summary>
/// <param name="callback">A method that is called repeatedly, being passed a chunk
/// of the Strinbuilder with each call. If 'false' is returned by the callback
/// then ForEachChunk terminates immediately. </param>
public void ForEachChunk(Func<ReadOnlySpan<char>, bool> callback)The actual implmentation is simple:
public void ForEachChunk(Func<ReadOnlySpan<char>, bool> callback)
{
VerifyClassInvariant();
ForEachChunk(callback, this);
}
private bool ForEachChunk(Func<ReadOnlySpan<char>, bool> callback, StringBuilder chunk)
{
// The chunks are last written first, so we need to output the others first.
var next = chunk.m_ChunkPrevious;
if (next != null && !ForEachChunk(callback, next))
return false;
// The fields here might be changing and inaccurate if threads are racing, but the validation that Span does on
// construction insures that the values, even if inaccurate, are 'in bounds'.
return callback(new Span<char>(chunk.m_ChunkChars, chunk.m_ChunkOffset, chunk.m_ChunkLength));
}The intent is that this new API can be a building block to provide extension methods on Stream (or other APIs that act like streams) that operate on StringBuilders directly (so an intermediate strings need not be formed)
Question: A variation of this is to have an explicit context parameter that is passed to the callback.
public void ForEachChunk<T>(Func<ReadOnlySpan<char>, T, bool> callback, T context = null) where T : classWhile this is not as convenient to use, it allows the user to avoid creating closures (which force the allocation of a Func), which is faster (and the goal of this API is is all about being faster)