Skip to content

Provide StringBuilder.ForeachChunk for efficient access to characters in a StringBuilder #22371

@vancem

Description

@vancem

Today, StringBuilder has problem in that the only way to read the characters out of the StringBuilder is to do it a character at a time [] or call the ToString APIs to convert it to a string (and then index it). Both of these are slow.

Typically what people actually do is call ToString, which forces the creation of a large string. Often the very next thing that happens is that the String is written to a file, leaving behind a large, dead string on the GC heap.

With the advent of Span there is now an efficient, safe mechanism for extracting the data in the StringBuilder.

The proposed API is

        /// <summary>
        /// ForEachChunk is an efficient way of accessing all the characters in the StringBuilder.
        /// It is an alternative to calling ToString() on the StringBuilder.   
        /// 
        /// The 'callback' delegate is called 0 or more times, each time being passed a chunk of
        /// the string as a ReadOnlySpan of characters (in order).  This is repeated until all
        /// the characters in the span have been passed back through 'callback'.  
        /// </summary>
        /// <param name="callback">A method that is called repeatedly, being passed a chunk 
        /// of the Strinbuilder with each call.  If 'false' is returned by the callback
        /// then ForEachChunk terminates immediately.  </param>
        public void ForEachChunk(Func<ReadOnlySpan<char>, bool> callback)

The actual implmentation is simple:

        public void ForEachChunk(Func<ReadOnlySpan<char>, bool> callback)
        {
            VerifyClassInvariant();
            ForEachChunk(callback, this);
        }

        private bool ForEachChunk(Func<ReadOnlySpan<char>, bool> callback, StringBuilder chunk)
        {
            // The chunks are last written first, so we need to output the others first.  
            var next = chunk.m_ChunkPrevious;
            if (next != null && !ForEachChunk(callback, next))
                return false;

            // The fields here might be changing and inaccurate if threads are racing, but the validation that Span does on 
            // construction insures that the values, even if inaccurate, are 'in bounds'.   
            return callback(new Span<char>(chunk.m_ChunkChars, chunk.m_ChunkOffset, chunk.m_ChunkLength));
        }

The intent is that this new API can be a building block to provide extension methods on Stream (or other APIs that act like streams) that operate on StringBuilders directly (so an intermediate strings need not be formed)

Question: A variation of this is to have an explicit context parameter that is passed to the callback.

       public void ForEachChunk<T>(Func<ReadOnlySpan<char>, T, bool> callback, T context = null) where T : class

While this is not as convenient to use, it allows the user to avoid creating closures (which force the allocation of a Func), which is faster (and the goal of this API is is all about being faster)

@stephentoub @jkotas @alexperovich @AlexGhiondea @davidfowl

Metadata

Metadata

Assignees

Labels

api-needs-workAPI needs work before it is approved, it is NOT ready for implementationarea-System.Memory

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions