Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Improvements in string handling in FSharp.Core and/or FSC #9501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
abelbraaksma opened this issue Jun 18, 2020 · 4 comments
Closed

Improvements in string handling in FSharp.Core and/or FSC #9501

abelbraaksma opened this issue Jun 18, 2020 · 4 comments

Comments

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Jun 18, 2020

Is your feature request related to a problem?
Strings are omnipresent and recent discussions and profiling have shown that improvements can be made here that may be applicable to other parts of the ecosystem.

I've created this issue to discuss some options.

Describe the solution you'd like
Several options have come forward and are up for discussion:

  • Find cases where StringBuilder is used and investigate if improvements are viable with StringBuilderCache (currently private in BCL, uses thread-local storage)
  • Where heavy string-copying is used locally, investigate use of ArrayPool, possibly cherry-picking from System.Buffer. The advantage here is that zeroing of memory is skipped. This cannot be used for shared instances of SB or strings.
  • On hot paths, check if using ReadOnlySpan and/or String.Create can help. This is .NET Standard 2.1, so we can only do that in FCS and tooling, I think
  • Where high GC pressure and/or LOH pressure is recognized we should perhaps consider using fixed, similar to how BCL does it. For string scenarios this can bring pressure down by 2x.
  • Use techniques shown in Performance of certain build-in functions, esp the ones related to Array manipulation #9390 with fixed-size char-arrays, possibly further improved with array pools.
  • Improve array manipulations by dipping into the ArrayPool technique and/or by speeding up array creation and copying in scenarios similar to Array.map/collect.
  • Investigate if we can use cpblk or initblk in certain scenarios. These have recently been highly improved in the CLR
  • Perhaps look into ZString, some techniques there (pooling, zero-allocation strings) may be helpful: https://github.com/Cysharp/ZString

Additional context
My idea is to discuss/brainstorm perf ideas (please check the prev. discussions) and the scope and then, where applicable, use separate issues for defined targets. One I've been working on silently has been Array.xxx functions, which are already highly optimized, but some measurements suggest there's room for further improvement.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jun 18, 2020

Yes, it can be sped up, array has direct access, string uses a pointer dereference for each char access to skip the first 8 bytes (which contains the length), but the speed up depends on other factors as well.

How would we write the string copy to be faster?

@KevinRansom (from here: #9470 (comment)) Well, the basic idea would be:

  • Use String.Copy in a local capacity
  • Modify the resulting string using fixed and pointers and/or Span
  • By returning the string from the function, it'll be eligible for normal treatment by the CLR
  • The up-and-coming deduplication algorithm in RyuJIT for strings is not affected by this:
    • the algorithm does a full scan during a blocking GC that includes compressing LOH. While this happens, user code cannot execute and fixed refs won't be touched
    • it only applies to survived state (survived after GC-0/1/2)
    • the algorithm is applied to the contents of strings, and doesn't build a static "id" per string

Advancing on that idea can be to replace String.Copy, depending on the scenario:

  • If we don't need the original string and change it, initblk would be fine too, a short while ago this was heavily optimized
  • If we only need temporary or local state, we can use Array.Pool if we can use char arrays, or stackalloc, which doesn't do mem-zeroing.

These last options are if we really have a need for speed, but on string-heavy implementations, these can make a real difference if applied to the right spots.

These timings are of interest to this discussion: https://xoofx.com/blog/2010/10/23/high-performance-memcpy-gotchas-in-c/. That was 2010. It would be good to repeat the exercise, esp now that initblk and cpblk (both currently not used by C#, but that will change) have been optimized.

Of interest to the discussion of optimizing copying of memory will be my next PR on String.replicate, the same technique I will propose for Array.create/replicate and the like. Preview on code and timings: #9390 (comment).

@KevinRansom
Copy link
Contributor

KevinRansom commented Jun 18, 2020 via email

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jun 18, 2020

@KevinRansom: your whole mail includes a copy of the prev. message with outlook links etc...

I was hoping you knew of some magic way of not doing it that way

I do, in a way, but that's another technique we can only use outside of FSharp.Core. It uses a simple trick to change the ReadOnlySpan of a string into a r/w span. This way you don't need pointer arithmetic, just array access. The basic idea is this:

let spanCopy (mapper: char -> char) (str: string) =
    let len = String.length str

    let resultStr = String.Copy str
    let roSpan = str.AsSpan()
    let writeSpan = MemoryMarshal.CreateSpan(&MemoryMarshal.GetReference roSpan, len)
        
    let mutable i = 0
    while i < len do
        writeSpan.[i] <- mapper roSpan.[i]
        i <- i + 1

    resultStr

And there's of course String.Create and other helper functions with Span that are available, but cannot be used in FSharp.Core (but they can be used in FSC, I assume).

Though, I am not quite sure why the persistence against using of fixed in dedicated places (that is: local only, internal use only). We're already using IL emit anyway, and this is simpler. And here you're basically just removing overhead of exactly the same algorithm happening multiple times internally anyway. What I do in the fixed example is exactly what happens in the BCL all the time.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jun 19, 2020

My comment from the PR (#9470) and removed there (makes more sense here):

@KevinRansom From our sessions and timings, I think we can agree that if the target string-size is known, using ToCharArray() with array manipulation and calling String: char[] -> string at the end to create a standard string, is the best way forward.

For now, alternatives with fixed (unsafe manipulate string) and String.Create (safely manipulate result of FastAllocateString) are out of scope unless we can decide on a way to bring this in (dependency or dropping peverify verifiability), but if that step is, or can be taken, let's do that in another PR.

Furthermore, SB, buffered or not, doesn't help here because of the added overhead per iteration. Though I reckon there will be cases where the buffered SB will help, perhaps with String.init,I'll have to investigate.

but they can be used in FSC, I assume

We may find places in FSC that are amenable to better optimizations w.r.t. string or array allocations. Finding the right spots will be the difficult bit though.

@dotnet dotnet locked and limited conversation to collaborators Apr 20, 2022
@dsyme dsyme converted this issue into discussion #13021 Apr 20, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants