[System.Globalization.Native] Fix small issues in CloneCollatorWithOptions and GetCollatorFromSortHandle #24100

filipnavara · 2019-04-18T20:30:06Z

The collator object returned by CloneCollatorWithOptions is cached so this is not a performance critical path. However, some mistakes were made in PR #22378 that made it perform a little less efficiently than it used to.

The performance-critical path in GetCollatorFromSortHandle was using binary search trees for something that could have been handled much easier with a small 32-entry lookup table. This is alternative fix to #24099 for an observed performance regression.

/cc @janvorli @stephentoub @jkotas

src/corefx/System.Globalization.Native/pal_collation.c

jkotas · 2019-04-19T15:11:39Z

Could you please resolve the conflicts?

Allocate initial array using malloc to avoid needlessly zeroing it Reinstate optimization for empty array returned from GetCustomRules lost in dotnet#22378

…size we can consume (5648 bytes)

…y stored as 32-entry lookup table

filipnavara · 2019-04-19T16:20:31Z

Could you please resolve the conflicts?

Done.

src/corefx/System.Globalization.Native/pal_collation.c

tarekgh · 2019-04-19T17:05:05Z

src/corefx/System.Globalization.Native/pal_collation.c

@@ -184,19 +157,18 @@ static UCharList* GetCustomRules(int32_t options, UColAttributeValue strength, i
    }

    // If we need to create customRules, the KanaType custom rule will be 88 kana characters * 4 = 352 chars long
-    // and the Width custom rule will be at least 215 halfwidth characters * 4 = 860 chars long.
+    // and the Width custom rule will be at most 212 halfwidth characters * 5 = 1060 chars long.


212 [](start = 49, length = 3)

How did you get 212 here?

looking at https://unicode.org/charts/PDF/UFF00.pdf, If considering all half width characters then this number would be really 201.

Also, the Kana characters should be 96?

I am ok to use a bigger numbers as I am not expecting the allocation will be much bigger anyway.

linking the original PR introduced creating this custom rule #2196

In reply to: 277037987 [](ancestors = 277037987)

How did you get 212 here?

The constant g_HalfFullCharsLength declared in the file. The description could be wrong though.

Also, the Kana characters should be 96?

Taken from the actual loop and comments in the same function.

tarekgh · 2019-04-19T17:12:03Z

src/corefx/System.Globalization.Native/pal_collation.c

@@ -206,15 +178,11 @@ static UCharList* GetCustomRules(int32_t options, UColAttributeValue strength, i
            // Hiragana is the range 3041 to 3096 & 309D & 309E
            if (hiraganaChar <= 0x3096 || hiraganaChar >= 0x309D) // characters between 3096 and 309D are not mapped to katakana


0x309D [](start = 58, length = 6)

This should be 309F now?

https://unicode.org/charts/PDF/U3040.pdf

In reply to: 277039452 [](ancestors = 277039452)

Do you prefer me to update it here or in a separate PR?

Maybe @eerhardt can comment why we didn't include 309F in the first place as this introduced awhile ago in Unicode 3.2. but you shouldn't block on this.

tarekgh · 2019-04-19T17:14:08Z

CC @eerhardt as he is the one originally introduced this code.

tarekgh

Would be nice if double check the constant values we used here. as I mentioned, using bigger values should be ok for allocation.

filipnavara · 2019-04-19T17:24:17Z

Would be nice if double check the constant values we used here.

I would be happy if someone takes a second look. I used the actual constants from the two loops in the CloneCollatorWithOptions function.

jkotas · 2019-04-19T18:35:46Z

src/corefx/System.Globalization.Native/pal_collation.c

                }
+                *(items++) = lowerChar;


This should be hardened for potential buffer overruns. It is dangerous to allocate a buffer using complex buffer size formula and then start filling it using complex algorithm assuming that the formula and algorithm are in sync.

I would keep the Add method method and assert+fail if the buffer is not big enough.

There is already an assert on both places where this is used (line 206 and 179).

I would make the method fail in release build too.

I don't think that's necessary. There's a very limited number of input values (31) and they should all be exercised through the CoreFX tests. The CI should be able to catch any error in the debug build and bail out on the assert.

I will change the calculation to get rid of the magic 88 value in favour of the constraints from the first loop.

I'll also go and check that CoreFX actually has tests for this.

The tests in https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs hit both of the loops in the GetCustomRules function. If the code was broken the asserts would fire on the CI.

src/corefx/System.Globalization.Native/pal_collation.c

eerhardt

I think this looks good.

I only have a slight concern that if we add new CompareOptions value in the future (like adding real StringSort support or a new enum value), we may need to change this implementation. But we haven't added one in a long time, so going with this approach for now seems fine. (Looks like we haven't added one since we added OrdinalIgnoreCase in the .NET Framework 2.0.)

sebastienros · 2019-05-07T18:22:17Z

This change has probably impacted the HttpClient benchmarks ... in a good way. From 80K to 105K RPS on Linux for our proxying scenario where headers have to be duplicated.

/cc @benaadams

eerhardt · 2019-05-07T18:29:46Z

@sebastienros - I am surprised by that. I would assume that scenario should be using Ordinal or OrdinalIgnoreCase options for header comparisons and not culture sensitive comparisons. But my understanding of HTTP is definitely not near the "expert" category, so I may be missing something here.

Is there any way to get comparisons before and after just this one change? And if we are showing major improvement - can we get a perfcollect for each?

sebastienros · 2019-05-07T18:32:33Z

We changed it recently in this PR https://github.com/aspnet/AspNetCore/pull/9537/files

I can get perfcollect traces, yes, and I did the comparison by only switching the builds of System.Globalization.Native from these versions:

3.0.0-preview6-27629-07 -> 3.0.0-preview6-27701-08
Microsoft.NetCore.App / Core FX
dotnet/corefx@7066a0c...787f87f

Microsoft.NetCore.App / Core CLR
8808f4e...7a24a53

sebastienros · 2019-05-07T18:41:28Z

Traces:
\scratch2\scratch\sebros\System.Globalization

eerhardt · 2019-05-07T19:16:12Z

Thanks! Downloading and taking a look now.

…tions and GetCollatorFromSortHandle (dotnet/coreclr#24100) * Fix allocation size calculation when resizing array Allocate initial array using malloc to avoid needlessly zeroing it Reinstate optimization for empty array returned from GetCustomRules lost in dotnet/coreclr#22378 * Avoid resizing arrays in GetCustomRules by preallocating the maximum size we can consume (5648 bytes) * Avoid creating a binary search tree for something that could be easily stored as 32-entry lookup table * Remove obsolete comment Commit migrated from dotnet/coreclr@a59ead8

filipnavara commented Apr 18, 2019

View reviewed changes

src/corefx/System.Globalization.Native/pal_collation.c Outdated Show resolved Hide resolved

filipnavara commented Apr 18, 2019

View reviewed changes

src/corefx/System.Globalization.Native/pal_collation.c Outdated Show resolved Hide resolved

filipnavara changed the title ~~[System.Globalization.Native] Fix small issues in CloneCollatorWithOptions~~ [System.Globalization.Native] Fix small issues in CloneCollatorWithOptions and GetCollatorFromSortHandle Apr 19, 2019

filipnavara added 3 commits April 19, 2019 18:19

Fix allocation size calculation when resizing array

9954386

Allocate initial array using malloc to avoid needlessly zeroing it Reinstate optimization for empty array returned from GetCustomRules lost in dotnet#22378

Avoid resizing arrays in GetCustomRules by preallocating the maximum …

e2651ba

…size we can consume (5648 bytes)

Avoid creating a binary search tree for something that could be easil…

6ccb8a5

…y stored as 32-entry lookup table

filipnavara force-pushed the sysglobnative2 branch from 1e894cc to 6ccb8a5 Compare April 19, 2019 16:20

jkotas reviewed Apr 19, 2019

View reviewed changes

src/corefx/System.Globalization.Native/pal_collation.c Outdated Show resolved Hide resolved

jkotas reviewed Apr 19, 2019

View reviewed changes

src/corefx/System.Globalization.Native/pal_collation.c Show resolved Hide resolved

Remove obsolete comment

0203fc3

tarekgh reviewed Apr 19, 2019

View reviewed changes

tarekgh approved these changes Apr 19, 2019

View reviewed changes

jkotas reviewed Apr 19, 2019

View reviewed changes

src/corefx/System.Globalization.Native/pal_collation.c Outdated Show resolved Hide resolved

Make the code a bit more obvious

9601144

eerhardt approved these changes Apr 24, 2019

View reviewed changes

jkotas merged commit a59ead8 into dotnet:master Apr 25, 2019

		@@ -206,15 +178,11 @@ static UCharList* GetCustomRules(int32_t options, UColAttributeValue strength, i
		// Hiragana is the range 3041 to 3096 & 309D & 309E
		if (hiraganaChar <= 0x3096 \|\| hiraganaChar >= 0x309D) // characters between 3096 and 309D are not mapped to katakana

[System.Globalization.Native] Fix small issues in CloneCollatorWithOptions and GetCollatorFromSortHandle #24100

[System.Globalization.Native] Fix small issues in CloneCollatorWithOptions and GetCollatorFromSortHandle #24100

Uh oh!

Conversation

filipnavara commented Apr 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkotas commented Apr 19, 2019

Uh oh!

filipnavara commented Apr 19, 2019

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tarekgh commented Apr 19, 2019

Uh oh!

tarekgh left a comment

Choose a reason for hiding this comment

Uh oh!

filipnavara commented Apr 19, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

filipnavara Apr 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eerhardt left a comment

Choose a reason for hiding this comment

Uh oh!

sebastienros commented May 7, 2019

Uh oh!

eerhardt commented May 7, 2019

Uh oh!

sebastienros commented May 7, 2019

Uh oh!

sebastienros commented May 7, 2019

Uh oh!

eerhardt commented May 7, 2019

Uh oh!

Uh oh!

filipnavara commented Apr 18, 2019 •

edited

Loading

filipnavara Apr 21, 2019 •

edited

Loading