Skip to content

cmd/compile: ppc64 function pointer call performance regression #42709

@pmur

Description

@pmur

There seems to be a large regression in performance when function pointer calls where changed to use the link register instead of the count register. Based on when this change went in, this likely affects all versions back to go1.14. The impact can be mitigated by setting the appropriate branch hint for this usage e.g

The regex benchmarks are a pretty good example of how much this
hint can help generic ppc64le code on a power9 machine:

name                          old time/op    new time/op     delta
Find                             606ns ± 0%      447ns ± 0%  -26.27%
FindAllNoMatches                 309ns ± 0%      205ns ± 0%  -33.72%
FindString                       609ns ± 0%      451ns ± 0%  -26.04%
FindSubmatch                     734ns ± 0%      594ns ± 0%  -19.07%
FindStringSubmatch               706ns ± 0%      574ns ± 0%  -18.83%
Literal                          177ns ± 0%      136ns ± 0%  -22.89%
NotLiteral                      4.69µs ± 0%     2.34µs ± 0%  -50.14%
MatchClass                      6.05µs ± 0%     3.26µs ± 0%  -46.08%
MatchClass_InRange              5.93µs ± 0%     3.15µs ± 0%  -46.86%
ReplaceAll                      3.15µs ± 0%     2.18µs ± 0%  -30.77%
AnchoredLiteralShortNonMatch     156ns ± 0%      109ns ± 0%  -30.61%
AnchoredLiteralLongNonMatch      192ns ± 0%      136ns ± 0%  -29.34%
AnchoredShortMatch               268ns ± 0%      209ns ± 0%  -22.00%
AnchoredLongMatch                472ns ± 0%      357ns ± 0%  -24.30%
OnePassShortA                   1.16µs ± 0%     0.87µs ± 0%  -25.03%
NotOnePassShortA                1.34µs ± 0%     1.20µs ± 0%  -10.63%
OnePassShortB                    940ns ± 0%      655ns ± 0%  -30.29%
NotOnePassShortB                 873ns ± 0%      703ns ± 0%  -19.52%
OnePassLongPrefix                258ns ± 0%      155ns ± 0%  -40.13%
OnePassLongNotPrefix             943ns ± 0%      529ns ± 0%  -43.89%
MatchParallelShared              591ns ± 0%      436ns ± 0%  -26.31%
MatchParallelCopied              596ns ± 0%      435ns ± 0%  -27.10%
QuoteMetaAll                     186ns ± 0%      186ns ± 0%   -0.16%
QuoteMetaNone                   55.9ns ± 0%     55.9ns ± 0%   +0.02%
Compile/Onepass                 9.64µs ± 0%     9.26µs ± 0%   -3.97%
Compile/Medium                  21.7µs ± 0%     20.6µs ± 0%   -4.90%
Compile/Hard                     174µs ± 0%      174µs ± 0%   +0.07%
Match/Easy0/16                  7.35ns ± 0%     7.34ns ± 0%   -0.11%
Match/Easy0/32                   116ns ± 0%       97ns ± 0%  -16.27%
Match/Easy0/1K                   592ns ± 0%      562ns ± 0%   -5.04%
Match/Easy0/32K                 12.6µs ± 0%     12.5µs ± 0%   -0.64%
Match/Easy0/1M                   556µs ± 0%      556µs ± 0%   -0.00%
Match/Easy0/32M                 17.7ms ± 0%     17.7ms ± 0%   +0.05%
Match/Easy0i/16                 7.34ns ± 0%     7.35ns ± 0%   +0.10%
Match/Easy0i/32                 2.82µs ± 0%     1.64µs ± 0%  -41.71%
Match/Easy0i/1K                 83.2µs ± 0%     48.2µs ± 0%  -42.06%
Match/Easy0i/32K                2.13ms ± 0%     1.80ms ± 0%  -15.34%
Match/Easy0i/1M                 68.1ms ± 0%     57.6ms ± 0%  -15.31%
Match/Easy0i/32M                 2.18s ± 0%      1.80s ± 0%  -17.52%
Match/Easy1/16                  7.36ns ± 0%     7.34ns ± 0%   -0.24%
Match/Easy1/32                   118ns ± 0%       96ns ± 0%  -18.72%
Match/Easy1/1K                  2.46µs ± 0%     1.58µs ± 0%  -35.65%
Match/Easy1/32K                 80.2µs ± 0%     54.6µs ± 0%  -31.92%
Match/Easy1/1M                  2.75ms ± 0%     1.88ms ± 0%  -31.66%
Match/Easy1/32M                 87.5ms ± 0%     59.8ms ± 0%  -31.62%
Match/Medium/16                 7.34ns ± 0%     7.34ns ± 0%   +0.01%
Match/Medium/32                 2.60µs ± 0%     1.50µs ± 0%  -42.61%
Match/Medium/1K                 78.1µs ± 0%     43.7µs ± 0%  -44.06%
Match/Medium/32K                2.08ms ± 0%     1.52ms ± 0%  -27.11%
Match/Medium/1M                 66.5ms ± 0%     48.6ms ± 0%  -26.96%
Match/Medium/32M                 2.14s ± 0%      1.60s ± 0%  -25.18%
Match/Hard/16                   7.35ns ± 0%     7.35ns ± 0%   +0.03%
Match/Hard/32                   3.58µs ± 0%     2.44µs ± 0%  -31.82%
Match/Hard/1K                    108µs ± 0%       75µs ± 0%  -31.04%
Match/Hard/32K                  2.79ms ± 0%     2.25ms ± 0%  -19.30%
Match/Hard/1M                   89.4ms ± 0%     72.2ms ± 0%  -19.26%
Match/Hard/32M                   2.91s ± 0%      2.37s ± 0%  -18.60%
Match/Hard1/16                  11.1µs ± 0%      8.3µs ± 0%  -25.07%
Match/Hard1/32                  21.4µs ± 0%     16.1µs ± 0%  -24.85%
Match/Hard1/1K                   658µs ± 0%      498µs ± 0%  -24.27%
Match/Hard1/32K                 12.2ms ± 0%     11.7ms ± 0%   -4.60%
Match/Hard1/1M                   391ms ± 0%      374ms ± 0%   -4.40%
Match/Hard1/32M                  12.6s ± 0%      12.0s ± 0%   -4.68%
Match_onepass_regex/16           870ns ± 0%      611ns ± 0%  -29.79%
Match_onepass_regex/32          1.58µs ± 0%     1.08µs ± 0%  -31.48%
Match_onepass_regex/1K          45.7µs ± 0%     30.3µs ± 0%  -33.58%
Match_onepass_regex/32K         1.45ms ± 0%     0.97ms ± 0%  -33.20%
Match_onepass_regex/1M          46.2ms ± 0%     30.9ms ± 0%  -33.01%
Match_onepass_regex/32M          1.46s ± 0%      0.99s ± 0%  -32.02%

name                          old alloc/op   new alloc/op    delta
Find                             0.00B           0.00B         0.00%
FindAllNoMatches                 0.00B           0.00B         0.00%
FindString                       0.00B           0.00B         0.00%
FindSubmatch                     48.0B ± 0%      48.0B ± 0%    0.00%
FindStringSubmatch               32.0B ± 0%      32.0B ± 0%    0.00%
Compile/Onepass                 4.02kB ± 0%     4.02kB ± 0%    0.00%
Compile/Medium                  9.39kB ± 0%     9.39kB ± 0%    0.00%
Compile/Hard                    84.7kB ± 0%     84.7kB ± 0%    0.00%
Match_onepass_regex/16           0.00B           0.00B         0.00%
Match_onepass_regex/32           0.00B           0.00B         0.00%
Match_onepass_regex/1K           0.00B           0.00B         0.00%
Match_onepass_regex/32K          0.00B           0.00B         0.00%
Match_onepass_regex/1M           5.00B ± 0%      3.00B ± 0%  -40.00%
Match_onepass_regex/32M           136B ± 0%        68B ± 0%  -50.00%

name                          old allocs/op  new allocs/op   delta
Find                              0.00            0.00         0.00%
FindAllNoMatches                  0.00            0.00         0.00%
FindString                        0.00            0.00         0.00%
FindSubmatch                      1.00 ± 0%       1.00 ± 0%    0.00%
FindStringSubmatch                1.00 ± 0%       1.00 ± 0%    0.00%
Compile/Onepass                   52.0 ± 0%       52.0 ± 0%    0.00%
Compile/Medium                     112 ± 0%        112 ± 0%    0.00%
Compile/Hard                       424 ± 0%        424 ± 0%    0.00%
Match_onepass_regex/16            0.00            0.00         0.00%
Match_onepass_regex/32            0.00            0.00         0.00%
Match_onepass_regex/1K            0.00            0.00         0.00%
Match_onepass_regex/32K           0.00            0.00         0.00%
Match_onepass_regex/1M            0.00            0.00         0.00%
Match_onepass_regex/32M           2.00 ± 0%       1.00 ± 0%  -50.00%

name                          old speed      new speed       delta
QuoteMetaAll                  75.2MB/s ± 0%   75.3MB/s ± 0%   +0.15%
QuoteMetaNone                  465MB/s ± 0%    465MB/s ± 0%   -0.02%
Match/Easy0/16                2.18GB/s ± 0%   2.18GB/s ± 0%   +0.10%
Match/Easy0/32                 276MB/s ± 0%    330MB/s ± 0%  +19.46%
Match/Easy0/1K                1.73GB/s ± 0%   1.82GB/s ± 0%   +5.29%
Match/Easy0/32K               2.60GB/s ± 0%   2.62GB/s ± 0%   +0.64%
Match/Easy0/1M                1.89GB/s ± 0%   1.89GB/s ± 0%   +0.00%
Match/Easy0/32M               1.89GB/s ± 0%   1.89GB/s ± 0%   -0.05%
Match/Easy0i/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.10%
Match/Easy0i/32               11.4MB/s ± 0%   19.5MB/s ± 0%  +71.48%
Match/Easy0i/1K               12.3MB/s ± 0%   21.2MB/s ± 0%  +72.62%
Match/Easy0i/32K              15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/1M               15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/32M              15.4MB/s ± 0%   18.6MB/s ± 0%  +21.21%
Match/Easy1/16                2.17GB/s ± 0%   2.18GB/s ± 0%   +0.24%
Match/Easy1/32                 271MB/s ± 0%    333MB/s ± 0%  +23.07%
Match/Easy1/1K                 417MB/s ± 0%    648MB/s ± 0%  +55.38%
Match/Easy1/32K                409MB/s ± 0%    600MB/s ± 0%  +46.88%
Match/Easy1/1M                 381MB/s ± 0%    558MB/s ± 0%  +46.33%
Match/Easy1/32M                383MB/s ± 0%    561MB/s ± 0%  +46.25%
Match/Medium/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.01%
Match/Medium/32               12.3MB/s ± 0%   21.4MB/s ± 0%  +74.13%
Match/Medium/1K               13.1MB/s ± 0%   23.4MB/s ± 0%  +78.73%
Match/Medium/32K              15.7MB/s ± 0%   21.6MB/s ± 0%  +37.23%
Match/Medium/1M               15.8MB/s ± 0%   21.6MB/s ± 0%  +36.93%
Match/Medium/32M              15.7MB/s ± 0%   21.0MB/s ± 0%  +33.67%
Match/Hard/16                 2.18GB/s ± 0%   2.18GB/s ± 0%   -0.03%
Match/Hard/32                 8.93MB/s ± 0%  13.10MB/s ± 0%  +46.70%
Match/Hard/1K                 9.48MB/s ± 0%  13.74MB/s ± 0%  +44.94%
Match/Hard/32K                11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/1M                 11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/32M                11.6MB/s ± 0%   14.2MB/s ± 0%  +22.86%
Match/Hard1/16                1.44MB/s ± 0%   1.93MB/s ± 0%  +34.03%
Match/Hard1/32                1.49MB/s ± 0%   1.99MB/s ± 0%  +33.56%
Match/Hard1/1K                1.56MB/s ± 0%   2.05MB/s ± 0%  +31.41%
Match/Hard1/32K               2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/1M                2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/32M               2.66MB/s ± 0%   2.79MB/s ± 0%   +4.89%
Match_onepass_regex/16        18.4MB/s ± 0%   26.2MB/s ± 0%  +42.41%
Match_onepass_regex/32        20.2MB/s ± 0%   29.5MB/s ± 0%  +45.92%
Match_onepass_regex/1K        22.4MB/s ± 0%   33.8MB/s ± 0%  +50.54%
Match_onepass_regex/32K       22.6MB/s ± 0%   33.9MB/s ± 0%  +49.67%
Match_onepass_regex/1M        22.7MB/s ± 0%   33.9MB/s ± 0%  +49.27%
Match_onepass_regex/32M       23.0MB/s ± 0%   33.9MB/s ± 0%  +47.14%

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions