You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am curious what is the 'correct' perf counter for 4K aliasing. You have mentioned ld_blocks.store_forward, but I was wondering about the other counter ld_blocks_partial.address_alias as well.
Here is the perf list description:
ld_blocks.store_forward
[loads blocked by overlapping with store buffer that cannot be forwarded]
ld_blocks_partial.address_alias
[False dependencies in MOB due to partial compare on address]
Here are the perf results on my machine:
$ perf stat -e ld_blocks_partial.address_alias,ld_blocks.store_forward ./a.out 4096
222
Performance counter stats for './a.out 4096':
6,852 ld_blocks_partial.address_alias:u
32 ld_blocks.store_forward:u
0.224647447 seconds time elapsed
$ perf stat -e ld_blocks_partial.address_alias,ld_blocks.store_forward ./a.out 4092
359
Performance counter stats for './a.out 4092':
132,139,399 ld_blocks_partial.address_alias:u
2,097,093 ld_blocks.store_forward:u
0.361229917 seconds time elapsed
As you can see, both of them are hugely different for 4092 and 4096.
The text was updated successfully, but these errors were encountered:
Good point, I remember that I was also thinking about this, but it was some time ago 😅 From ld_blocks_partial.address and ld_blocks.store_forward, I suspect that maybe store_forward reports the actual cases where forwarding was blocked, and the second counter reports cases where it was due to "false" aliasing (i.e. when forwarding would be possible, but there was an alias). But obviously these two counters are sampled in different situations, because their values are vastly different.
I would have to remind myself of this in more detail, my knowledge is not so deep in this area :) This is just a (probably wrong) guess.
Yes, I think ld_blocks_partial.address_alias counts cases where there was an initial "hit" in the store buffer loose net (i.e., the CPU thinks a load is going to forward from a store), but then when the full address was compared in the fine net, it was found to be a spurious hit due to 4K aliasing. This question and answers have some details about store forwarding and in particular "fine net" and "loose net".
ld_blocks.store_forward measures some other type of block related to store forwarding, although I'm not actually sure what. Maybe when a load is predicted to forward, but the data is not available, or when a load can't be forwarded because it overlaps but is not fully contained within a store (example).
I am curious what is the 'correct' perf counter for 4K aliasing. You have mentioned
ld_blocks.store_forward
, but I was wondering about the other counterld_blocks_partial.address_alias
as well.Here is the
perf list
description:Here are the perf results on my machine:
As you can see, both of them are hugely different for
4092
and4096
.The text was updated successfully, but these errors were encountered: