Skip to content

4K aliasing correct perf counter #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
skgbanga opened this issue Jan 2, 2020 · 2 comments
Open

4K aliasing correct perf counter #18

skgbanga opened this issue Jan 2, 2020 · 2 comments

Comments

@skgbanga
Copy link

skgbanga commented Jan 2, 2020

I am curious what is the 'correct' perf counter for 4K aliasing. You have mentioned ld_blocks.store_forward, but I was wondering about the other counter ld_blocks_partial.address_alias as well.

Here is the perf list description:

  ld_blocks.store_forward                           
       [loads blocked by overlapping with store buffer that cannot be forwarded]
  ld_blocks_partial.address_alias                   
       [False dependencies in MOB due to partial compare on address]

Here are the perf results on my machine:

$ perf stat -e ld_blocks_partial.address_alias,ld_blocks.store_forward ./a.out 4096
222

 Performance counter stats for './a.out 4096':

             6,852      ld_blocks_partial.address_alias:u                                   
                32      ld_blocks.store_forward:u                                   

       0.224647447 seconds time elapsed

$ perf stat -e ld_blocks_partial.address_alias,ld_blocks.store_forward ./a.out 4092
359

 Performance counter stats for './a.out 4092':

       132,139,399      ld_blocks_partial.address_alias:u                                   
         2,097,093      ld_blocks.store_forward:u                                   

       0.361229917 seconds time elapsed

As you can see, both of them are hugely different for 4092 and 4096.

@Kobzol
Copy link
Owner

Kobzol commented Dec 11, 2020

Good point, I remember that I was also thinking about this, but it was some time ago 😅 From
ld_blocks_partial.address and ld_blocks.store_forward, I suspect that maybe store_forward reports the actual cases where forwarding was blocked, and the second counter reports cases where it was due to "false" aliasing (i.e. when forwarding would be possible, but there was an alias). But obviously these two counters are sampled in different situations, because their values are vastly different.

I would have to remind myself of this in more detail, my knowledge is not so deep in this area :) This is just a (probably wrong) guess.

@travisdowns any hints? :)

@travisdowns
Copy link
Contributor

Yes, I think ld_blocks_partial.address_alias counts cases where there was an initial "hit" in the store buffer loose net (i.e., the CPU thinks a load is going to forward from a store), but then when the full address was compared in the fine net, it was found to be a spurious hit due to 4K aliasing. This question and answers have some details about store forwarding and in particular "fine net" and "loose net".

ld_blocks.store_forward measures some other type of block related to store forwarding, although I'm not actually sure what. Maybe when a load is predicted to forward, but the data is not available, or when a load can't be forwarded because it overlaps but is not fully contained within a store (example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants