-
Notifications
You must be signed in to change notification settings - Fork 907
Optimizations to PML-CM #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Refer to this link for build results (access rights to CI server needed): Build Log
Test FAILed. |
The proposed patch only works for predefined, and no gaps datatypes. The function opal_datatype_is_contiguous_memory_layout detects if the access pattern is contiguous starting from some memory location, but this memory location might be something else than |
Thanks @bosilca. Some tests failed with btl build. |
@bosilca - A little confused here: (commit: jithinjosepkl@fd9eb39) |
@jithinjosepkl using opal_datatype_is_contiguous_memory_layout returns true On Tue, May 26, 2015 at 2:02 PM, Jithin Jose [email protected]
|
@bosilca - got it, thanks. |
I don't see the point of the inline change. In the normal case the pml functions will never actually be inlined as they need a valid function pointer. In the best case you may inline the pml direct stuff but does that buy anything in terms of performance? |
@hjelmn - Yeah it helps only for the pml-cm direct build to reduce instruction counts. |
Signed-off-by: Jithin Jose <[email protected]>
Signed-off-by: Jithin Jose <[email protected]>
Signed-off-by: Jithin Jose <[email protected]>
4da296a
to
20c88e8
Compare
|
Refer to this link for build results (access rights to CI server needed): Build Log
Test FAILed. |
Signed-off-by: Jithin Jose <[email protected]>
20c88e8
to
c745854
Compare
|
Refer to this link for build results (access rights to CI server needed): Build Log
Test FAILed. |
- Follow up patch for 56869bf Signed-off-by: Jithin Jose <[email protected]>
|
Refer to this link for build results (access rights to CI server needed): |
@bosilca - I updated the patch so that the buffer address is calculated by adding datatype->true_lb. |
@jithinjosepkl thanks for fixing it. looks better now. |
@jithinjosepkl - could you please quantify these optimization? w/ before and after? ohh, sorry, just saw last 3 lines of PR description. |
@bosilca - thanks. |
@jithinjosepkl - thanks! does static build essential to get this numbers? |
Static build is not essential. I was just reporting the build that I used. Large message latency degradation could probably be noise. |
thanks, 5 iterations probably the reason for noise. interesting to see w/ 1k iterations |
Btw, I meant the numbers are averaged after 5 runs of benchmark, each running default number of iterations (iterations depend on message size). Used OSU benchmarks here. For large messages, I doubt if these changes could affect the performance. |
@hjelmn is sitting next to me and he's cool with this PR now; I think you should feel free to merge it. Excessive inlining can sometimes be a problem, though, so this is something to keep an eye on. |
Thanks @jsquyres. Yes, I agree on the comment about inlining. |
dl/dlopen: fix memory leak
Requesting comments on this PR.
These patches aim to optimize PML-CM layer.
Improvements (evaluated with static direct build PML-CM, MTL-OFI):
Please review this PR.
@jsquyres @rhc54 @tkordenbrock