Skip to content

Optimizations to PML-CM #602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 2, 2015
Merged

Conversation

jithinjosepkl
Copy link
Contributor

Requesting comments on this PR.

These patches aim to optimize PML-CM layer.

  • Avoid datatype pack/unpack for contiguous data on homogenous systems
  • Inline PML-CM
  • Avoid extra lookup for ompi_proc in homogenous build

Improvements (evaluated with static direct build PML-CM, MTL-OFI):

  • Lesser instruction count for send path 113 (initially 244)
  • Slight improvements for message rate (around 8%) and latency for small messages.

Please review this PR.
@jsquyres @rhc54 @tkordenbrock

@mellanox-github
Copy link

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/551/

Build Log
last 50 lines

[...truncated 22010 lines...]
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
===================
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
===================
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
===================
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
===================
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
===================
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
===================
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
===================
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
===================
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node jenkins01 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Build step 'Execute shell' marked build as failure
[htmlpublisher] Archiving HTML reports...
[htmlpublisher] Archiving at BUILD level /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/cov_build to /var/lib/jenkins/jobs/gh-ompi-master-pr/builds/551/htmlreports/Coverity_Report
Setting commit status on GitHub for https://github.com/open-mpi/ompi/commit/b4cfa6dd88042bf0ec4c982b089d9a226308277a
[BFA] Scanning build for known causes...
[BFA] No failure causes found
[BFA] Done. 0s
Setting status of 4da296a13510dbce3a80535eba998a300769a634 to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/551/ and message: Build finished.

Test FAILed.

@bosilca
Copy link
Member

bosilca commented May 26, 2015

The proposed patch only works for predefined, and no gaps datatypes. The function opal_datatype_is_contiguous_memory_layout detects if the access pattern is contiguous starting from some memory location, but this memory location might be something else than convertor->pBaseBuf. One should use opal_convertor_get_current_pointer to correctly get the pointer to the first contiguous byte of the memory layout.

@jithinjosepkl
Copy link
Contributor Author

Thanks @bosilca.
I will rework the patch to use opal_convertor_get_current_pointer.

Some tests failed with btl build.
Let me take a look at them too.

@jithinjosepkl
Copy link
Contributor Author

@bosilca - A little confused here:
I explicitly set convertor->pBaseBuf in case of contiguous memory layout.
And later, i get the buffer from convertor->pBaseBuf just before sending.
Since it is explicitly set, should I worry about using opal_convertor_get_current_pointer?
Or, am I missing something else here?

(commit: jithinjosepkl@fd9eb39)

@bosilca
Copy link
Member

bosilca commented May 26, 2015

@jithinjosepkl using opal_datatype_is_contiguous_memory_layout returns true
if you have a datatype with a gap in the beginning, but the memory layout
is contiguous. Thus, using the user supplied buffer in this context is
wrong, as it doesn't account for the gap in the beginning. Adding
datatype->ub might be the solution here. You can create a simple datatype
using MPI_Type_create_resized to check.

On Tue, May 26, 2015 at 2:02 PM, Jithin Jose [email protected]
wrote:

@bosilca https://github.com/bosilca - A little confused here:
I explicitly set convertor->pBaseBuf in case of contiguous memory layout.
And later, i get the buffer from convertor->pBaseBuf just before sending.
Since it is explicitly set, should I worry about using
opal_convertor_get_current_pointer?
Or, am I missing something else here?

(commit: jithinjosepkl/ompi@fd9eb39
jithinjosepkl@fd9eb39
)


Reply to this email directly or view it on GitHub
#602 (comment).

@jithinjosepkl
Copy link
Contributor Author

@bosilca - got it, thanks.

@hjelmn
Copy link
Member

hjelmn commented May 26, 2015

I don't see the point of the inline change. In the normal case the pml functions will never actually be inlined as they need a valid function pointer. In the best case you may inline the pml direct stuff but does that buy anything in terms of performance?

@jithinjosepkl
Copy link
Contributor Author

@hjelmn - Yeah it helps only for the pml-cm direct build to reduce instruction counts.
This might not directly impact performance, but all the patches together slightly improve performance for direct build.

@mellanox-github
Copy link

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/556/

Build Log
last 50 lines

[...truncated 26219 lines...]
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
 5 0x0000003d69089985 memcpy()  ??:0
 6 0x0000000000005338 mca_spml_ikrit_get_shm()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/oshmem/mca/spml/ikrit/spml_ikrit.c:846
 7 0x00000000000053ad mca_spml_ikrit_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/oshmem/mca/spml/ikrit/spml_ikrit.c:859
 8 0x000000000002f22f pshmem_int_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/oshmem/shmem/c/profile/pshmem_get.c:66
 9 0x000000000040080e main()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/oshmem_circular_shift.c:27
10 0x0000003d6901ed1d __libc_start_main()  ??:0
11 0x00000000004006e9 _start()  ??:0
===================
[jenkins01:11725:0] Caught signal 11 (Segmentation fault)
Process 3 gets message from 4 (8 processes in ring)
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
 5 0x0000003d69089985 memcpy()  ??:0
 6 0x0000000000005338 mca_spml_ikrit_get_shm()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/oshmem/mca/spml/ikrit/spml_ikrit.c:846
 7 0x00000000000053ad mca_spml_ikrit_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/oshmem/mca/spml/ikrit/spml_ikrit.c:859
 8 0x000000000002f22f pshmem_int_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/oshmem/shmem/c/profile/pshmem_get.c:66
 9 0x000000000040080e main()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/oshmem_circular_shift.c:27
10 0x0000003d6901ed1d __libc_start_main()  ??:0
11 0x00000000004006e9 _start()  ??:0
===================
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
 5 0x0000003d69089985 memcpy()  ??:0
 6 0x0000000000005338 mca_spml_ikrit_get_shm()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/oshmem/mca/spml/ikrit/spml_ikrit.c:846
 7 0x00000000000053ad mca_spml_ikrit_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/oshmem/mca/spml/ikrit/spml_ikrit.c:859
 8 0x000000000002f22f pshmem_int_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/oshmem/shmem/c/profile/pshmem_get.c:66
 9 0x000000000040080e main()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/oshmem_circular_shift.c:27
10 0x0000003d6901ed1d __libc_start_main()  ??:0
11 0x00000000004006e9 _start()  ??:0
===================
--------------------------------------------------------------------------
oshrun noticed that process rank 0 with PID 0 on node jenkins01 exited on signal 13 (Broken pipe).
--------------------------------------------------------------------------
Build step 'Execute shell' marked build as failure
[htmlpublisher] Archiving HTML reports...
[htmlpublisher] Archiving at BUILD level /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/cov_build to /var/lib/jenkins/jobs/gh-ompi-master-pr/builds/556/htmlreports/Coverity_Report
Setting commit status on GitHub for https://github.com/open-mpi/ompi/commit/cb0ac29d74123864caa38829bed6492b425d3011
[BFA] Scanning build for known causes...
[BFA] No failure causes found
[BFA] Done. 0s
Setting status of 20c88e8530654a3aa2db7fed1b5785ca2c7f44ab to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/556/ and message: Build finished.

Test FAILed.

@mellanox-github
Copy link

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/561/

Build Log
last 50 lines

[...truncated 26222 lines...]
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
 5 0x0000003d69089985 memcpy()  ??:0
 6 0x0000000000005338 mca_spml_ikrit_get_shm()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/oshmem/mca/spml/ikrit/spml_ikrit.c:846
 7 0x00000000000053ad mca_spml_ikrit_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/oshmem/mca/spml/ikrit/spml_ikrit.c:859
 8 0x000000000002f22f pshmem_int_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/oshmem/shmem/c/profile/pshmem_get.c:66
 9 0x000000000040080e main()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/ompi_install1/examples/oshmem_circular_shift.c:27
10 0x0000003d6901ed1d __libc_start_main()  ??:0
11 0x00000000004006e9 _start()  ??:0
===================
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
 5 0x0000003d69089985 memcpy()  ??:0
 6 0x0000000000005338 mca_spml_ikrit_get_shm()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/oshmem/mca/spml/ikrit/spml_ikrit.c:846
 7 0x00000000000053ad mca_spml_ikrit_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/oshmem/mca/spml/ikrit/spml_ikrit.c:859
 8 0x000000000002f22f pshmem_int_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/oshmem/shmem/c/profile/pshmem_get.c:66
 9 0x000000000040080e main()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/ompi_install1/examples/oshmem_circular_shift.c:27
10 0x0000003d6901ed1d __libc_start_main()  ??:0
11 0x00000000004006e9 _start()  ??:0
===================
[jenkins01:22192:0] Caught signal 11 (Segmentation fault)
Process 3 gets message from 4 (8 processes in ring)
==== backtrace ====
 2 0x00000000000597fc mxm_handle_error()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:641
 3 0x000000000005996c mxm_error_signal_handler()  /var/tmp/OFED_topdir/BUILD/mxm-3.3.3052/src/mxm/util/debug/debug.c:616
 4 0x0000003d690329a0 killpg()  ??:0
 5 0x0000003d69089985 memcpy()  ??:0
 6 0x0000000000005338 mca_spml_ikrit_get_shm()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/oshmem/mca/spml/ikrit/spml_ikrit.c:846
 7 0x00000000000053ad mca_spml_ikrit_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/oshmem/mca/spml/ikrit/spml_ikrit.c:859
 8 0x000000000002f22f pshmem_int_get()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/oshmem/shmem/c/profile/pshmem_get.c:66
 9 0x000000000040080e main()  /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/ompi_install1/examples/oshmem_circular_shift.c:27
10 0x0000003d6901ed1d __libc_start_main()  ??:0
11 0x00000000004006e9 _start()  ??:0
===================
--------------------------------------------------------------------------
oshrun noticed that process rank 0 with PID 0 on node jenkins01 exited on signal 13 (Broken pipe).
--------------------------------------------------------------------------
Build step 'Execute shell' marked build as failure
[htmlpublisher] Archiving HTML reports...
[htmlpublisher] Archiving at BUILD level /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace-2/cov_build to /var/lib/jenkins/jobs/gh-ompi-master-pr/builds/561/htmlreports/Coverity_Report
Setting commit status on GitHub for https://github.com/open-mpi/ompi/commit/7bb48746f1db24c6ed5b1d39ccae5bfe8d25cce3
[BFA] Scanning build for known causes...
[BFA] No failure causes found
[BFA] Done. 0s
Setting status of c745854d9b518f7e93d8308b9b201c44db12b8b4 to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/561/ and message: Build finished.

Test FAILed.

@mellanox-github
Copy link

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/565/
Test PASSed.

@jithinjosepkl
Copy link
Contributor Author

@bosilca - I updated the patch so that the buffer address is calculated by adding datatype->true_lb.
Can you please take a look to see if this change is fine?

@bosilca
Copy link
Member

bosilca commented May 28, 2015

@jithinjosepkl thanks for fixing it. looks better now.

@mike-dubman
Copy link
Member

@jithinjosepkl - could you please quantify these optimization? w/ before and after?
Thanks!

ohh, sorry, just saw last 3 lines of PR description.

@jithinjosepkl
Copy link
Contributor Author

@miked-mellanox - not a problem.
Let me post the numbers here:
(These are with direct static build pml-cm, mtl-ofi)

Latency:
image

B/w:
image

Message Rate:
image

(plus, there is a good reduction in instruction count for send path - (~113 vs. 244)

@jithinjosepkl
Copy link
Contributor Author

@bosilca - thanks.

@mike-dubman
Copy link
Member

@jithinjosepkl - thanks!

does static build essential to get this numbers?
some large messages numbers got worse, probably noise?
how many iterations were used?

@jithinjosepkl
Copy link
Contributor Author

Static build is not essential. I was just reporting the build that I used.

Large message latency degradation could probably be noise.
B/w and message rates are better for large messages.
These numbers are average of five iterations.

@mike-dubman
Copy link
Member

thanks, 5 iterations probably the reason for noise. interesting to see w/ 1k iterations

@jithinjosepkl
Copy link
Contributor Author

Btw, I meant the numbers are averaged after 5 runs of benchmark, each running default number of iterations (iterations depend on message size). Used OSU benchmarks here.

For large messages, I doubt if these changes could affect the performance.

@jsquyres
Copy link
Member

jsquyres commented Jun 2, 2015

@hjelmn is sitting next to me and he's cool with this PR now; I think you should feel free to merge it. Excessive inlining can sometimes be a problem, though, so this is something to keep an eye on.

@jithinjosepkl
Copy link
Contributor Author

Thanks @jsquyres. Yes, I agree on the comment about inlining.

jsquyres added a commit that referenced this pull request Jun 2, 2015
@jsquyres jsquyres merged commit a55eb5e into open-mpi:master Jun 2, 2015
jsquyres pushed a commit to jsquyres/ompi that referenced this pull request Sep 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants