- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
WeeklyTelcon_20160412
        Jeff Squyres edited this page Nov 18, 2016 
        ·
        1 revision
      
    - Dialup Info: (Do not post to public mailing list or public wiki)
 
- Geoff Paulsen
 - Jeff Squyres
 - Edgar Gabriel
 - Howard
 - Josh Hursey
 - Joshua Ladd
 - Nathan Hjelm
 - Ralph Castain
 - Ryan Grant
 - Sylvain Jeaugey
 - Todd Kordenbrock
 
- 
Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
- Ralph will look at the ByNode thing today.
 - Allena reported a SLURM issue.
 - Issue 1530 Reason MTT was hanging was due to a test signal handler segving, cores taking a long time to dump.
 - Next 1.10 release. need to fix these issues, but looking like early May.
 
 - 
Github Now DOES allow per-branch permissions, so will look at
 
- 
Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
- 1 remaining blocker: to memory symbol patcher - Nathan / IBM / Mellanox.
- Got the original code to stack with UCX. munmap on Linux has an optional argument. This doesn't work well with any style of hooking. Loader ends up patching a random function address. Didn't understand. Assembly looked okay, but munmap was randomly
 - When UCX is involved, seeing both Open MPI and UCX memhooks, which is great.
 - SPARC still having issues, so will need a solution for 2.0.1.
 - Nathan will work to remove ptmalloc on master, and have build time
 - on 2.0.0 Nathan will add a --enable-ptmalloc explicit configure option, but doesn't build by default.
- If users configure --enable-ptmalloc, then it would disable the internal memhook frameworks entirely.
 - when this happens, will have to add some early code to tickle ptmalloc
 - need to document that if --enable-ptmalloc then munmap() calls may give wrong answers.
 
 - Nathan might due sparc assembly himself, since seeing weird dlsym issues on sparc, that might be related.
 - Now created first time creating an rcache.
 - Might need to tweak openib BTL, because it created rcache too early.
 - When you create a thread, it has to expand the heap sometimes.
 - When they expand heap, they protect entire heap with PROT_NOTE.
 - munmap, mremap (if new length is smaller than old length), shmdt (only on linux, not OSX), brk.
- need sbrk also (for negative increment).
 
 - Nathan will look at README for memory hook stuff.
 - OPENFABRICS should get it's act together and put in something in kernel to alleviate ll of this.
 
 - Question, do we want new prettier ompi_info output.  Didn't change parsable output.
- Low risk, got contributor agreement (works for SuSE). Can pull 1515, 1516, 1518 into 2.0.
 
 
 - 1 remaining blocker: to memory symbol patcher - Nathan / IBM / Mellanox.
 - 
Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0 *
 
- Absoft failure. Need to fix configure stuff with atomics. On Master.
 - Nathan - this failure should go into 2.0.1. Absoft MTT shows that
 
- IBM has a client facing cluster
- Working on getting Jenkins setup on IBM side to ensure Pull Requests get tested on Power also.
 - Hoping to have online this week?
 
 - Better upload interface using by both Ralph and Josh.
 - Plugins to support SLURM, Copy tree, Shell commands. Compiler version detection.
 - Looking at establishing MTT release .tarballs.
 - Timeframe: Sooner rather than later
 
- Cisco - a bunch of release engineering work for both libfabric and OMPI.
- assisting on a number of bugs.
 
 - NVIDIA - Sylvian - An issue on MTT - looking into.
 
- Cisco, ORNL, UTK, NVIDIA
 - Mellanox, Sandia, Intel
 - LANL, Houston, IBM