Hanging with more than 64 hosts

## Problem

I can not run jobs in parallel in a cluster with more than 64 machines.

-----------------------------

## System

Using openmpi-3.0.0.tar.bz2, Sep 12, 2017, compiled from source/distribution tarball.

Linux version 2.6.32-573.3.1.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Thu Aug 13 22:55:16 UTC 2015

-----------------------------

## Details of the problem

I'm trying to use OpenMPI in a cluster with 200 machines. Unfortunately, `mpirun` looks like be frozen when I submit the jobs:
```shell
$ mpirun -np 1 -hostfile hosts.txt -pernode hostname
```
where `hosts.txt` has 200 machines. 

If the number of machines is at most 64, I can get the answer pretty fast:
```shell
$ mpirun -np 64 -hostfile hosts_64.txt -pernode hostname
machine1
machine2
....
machine64
```

This will be blocked forever:
```shell
$ mpirun -np 65 -hostfile hosts_65.txt -pernode hostname
<no response for 10min...>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hanging with more than 64 hosts #4578

Problem

System

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hanging with more than 64 hosts #4578

Description

Problem

System

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions