-
Couldn't load subscription status.
- Fork 1.5k
Description
I've been doing some experiments on CIFAR10 with ResNets and decided to give APEX AMP a try.
However, I ran into some performance issues:
- AMP with pytorch's
torch.nn.parallel.DistributedDataParallelwas extremely slow. - AMP with
apex.parallel.DistributedDataParallelwas slower than the default training withtorch.nn.DistributedDataParallel(no apex involved). For reference, normal training took about 15 min, while apex AMP training took 21 minutes (90 epochs on CIFAR-10 with ResNet20)
I followed the installation instructions, but I couldn't install the C++ extensions because of my GCC/CUDA version. Does this justify this slowdown?
You can see the code here:
https://github.com/braincreators/octconv/blob/34440209c4b37fb5198f75e4e8c052e92e80e85d/benchmarks/train.py#L1-L498
And run it (2 GPUs):
Without APEX AMP:
python -m torch.distributed.launch --nproc_per_node 2 train.py -c configs/cifar10/resnet20_small.yml --batch-size 128 --lr 0.1
With APEX AMP:
python -m torch.distributed.launch --nproc_per_node 2 train.py -c configs/cifar10/resnet20_small.yml --batch-size 128 --lr 0.1 --mixed-precision