Gradient accumulation (micro step) could be very useful when we want to have large batch size but with limited number of gpus.