Train longer, generalize better: closing the generalization gap in large batch training of neural networks read more