Understanding weight initialization for neural networks