Vanishing / Exploding Gradients

Unavoidable unless weight = 1, but can be improved with Xavier initialization term: sqrt(1./layers_dims[l-1])

Last updated