Vanishing / Exploding Gradients
Unavoidable unless weight = 1
, but can be improved with Xavier initialization term: sqrt(1./layers_dims[l-1])
Initialize weight
Last updated
Unavoidable unless weight = 1
, but can be improved with Xavier initialization term: sqrt(1./layers_dims[l-1])
Last updated