Lā regularization
Last updated
Was this helpful?
Last updated
Was this helpful?
Add Lā regularization term
to penalize high weight matrix (set weight as 0, equivalent of eliminating hidden nodes), thus create a smaller network and reduce overfitting
Try weight_decay
in pytorch.optim
l1=1e-5
, l2=1e-4
L2 and L1 penalize weights differently:
L2 penalizes weight^2.
L1 penalizes |weight|.
Consequently, L2 and L1 have different derivatives:
The derivative of L2 is 2 * weight.
The derivative of L1 is k (a constant, whose value is independent of weight).
Weight decay: a regularization technique (such as Lā regularization) that results in gradient descent shrinking the weights on every iteration.