L₁ regularization
Add L₁ regularization term to penalize high weight matrix
Try weight_decay in pytorch.optim
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-5)l1=1e-5, l2=1e-4
L2 and L1 penalize weights differently:
L2 penalizes weight^2.
L1 penalizes |weight|.
Consequently, L2 and L1 have different derivatives:
The derivative of L2 is 2 * weight.
The derivative of L1 is k (a constant, whose value is independent of weight).
Weight decay: a regularization technique (such as L₂ regularization) that results in gradient descent shrinking the weights on every iteration.
Last updated
Was this helpful?