L₁ regularization

Add L₁ regularization term to penalize high weight matrix

Try weight_decay in pytorch.optim

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-5)

l1=1e-5, l2=1e-4

L2 and L1 penalize weights differently:

  • L2 penalizes weight^2.

  • L1 penalizes |weight|.

Consequently, L2 and L1 have different derivatives:

  • The derivative of L2 is 2 * weight.

  • The derivative of L1 is k (a constant, whose value is independent of weight).

Weight decay: a regularization technique (such as L₂ regularization) that results in gradient descent shrinking the weights on every iteration.

Link to Google Developer

Last updated