L₁ regularization

Add L₁ regularization term to penalize high weight matrix

Try weight_decay in pytorch.optim

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-5)

l1=1e-5, l2=1e-4

L2 and L1 penalize weights differently:

Consequently, L2 and L1 have different derivatives:

Weight decay: a regularization technique (such as L₂ regularization) that results in gradient descent shrinking the weights on every iteration.

Last updated 4 years ago

Was this helpful?