Normalize input
Left: need to be careful about the learning rate, or else will oscillate around the optimal point
Right: normalized data can use bigger learning rate and learn faster
Two steps:
Zero center:
subtract mean: all features around 0
normalize variable: make it round (variance around 1)
Use the same parameter (Āµ and Ļ) for both train and test
Last updated