GRU and LSTM
Last updated
Last updated
Compared with RNN, there are update gate and forget gate that modify hidden states.
LSTM:
Forget gates control a (previous cell state) gets passed on to next cell state
Update gates control c (candidate value) add to next cell state
Output gates control prediction output
LSTM is better at addressing vanishing gradients, and better at carrying input for many timestep
GRU focus on local input (near the current timestep)