GRU and LSTM
Compared with RNN, there are update gate and forget gate that modify hidden states.
LSTM
LSTM:
Forget gates control a (previous cell state) gets passed on to next cell state
Update gates control c (candidate value) add to next cell state
Output gates control prediction output
Difference between GRU and LSTM:
LSTM is better at addressing vanishing gradients, and better at carrying input for many timestep
GRU focus on local input (near the current timestep)
Last updated