GRU and LSTM

Compared with RNN, there are update gate and forget gate that modify hidden states.

LSTM

LSTM:

  1. Forget gates control a (previous cell state) gets passed on to next cell state

  2. Update gates control c (candidate value) add to next cell state

  3. Output gates control prediction output

Difference between GRU and LSTM:

  • LSTM is better at addressing vanishing gradients, and better at carrying input for many timestep

  • GRU focus on local input (near the current timestep)

Last updated