Attention
Last updated
Last updated
Query -> input*embedding
Key and Value -> input*embedding
softmax -> weight of Value (use Query to interpret Key(input), does this input have what Query is looking for)
Why softmax not relu: too many 0s? Softmax normalizes the attention scores.
For self-attention knowledge tracing: attention + ResNet (output + x)
Q1
Q2
Q3
K1
1
0
0
K2
1
1
0
K3
1
1
1
Self-attention knowledge tracing paper