Query -> input*embedding
Key and Value -> input*embedding
softmax -> weight of Value (use Query to interpret Key(input), does this input have what Query is looking for)
Why softmax not relu: too many 0s? Softmax normalizes the attention scores.
Attention paperarrow-up-right
For self-attention knowledge tracing: attention + ResNet (output + x)
Q1
Q2
Q3
K1
1
0
K2
K3
Self-attention knowledge tracing paperarrow-up-right
Overview videoarrow-up-right
Last updated 1 year ago