Attention
Query -> input*embedding
Key and Value -> input*embedding
softmax -> weight of Value (use Query to interpret Key(input), does this input have what Query is looking for)
Why softmax not relu: too many 0s? Softmax normalizes the attention scores.
For self-attention knowledge tracing: attention + ResNet (output + x)
Q1 | Q2 | Q3 | |
K1 | 1 | 0 | 0 |
K2 | 1 | 1 | 0 |
K3 | 1 | 1 | 1 |
Self-attention knowledge tracing paper
Last updated