🌻
Models
Search...
Ctrl + K
Models
Attention
Re-attention
Previous
Attention
Next
Enformer
Last updated
3 years ago
To handle Attention Collapse or learning plateau: re-attention
A
t
t
e
n
t
i
o
n
(
Q
,
K
,
V
)
=
s
o
f
t
m
a
x
(
Q
⋅
K
T
d
)
⋅
V
Attention(Q, K, V) = softmax(\frac{{Q}\cdot{K}^{T}}{\sqrt{d}})\cdot{V}
A
tt
e
n
t
i
o
n
(
Q
,
K
,
V
)
=
so
f
t
ma
x
(
d
Q
⋅
K
T
)
⋅
V
paper
R
e
−
a
t
t
e
n
t
i
o
n
(
Q
,
K
,
V
)
=
N
o
r
m
(
θ
T
(
s
o
f
t
m
a
x
(
Q
⋅
K
T
d
)
)
)
⋅
V
Re-attention(Q, K, V) = Norm(\theta^{T}(softmax(\frac{{Q}\cdot{K}^{T}}{\sqrt{d}})))\cdot{V}
R
e
−
a
tt
e
n
t
i
o
n
(
Q
,
K
,
V
)
=
N
or
m
(
θ
T
(
so
f
t
ma
x
(
d
Q
⋅
K
T
)))
⋅
V
Difference