Continuous Backpropagation

Define adjoint state

a(t)=Lossz(t)(1)a(t) = \frac{\partial{Loss}}{\partial{z(t)}} \tag1

From tt to t+ϵt + \epsilon (ϵ\epsilon change in time) we have

z(t+ϵ)=tt+ϵf(z(t),t,θ)t+z(t)=Tϵ(z(t),t)(2)z(t + \epsilon) = \int_{t}^{t + \epsilon}f(z(t),t,\theta)\partial{t}+z(t) = T_\epsilon(z(t),t) \tag2

And because of chain rule ( yx=yuux\frac{\partial{y}}{\partial{x}} = \frac{\partial{y}}{\partial{u}} \frac{\partial{u}}{\partial{x}} )

a(t)=a(t+ϵ)Tϵ(z(t),t)z(t)(3)a(t) = a(t + \epsilon)\frac{\partial{T_\epsilon}(z(t),t)}{\partial{z(t)}} \tag3

Take the definition of derivative:

a(t)t=limϵ0a(t+ϵ)a(t)ϵ(4)\frac{\partial{a(t)}}{\partial{t}} = \lim_{\epsilon\to0}\frac{a(t+\epsilon) - a(t)}{\epsilon} \tag4

Substitue (3) in (4)

a(t)t=limϵ0a(t+ϵ)a(t+ϵ)Tϵ(z(t),t)z(t)ϵ(5)\frac{\partial{a(t)}}{\partial{t}} = \lim_{\epsilon\to0}\frac{a(t+\epsilon) - a(t + \epsilon)\frac{\partial{T_\epsilon}(z(t),t)}{\partial{z(t)}}}{\epsilon} \tag5
a(t)t=limϵ0a(t+ϵ)a(t+ϵ)z(t)Tϵ(z(t),t)ϵ(6)\frac{\partial{a(t)}}{\partial{t}} = \lim_{\epsilon\to0}\frac{a(t+\epsilon) - a(t + \epsilon)\frac{\partial}{\partial{z(t)}}{T_\epsilon}(z(t),t)}{\epsilon} \tag6

Taylor series around z(t)z(t) in (6)

a(t)t=limϵ0a(t+ϵ)a(t+ϵ)z(t)(z(t)+ϵf(z(t),t,θ)+O(ϵ2))ϵ(7)\frac{\partial{a(t)}}{\partial{t}} = \lim_{\epsilon\to0}\frac{a(t+\epsilon) - a(t + \epsilon)\frac{\partial}{\partial{z(t)}}(z(t)+\epsilon{f(z(t),t,\theta)+\mathcal{O}(\epsilon^2)})}{\epsilon} \tag7

aka Tϵ(z(t),t){T_\epsilon}(z(t),t) to z(t)+ϵf(z(t),t,θ)+O(ϵ2)z(t)+\epsilon{f(z(t),t,\theta)+\mathcal{O}(\epsilon^2)} when limϵ0\lim_{\epsilon\to0}

aka when ϵ\epsilon change in time is small, take range ϵ0=ϵ\epsilon - 0 = \epsilon and become ϵf(z(t),t,θ)\epsilon{f(z(t),t,\theta)}, to make up for the loss add O(ϵ2)\mathcal{O}(\epsilon^2) at the end (notice it is related to ϵ\epsilon)

Expand (7)

a(t)t=limϵ0a(t+ϵ)a(t+ϵ)(z(t)z(t)+z(t)ϵf(z(t),t,θ)+O(ϵ2))ϵ(8)\frac{\partial{a(t)}}{\partial{t}} = \lim_{\epsilon\to0}\frac{a(t+\epsilon) - a(t + \epsilon)(\frac{\partial}{\partial{z(t)}}z(t)+\frac{\partial}{\partial{z(t)}}\epsilon{f(z(t),t,\theta)+\mathcal{O}(\epsilon^2)})}{\epsilon} \tag8

aka z(t)z(t)=I\frac{\partial}{\partial{z(t)}}z(t) = I

a(t)t=limϵ0a(t+ϵ)a(t+ϵ)(I+z(t)ϵf(z(t),t,θ)+O(ϵ2))ϵ(9)\frac{\partial{a(t)}}{\partial{t}} = \lim_{\epsilon\to0}\frac{a(t+\epsilon) - a(t + \epsilon)(I+\frac{\partial}{\partial{z(t)}}\epsilon{f(z(t),t,\theta)+\mathcal{O}(\epsilon^2)})}{\epsilon} \tag9
a(t)t=limϵ0a(t+ϵ)a(t+ϵ)(I+ϵf(z(t),t,θ)z(t)+O(ϵ2))ϵ(10)\frac{\partial{a(t)}}{\partial{t}} = \lim_{\epsilon\to0}\frac{a(t+\epsilon) - a(t + \epsilon)(I+\epsilon\frac{\partial{f(z(t),t,\theta)}}{\partial{z(t)}}{+\mathcal{O}(\epsilon^2)})}{\epsilon} \tag{10}
a(t)t=limϵ0a(t+ϵ)ϵf(z(t),t,θ)z(t)+O(ϵ2)ϵ(11)\frac{\partial{a(t)}}{\partial{t}} = \lim_{\epsilon\to0}\frac{- a(t + \epsilon)\epsilon\frac{\partial{f(z(t),t,\theta)}}{\partial{z(t)}}{+\mathcal{O}(\epsilon^2)}}{\epsilon} \tag{11}

aka a(t+ϵ)a(t+ϵ)I=0a(t + \epsilon) - a(t + \epsilon)I = 0

a(t)t=limϵ0a(t+ϵ)f(z(t),t,θ)z(t)+O(ϵ)(12)\frac{\partial{a(t)}}{\partial{t}} = \lim_{\epsilon\to0}{- a(t + \epsilon)\frac{\partial{f(z(t),t,\theta)}}{\partial{z(t)}}{+\mathcal{O}(\epsilon)}} \tag{12}

and because limϵ0\lim_{\epsilon\to0}

a(t)t=a(t)f(z(t),t,θ)z(t)(13)\frac{\partial{a(t)}}{\partial{t}} = - a(t)\frac{\partial{f(z(t),t,\theta)}}{\partial{z(t)}} \tag{13}

Last updated