矩阵求导的一般规律

Consider two vector xRdI and yRdO, and y=Wx with WRdo×dI .

(1)dydxRdO× dIdydx=W (2)dydWRdO× dO×dIdydW=IdO× dOxT

If we define W=MN, where MRdO×r and NRr×dI, we have

(3)dydMRdO× dO×rdydM=IdO(Nx)T=IdO(x)TNT

and

(4)dydNRdO× r×dIdydN=dydNxdNxdN=MIrxT

Gradient is different from the derivative. It can be seen as the transpose of derivatives.

(5)xy=(yx)T.

In derivative, we may have:

(6)dLdθ=dLdfdfdθ,

with the shape dLdθR1×Nθ while in the gradient, it is:

(7)θLRNθ× 1θL=θf fL

Author: Zi Liang (zi1415926.liang@connect.polyu.hk) Create Date: Sun Oct 13 20:03:54 2024 Last modified: 2025-02-01 Sat 17:21 Creator: Emacs 29.2 (Org mode 9.6.15)