标量、向量、矩阵间的求导共有9种可能,其中就是我们熟悉的单变量微积分,会涉及高阶张量,处理更为麻烦,因此本文只考虑剩下的5种情形。

  设,则向量、矩阵对标量求导的定义为 \begin{align*} \frac{\partial \uv}{\partial x} \triangleq \begin{bmatrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \vdots \\ \frac{\partial u_l}{\partial x} \end{bmatrix}, \quad \frac{\partial \Uv}{\partial x} \triangleq \begin{bmatrix} \frac{\partial u_{11}}{\partial x} & \frac{\partial u_{12}}{\partial x} & \ldots & \frac{\partial u_{1n}}{\partial x} \\ \frac{\partial u_{21}}{\partial x} & \frac{\partial u_{22}}{\partial x} & \ldots & \frac{\partial u_{2n}}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial u_{m1}}{\partial x} & \frac{\partial u_{m2}}{\partial x} & \ldots & \frac{\partial u_{mn}}{\partial x} \end{bmatrix} \end{align*} 设,则标量对向量、矩阵求导的定义为 \begin{align*} \frac{\partial u}{\partial \xv} \triangleq \begin{bmatrix} \frac{\partial u}{\partial x_1} & \frac{\partial u}{\partial x_2} & \ldots & \frac{\partial u}{\partial x_l} \end{bmatrix}, \quad \frac{\partial u}{\partial \Xv} \triangleq \begin{bmatrix} \frac{\partial u}{\partial x_{11}} & \frac{\partial u}{\partial x_{21}} & \ldots & \frac{\partial u}{\partial x_{m1}} \\ \frac{\partial u}{\partial x_{12}} & \frac{\partial u}{\partial x_{22}} & \ldots & \frac{\partial u}{\partial x_{m2}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial u}{\partial x_{1n}} & \frac{\partial u}{\partial x_{2n}} & \ldots & \frac{\partial u}{\partial x_{mn}} \end{bmatrix} \end{align*} 即向量、矩阵对标量求导的结果与分子尺寸相同,标量对向量、矩阵求导的结果与分母的转置尺寸相同。向量对向量求导的定义为雅可比矩阵: \begin{align*} \frac{\partial \uv}{\partial \xv} \triangleq \begin{bmatrix} \frac{\partial u_1}{\partial x_1} & \frac{\partial u_1}{\partial x_2} & \ldots & \frac{\partial u_1}{\partial x_l} \\ \frac{\partial u_2}{\partial x_1} & \frac{\partial u_2}{\partial x_2} & \ldots & \frac{\partial u_2}{\partial x_l} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial u_l}{\partial x_1} & \frac{\partial u_l}{\partial x_2} & \ldots & \frac{\partial u_l}{\partial x_l} \end{bmatrix} \end{align*} 即行数与分子尺寸相同列数与分母尺寸相同

  以上即为分子布局,其好处是链式法则跟单变量微积分中的顺序一样,坏处是计算标量值函数关于向量变量的梯度时要多做一个转置,否则梯度下降优化变量和梯度没法直接相减。分母布局的结果均是分子布局的转置,好处就是算梯度时不用做转置,坏处就是链式法则的顺序要完全反过来。

基本结果

  以下结果根据定义和单变量微积分的求导法则都是显然的。

  单变量微积分中常量的导数为零 \begin{align*} \frac{\partial a}{\partial x} = 0 \end{align*} 类似的这里有 \begin{align*} \frac{\partial \av}{\partial x} = \zerov, \quad \frac{\partial a}{\partial \xv} = \zerov^\top, \quad \frac{\partial \av}{\partial \xv} = \zerov, \quad \frac{\partial \Av}{\partial x} = \zerov, \quad \frac{\partial a}{\partial \Xv} = \zerov^\top \end{align*}

  单变量微积分中常数标量乘的求导法则为 \begin{align*} \frac{\partial (a u)}{\partial x} = a \frac{\partial u}{\partial x} \end{align*} 类似的这里有 \begin{align*} \frac{\partial (a \uv)}{\partial x} = a \frac{\partial \uv}{\partial x}, \quad \frac{\partial (a u)}{\partial \xv} = a \frac{\partial u}{\partial \xv}, \quad \frac{\partial (a \uv)}{\partial \xv} = a \frac{\partial \uv}{\partial \xv}, \quad \frac{\partial (a \Uv)}{\partial x} = a \frac{\partial \Uv}{\partial x}, \quad \frac{\partial (a u)}{\partial \Xv} = a \frac{\partial u}{\partial \Xv} \end{align*}

  单变量微积分中加法的求导法则为 \begin{align*} \frac{\partial (u+v)}{\partial x} = \frac{\partial u}{\partial x} + \frac{\partial v}{\partial x} \end{align*} 类似的这里有 \begin{align*} & \frac{\partial (\uv + \vv)}{\partial x} = \frac{\partial \uv}{\partial x} + \frac{\partial \vv}{\partial x}, \quad \frac{\partial (u+v)}{\partial \xv} = \frac{\partial u}{\partial \xv} + \frac{\partial v}{\partial \xv}, \quad \frac{\partial (\uv + \vv)}{\partial \xv} = \frac{\partial \uv}{\partial \xv} + \frac{\partial \vv}{\partial \xv} \\ & \frac{\partial (\Uv + \Vv)}{\partial x} = \frac{\partial \Uv}{\partial x} + \frac{\partial \Vv}{\partial x}, \quad \frac{\partial (u + v)}{\partial \Xv} = \frac{\partial u}{\partial \Xv} + \frac{\partial v}{\partial \Xv} \end{align*}

  单变量微积分中乘法的求导法则为 \begin{align*} \frac{\partial (uv)}{\partial x} = \frac{\partial u}{\partial x} v + u \frac{\partial v}{\partial x} \end{align*} 类似的这里有 \begin{align*} & \frac{\partial (\uv \vv)}{\partial x} = \frac{\partial \uv}{\partial x} \vv + \uv \frac{\partial \vv}{\partial x}, \quad \frac{\partial (uv)}{\partial \xv} = \frac{\partial u}{\partial \xv} v + u \frac{\partial v}{\partial \xv} \\ & \frac{\partial (\Uv \Vv)}{\partial x} = \frac{\partial \Uv}{\partial x} \Vv + \Uv \frac{\partial \Vv}{\partial x}, \quad \frac{\partial (uv)}{\partial \Xv} = \frac{\partial u}{\partial \Xv} v + u \frac{\partial v}{\partial \Xv} \end{align*} 其中第二行是因为 \begin{align*} \left[ \frac{\partial (\Uv \Vv)}{\partial x} \right]_{ij} & = \frac{\partial (\sum_k u_{ik} v_{kj})}{\partial x} = \sum_k \frac{\partial u_{ik}}{\partial x} v_{kj} + \sum_k u_{ik} \frac{\partial v_{kj}}{\partial x} = \left[ \frac{\partial \Uv}{\partial x} \Vv \right]_{ij} + \left[ \Uv \frac{\partial \Vv}{\partial x} \right]_{ij} \\ \left[ \frac{\partial (uv)}{\partial \Xv} \right]_{ij} & = \frac{\partial (uv)}{\partial x_{ji}} = \frac{\partial u}{\partial x_{ji}} v + u \frac{\partial v}{\partial x_{ji}} = \left[ \frac{\partial u}{\partial \Xv} \right]_{ij} v + u \left[ \frac{\partial v}{\partial \Xv} \right]_{ij} \end{align*} 第一行可看作第二行的特例。有两种可能,一是为标量,即两者的内积,这里暂且不表,后文再讲;二是为矩阵,这属于我们不考虑的情形。

  单变量微积分中有,类似的这里有 \begin{align*} \frac{\partial x_i}{\partial \xv} = \ev_i^\top, \quad \frac{\partial \xv}{\partial x_i} = \ev_i, \quad \frac{\partial \xv}{\partial \xv} = \Iv, \quad \frac{\partial x_{ij}}{\partial \Xv} = \Ev_{ji}, \quad \frac{\partial \Xv}{\partial x_{ij}} = \Ev_{ij} \end{align*} 其中处为其余为的矩阵。

  单变量微积分中的链式法则为 \begin{align*} \frac{\partial g(u)}{\partial x} = \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial x} \end{align*} 类似的,

  • 只涉及向量:设,则 \begin{align*} \underbrace{\frac{\partial \gv(\uv)}{\partial \xv}}_{l \times n} = \underbrace{\frac{\partial \gv(\uv)}{\partial \uv}}_{l \times m} \underbrace{\frac{\partial \uv}{\partial \xv}}_{m \times n} \end{align*} 这是因为 \begin{align*} \left[ \frac{\partial \gv(\uv)}{\partial \xv} \right]_{ij} & = \frac{\partial [\gv(\uv)]_i}{\partial x_j} = \sum_{k \in [m]} \frac{\partial [\gv(\uv)]_i}{\partial u_k} \frac{\partial u_k}{\partial x_j} = \frac{\partial [\gv(\uv)]_i}{\partial \uv} \frac{\partial \uv}{\partial x_j} \\ & = \left[ \frac{\partial \gv(\uv)}{\partial \uv} \right]_{i,:} \left[ \frac{\partial \uv}{\partial \xv} \right]_{:,j} = \left[ \frac{\partial \gv(\uv)}{\partial \uv} \frac{\partial \uv}{\partial \xv} \right]_{i,j} \end{align*} 注意若,就退化成了单变量的链式法则。
  • 自变量是矩阵:设,则 \begin{align*} \frac{\partial g(u)}{\partial \Xv} = \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \Xv} \end{align*} 这是因为 \begin{align*} \left[ \frac{\partial g(u)}{\partial \Xv} \right]_{ij} & = \frac{\partial g(u)}{\partial x_{ji}} = \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial x_{ji}} = \frac{\partial g(u)}{\partial u} \left[ \frac{\partial u}{\partial \Xv} \right]_{ij} \end{align*}
  • 中间变量是矩阵:设,则 \begin{align} \label{eq: chain-matrix} \class{blue}{\frac{\partial g(\Uv)}{\partial x}} = \sum_p \sum_q \frac{\partial g(\Uv)}{\partial u_{pq}} \frac{\partial u_{pq}}{\partial x} = \sum_q \sum_p \left[ \frac{\partial g(\Uv)}{\partial \Uv} \right]_{qp} \left[ \frac{\partial \Uv}{\partial x} \right]_{pq} = \class{blue}{\tr \left( \frac{\partial g(\Uv)}{\partial \Uv} \frac{\partial \Uv}{\partial x} \right)} \end{align}

向量对标量求导

  矩阵和向量的乘积是向量,若无关,易知有 \begin{align*} & \left[ \frac{\partial (\Av \uv)}{\partial x} \right]_{i} = \frac{\partial [\Av \uv]_i}{\partial x} = \frac{\partial (\sum_k a_{ik} u_k)}{\partial x} = \sum_k a_{ik} \frac{\partial u_k}{\partial x} = \left[ \Av \frac{\partial \uv}{\partial x} \right]_i \Longrightarrow \class{blue}{\frac{\partial (\Av \uv)}{\partial x} = \Av \frac{\partial \uv}{\partial x}} \\ & \class{blue}{\frac{\partial (\uv^\top \Av)}{\partial x}} = \left[ \frac{\partial (\Av^\top \uv)}{\partial x} \right]^\top = \left[ \Av^\top \frac{\partial \uv}{\partial x} \right]^\top = \class{blue}{\frac{\partial \uv^\top}{\partial x} \Av} \end{align*}

  向量的外积也是向量,记,则 \begin{align*} \uv \times \vv = \begin{bmatrix} u_2 v_3 - u_3 v_2 \\ u_3 v_1 - u_1 v_3 \\ u_1 v_2 - u_2 v_1 \end{bmatrix} \end{align*} 于是 \begin{align*} \class{blue}{\frac{\partial (\uv \times \vv)}{\partial x}} & = \begin{bmatrix} \frac{\partial u_2}{\partial x} v_3 - \frac{\partial u_3}{\partial x} v_2 + u_2 \frac{\partial v_3}{\partial x} - u_3 \frac{\partial v_2}{\partial x} \\ \frac{\partial u_3}{\partial x} v_1 - \frac{\partial u_1}{\partial x} v_3 + u_3 \frac{\partial v_1}{\partial x} - u_1 \frac{\partial v_3}{\partial x} \\ \frac{\partial u_1}{\partial x} v_2 - \frac{\partial u_2}{\partial x} v_1 + u_1 \frac{\partial v_2}{\partial x} - u_2 \frac{\partial v_1}{\partial x} \\ \end{bmatrix} = \class{blue}{\left( \frac{\partial \uv}{\partial x} \right) \times \vv + \uv \times \frac{\partial \vv}{\partial x}} \end{align*}

标量对向量求导

  二次型是标量,设无关,易知有 \begin{align*} \left[ \frac{\partial (\uv^\top \Av \vv)}{\partial \xv} \right]_i & = \frac{\partial (\uv^\top \Av \vv)}{\partial x_i} = \frac{\partial (\sum_j \sum_k u_j a_{jk} v_k)}{\partial x_i} = \sum_j \sum_k u_j a_{jk} \frac{\partial v_k}{\partial x_i} + \sum_j \sum_k \frac{\partial u_j}{\partial x_i} a_{jk} v_k \\ & = \uv^\top \Av \frac{\partial \vv}{\partial x_i} + \vv^\top \Av^\top \frac{\partial \uv}{\partial x_i} = \left[ \uv^\top \Av \frac{\partial \vv}{\partial \xv} \right]_i + \left[ \vv^\top \Av^\top \frac{\partial \uv}{\partial \xv} \right]_i \\ & \Longrightarrow \class{blue}{\frac{\partial (\uv^\top \Av \vv)}{\partial \xv} = \uv^\top \Av \frac{\partial \vv}{\partial \xv} + \vv^\top \Av^\top \frac{\partial \uv}{\partial \xv}} \end{align*}

  特别的,

  • ,则 \begin{align*} \frac{\partial (\uv^\top \vv)}{\partial \xv} = \uv^\top \frac{\partial \vv}{\partial \xv} + \vv^\top \frac{\partial \uv}{\partial \xv} \end{align*} 进一步若无关,则 \begin{align*} \frac{\partial (\av^\top \vv)}{\partial \xv} = \av^\top \frac{\partial \vv}{\partial \xv}, \quad \frac{\partial (\av^\top \xv)}{\partial \xv} = \av^\top \frac{\partial \xv}{\partial \xv} = \av^\top, \quad \frac{\partial (\bv^\top \Av \xv)}{\partial \xv} = \bv^\top \Av \end{align*}
  • ,则 \begin{align*} \frac{\partial (\xv^\top \Av \xv)}{\partial \xv} = \xv^\top \Av \frac{\partial \xv}{\partial \xv} + \xv^\top \Av^\top \frac{\partial \xv}{\partial \xv} = \xv^\top (\Av + \Av^\top) \end{align*} 进一步若,则 \begin{align*} \frac{\partial (\xv^\top \xv)}{\partial \xv} = \frac{\partial \|\xv\|^2}{\partial \xv} = 2 \xv^\top \end{align*}
  • ,则 \begin{align*} \frac{\partial (\xv^\top \bv \av^\top \xv)}{\partial \xv} = \frac{\partial (\av^\top \xv \xv^\top \bv)}{\partial \xv} = \xv^\top (\av \bv^\top + \bv \av^\top) \end{align*}
  • 更一般的有 \begin{align*} \frac{\partial [(\Av \xv + \bv)^\top \Cv (\Dv \xv + \ev)]}{\partial \xv} & = \frac{\partial (\xv^\top \Av^\top \Cv \Dv \xv + \bv^\top \Cv \Dv \xv + \xv^\top \Av^\top \Cv \ev + \bv^\top \ev)}{\partial \xv} \\ & = \xv^\top (\Av^\top \Cv \Dv + \Dv^\top \Cv^\top \Av) + \bv^\top \Cv \Dv + \ev^\top \Cv^\top \Av \\ & = (\Dv \xv + \ev)^\top \Cv^\top \Av + (\Av \xv + \bv)^\top \Cv \Dv \end{align*}

  范数也是标量,若无关,则 \begin{align} \label{eq: norm} \left[ \frac{\partial \| \xv - \av \|}{\partial \xv} \right]_i & = \frac{\partial \| \xv - \av \|}{\partial x_i} = \frac{\partial \sqrt{\sum_j (x_j - a_j)^2}}{\partial x_i} = \frac{1}{2} \frac{2 (x_i - a_i)}{\sqrt{\sum_j (x_j - a_j)^2}} = \frac{x_i - a_i}{\| \xv - \av \|} \\ & \Longrightarrow \class{blue}{\frac{\partial \| \xv - \av \|}{\partial \xv} = \frac{(\xv - \av)^\top}{\| \xv - \av \|}} \nonumber \end{align}

向量对向量求导

  若无关,易知有 \begin{align*} & \left[ \frac{\partial (\Av \uv)}{\partial \xv} \right]_{ij} = \frac{\partial [\Av \uv]_i}{\partial x_j} = \frac{\partial (\sum_k a_{ik} u_k)}{\partial x_j} = \sum_k a_{ik} \frac{\partial u_k}{\partial x_j} = \left[ \Av \frac{\partial \uv}{\partial \xv} \right]_{ij} \Longrightarrow \class{blue}{\frac{\partial (\Av \uv)}{\partial \xv} = \Av \frac{\partial \uv}{\partial \xv}} \end{align*} 特别的,若,则 \begin{align*} \frac{\partial (\Av \xv)}{\partial \xv} = \Av \frac{\partial \xv}{\partial \xv} = \Av \end{align*}

  若,则 \begin{align*} \left[ \frac{\partial (v \uv)}{\partial \xv} \right]_{ij} = \frac{\partial (v u_i)}{\partial x_j} = v \frac{\partial u_i}{\partial x_j} + u_i \frac{\partial v}{\partial x_j} = v \left[ \frac{\partial \uv}{\partial \xv} \right]_{ij} + \left[ \uv \frac{\partial v}{\partial \xv} \right]_{ij} \Longrightarrow \class{blue}{\frac{\partial (v \uv)}{\partial \xv} = v \frac{\partial \uv}{\partial \xv} + \uv \frac{\partial v}{\partial \xv}} \end{align*} 注意第一项是标量乘以雅可比矩阵,第二项是列向量乘以行向量。

  若无关,结合式(\ref{eq: norm})可得 \begin{align*} \left[ \frac{\partial}{\partial \xv} \frac{\xv - \av}{\| \xv - \av \|} \right]_{ij} & = \frac{\partial}{\partial x_j} \frac{x_i - a_i}{\| \xv - \av \|} = \frac{\delta_{ij} \|\xv - \av\|}{\| \xv - \av \|^2} - \frac{x_i - a_i}{\| \xv - \av \|^2} \frac{\partial \| \xv - \av \|}{\partial x_j} \\ & = \frac{\delta_{ij}}{\| \xv - \av \|} - \frac{x_i - a_i}{\| \xv - \av \|^2} \frac{x_j - a_j}{\| \xv - \av \|} \\ & \Longrightarrow \class{blue}{\frac{\partial}{\partial \xv} \frac{\xv - \av}{\| \xv - \av \|} = \frac{\Iv}{\| \xv - \av \|} - \frac{(\xv - \av)(\xv - \av)^\top}{\| \xv - \av \|^3}} \end{align*}

矩阵对标量求导

  若,则 \begin{align*} \left[ \frac{\partial (u \Vv)}{\partial x} \right]_{ij} = \frac{\partial (u v_{ij})}{\partial x} = \frac{\partial u}{\partial x} v_{ij} + u \frac{\partial v_{ij}}{\partial x} = \frac{\partial u}{\partial x} \left[ \Vv \right]_{ij} + u \left[ \frac{\partial \Vv}{\partial x} \right]_{ij} \Longrightarrow \class{blue}{\frac{\partial (u \Vv)}{\partial x} = \frac{\partial u}{\partial x} \Vv + u \frac{\partial \Vv}{\partial x}} \end{align*}

  若乘积求导法则中的可继续分解为相关项的乘积,例如,则 \begin{align} \label{eq: product} \class{blue}{\frac{\partial (\Uv \Vv \Wv)}{\partial x}} = \frac{\partial \Uv}{\partial x} \Vv \Wv + \Uv \frac{\partial (\Vv \Wv)}{\partial x} = \frac{\partial \Uv}{\partial x} \Vv \Wv + \Uv \left( \frac{\partial \Vv}{\partial x} \Wv + \Vv \frac{\partial \Wv}{\partial x} \right) = \class{blue}{\frac{\partial \Uv}{\partial x} \Vv \Wv + \Uv \frac{\partial \Vv}{\partial x} \Wv + \Uv \Vv \frac{\partial \Wv}{\partial x}} \end{align} 由此可知若无关,则 \begin{align*} \frac{\partial (\Av \Uv \Bv)}{\partial x} = \Av \frac{\partial \Uv}{\partial x} \Bv \end{align*} 当为方阵、为正整数时有 \begin{align} \label{eq: power} \class{blue}{\frac{\partial \Uv^n}{\partial x}} = \Uv^{n-1} \frac{\partial \Uv}{\partial x} + \Uv^{n-2} \frac{\partial \Uv}{\partial x} \Uv + \cdots + \Uv \frac{\partial \Uv}{\partial x} \Uv^{n-2} + \frac{\partial \Uv}{\partial x} \Uv^{n-1} = \class{blue}{\sum_{i \in [n]} \Uv^{i-1} \frac{\partial \Uv}{\partial x} \Uv^{n-i}} \end{align}

  令乘积求导法则中的可得 \begin{align} \label{eq: inverse} \zerov = \frac{\partial \Iv}{\partial x} = \frac{\partial (\Uv \Uv^{-1})}{\partial x} = \Uv \frac{\partial \Uv^{-1}}{\partial x} + \frac{\partial \Uv}{\partial x} \Uv^{-1} \Longrightarrow \class{blue}{\frac{\partial \Uv^{-1}}{\partial x} = - \Uv^{-1} \frac{\partial \Uv}{\partial x} \Uv^{-1}} \end{align} 由此可知 \begin{align*} \class{blue}{\frac{\partial [\Xv^{-1}]_{kl}}{\partial x_{ij}}} & = \tr \left( \frac{\partial [\Xv^{-1}]_{kl}}{\partial \Xv^{-1}} \frac{\partial \Xv^{-1}}{\partial x_{ij}} \right) = - \tr \left( \Ev_{lk} \Xv^{-1} \frac{\partial \Xv}{\partial x_{ij}} \Xv^{-1} \right) = - \tr ( \Xv^{-1} \Ev_{lk} \Xv^{-1} \Ev_{ij} ) \\ & = - [\Xv^{-1} \Ev_{lk} \Xv^{-1}]_{ji} = - \sum_p \sum_q [\Xv^{-1}]_{jp} [\Ev_{lk}]_{pq} [\Xv^{-1}]_{qi} = \class{blue}{- [\Xv^{-1}]_{jl} [\Xv^{-1}]_{ki}} \end{align*} 结合式(\ref{eq: product})还可得海森矩阵 \begin{align*} \class{blue}{\frac{\partial^2 \Uv^{-1}}{\partial x \partial y}} & = \frac{\partial}{\partial y} \left( - \Uv^{-1} \frac{\partial \Uv}{\partial x} \Uv^{-1} \right) = - \frac{\partial \Uv^{-1}}{\partial y} \frac{\partial \Uv}{\partial x} \Uv^{-1} - \Uv^{-1} \frac{\partial^2 \Uv}{\partial x \partial y} \Uv^{-1} - \Uv^{-1} \frac{\partial \Uv}{\partial x} \frac{\partial \Uv^{-1}}{\partial y} \\ & = \Uv^{-1} \frac{\partial \Uv}{\partial y} \Uv^{-1} \frac{\partial \Uv}{\partial x} \Uv^{-1} - \Uv^{-1} \frac{\partial^2 \Uv}{\partial x \partial y} \Uv^{-1} + \Uv^{-1} \frac{\partial \Uv}{\partial x} \Uv^{-1} \frac{\partial \Uv}{\partial y} \Uv^{-1} \\ & = \class{blue}{\Uv^{-1} \left( \frac{\partial \Uv}{\partial y} \Uv^{-1} \frac{\partial \Uv}{\partial x} - \frac{\partial^2 \Uv}{\partial x \partial y} + \frac{\partial \Uv}{\partial x} \Uv^{-1} \frac{\partial \Uv}{\partial y} \right) \Uv^{-1}} \end{align*}

  矩阵除了常规的乘积外,还有克罗内克积和哈达玛积。设,则 \begin{align*} \class{blue}{\frac{\partial (\Uv \otimes \Vv)}{\partial x}} & = \begin{bmatrix} \frac{\partial u_{11} \Vv}{\partial x} & \frac{\partial u_{12} \Vv}{\partial x} & \cdots & \frac{\partial u_{1n} \Vv}{\partial x} \\ \frac{\partial u_{21} \Vv}{\partial x} & \frac{\partial u_{22} \Vv}{\partial x} & \cdots & \frac{\partial u_{2n} \Vv}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial u_{m1} \Vv}{\partial x} & \frac{\partial u_{m2} \Vv}{\partial x} & \cdots & \frac{\partial u_{mn} \Vv}{\partial x} \\ \end{bmatrix} \\ & = \begin{bmatrix} \frac{\partial u_{11}}{\partial x} \Vv + u_{11} \frac{\partial \Vv}{\partial x} & \frac{\partial u_{12}}{\partial x} \Vv + u_{12} \frac{\partial \Vv}{\partial x} & \cdots & \frac{\partial u_{1n}}{\partial x} \Vv + u_{1n} \frac{\partial \Vv}{\partial x} \\ \frac{\partial u_{21}}{\partial x} \Vv + u_{21} \frac{\partial \Vv}{\partial x} & \frac{\partial u_{22}}{\partial x} \Vv + u_{22} \frac{\partial \Vv}{\partial x} & \cdots & \frac{\partial u_{2n}}{\partial x} \Vv + u_{2n} \frac{\partial \Vv}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial u_{m1}}{\partial x} \Vv + u_{m1} \frac{\partial \Vv}{\partial x} & \frac{\partial u_{m2}}{\partial x} \Vv + u_{m2} \frac{\partial \Vv}{\partial x} & \cdots & \frac{\partial u_{mn}}{\partial x} \Vv + u_{mn} \frac{\partial \Vv}{\partial x} \\ \end{bmatrix} \\ & = \begin{bmatrix} \frac{\partial u_{11}}{\partial x} \Vv & \frac{\partial u_{12}}{\partial x} \Vv & \cdots & \frac{\partial u_{1n}}{\partial x} \Vv \\ \frac{\partial u_{21}}{\partial x} \Vv & \frac{\partial u_{22}}{\partial x} \Vv & \cdots & \frac{\partial u_{2n}}{\partial x} \Vv \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial u_{m1}}{\partial x} \Vv & \frac{\partial u_{m2}}{\partial x} \Vv & \cdots & \frac{\partial u_{mn}}{\partial x} \Vv \\ \end{bmatrix} + \begin{bmatrix} u_{11} \frac{\partial \Vv}{\partial x} & u_{12} \frac{\partial \Vv}{\partial x} & \cdots & u_{1n} \frac{\partial \Vv}{\partial x} \\ u_{21} \frac{\partial \Vv}{\partial x} & u_{22} \frac{\partial \Vv}{\partial x} & \cdots & u_{2n} \frac{\partial \Vv}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\ u_{m1} \frac{\partial \Vv}{\partial x} & u_{m2} \frac{\partial \Vv}{\partial x} & \cdots & u_{mn} \frac{\partial \Vv}{\partial x} \\ \end{bmatrix} \\ & = \class{blue}{\frac{\partial \Uv}{\partial x} \otimes \Vv + \Uv \otimes \frac{\partial \Vv}{\partial x}} \end{align*} 设,则 \begin{align*} \class{blue}{\frac{\partial (\Uv \odot \Vv)}{\partial x}} & = \begin{bmatrix} \frac{\partial u_{11} v_{11}}{\partial x} & \frac{\partial u_{12} v_{12}}{\partial x} & \cdots & \frac{\partial u_{1n} v_{1n}}{\partial x} \\ \frac{\partial u_{21} v_{21}}{\partial x} & \frac{\partial u_{22} v_{22}}{\partial x} & \cdots & \frac{\partial u_{2n} v_{2n}}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial u_{m1} v_{m1}}{\partial x} & \frac{\partial u_{m2} v_{m2}}{\partial x} & \cdots & \frac{\partial u_{mn} v_{mn}}{\partial x} \\ \end{bmatrix} \\ & = \begin{bmatrix} \frac{\partial u_{11}}{\partial x} v_{11} & \frac{\partial u_{12}}{\partial x} v_{12} & \cdots & \frac{\partial u_{1n}}{\partial x} v_{1n} \\ \frac{\partial u_{21}}{\partial x} v_{21} & \frac{\partial u_{22}}{\partial x} v_{22} & \cdots & \frac{\partial u_{2n}}{\partial x} v_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial u_{m1}}{\partial x} v_{m1} & \frac{\partial u_{m2}}{\partial x} v_{m2} & \cdots & \frac{\partial u_{mn}}{\partial x} v_{mn} \\ \end{bmatrix} + \begin{bmatrix} u_{11} \frac{\partial v_{11}}{\partial x} & u_{12} \frac{\partial v_{12}}{\partial x} & \cdots & u_{1n} \frac{\partial v_{1n}}{\partial x} \\ u_{21} \frac{\partial v_{21}}{\partial x} & u_{22} \frac{\partial v_{22}}{\partial x} & \cdots & u_{2n} \frac{\partial v_{2n}}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\ u_{m1} \frac{\partial v_{m1}}{\partial x} & u_{m2} \frac{\partial v_{m2}}{\partial x} & \cdots & u_{mn} \frac{\partial v_{mn}}{\partial x} \\ \end{bmatrix} \\ & = \class{blue}{\frac{\partial \Uv}{\partial x} \odot \Vv + \Uv \odot \frac{\partial \Vv}{\partial x}} \end{align*}

  设多项式函数,则,若为与无关的方阵,记 \begin{align*} g (x \Av) & = a_0 \Iv + a_1 x \Av + a_2 x^2 \Av^2 + a_3 x^3 \Av^3 + \cdots \\ g' (x \Av) & = a_1 \Iv + 2 a_2 x \Av + 3 a_3 x^2 \Av^2 + \cdots \end{align*} 易知有 \begin{align*} \class{blue}{\frac{\partial g(x \Av)}{\partial x}} & = a_1 \Av + 2 a_2 x \Av^2 + 3 a_3 x^2 \Av^3 + \cdots \\ & = \Av (a_1 \Iv + 2 a_2 x \Av + 3 a_3 x^2 \Av^2 + \cdots) = \class{blue}{\Av g' (x \Av)} \\ & = (a_1 \Iv + 2 a_2 x \Av + 3 a_3 x^2 \Av^2 + \cdots) \Av = \class{blue}{g' (x \Av) \Av} \end{align*} 对于,上式依然适用,例如 \begin{align*} \frac{\partial e^{x \Av}}{\partial x} = \Av e^{x \Av} = e^{x \Av} \Av \end{align*}

标量对矩阵求导

  矩阵常见的标量函数有行列式,二次型可以归为迹来处理。

迹对矩阵求导

  若无关,,则以下结论是显然的: \begin{align*} \frac{\partial \tr(\Xv)}{\partial \Xv} = \Iv, \quad \frac{\partial \tr(\Uv+\Vv)}{\partial \Xv} = \frac{\partial \tr(\Uv)}{\partial \Xv} + \frac{\partial \tr(\Vv)}{\partial \Xv}, \quad \frac{\partial \tr(a \Uv)}{\partial \Xv} = a \frac{\partial \tr(\Uv)}{\partial \Xv} \end{align*} 对于乘积有 \begin{align*} \left[ \frac{\partial \tr(\Uv \Vv)}{\partial \Xv} \right]_{ij} & = \class{blue}{\frac{\partial \tr(\Uv \Vv)}{\partial x_{ji}}} = \frac{\partial (\sum_p \sum_q u_{pq} v_{qp})}{\partial x_{ji}} = \sum_p \sum_q \left( \frac{\partial u_{pq}}{\partial x_{ji}} v_{qp} + u_{pq} \frac{\partial v_{qp}}{\partial x_{ji}} \right) \\ & = \class{blue}{\tr \left( \frac{\partial \Uv}{\partial x_{ji}} \Vv \right) + \tr \left( \Uv \frac{\partial \Vv}{\partial x_{ji}} \right)} = \tr \left( \frac{\partial (\Uv \Vv)}{\partial x_{ji}} \right) \end{align*} 由此可知迹和求导的顺序可以交换。特别的,

  • 无关,,则 \begin{align*} \left[ \frac{\partial \tr(\Bv \Av \Xv)}{\partial \Xv} \right]_{ij} = \tr \left( \Bv \Av \frac{\partial \Xv}{\partial x_{ji}} \right) = \tr ( \Bv \Av \Ev_{ji} ) = [\Bv \Av]_{ij} \Longrightarrow \frac{\partial \tr(\Bv \Av \Xv)}{\partial \Xv} = \frac{\partial \tr(\Av \Xv \Bv)}{\partial \Xv} = \Bv \Av \end{align*}
  • 无关,,则 \begin{align*} \frac{\partial \tr(\Bv \Av \Xv^\top)}{\partial \Xv} = \frac{\partial \tr(\Xv \Av^\top \Bv^\top)}{\partial \Xv} = \frac{\partial \tr(\Av^\top \Bv^\top \Xv)}{\partial \Xv} = \Av^\top \Bv^\top \end{align*}
  • 无关,,则 \begin{align*} \left[ \frac{\partial \tr(\Av \Xv \Xv^\top)}{\partial \Xv} \right]_{ij} & = \tr \left( \Av \frac{\partial \Xv \Xv^\top}{\partial x_{ji}} \right) = \tr \left( \Av \frac{\partial \Xv}{\partial x_{ji}} \Xv^\top \right) + \tr \left( \Av \Xv \frac{\partial \Xv^\top}{\partial x_{ji}} \right) \\ & = \tr(\Av \Ev_{ji} \Xv^\top) + \tr(\Av \Xv \Ev_{ij}) \\ & = [\Xv^\top \Av]_{ij} + [\Av \Xv]_{ji} \end{align*} 从而 \begin{align*} \frac{\partial \tr(\Av \Xv \Xv^\top)}{\partial \Xv} = \frac{\partial \tr(\Xv^\top \Av \Xv)}{\partial \Xv} = \frac{\partial \tr(\Xv \Xv^\top \Av)}{\partial \Xv} = \Xv^\top \Av + \Xv^\top \Av^\top = \Xv^\top (\Av + \Av^\top) \end{align*}
  • 无关,,则 \begin{align*} \left[ \frac{\partial \tr(\Av \Xv^\top \Xv)}{\partial \Xv} \right]_{ij} & = \tr \left( \Av \frac{\partial \Xv^\top \Xv}{\partial x_{ji}} \right) = \tr \left( \Av \frac{\partial \Xv^\top}{\partial x_{ji}} \Xv \right) + \tr \left( \Av \Xv^\top \frac{\partial \Xv}{\partial x_{ji}} \right) \\ & = \tr(\Av \Ev_{ij} \Xv) + \tr(\Av \Xv^\top \Ev_{ji}) \\ & = [\Xv \Av]_{ji} + [\Av \Xv^\top]_{ij} \end{align*} 从而 \begin{align*} \frac{\partial \tr(\Av \Xv^\top \Xv)}{\partial \Xv} = \frac{\partial \tr(\Xv \Av \Xv^\top)}{\partial \Xv} = \frac{\partial \tr(\Xv^\top \Xv \Av)}{\partial \Xv} = (\Av + \Av^\top) \Xv^\top \end{align*}
  • 无关,,结合式(\ref{eq: inverse})可得 \begin{align*} \left[ \frac{\partial \tr(\Bv \Av \Xv^{-1})}{\partial \Xv} \right]_{ij} & = \tr \left( \Bv \Av \frac{\partial \Xv^{-1}}{\partial x_{ji}} \right) = \tr \left( - \Bv \Av \Xv^{-1} \frac{\partial \Xv}{\partial x_{ji}} \Xv^{-1} \right) \\ & = - \tr \left( \Xv^{-1} \Bv \Av \Xv^{-1} \Ev_{ji} \right) = - [\Xv^{-1} \Bv \Av \Xv^{-1}]_{ij} \\ & \Longrightarrow \frac{\partial \tr(\Bv \Av \Xv^{-1})}{\partial \Xv} = \frac{\partial \tr(\Av \Xv^{-1} \Bv)}{\partial \Xv} = - \Xv^{-1} \Bv \Av \Xv^{-1} \end{align*}
  • ,结合式(\ref{eq: inverse})可得 \begin{align*} \left[ \frac{\partial \tr(\Xv + \Av)^{-1}}{\partial \Xv} \right]_{ij} & = \tr \left( \frac{\partial (\Xv + \Av)^{-1}}{\partial x_{ji}} \right) = - \tr \left( (\Xv + \Av)^{-1} \frac{\partial (\Xv + \Av)}{\partial x_{ji}} (\Xv + \Av)^{-1} \right) \\ & = - \tr \left( (\Xv + \Av)^{-1} (\Xv + \Av)^{-1} \Ev_{ji} \right) = - [(\Xv + \Av)^{-1} (\Xv + \Av)^{-1}]_{ij} \\ & \Longrightarrow \frac{\partial \tr(\Xv + \Av)^{-1}}{\partial \Xv} = - (\Xv + \Av)^{-1} (\Xv + \Av)^{-1} \end{align*}
  • ,其中无关,则 \begin{align*} \left[ \frac{\partial \tr(\Av \Xv \Bv \Xv^\top \Cv)}{\partial \Xv} \right]_{ij} & = \tr \left( \frac{\partial (\Av \Xv \Bv)}{\partial x_{ji}} \Xv^\top \Cv \right) + \tr \left( \Av \Xv \Bv \frac{\partial (\Xv^\top \Cv)}{\partial x_{ji}} \right) \\ & = \tr \left( \Av \Ev_{ji} \Bv \Xv^\top \Cv \right) + \tr \left( \Av \Xv \Bv \Ev_{ij} \Cv \right) \\ & = [\Bv \Xv^\top \Cv \Av]_{ij} + [\Cv \Av \Xv \Bv]_{ji} \\ & \Longrightarrow \frac{\partial \tr(\Av \Xv \Bv \Xv^\top \Cv)}{\partial \Xv} = \Bv \Xv^\top \Cv \Av + \Bv^\top \Xv^\top \Av^\top \Cv^\top \end{align*}
  • ,其中无关,则 \begin{align*} \left[ \frac{\partial \tr(\Av \Xv^\top \Bv \Xv \Cv)}{\partial \Xv} \right]_{ij} & = \tr \left( \frac{\partial (\Av \Xv^\top \Bv)}{\partial x_{ji}} \Xv \Cv \right) + \tr \left( \Av \Xv^\top \Bv \frac{\partial (\Xv \Cv)}{\partial x_{ji}} \right) \\ & = \tr \left( \Av \Ev_{ij} \Bv \Xv \Cv \right) + \tr \left( \Av \Xv^\top \Bv \Ev_{ji} \Cv \right) \\ & = [\Bv \Xv \Cv \Av]_{ji} + [\Cv \Av \Xv^\top \Bv]_{ij} \\ & \Longrightarrow \frac{\partial \tr(\Av \Xv^\top \Bv \Xv \Cv)}{\partial \Xv} = \Cv \Av \Xv^\top \Bv + \Av^\top \Cv^\top \Xv^\top \Bv^\top \end{align*}
  • 无关,,其中是正整数,结合式(\ref{eq: power})可得 \begin{align*} \left[ \frac{\partial \tr(\Bv \Av \Xv^n)}{\partial \Xv} \right]_{ij} & = \tr \left( \Bv \Av \frac{\partial \Xv^n}{\partial x_{ji}} \right) = \tr \left( \Bv \Av \sum_{k \in [n]} \Xv^{k-1} \frac{\partial \Xv}{\partial x_{ji}} \Xv^{n-k} \right) = \sum_{k \in [n]} \tr \left( \Bv \Av \Xv^{k-1} \frac{\partial \Xv}{\partial x_{ji}} \Xv^{n-k} \right) \\ & = \sum_{k \in [n]} \tr ( \Xv^{n-k} \Bv \Av \Xv^{k-1} \Ev_{ji} ) = \sum_{k \in [n]} [\Xv^{n-k} \Bv \Av \Xv^{k-1}]_{ij} \\ & \Longrightarrow \frac{\partial \tr(\Bv \Av \Xv^n)}{\partial \Xv} = \frac{\partial \tr(\Av \Xv^n \Bv)}{\partial \Xv} = \sum_{k \in [n]} \Xv^{n-k} \Bv \Av \Xv^{k-1} \end{align*} 进一步若,则 \begin{align*} \frac{\partial \tr(\Xv^n)}{\partial \Xv} = \sum_{k \in [n]} \Xv^{n-k} \Xv^{k-1} = \sum_{k \in [n]} \Xv^{n-1} = n \Xv^{n-1} \end{align*} 不难发现形式上和单变量的求导公式是一样的。类似的记 \begin{align*} e^{\Xv} & = \Iv + \Xv + \frac{\Xv^2}{2!} + \frac{\Xv^3}{3!} + \cdots \\ \sin \Xv & = \Xv - \frac{\Xv^3}{3!} + \frac{\Xv^5}{5!} - \cdots \\ \cos \Xv & = \Iv - \frac{\Xv^2}{2!} + \frac{\Xv^4}{4!} - \frac{\Xv^6}{6!} + \cdots \end{align*} 结合式(\ref{eq: power})可得 \begin{align*} \frac{\partial \tr(e^{\Xv})}{\partial \Xv} & = \frac{\partial }{\partial \Xv} \tr \left( \Iv + \Xv + \frac{\Xv^2}{2!} + \frac{\Xv^3}{3!} + \cdots \right) \\ & = \frac{\partial \tr (\Iv)}{\partial \Xv} + \frac{\partial \tr (\Xv)}{\partial \Xv} + \frac{1}{2!} \frac{\partial \tr (\Xv^2)}{\partial \Xv} + \frac{1}{3!} \frac{\partial \tr (\Xv^3)}{\partial \Xv} + \cdots \\ & = \Iv + \Xv + \frac{\Xv^2}{2!} + \cdots = e^{\Xv} \end{align*} 以及 \begin{align*} \frac{\partial \tr(\sin \Xv)}{\partial \Xv} & = \frac{\partial }{\partial \Xv} \tr \left( \Xv - \frac{\Xv^3}{3!} + \frac{\Xv^5}{5!} - \cdots \right) \\ & = \frac{1}{1!} \frac{\partial \tr (\Xv)}{\partial \Xv} - \frac{1}{3!} \frac{\partial \tr (\Xv^3)}{\partial \Xv} + \frac{1}{5!} \frac{\partial \tr (\Xv^5)}{\partial \Xv} - \cdots \\ & = \Iv - \frac{\Xv^2}{2!} + \frac{\Xv^4}{4!} - \cdots = \cos \Xv \\ \frac{\partial \tr(\cos \Xv)}{\partial \Xv} & = \frac{\partial }{\partial \Xv} \tr \left( \Iv - \frac{\Xv^2}{2!} + \frac{\Xv^4}{4!} - \frac{\Xv^6}{6!} + \cdots \right) \\ & = \frac{\partial \tr (\Iv)}{\partial \Xv} - \frac{1}{2!} \frac{\partial \tr (\Xv^2)}{\partial \Xv} + \frac{1}{4!} \frac{\partial \tr (\Xv^4)}{\partial \Xv} - \frac{1}{6!} \frac{\partial \tr (\Xv^6)}{\partial \Xv} + \cdots \\ & = - \Xv + \frac{\Xv^3}{3!} - \frac{\Xv^5}{5!} + \cdots = - \sin \Xv \end{align*} 均与单变量的求导公式一样。

  • ,则 \begin{align*} \left[ \frac{\partial \tr(\Av \otimes \Xv)}{\partial \Xv} \right]_{ij} & = \tr \left( \frac{\partial \Av \otimes \Xv}{\partial x_{ji}} \right) = \tr \left( \Av \otimes \frac{\partial \Xv}{\partial x_{ji}} \right) = \tr ( \Av \otimes \Ev_{ji} ) = \tr(\Av) \delta_{ij} \\ & \Longrightarrow \frac{\partial \tr(\Av \otimes \Xv)}{\partial \Xv} = \tr(\Av) \Iv \end{align*}

  • ,则 \begin{align*} \left[ \frac{\partial \tr(\Xv \otimes \Xv)}{\partial \Xv} \right]_{ij} & = \tr \left( \frac{\partial \Xv \otimes \Xv}{\partial x_{ji}} \right) = \tr \left( \frac{\partial \Xv}{\partial x_{ji}} \otimes \Xv + \Xv \otimes \frac{\partial \Xv}{\partial x_{ji}} \right) \\ & = \tr ( \Ev_{ji} \otimes \Xv ) + \tr ( \Xv \otimes \Ev_{ji} ) = 2 \tr(\Xv) \delta_{ij} \\ & \Longrightarrow \frac{\partial \tr(\Xv \otimes \Xv)}{\partial \Xv} = 2 \tr(\Xv) \Iv \end{align*}

行列式对矩阵求导

  设无关,结合式(\ref{eq: chain-matrix})易知 \begin{align*} \left[ \frac{\partial |\Av \Xv \Bv|}{\partial \Xv} \right]_{ij} = \frac{\partial |\Yv|}{\partial x_{ji}} = \sum_p \sum_q \frac{\partial |\Yv|}{\partial y_{pq}}\frac{\partial y_{pq}}{\partial x_{ji}} = \tr \left( \frac{\partial |\Yv|}{\partial \Yv} \frac{\partial \Yv}{\partial x_{ji}} \right) \end{align*} 其中第二项 \begin{align*} \frac{\partial \Yv}{\partial x_{ji}} = \frac{\partial (\Av \Xv \Bv)}{\partial x_{ji}} = \Av \frac{\partial \Xv}{\partial x_{ji}} \Bv = \Av \Ev_{ji} \Bv \end{align*} 记有一个微小增量后的矩阵为,根据第行拉普拉斯展开易知 \begin{align*} |\Yv(y_{ji} + \epsilon)| - |\Yv| = \epsilon C_{ji} \end{align*} 其中是关于代数余子式,因此 \begin{align*} \left[ \frac{\partial |\Yv|}{\partial \Yv} \right]_{ij} = \frac{\partial |\Yv|}{\partial y_{ji}} = \lim_{\epsilon \rightarrow 0} \frac{|\Yv(y_{ji} + \epsilon)| - |\Yv|}{\epsilon} = C_{ji} \end{align*} 故第一项 \begin{align*} \frac{\partial |\Yv|}{\partial \Yv} = \begin{bmatrix} C_{11} & C_{21} & \cdots & C_{n1} \\ C_{12} & C_{22} & \cdots & C_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ C_{1n} & C_{2n} & \cdots & C_{nn} \end{bmatrix} = \Yv^* \end{align*} 代入可得 \begin{align*} \left[ \frac{\partial |\Av \Xv \Bv|}{\partial \Xv} \right]_{ij} & = \tr \left( \frac{\partial |\Yv|}{\partial \Yv} \frac{\partial \Yv}{\partial x_{ji}} \right) = \tr (\Yv^* \Av \Ev_{ji} \Bv) = [\Bv \Yv^* \Av]_{ij} \\ & \Longrightarrow \class{blue}{\frac{\partial |\Av \Xv \Bv|}{\partial \Xv} = \Bv (\Av \Xv \Bv)^* \Av} \end{align*} 若均为可逆方阵,则亦为可逆方阵,于是 \begin{align} \label{eq: determinant} \frac{\partial |\Av \Xv \Bv|}{\partial \Xv} = \Bv (\Av \Xv \Bv)^* \Av = \Bv |\Av \Xv \Bv| (\Av \Xv \Bv)^{-1} \Av = |\Av \Xv \Bv| \Xv^{-1} \end{align} 进一步若,则 \begin{align*} \frac{\partial |\Xv|}{\partial \Xv} = \Xv^* = |\Xv| \Xv^{-1} \end{align*} 由此可得 \begin{align*} \frac{\partial |\Xv^n|}{\partial \Xv} = \frac{\partial |\Xv|^n}{\partial \Xv} = n |\Xv|^{n-1} \Xv^* = n |\Xv|^n \Xv^{-1} = n |\Xv^n| \Xv^{-1} \end{align*} 若无关,则 \begin{align*} \frac{\partial \ln |a \Xv|}{\partial \Xv} = \frac{\partial \ln a^m |\Xv|}{\partial \Xv} = \frac{\partial \ln a^m}{\partial \Xv} + \frac{\partial \ln |\Xv|}{\partial \Xv} = \frac{1}{|\Xv|} \frac{\partial |\Xv|}{\partial \Xv} = \frac{\Xv^*}{|\Xv|} = \Xv^{-1} \end{align*}

  设可逆,无关,易知有 \begin{align*} \left[ \frac{\partial |\Xv^\top \Av \Xv|}{\partial \Xv} \right]_{ij} & = \tr \left( \Yv^* \frac{\partial \Xv^\top \Av \Xv}{\partial x_{ji}} \right) = \tr \left( \Yv^* \frac{\partial \Xv^\top}{\partial x_{ji}} \Av \Xv \right) + \tr \left( \Yv^* \Xv^\top \Av \frac{\partial \Xv}{\partial x_{ji}} \right) \\ & = \tr ( \Yv^* \Ev_{ij} \Av \Xv ) + \tr ( \Yv^* \Xv^\top \Av \Ev_{ji} ) = [\Av \Xv \Yv^*]_{ji} + [\Yv^* \Xv^\top \Av]_{ij} \end{align*} 于是 \begin{align*} \frac{\partial |\Xv^\top \Av \Xv|}{\partial \Xv} & = (\Av \Xv \Yv^*)^\top + \Yv^* \Xv^\top \Av = (\Av \Xv |\Xv^\top \Av \Xv| (\Xv^\top \Av \Xv)^{-1})^\top + |\Xv^\top \Av \Xv| (\Xv^\top \Av \Xv)^{-1} \Xv^\top \Av \\ & = |\Xv^\top \Av \Xv| (\Xv^\top \Av^\top \Xv)^{-1} \Xv^\top \Av^\top + |\Xv^\top \Av \Xv| (\Xv^\top \Av \Xv)^{-1} \Xv^\top \Av \\ & = |\Xv^\top \Av \Xv| ((\Xv^\top \Av^\top \Xv)^{-1} \Xv^\top \Av^\top + (\Xv^\top \Av \Xv)^{-1} \Xv^\top \Av) \end{align*} 若对称,则 \begin{align*} \frac{\partial |\Xv^\top \Av \Xv|}{\partial \Xv} = 2 |\Xv^\top \Av \Xv| (\Xv^\top \Av \Xv)^{-1} \Xv^\top \Av \end{align*}

  • 是方阵,则其均可逆,于是 \begin{align*} \frac{\partial |\Xv^\top \Av \Xv|}{\partial \Xv} = 2 |\Xv^\top| |\Av| |\Xv| \Xv^{-1} \Av^{-1} \Xv^{-\top} \Xv^\top \Av = 2 |\Xv|^2 |\Av| \Xv^{-1} \end{align*}
  • ,则 \begin{align*} \frac{\partial |\Xv^\top \Xv|}{\partial \Xv} = 2 |\Xv^\top \Xv| (\Xv^\top \Xv)^{-1} \Xv^\top = 2 |\Xv^\top \Xv| \Xv^\dagger \end{align*} 以及 \begin{align*} \frac{\partial \ln |\Xv^\top \Xv|}{\partial \Xv} = \frac{1}{|\Xv^\top \Xv|} \frac{\partial |\Xv^\top \Xv|}{\partial \Xv} = 2 \Xv^\dagger \end{align*}
Copyright © Avanti 2020 all right reserved,powered by Gitbook文件最后修改时间: 2021-11-14 16:38:31

results matching ""

    No results matching ""