Adding Gradient with respect to T

JeanVanDyk · JeanVanDyk · commit fc48f8728741 · 2025-08-07T17:54:07.000+02:00
diff --git a/notebooks/Kalman_Filter_Gradient.ipynb b/notebooks/Kalman_Filter_Gradient.ipynb
@@ -10,7 +10,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 214,
+   "execution_count": 1,
    "id": "90979a41",
    "metadata": {},
    "outputs": [],
@@ -40,18 +40,10 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
    "id": "fdb156d6",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "<function compile_statespace.<locals>.f at 0x000001820DC942C0>\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "mod = (\n",
     "    sts.LevelTrendComponent(order=2, innovations_order=[0, 1], name='level') +\n",
@@ -87,7 +79,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 218,
+   "execution_count": 3,
    "id": "3661408d",
    "metadata": {},
    "outputs": [],
@@ -140,7 +132,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 219,
+   "execution_count": 4,
    "id": "35351096",
    "metadata": {},
    "outputs": [],
@@ -268,7 +260,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 132,
+   "execution_count": 5,
    "id": "ee21ef4e",
    "metadata": {},
    "outputs": [],
@@ -330,7 +322,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 133,
+   "execution_count": 6,
    "id": "8c89b018",
    "metadata": {},
    "outputs": [],
@@ -387,7 +379,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 134,
+   "execution_count": 7,
    "id": "bba53a26",
    "metadata": {},
    "outputs": [],
@@ -426,7 +418,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 8,
    "id": "c17949b7",
    "metadata": {},
    "outputs": [],
@@ -441,7 +433,7 @@
    "id": "f0bc0287",
    "metadata": {},
    "source": [
-    "## Gradient with respect to **H**\n",
+    "### Gradient with respect to **H**\n",
     "\n",
     "From the article we have :\n",
     "\n",
@@ -472,7 +464,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 135,
+   "execution_count": 9,
    "id": "84cb6867",
    "metadata": {},
    "outputs": [],
@@ -499,6 +491,68 @@
     "    return K.T @ P_filtered_grad @ K - 0.5 * K.T @ a_filtered_grad @ v.T @ F_inv - 0.5 *  F_inv @ v @ a_filtered_grad.T @ K + F_inv - F_inv @ v @ v.T @ F_inv"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4fa2ffc0",
+   "metadata": {},
+   "source": [
+    "### Gradient with respect to **T**\n",
+    "\n",
+    "This gradient was not given in the article. Here are the steps that got me to this expression :\n",
+    "\n",
+    "1 - Only $x_{n|n-1}$ and $P_{n|n-1}$ depends on $T_n$. Hence :\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial T} = \\frac{\\partial L}{\\partial x_{n|n-1}} \\frac{\\partial x_{n|n-1}}{\\partial T} + \\frac{\\partial L}{\\partial P_{n|n-1}} \\frac{\\partial T}{\\partial P_{n|n-1}}\n",
+    "$$\n",
+    "2 - Using the equation (11) and (12) of the article, on the (1), we directly got that :\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial x_{n|n-1}} \\frac{\\partial x_{n|n-1}}{\\partial T} = \\frac{\\partial L}{\\partial x_{n|n-1}} x_{n-1|n-1}^T\n",
+    "$$\n",
+    "3 - Recognizing the first quadratic form in the equation (2), and using equation (11) we got :\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial P_{n|n-1}} \\frac{\\partial P_{n|n-1}}{\\partial T^T} = P_{n|n-1}T_n^T \\frac{\\partial L}{\\partial P_{n|n-1}}^T + P_{n|n-1}^T T_n^T \\frac{\\partial L}{\\partial P_{n|n-1}}\n",
+    "$$\n",
+    "4 - Now transposing to get the dependencies on T :\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial P_{n|n-1}} \\frac{\\partial P_{n|n-1}}{\\partial T} = \\frac{\\partial L}{\\partial P_{n|n-1}} T_n P_{n|n-1}^T +\\frac{\\partial L}{\\partial P_{n|n-1}}^T T_n P_{n|n-1}\n",
+    "$$\n",
+    "5 - Finally, we have :\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial T} = \\frac{\\partial L}{\\partial x_{n|n-1}} x_{n-1|n-1}^T + \\frac{\\partial L}{\\partial P_{n|n-1}} T_n P_{n|n-1}^T +\\frac{\\partial L}{\\partial P_{n|n-1}}^T T_n P_{n|n-1}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "9a560ed9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def grad_T(inp, out, out_grad):\n",
+    "    y, a, P, T, Z, H, Q = inp\n",
+    "    a_hat_grad, P_h_grad, y_grad = out_grad\n",
+    "\n",
+    "    y_hat = Z.dot(a)\n",
+    "    v = y - y_hat\n",
+    "\n",
+    "    PZT = P.dot(Z.T)\n",
+    "    F = Z.dot(PZT) + H\n",
+    "    F_inv = pt.linalg.inv(F)\n",
+    "\n",
+    "    K = PZT.dot(F_inv)\n",
+    "    I_KZ = pt.eye(a.shape[0]) - K.dot(Z)\n",
+    "\n",
+    "    v = v.dimshuffle(0, 'x')\n",
+    "    a = a.dimshuffle(0, 'x')\n",
+    "    a_hat_grad = a_hat_grad.dimshuffle(0, 'x')\n",
+    "\n",
+    "    a_filtered = a + K.dot(v)\n",
+    "    P_filtered = I_KZ @ P\n",
+    "\n",
+    "    return a_hat_grad @ a_filtered.T + P_h_grad @ T @ P_filtered.T + P_h_grad.T @ T @ P_filtered"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "bd458dee",
@@ -509,7 +563,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 11,
    "id": "afb362e5",
    "metadata": {},
    "outputs": [],
@@ -566,7 +620,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 259,
+   "execution_count": 12,
    "id": "7cead2c1",
    "metadata": {},
    "outputs": [],
@@ -576,7 +630,7 @@
     "kalman_step_op = OpFromGraph(\n",
     "    inputs=[y_sym, a0_sym, P0_sym, T_sym, Z_sym, H_sym, Q_sym],\n",
     "    outputs=kalman_step(y_sym, a0_sym, P0_sym, T_sym, Z_sym, H_sym, Q_sym),\n",
-    "    lop_overrides=[grad_y, grad_a_hat, grad_P_hat, None, None, grad_H, grad_Q],\n",
+    "    lop_overrides=[grad_y, grad_a_hat, grad_P_hat, grad_T, None, grad_H, grad_Q],\n",
     "    inline=True\n",
     ")\n",
     "\n",
@@ -604,7 +658,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 264,
+   "execution_count": 13,
    "id": "b6eb5d48",
    "metadata": {},
    "outputs": [],
@@ -665,7 +719,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 253,
+   "execution_count": 14,
    "id": "908946b0",
    "metadata": {},
    "outputs": [],
@@ -712,7 +766,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 15,
    "id": "a85fe92e",
    "metadata": {},
    "outputs": [],
@@ -769,7 +823,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 254,
+   "execution_count": 16,
    "id": "27a60fb3",
    "metadata": {},
    "outputs": [],
@@ -779,15 +833,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 257,
+   "execution_count": 17,
    "id": "a413c8e9",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "defaultdict(<class 'dict'>, {'exec_time': 0.016296579997288063})\n"
+      "defaultdict(<class 'dict'>, {'exec_time': 0.017576184973586352})\n"
      ]
     }
    ],
@@ -797,7 +851,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 260,
+   "execution_count": 18,
    "id": "d35b98d6",
    "metadata": {},
    "outputs": [],
@@ -807,15 +861,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 261,
+   "execution_count": 19,
    "id": "539c18c2",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "defaultdict(<class 'dict'>, {'exec_time': 0.015451419999590144})\n"
+      "defaultdict(<class 'dict'>, {'exec_time': 0.021262520016171044})\n"
      ]
     }
    ],
@@ -825,7 +879,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 267,
+   "execution_count": 20,
    "id": "1e633e75",
    "metadata": {},
    "outputs": [],
@@ -835,15 +889,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 268,
+   "execution_count": 21,
    "id": "7118dfec",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "defaultdict(<class 'dict'>, {'Forward pass': 0.00269013000652194, 'Backprop': 0.002321220003068447})\n"
+      "defaultdict(<class 'dict'>, {'Forward pass': 0.002400995010975749, 'Backprop': 0.0018996200058609247})\n"
      ]
     }
    ],
@@ -869,7 +923,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 274,
+   "execution_count": 22,
    "id": "fbae0189",
    "metadata": {},
    "outputs": [],
@@ -905,7 +959,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 278,
+   "execution_count": 23,
    "id": "c3a114b2",
    "metadata": {},
    "outputs": [
@@ -914,25 +968,25 @@
      "output_type": "stream",
      "text": [
       "Comparison between classic a0 gradient and our custom OpFromGraph : True\n",
-      "Comparison between classic a0 gradient and our handmaid NumPy backprop : True\n"
+      "Comparison between classic a0 gradient and our handmade NumPy backprop : True\n"
      ]
     }
    ],
    "source": [
     "print(\"Comparison between classic a0 gradient and our custom OpFromGraph :\", np.allclose(grad_a0,  grad_a0_op))\n",
-    "print(\"Comparison between classic a0 gradient and our handmaid NumPy backprop :\", np.allclose(grad_a0,  grad_a0_np))"
+    "print(\"Comparison between classic a0 gradient and our handmade NumPy backprop :\", np.allclose(grad_a0,  grad_a0_np))"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 279,
+   "execution_count": 24,
    "id": "867d5e2f",
    "metadata": {},
    "outputs": [],
    "source": [
     "# First the classic way with autodiff\n",
     "\n",
-    "grad_list = pt.grad(loss, [data_sym, a0_sym, P0_sym, H_sym, Q_sym])\n",
+    "grad_list = pt.grad(loss, [data_sym, a0_sym, P0_sym, T_sym, H_sym, Q_sym])\n",
     "f_grad = pytensor.function(\n",
     "    inputs=[data_sym, a0_sym, P0_sym, T_sym, Z_sym, H_sym, Q_sym],\n",
     "    outputs=grad_list,\n",
@@ -942,7 +996,7 @@
     "\n",
     "# Now using our OpFromGraph custom gradient\n",
     "\n",
-    "grad_list_op = pt.grad(loss_op, [data_sym, a0_sym, P0_sym, H_sym, Q_sym])\n",
+    "grad_list_op = pt.grad(loss_op, [data_sym, a0_sym, P0_sym, T_sym, H_sym, Q_sym])\n",
     "f_grad = pytensor.function(\n",
     "    inputs=[data_sym, a0_sym, P0_sym, T_sym, Z_sym, H_sym, Q_sym],\n",
     "    outputs=grad_list_op,\n",
@@ -953,7 +1007,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 289,
+   "execution_count": 25,
    "id": "25f0a57b",
    "metadata": {},
    "outputs": [
@@ -964,6 +1018,7 @@
       "Comparison between classic y gradient and our custom OpFromGraph : True\n",
       "Comparison between classic a0 gradient and our custom OpFromGraph : True\n",
       "Comparison between classic P0 gradient and our custom OpFromGraph : True\n",
+      "Comparison between classic T gradient and our custom OpFromGraph : True\n",
       "Comparison between classic H gradient and our custom OpFromGraph : True\n",
       "Comparison between classic Q gradient and our custom OpFromGraph : True\n"
      ]
@@ -973,8 +1028,9 @@
     "print(\"Comparison between classic y gradient and our custom OpFromGraph :\", np.allclose(grad_a0[0],  grad_a0_op[0]))\n",
     "print(\"Comparison between classic a0 gradient and our custom OpFromGraph :\", np.allclose(grad_a0[1],  grad_a0_op[1]))\n",
     "print(\"Comparison between classic P0 gradient and our custom OpFromGraph :\", np.allclose((grad_a0[2] + grad_a0[2].T)/2,  grad_a0_op[2]))\n",
-    "print(\"Comparison between classic H gradient and our custom OpFromGraph :\", np.allclose(grad_a0[3],  grad_a0_op[3]))\n",
-    "print(\"Comparison between classic Q gradient and our custom OpFromGraph :\", np.allclose((grad_a0[4] + grad_a0[4].T)/2,  grad_a0_op[4]))"
+    "print(\"Comparison between classic T gradient and our custom OpFromGraph :\", np.allclose(grad_a0[3],  grad_a0_op[3]))\n",
+    "print(\"Comparison between classic H gradient and our custom OpFromGraph :\", np.allclose(grad_a0[4],  grad_a0_op[4]))\n",
+    "print(\"Comparison between classic Q gradient and our custom OpFromGraph :\", np.allclose((grad_a0[5] + grad_a0[5].T)/2,  grad_a0_op[5]))"
    ]
   }
  ],