Files
2025-12-16 09:23:53 +08:00

1224 lines
45 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "7f91a115",
"metadata": {
"origin_pos": 0
},
"source": [
"# 微积分\n",
":label:`sec_calculus`\n",
"\n",
"在2500年前,古希腊人把一个多边形分成三角形,并把它们的面积相加,才找到计算多边形面积的方法。\n",
"为了求出曲线形状(比如圆)的面积,古希腊人在这样的形状上刻内接多边形。\n",
"如 :numref:`fig_circle_area`所示,内接多边形的等长边越多,就越接近圆。\n",
"这个过程也被称为*逼近法*method of exhaustion)。\n",
"\n",
"![用逼近法求圆的面积](../img/polygon-circle.svg)\n",
":label:`fig_circle_area`\n",
"\n",
"事实上,逼近法就是*积分*integral calculus)的起源。\n",
"2000多年后,微积分的另一支,*微分*differential calculus)被发明出来。\n",
"在微分学最重要的应用是优化问题,即考虑如何把事情做到最好。\n",
"正如在 :numref:`subsec_norms_and_objectives`中讨论的那样,\n",
"这种问题在深度学习中是无处不在的。\n",
"\n",
"在深度学习中,我们“训练”模型,不断更新它们,使它们在看到越来越多的数据时变得越来越好。\n",
"通常情况下,变得更好意味着最小化一个*损失函数*(loss function),\n",
"即一个衡量“模型有多糟糕”这个问题的分数。\n",
"最终,我们真正关心的是生成一个模型,它能够在从未见过的数据上表现良好。\n",
"但“训练”模型只能将模型与我们实际能看到的数据相拟合。\n",
"因此,我们可以将拟合模型的任务分解为两个关键问题:\n",
"\n",
"* *优化*optimization):用模型拟合观测数据的过程;\n",
"* *泛化*generalization):数学原理和实践者的智慧,能够指导我们生成出有效性超出用于训练的数据集本身的模型。\n",
"\n",
"为了帮助读者在后面的章节中更好地理解优化问题和方法,\n",
"本节提供了一个非常简短的入门教程,帮助读者快速掌握深度学习中常用的微分知识。\n",
"\n",
"## 导数和微分\n",
"\n",
"我们首先讨论导数的计算,这是几乎所有深度学习优化算法的关键步骤。\n",
"在深度学习中,我们通常选择对于模型参数可微的损失函数。\n",
"简而言之,对于每个参数,\n",
"如果我们把这个参数*增加*或*减少*一个无穷小的量,可以知道损失会以多快的速度增加或减少,\n",
"\n",
"假设我们有一个函数$f: \\mathbb{R} \\rightarrow \\mathbb{R}$,其输入和输出都是标量。\n",
"(**如果$f$的*导数*存在,这个极限被定义为**)\n",
"\n",
"(**$$f'(x) = \\lim_{h \\rightarrow 0} \\frac{f(x+h) - f(x)}{h}.$$**)\n",
":eqlabel:`eq_derivative`\n",
"\n",
"如果$f'(a)$存在,则称$f$在$a$处是*可微*differentiable)的。\n",
"如果$f$在一个区间内的每个数上都是可微的,则此函数在此区间中是可微的。\n",
"我们可以将 :eqref:`eq_derivative`中的导数$f'(x)$解释为$f(x)$相对于$x$的*瞬时*instantaneous)变化率。\n",
"所谓的瞬时变化率是基于$x$中的变化$h$,且$h$接近$0$。\n",
"\n",
"为了更好地解释导数,让我们做一个实验。\n",
"(**定义$u=f(x)=3x^2-4x$**)如下:\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "02d617cb",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:25.065994Z",
"iopub.status.busy": "2023-08-18T07:01:25.065245Z",
"iopub.status.idle": "2023-08-18T07:01:27.381378Z",
"shell.execute_reply": "2023-08-18T07:01:27.380233Z"
},
"origin_pos": 2,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import numpy as np\n",
"from matplotlib_inline import backend_inline\n",
"from d2l import torch as d2l\n",
"\n",
"\n",
"def f(x):\n",
" return 3 * x ** 2 - 4 * x"
]
},
{
"cell_type": "markdown",
"id": "60a0915b",
"metadata": {
"origin_pos": 5
},
"source": [
"[**通过令$x=1$并让$h$接近$0$**] :eqref:`eq_derivative`中(**$\\frac{f(x+h)-f(x)}{h}$的数值结果接近$2$**)。\n",
"虽然这个实验不是一个数学证明,但稍后会看到,当$x=1$时,导数$u'$是$2$。\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "39cf9942",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.387542Z",
"iopub.status.busy": "2023-08-18T07:01:27.386582Z",
"iopub.status.idle": "2023-08-18T07:01:27.394057Z",
"shell.execute_reply": "2023-08-18T07:01:27.393090Z"
},
"origin_pos": 6,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"h=0.10000, numerical limit=2.30000\n",
"h=0.01000, numerical limit=2.03000\n",
"h=0.00100, numerical limit=2.00300\n",
"h=0.00010, numerical limit=2.00030\n",
"h=0.00001, numerical limit=2.00003\n"
]
}
],
"source": [
"def numerical_lim(f, x, h):\n",
" return (f(x + h) - f(x)) / h\n",
"\n",
"h = 0.1\n",
"for i in range(5):\n",
" print(f'h={h:.5f}, numerical limit={numerical_lim(f, 1, h):.5f}')\n",
" h *= 0.1"
]
},
{
"cell_type": "markdown",
"id": "ea011f86",
"metadata": {
"origin_pos": 7
},
"source": [
"让我们熟悉一下导数的几个等价符号。\n",
"给定$y=f(x)$,其中$x$和$y$分别是函数$f$的自变量和因变量。以下表达式是等价的:\n",
"\n",
"$$f'(x) = y' = \\frac{dy}{dx} = \\frac{df}{dx} = \\frac{d}{dx} f(x) = Df(x) = D_x f(x),$$\n",
"\n",
"其中符号$\\frac{d}{dx}$和$D$是*微分运算符*,表示*微分*操作。\n",
"我们可以使用以下规则来对常见函数求微分:\n",
"\n",
"* $DC = 0$$C$是一个常数)\n",
"* $Dx^n = nx^{n-1}$*幂律*power rule),$n$是任意实数)\n",
"* $De^x = e^x$\n",
"* $D\\ln(x) = 1/x$\n",
"\n",
"为了微分一个由一些常见函数组成的函数,下面的一些法则方便使用。\n",
"假设函数$f$和$g$都是可微的,$C$是一个常数,则:\n",
"\n",
"*常数相乘法则*\n",
"$$\\frac{d}{dx} [Cf(x)] = C \\frac{d}{dx} f(x),$$\n",
"\n",
"*加法法则*\n",
"\n",
"$$\\frac{d}{dx} [f(x) + g(x)] = \\frac{d}{dx} f(x) + \\frac{d}{dx} g(x),$$\n",
"\n",
"*乘法法则*\n",
"\n",
"$$\\frac{d}{dx} [f(x)g(x)] = f(x) \\frac{d}{dx} [g(x)] + g(x) \\frac{d}{dx} [f(x)],$$\n",
"\n",
"*除法法则*\n",
"\n",
"$$\\frac{d}{dx} \\left[\\frac{f(x)}{g(x)}\\right] = \\frac{g(x) \\frac{d}{dx} [f(x)] - f(x) \\frac{d}{dx} [g(x)]}{[g(x)]^2}.$$\n",
"\n",
"现在我们可以应用上述几个法则来计算$u'=f'(x)=3\\frac{d}{dx}x^2-4\\frac{d}{dx}x=6x-4$。\n",
"令$x=1$,我们有$u'=2$:在这个实验中,数值结果接近$2$,\n",
"这一点得到了在本节前面的实验的支持。\n",
"当$x=1$时,此导数也是曲线$u=f(x)$切线的斜率。\n",
"\n",
"[**为了对导数的这种解释进行可视化,我们将使用`matplotlib`**]\n",
"这是一个Python中流行的绘图库。\n",
"要配置`matplotlib`生成图形的属性,我们需要(**定义几个函数**)。\n",
"在下面,`use_svg_display`函数指定`matplotlib`软件包输出svg图表以获得更清晰的图像。\n",
"\n",
"注意,注释`#@save`是一个特殊的标记,会将对应的函数、类或语句保存在`d2l`包中。\n",
"因此,以后无须重新定义就可以直接调用它们(例如,`d2l.use_svg_display()`)。\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a0efe8c9",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.399278Z",
"iopub.status.busy": "2023-08-18T07:01:27.398487Z",
"iopub.status.idle": "2023-08-18T07:01:27.403514Z",
"shell.execute_reply": "2023-08-18T07:01:27.402414Z"
},
"origin_pos": 8,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def use_svg_display(): #@save\n",
" \"\"\"使用svg格式在Jupyter中显示绘图\"\"\"\n",
" backend_inline.set_matplotlib_formats('svg')"
]
},
{
"cell_type": "markdown",
"id": "8b1650c6",
"metadata": {
"origin_pos": 9
},
"source": [
"我们定义`set_figsize`函数来设置图表大小。\n",
"注意,这里可以直接使用`d2l.plt`,因为导入语句\n",
"`from matplotlib import pyplot as plt`已标记为保存到`d2l`包中。\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "acef7e22",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.408462Z",
"iopub.status.busy": "2023-08-18T07:01:27.407659Z",
"iopub.status.idle": "2023-08-18T07:01:27.414090Z",
"shell.execute_reply": "2023-08-18T07:01:27.412718Z"
},
"origin_pos": 10,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def set_figsize(figsize=(3.5, 2.5)): #@save\n",
" \"\"\"设置matplotlib的图表大小\"\"\"\n",
" use_svg_display()\n",
" d2l.plt.rcParams['figure.figsize'] = figsize"
]
},
{
"cell_type": "markdown",
"id": "71a62720",
"metadata": {
"origin_pos": 11
},
"source": [
"下面的`set_axes`函数用于设置由`matplotlib`生成图表的轴的属性。\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0ad890f8",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.419140Z",
"iopub.status.busy": "2023-08-18T07:01:27.418455Z",
"iopub.status.idle": "2023-08-18T07:01:27.426061Z",
"shell.execute_reply": "2023-08-18T07:01:27.424739Z"
},
"origin_pos": 12,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"#@save\n",
"def set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend):\n",
" \"\"\"设置matplotlib的轴\"\"\"\n",
" axes.set_xlabel(xlabel)\n",
" axes.set_ylabel(ylabel)\n",
" axes.set_xscale(xscale)\n",
" axes.set_yscale(yscale)\n",
" axes.set_xlim(xlim)\n",
" axes.set_ylim(ylim)\n",
" if legend:\n",
" axes.legend(legend)\n",
" axes.grid()"
]
},
{
"cell_type": "markdown",
"id": "30e5a1f9",
"metadata": {
"origin_pos": 13
},
"source": [
"通过这三个用于图形配置的函数,定义一个`plot`函数来简洁地绘制多条曲线,\n",
"因为我们需要在整个书中可视化许多曲线。\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "00c43fac",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.431229Z",
"iopub.status.busy": "2023-08-18T07:01:27.430462Z",
"iopub.status.idle": "2023-08-18T07:01:27.441418Z",
"shell.execute_reply": "2023-08-18T07:01:27.440390Z"
},
"origin_pos": 14,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"#@save\n",
"def plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None,\n",
" ylim=None, xscale='linear', yscale='linear',\n",
" fmts=('-', 'm--', 'g-.', 'r:'), figsize=(3.5, 2.5), axes=None):\n",
" \"\"\"绘制数据点\"\"\"\n",
" if legend is None:\n",
" legend = []\n",
"\n",
" set_figsize(figsize)\n",
" axes = axes if axes else d2l.plt.gca()\n",
"\n",
" # 如果X有一个轴,输出True\n",
" def has_one_axis(X):\n",
" return (hasattr(X, \"ndim\") and X.ndim == 1 or isinstance(X, list)\n",
" and not hasattr(X[0], \"__len__\"))\n",
"\n",
" if has_one_axis(X):\n",
" X = [X]\n",
" if Y is None:\n",
" X, Y = [[]] * len(X), X\n",
" elif has_one_axis(Y):\n",
" Y = [Y]\n",
" if len(X) != len(Y):\n",
" X = X * len(Y)\n",
" axes.cla()\n",
" for x, y, fmt in zip(X, Y, fmts):\n",
" if len(x):\n",
" axes.plot(x, y, fmt)\n",
" else:\n",
" axes.plot(y, fmt)\n",
" set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)"
]
},
{
"cell_type": "markdown",
"id": "d9dbfa1f",
"metadata": {
"origin_pos": 15
},
"source": [
"现在我们可以[**绘制函数$u=f(x)$及其在$x=1$处的切线$y=2x-3$**]\n",
"其中系数$2$是切线的斜率。\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f09a2c12",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.445931Z",
"iopub.status.busy": "2023-08-18T07:01:27.445122Z",
"iopub.status.idle": "2023-08-18T07:01:27.699931Z",
"shell.execute_reply": "2023-08-18T07:01:27.698662Z"
},
"origin_pos": 16,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"243.529359pt\" height=\"180.65625pt\" viewBox=\"0 0 243.529359 180.65625\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
" <metadata>\n",
" <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n",
" <cc:Work>\n",
" <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n",
" <dc:date>2023-08-18T07:01:27.649475</dc:date>\n",
" <dc:format>image/svg+xml</dc:format>\n",
" <dc:creator>\n",
" <cc:Agent>\n",
" <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>\n",
" </cc:Agent>\n",
" </dc:creator>\n",
" </cc:Work>\n",
" </rdf:RDF>\n",
" </metadata>\n",
" <defs>\n",
" <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n",
" </defs>\n",
" <g id=\"figure_1\">\n",
" <g id=\"patch_1\">\n",
" <path d=\"M 0 180.65625 \n",
"L 243.529359 180.65625 \n",
"L 243.529359 0 \n",
"L 0 0 \n",
"L 0 180.65625 \n",
"z\n",
"\" style=\"fill: none\"/>\n",
" </g>\n",
" <g id=\"axes_1\">\n",
" <g id=\"patch_2\">\n",
" <path d=\"M 40.603125 143.1 \n",
"L 235.903125 143.1 \n",
"L 235.903125 7.2 \n",
"L 40.603125 7.2 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"matplotlib.axis_1\">\n",
" <g id=\"xtick_1\">\n",
" <g id=\"line2d_1\">\n",
" <path d=\"M 49.480398 143.1 \n",
"L 49.480398 7.2 \n",
"\" clip-path=\"url(#pb34a262701)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_2\">\n",
" <defs>\n",
" <path id=\"meab27cbf96\" d=\"M 0 0 \n",
"L 0 3.5 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#meab27cbf96\" x=\"49.480398\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_1\">\n",
" <!-- 0 -->\n",
" <g transform=\"translate(46.299148 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-30\" d=\"M 2034 4250 \n",
"Q 1547 4250 1301 3770 \n",
"Q 1056 3291 1056 2328 \n",
"Q 1056 1369 1301 889 \n",
"Q 1547 409 2034 409 \n",
"Q 2525 409 2770 889 \n",
"Q 3016 1369 3016 2328 \n",
"Q 3016 3291 2770 3770 \n",
"Q 2525 4250 2034 4250 \n",
"z\n",
"M 2034 4750 \n",
"Q 2819 4750 3233 4129 \n",
"Q 3647 3509 3647 2328 \n",
"Q 3647 1150 3233 529 \n",
"Q 2819 -91 2034 -91 \n",
"Q 1250 -91 836 529 \n",
"Q 422 1150 422 2328 \n",
"Q 422 3509 836 4129 \n",
"Q 1250 4750 2034 4750 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_2\">\n",
" <g id=\"line2d_3\">\n",
" <path d=\"M 110.702968 143.1 \n",
"L 110.702968 7.2 \n",
"\" clip-path=\"url(#pb34a262701)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_4\">\n",
" <g>\n",
" <use xlink:href=\"#meab27cbf96\" x=\"110.702968\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_2\">\n",
" <!-- 1 -->\n",
" <g transform=\"translate(107.521718 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-31\" d=\"M 794 531 \n",
"L 1825 531 \n",
"L 1825 4091 \n",
"L 703 3866 \n",
"L 703 4441 \n",
"L 1819 4666 \n",
"L 2450 4666 \n",
"L 2450 531 \n",
"L 3481 531 \n",
"L 3481 0 \n",
"L 794 0 \n",
"L 794 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-31\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_3\">\n",
" <g id=\"line2d_5\">\n",
" <path d=\"M 171.925539 143.1 \n",
"L 171.925539 7.2 \n",
"\" clip-path=\"url(#pb34a262701)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_6\">\n",
" <g>\n",
" <use xlink:href=\"#meab27cbf96\" x=\"171.925539\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_3\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(168.744289 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-32\" d=\"M 1228 531 \n",
"L 3431 531 \n",
"L 3431 0 \n",
"L 469 0 \n",
"L 469 531 \n",
"Q 828 903 1448 1529 \n",
"Q 2069 2156 2228 2338 \n",
"Q 2531 2678 2651 2914 \n",
"Q 2772 3150 2772 3378 \n",
"Q 2772 3750 2511 3984 \n",
"Q 2250 4219 1831 4219 \n",
"Q 1534 4219 1204 4116 \n",
"Q 875 4013 500 3803 \n",
"L 500 4441 \n",
"Q 881 4594 1212 4672 \n",
"Q 1544 4750 1819 4750 \n",
"Q 2544 4750 2975 4387 \n",
"Q 3406 4025 3406 3419 \n",
"Q 3406 3131 3298 2873 \n",
"Q 3191 2616 2906 2266 \n",
"Q 2828 2175 2409 1742 \n",
"Q 1991 1309 1228 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_4\">\n",
" <g id=\"line2d_7\">\n",
" <path d=\"M 233.148109 143.1 \n",
"L 233.148109 7.2 \n",
"\" clip-path=\"url(#pb34a262701)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_8\">\n",
" <g>\n",
" <use xlink:href=\"#meab27cbf96\" x=\"233.148109\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_4\">\n",
" <!-- 3 -->\n",
" <g transform=\"translate(229.966859 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-33\" d=\"M 2597 2516 \n",
"Q 3050 2419 3304 2112 \n",
"Q 3559 1806 3559 1356 \n",
"Q 3559 666 3084 287 \n",
"Q 2609 -91 1734 -91 \n",
"Q 1441 -91 1130 -33 \n",
"Q 819 25 488 141 \n",
"L 488 750 \n",
"Q 750 597 1062 519 \n",
"Q 1375 441 1716 441 \n",
"Q 2309 441 2620 675 \n",
"Q 2931 909 2931 1356 \n",
"Q 2931 1769 2642 2001 \n",
"Q 2353 2234 1838 2234 \n",
"L 1294 2234 \n",
"L 1294 2753 \n",
"L 1863 2753 \n",
"Q 2328 2753 2575 2939 \n",
"Q 2822 3125 2822 3475 \n",
"Q 2822 3834 2567 4026 \n",
"Q 2313 4219 1838 4219 \n",
"Q 1578 4219 1281 4162 \n",
"Q 984 4106 628 3988 \n",
"L 628 4550 \n",
"Q 988 4650 1302 4700 \n",
"Q 1616 4750 1894 4750 \n",
"Q 2613 4750 3031 4423 \n",
"Q 3450 4097 3450 3541 \n",
"Q 3450 3153 3228 2886 \n",
"Q 3006 2619 2597 2516 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-33\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_5\">\n",
" <!-- x -->\n",
" <g transform=\"translate(135.29375 171.376563)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-78\" d=\"M 3513 3500 \n",
"L 2247 1797 \n",
"L 3578 0 \n",
"L 2900 0 \n",
"L 1881 1375 \n",
"L 863 0 \n",
"L 184 0 \n",
"L 1544 1831 \n",
"L 300 3500 \n",
"L 978 3500 \n",
"L 1906 2253 \n",
"L 2834 3500 \n",
"L 3513 3500 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-78\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"matplotlib.axis_2\">\n",
" <g id=\"ytick_1\">\n",
" <g id=\"line2d_9\">\n",
" <path d=\"M 40.603125 114.635514 \n",
"L 235.903125 114.635514 \n",
"\" clip-path=\"url(#pb34a262701)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_10\">\n",
" <defs>\n",
" <path id=\"m629c214856\" d=\"M 0 0 \n",
"L -3.5 0 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#m629c214856\" x=\"40.603125\" y=\"114.635514\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_6\">\n",
" <!-- 0 -->\n",
" <g transform=\"translate(27.240625 118.434732)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_2\">\n",
" <g id=\"line2d_11\">\n",
" <path d=\"M 40.603125 77.490157 \n",
"L 235.903125 77.490157 \n",
"\" clip-path=\"url(#pb34a262701)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_12\">\n",
" <g>\n",
" <use xlink:href=\"#m629c214856\" x=\"40.603125\" y=\"77.490157\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_7\">\n",
" <!-- 5 -->\n",
" <g transform=\"translate(27.240625 81.289376)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-35\" d=\"M 691 4666 \n",
"L 3169 4666 \n",
"L 3169 4134 \n",
"L 1269 4134 \n",
"L 1269 2991 \n",
"Q 1406 3038 1543 3061 \n",
"Q 1681 3084 1819 3084 \n",
"Q 2600 3084 3056 2656 \n",
"Q 3513 2228 3513 1497 \n",
"Q 3513 744 3044 326 \n",
"Q 2575 -91 1722 -91 \n",
"Q 1428 -91 1123 -41 \n",
"Q 819 9 494 109 \n",
"L 494 744 \n",
"Q 775 591 1075 516 \n",
"Q 1375 441 1709 441 \n",
"Q 2250 441 2565 725 \n",
"Q 2881 1009 2881 1497 \n",
"Q 2881 1984 2565 2268 \n",
"Q 2250 2553 1709 2553 \n",
"Q 1456 2553 1204 2497 \n",
"Q 953 2441 691 2322 \n",
"L 691 4666 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-35\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_3\">\n",
" <g id=\"line2d_13\">\n",
" <path d=\"M 40.603125 40.344801 \n",
"L 235.903125 40.344801 \n",
"\" clip-path=\"url(#pb34a262701)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_14\">\n",
" <g>\n",
" <use xlink:href=\"#m629c214856\" x=\"40.603125\" y=\"40.344801\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_8\">\n",
" <!-- 10 -->\n",
" <g transform=\"translate(20.878125 44.14402)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"63.623047\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_9\">\n",
" <!-- f(x) -->\n",
" <g transform=\"translate(14.798437 83.771094)rotate(-90)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-66\" d=\"M 2375 4863 \n",
"L 2375 4384 \n",
"L 1825 4384 \n",
"Q 1516 4384 1395 4259 \n",
"Q 1275 4134 1275 3809 \n",
"L 1275 3500 \n",
"L 2222 3500 \n",
"L 2222 3053 \n",
"L 1275 3053 \n",
"L 1275 0 \n",
"L 697 0 \n",
"L 697 3053 \n",
"L 147 3053 \n",
"L 147 3500 \n",
"L 697 3500 \n",
"L 697 3744 \n",
"Q 697 4328 969 4595 \n",
"Q 1241 4863 1831 4863 \n",
"L 2375 4863 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-28\" d=\"M 1984 4856 \n",
"Q 1566 4138 1362 3434 \n",
"Q 1159 2731 1159 2009 \n",
"Q 1159 1288 1364 580 \n",
"Q 1569 -128 1984 -844 \n",
"L 1484 -844 \n",
"Q 1016 -109 783 600 \n",
"Q 550 1309 550 2009 \n",
"Q 550 2706 781 3412 \n",
"Q 1013 4119 1484 4856 \n",
"L 1984 4856 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-29\" d=\"M 513 4856 \n",
"L 1013 4856 \n",
"Q 1481 4119 1714 3412 \n",
"Q 1947 2706 1947 2009 \n",
"Q 1947 1309 1714 600 \n",
"Q 1481 -109 1013 -844 \n",
"L 513 -844 \n",
"Q 928 -128 1133 580 \n",
"Q 1338 1288 1338 2009 \n",
"Q 1338 2731 1133 3434 \n",
"Q 928 4138 513 4856 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-66\"/>\n",
" <use xlink:href=\"#DejaVuSans-28\" x=\"35.205078\"/>\n",
" <use xlink:href=\"#DejaVuSans-78\" x=\"74.21875\"/>\n",
" <use xlink:href=\"#DejaVuSans-29\" x=\"133.398438\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_15\">\n",
" <path d=\"M 49.480398 114.635514 \n",
"L 55.602655 117.38427 \n",
"L 61.724912 119.687282 \n",
"L 67.847169 121.54455 \n",
"L 73.969426 122.956073 \n",
"L 80.091683 123.921853 \n",
"L 86.21394 124.441888 \n",
"L 92.336197 124.516178 \n",
"L 98.458454 124.144725 \n",
"L 104.580711 123.327527 \n",
"L 110.702968 122.064585 \n",
"L 116.825225 120.355898 \n",
"L 122.947482 118.201468 \n",
"L 129.069739 115.601293 \n",
"L 135.191996 112.555374 \n",
"L 141.314254 109.06371 \n",
"L 147.436511 105.126302 \n",
"L 153.558768 100.74315 \n",
"L 159.681025 95.914254 \n",
"L 165.803282 90.639614 \n",
"L 171.925539 84.919229 \n",
"L 178.047796 78.7531 \n",
"L 184.170053 72.141226 \n",
"L 190.29231 65.083608 \n",
"L 196.414567 57.580247 \n",
"L 202.536824 49.63114 \n",
"L 208.659081 41.23629 \n",
"L 214.781338 32.395695 \n",
"L 220.903595 23.109356 \n",
"L 227.025852 13.377273 \n",
"\" clip-path=\"url(#pb34a262701)\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_16\">\n",
" <path d=\"M 49.480398 136.922727 \n",
"L 55.602655 135.436913 \n",
"L 61.724912 133.951099 \n",
"L 67.847169 132.465285 \n",
"L 73.969426 130.97947 \n",
"L 80.091683 129.493656 \n",
"L 86.21394 128.007842 \n",
"L 92.336197 126.522028 \n",
"L 98.458454 125.036213 \n",
"L 104.580711 123.550399 \n",
"L 110.702968 122.064585 \n",
"L 116.825225 120.578771 \n",
"L 122.947482 119.092956 \n",
"L 129.069739 117.607142 \n",
"L 135.191996 116.121328 \n",
"L 141.314254 114.635514 \n",
"L 147.436511 113.149699 \n",
"L 153.558768 111.663885 \n",
"L 159.681025 110.178071 \n",
"L 165.803282 108.692257 \n",
"L 171.925539 107.206442 \n",
"L 178.047796 105.720628 \n",
"L 184.170053 104.234814 \n",
"L 190.29231 102.749 \n",
"L 196.414567 101.263185 \n",
"L 202.536824 99.777371 \n",
"L 208.659081 98.291557 \n",
"L 214.781338 96.805743 \n",
"L 220.903595 95.319928 \n",
"L 227.025852 93.834114 \n",
"\" clip-path=\"url(#pb34a262701)\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"patch_3\">\n",
" <path d=\"M 40.603125 143.1 \n",
"L 40.603125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_4\">\n",
" <path d=\"M 235.903125 143.1 \n",
"L 235.903125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_5\">\n",
" <path d=\"M 40.603125 143.1 \n",
"L 235.903125 143.1 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_6\">\n",
" <path d=\"M 40.603125 7.2 \n",
"L 235.903125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"legend_1\">\n",
" <g id=\"patch_7\">\n",
" <path d=\"M 47.603125 44.55625 \n",
"L 172.153125 44.55625 \n",
"Q 174.153125 44.55625 174.153125 42.55625 \n",
"L 174.153125 14.2 \n",
"Q 174.153125 12.2 172.153125 12.2 \n",
"L 47.603125 12.2 \n",
"Q 45.603125 12.2 45.603125 14.2 \n",
"L 45.603125 42.55625 \n",
"Q 45.603125 44.55625 47.603125 44.55625 \n",
"z\n",
"\" style=\"fill: #ffffff; opacity: 0.8; stroke: #cccccc; stroke-linejoin: miter\"/>\n",
" </g>\n",
" <g id=\"line2d_17\">\n",
" <path d=\"M 49.603125 20.298437 \n",
"L 59.603125 20.298437 \n",
"L 69.603125 20.298437 \n",
"\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"text_10\">\n",
" <!-- f(x) -->\n",
" <g transform=\"translate(77.603125 23.798437)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-66\"/>\n",
" <use xlink:href=\"#DejaVuSans-28\" x=\"35.205078\"/>\n",
" <use xlink:href=\"#DejaVuSans-78\" x=\"74.21875\"/>\n",
" <use xlink:href=\"#DejaVuSans-29\" x=\"133.398438\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_18\">\n",
" <path d=\"M 49.603125 34.976562 \n",
"L 59.603125 34.976562 \n",
"L 69.603125 34.976562 \n",
"\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"text_11\">\n",
" <!-- Tangent line (x=1) -->\n",
" <g transform=\"translate(77.603125 38.476562)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-54\" d=\"M -19 4666 \n",
"L 3928 4666 \n",
"L 3928 4134 \n",
"L 2272 4134 \n",
"L 2272 0 \n",
"L 1638 0 \n",
"L 1638 4134 \n",
"L -19 4134 \n",
"L -19 4666 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-61\" d=\"M 2194 1759 \n",
"Q 1497 1759 1228 1600 \n",
"Q 959 1441 959 1056 \n",
"Q 959 750 1161 570 \n",
"Q 1363 391 1709 391 \n",
"Q 2188 391 2477 730 \n",
"Q 2766 1069 2766 1631 \n",
"L 2766 1759 \n",
"L 2194 1759 \n",
"z\n",
"M 3341 1997 \n",
"L 3341 0 \n",
"L 2766 0 \n",
"L 2766 531 \n",
"Q 2569 213 2275 61 \n",
"Q 1981 -91 1556 -91 \n",
"Q 1019 -91 701 211 \n",
"Q 384 513 384 1019 \n",
"Q 384 1609 779 1909 \n",
"Q 1175 2209 1959 2209 \n",
"L 2766 2209 \n",
"L 2766 2266 \n",
"Q 2766 2663 2505 2880 \n",
"Q 2244 3097 1772 3097 \n",
"Q 1472 3097 1187 3025 \n",
"Q 903 2953 641 2809 \n",
"L 641 3341 \n",
"Q 956 3463 1253 3523 \n",
"Q 1550 3584 1831 3584 \n",
"Q 2591 3584 2966 3190 \n",
"Q 3341 2797 3341 1997 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-6e\" d=\"M 3513 2113 \n",
"L 3513 0 \n",
"L 2938 0 \n",
"L 2938 2094 \n",
"Q 2938 2591 2744 2837 \n",
"Q 2550 3084 2163 3084 \n",
"Q 1697 3084 1428 2787 \n",
"Q 1159 2491 1159 1978 \n",
"L 1159 0 \n",
"L 581 0 \n",
"L 581 3500 \n",
"L 1159 3500 \n",
"L 1159 2956 \n",
"Q 1366 3272 1645 3428 \n",
"Q 1925 3584 2291 3584 \n",
"Q 2894 3584 3203 3211 \n",
"Q 3513 2838 3513 2113 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-67\" d=\"M 2906 1791 \n",
"Q 2906 2416 2648 2759 \n",
"Q 2391 3103 1925 3103 \n",
"Q 1463 3103 1205 2759 \n",
"Q 947 2416 947 1791 \n",
"Q 947 1169 1205 825 \n",
"Q 1463 481 1925 481 \n",
"Q 2391 481 2648 825 \n",
"Q 2906 1169 2906 1791 \n",
"z\n",
"M 3481 434 \n",
"Q 3481 -459 3084 -895 \n",
"Q 2688 -1331 1869 -1331 \n",
"Q 1566 -1331 1297 -1286 \n",
"Q 1028 -1241 775 -1147 \n",
"L 775 -588 \n",
"Q 1028 -725 1275 -790 \n",
"Q 1522 -856 1778 -856 \n",
"Q 2344 -856 2625 -561 \n",
"Q 2906 -266 2906 331 \n",
"L 2906 616 \n",
"Q 2728 306 2450 153 \n",
"Q 2172 0 1784 0 \n",
"Q 1141 0 747 490 \n",
"Q 353 981 353 1791 \n",
"Q 353 2603 747 3093 \n",
"Q 1141 3584 1784 3584 \n",
"Q 2172 3584 2450 3431 \n",
"Q 2728 3278 2906 2969 \n",
"L 2906 3500 \n",
"L 3481 3500 \n",
"L 3481 434 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-65\" d=\"M 3597 1894 \n",
"L 3597 1613 \n",
"L 953 1613 \n",
"Q 991 1019 1311 708 \n",
"Q 1631 397 2203 397 \n",
"Q 2534 397 2845 478 \n",
"Q 3156 559 3463 722 \n",
"L 3463 178 \n",
"Q 3153 47 2828 -22 \n",
"Q 2503 -91 2169 -91 \n",
"Q 1331 -91 842 396 \n",
"Q 353 884 353 1716 \n",
"Q 353 2575 817 3079 \n",
"Q 1281 3584 2069 3584 \n",
"Q 2775 3584 3186 3129 \n",
"Q 3597 2675 3597 1894 \n",
"z\n",
"M 3022 2063 \n",
"Q 3016 2534 2758 2815 \n",
"Q 2500 3097 2075 3097 \n",
"Q 1594 3097 1305 2825 \n",
"Q 1016 2553 972 2059 \n",
"L 3022 2063 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-74\" d=\"M 1172 4494 \n",
"L 1172 3500 \n",
"L 2356 3500 \n",
"L 2356 3053 \n",
"L 1172 3053 \n",
"L 1172 1153 \n",
"Q 1172 725 1289 603 \n",
"Q 1406 481 1766 481 \n",
"L 2356 481 \n",
"L 2356 0 \n",
"L 1766 0 \n",
"Q 1100 0 847 248 \n",
"Q 594 497 594 1153 \n",
"L 594 3053 \n",
"L 172 3053 \n",
"L 172 3500 \n",
"L 594 3500 \n",
"L 594 4494 \n",
"L 1172 4494 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-20\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-6c\" d=\"M 603 4863 \n",
"L 1178 4863 \n",
"L 1178 0 \n",
"L 603 0 \n",
"L 603 4863 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-69\" d=\"M 603 3500 \n",
"L 1178 3500 \n",
"L 1178 0 \n",
"L 603 0 \n",
"L 603 3500 \n",
"z\n",
"M 603 4863 \n",
"L 1178 4863 \n",
"L 1178 4134 \n",
"L 603 4134 \n",
"L 603 4863 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-3d\" d=\"M 678 2906 \n",
"L 4684 2906 \n",
"L 4684 2381 \n",
"L 678 2381 \n",
"L 678 2906 \n",
"z\n",
"M 678 1631 \n",
"L 4684 1631 \n",
"L 4684 1100 \n",
"L 678 1100 \n",
"L 678 1631 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-54\"/>\n",
" <use xlink:href=\"#DejaVuSans-61\" x=\"44.583984\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"105.863281\"/>\n",
" <use xlink:href=\"#DejaVuSans-67\" x=\"169.242188\"/>\n",
" <use xlink:href=\"#DejaVuSans-65\" x=\"232.71875\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"294.242188\"/>\n",
" <use xlink:href=\"#DejaVuSans-74\" x=\"357.621094\"/>\n",
" <use xlink:href=\"#DejaVuSans-20\" x=\"396.830078\"/>\n",
" <use xlink:href=\"#DejaVuSans-6c\" x=\"428.617188\"/>\n",
" <use xlink:href=\"#DejaVuSans-69\" x=\"456.400391\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"484.183594\"/>\n",
" <use xlink:href=\"#DejaVuSans-65\" x=\"547.5625\"/>\n",
" <use xlink:href=\"#DejaVuSans-20\" x=\"609.085938\"/>\n",
" <use xlink:href=\"#DejaVuSans-28\" x=\"640.873047\"/>\n",
" <use xlink:href=\"#DejaVuSans-78\" x=\"679.886719\"/>\n",
" <use xlink:href=\"#DejaVuSans-3d\" x=\"739.066406\"/>\n",
" <use xlink:href=\"#DejaVuSans-31\" x=\"822.855469\"/>\n",
" <use xlink:href=\"#DejaVuSans-29\" x=\"886.478516\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <defs>\n",
" <clipPath id=\"pb34a262701\">\n",
" <rect x=\"40.603125\" y=\"7.2\" width=\"195.3\" height=\"135.9\"/>\n",
" </clipPath>\n",
" </defs>\n",
"</svg>\n"
],
"text/plain": [
"<Figure size 252x180 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"x = np.arange(0, 3, 0.1)\n",
"plot(x, [f(x), 2 * x - 3], 'x', 'f(x)', legend=['f(x)', 'Tangent line (x=1)'])"
]
},
{
"cell_type": "markdown",
"id": "c292a783",
"metadata": {
"origin_pos": 17
},
"source": [
"## 偏导数\n",
"\n",
"到目前为止,我们只讨论了仅含一个变量的函数的微分。\n",
"在深度学习中,函数通常依赖于许多变量。\n",
"因此,我们需要将微分的思想推广到*多元函数*multivariate function)上。\n",
"\n",
"设$y = f(x_1, x_2, \\ldots, x_n)$是一个具有$n$个变量的函数。\n",
"$y$关于第$i$个参数$x_i$的*偏导数*partial derivative)为:\n",
"\n",
"$$ \\frac{\\partial y}{\\partial x_i} = \\lim_{h \\rightarrow 0} \\frac{f(x_1, \\ldots, x_{i-1}, x_i+h, x_{i+1}, \\ldots, x_n) - f(x_1, \\ldots, x_i, \\ldots, x_n)}{h}.$$\n",
"\n",
"为了计算$\\frac{\\partial y}{\\partial x_i}$\n",
"我们可以简单地将$x_1, \\ldots, x_{i-1}, x_{i+1}, \\ldots, x_n$看作常数,\n",
"并计算$y$关于$x_i$的导数。\n",
"对于偏导数的表示,以下是等价的:\n",
"\n",
"$$\\frac{\\partial y}{\\partial x_i} = \\frac{\\partial f}{\\partial x_i} = f_{x_i} = f_i = D_i f = D_{x_i} f.$$\n",
"\n",
"## 梯度\n",
":label:`subsec_calculus-grad`\n",
"\n",
"我们可以连结一个多元函数对其所有变量的偏导数,以得到该函数的*梯度*(gradient)向量。\n",
"具体而言,设函数$f:\\mathbb{R}^n\\rightarrow\\mathbb{R}$的输入是\n",
"一个$n$维向量$\\mathbf{x}=[x_1,x_2,\\ldots,x_n]^\\top$,并且输出是一个标量。\n",
"函数$f(\\mathbf{x})$相对于$\\mathbf{x}$的梯度是一个包含$n$个偏导数的向量:\n",
"\n",
"$$\\nabla_{\\mathbf{x}} f(\\mathbf{x}) = \\bigg[\\frac{\\partial f(\\mathbf{x})}{\\partial x_1}, \\frac{\\partial f(\\mathbf{x})}{\\partial x_2}, \\ldots, \\frac{\\partial f(\\mathbf{x})}{\\partial x_n}\\bigg]^\\top,$$\n",
"\n",
"其中$\\nabla_{\\mathbf{x}} f(\\mathbf{x})$通常在没有歧义时被$\\nabla f(\\mathbf{x})$取代。\n",
"\n",
"假设$\\mathbf{x}$为$n$维向量,在微分多元函数时经常使用以下规则:\n",
"\n",
"* 对于所有$\\mathbf{A} \\in \\mathbb{R}^{m \\times n}$,都有$\\nabla_{\\mathbf{x}} \\mathbf{A} \\mathbf{x} = \\mathbf{A}^\\top$\n",
"* 对于所有$\\mathbf{A} \\in \\mathbb{R}^{n \\times m}$,都有$\\nabla_{\\mathbf{x}} \\mathbf{x}^\\top \\mathbf{A} = \\mathbf{A}$\n",
"* 对于所有$\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$,都有$\\nabla_{\\mathbf{x}} \\mathbf{x}^\\top \\mathbf{A} \\mathbf{x} = (\\mathbf{A} + \\mathbf{A}^\\top)\\mathbf{x}$\n",
"* $\\nabla_{\\mathbf{x}} \\|\\mathbf{x} \\|^2 = \\nabla_{\\mathbf{x}} \\mathbf{x}^\\top \\mathbf{x} = 2\\mathbf{x}$\n",
"\n",
"同样,对于任何矩阵$\\mathbf{X}$,都有$\\nabla_{\\mathbf{X}} \\|\\mathbf{X} \\|_F^2 = 2\\mathbf{X}$。\n",
"正如我们之后将看到的,梯度对于设计深度学习中的优化算法有很大用处。\n",
"\n",
"## 链式法则\n",
"\n",
"然而,上面方法可能很难找到梯度。\n",
"这是因为在深度学习中,多元函数通常是*复合*(composite)的,\n",
"所以难以应用上述任何规则来微分这些函数。\n",
"幸运的是,链式法则可以被用来微分复合函数。\n",
"\n",
"让我们先考虑单变量函数。假设函数$y=f(u)$和$u=g(x)$都是可微的,根据链式法则:\n",
"\n",
"$$\\frac{dy}{dx} = \\frac{dy}{du} \\frac{du}{dx}.$$\n",
"\n",
"现在考虑一个更一般的场景,即函数具有任意数量的变量的情况。\n",
"假设可微分函数$y$有变量$u_1, u_2, \\ldots, u_m$,其中每个可微分函数$u_i$都有变量$x_1, x_2, \\ldots, x_n$。\n",
"注意,$y$是$x_1, x_2 \\ldots, x_n$的函数。\n",
"对于任意$i = 1, 2, \\ldots, n$,链式法则给出:\n",
"\n",
"$$\\frac{\\partial y}{\\partial x_i} = \\frac{\\partial y}{\\partial u_1} \\frac{\\partial u_1}{\\partial x_i} + \\frac{\\partial y}{\\partial u_2} \\frac{\\partial u_2}{\\partial x_i} + \\cdots + \\frac{\\partial y}{\\partial u_m} \\frac{\\partial u_m}{\\partial x_i}$$\n",
"\n",
"## 小结\n",
"\n",
"* 微分和积分是微积分的两个分支,前者可以应用于深度学习中的优化问题。\n",
"* 导数可以被解释为函数相对于其变量的瞬时变化率,它也是函数曲线的切线的斜率。\n",
"* 梯度是一个向量,其分量是多变量函数相对于其所有变量的偏导数。\n",
"* 链式法则可以用来微分复合函数。\n",
"\n",
"## 练习\n",
"\n",
"1. 绘制函数$y = f(x) = x^3 - \\frac{1}{x}$和其在$x = 1$处切线的图像。\n",
"1. 求函数$f(\\mathbf{x}) = 3x_1^2 + 5e^{x_2}$的梯度。\n",
"1. 函数$f(\\mathbf{x}) = \\|\\mathbf{x}\\|_2$的梯度是什么?\n",
"1. 尝试写出函数$u = f(x, y, z)$,其中$x = x(a, b)$$y = y(a, b)$$z = z(a, b)$的链式法则。\n"
]
},
{
"cell_type": "markdown",
"id": "fed29a76",
"metadata": {
"origin_pos": 19,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/1756)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": []
},
"nbformat": 4,
"nbformat_minor": 5
}