{
"cells": [
{
"cell_type": "markdown",
"id": "7f91a115",
"metadata": {
"origin_pos": 0
},
"source": [
"# 微积分\n",
":label:`sec_calculus`\n",
"\n",
"在2500年前,古希腊人把一个多边形分成三角形,并把它们的面积相加,才找到计算多边形面积的方法。\n",
"为了求出曲线形状(比如圆)的面积,古希腊人在这样的形状上刻内接多边形。\n",
"如 :numref:`fig_circle_area`所示,内接多边形的等长边越多,就越接近圆。\n",
"这个过程也被称为*逼近法*(method of exhaustion)。\n",
"\n",
"\n",
":label:`fig_circle_area`\n",
"\n",
"事实上,逼近法就是*积分*(integral calculus)的起源。\n",
"2000多年后,微积分的另一支,*微分*(differential calculus)被发明出来。\n",
"在微分学最重要的应用是优化问题,即考虑如何把事情做到最好。\n",
"正如在 :numref:`subsec_norms_and_objectives`中讨论的那样,\n",
"这种问题在深度学习中是无处不在的。\n",
"\n",
"在深度学习中,我们“训练”模型,不断更新它们,使它们在看到越来越多的数据时变得越来越好。\n",
"通常情况下,变得更好意味着最小化一个*损失函数*(loss function),\n",
"即一个衡量“模型有多糟糕”这个问题的分数。\n",
"最终,我们真正关心的是生成一个模型,它能够在从未见过的数据上表现良好。\n",
"但“训练”模型只能将模型与我们实际能看到的数据相拟合。\n",
"因此,我们可以将拟合模型的任务分解为两个关键问题:\n",
"\n",
"* *优化*(optimization):用模型拟合观测数据的过程;\n",
"* *泛化*(generalization):数学原理和实践者的智慧,能够指导我们生成出有效性超出用于训练的数据集本身的模型。\n",
"\n",
"为了帮助读者在后面的章节中更好地理解优化问题和方法,\n",
"本节提供了一个非常简短的入门教程,帮助读者快速掌握深度学习中常用的微分知识。\n",
"\n",
"## 导数和微分\n",
"\n",
"我们首先讨论导数的计算,这是几乎所有深度学习优化算法的关键步骤。\n",
"在深度学习中,我们通常选择对于模型参数可微的损失函数。\n",
"简而言之,对于每个参数,\n",
"如果我们把这个参数*增加*或*减少*一个无穷小的量,可以知道损失会以多快的速度增加或减少,\n",
"\n",
"假设我们有一个函数$f: \\mathbb{R} \\rightarrow \\mathbb{R}$,其输入和输出都是标量。\n",
"(**如果$f$的*导数*存在,这个极限被定义为**)\n",
"\n",
"(**$$f'(x) = \\lim_{h \\rightarrow 0} \\frac{f(x+h) - f(x)}{h}.$$**)\n",
":eqlabel:`eq_derivative`\n",
"\n",
"如果$f'(a)$存在,则称$f$在$a$处是*可微*(differentiable)的。\n",
"如果$f$在一个区间内的每个数上都是可微的,则此函数在此区间中是可微的。\n",
"我们可以将 :eqref:`eq_derivative`中的导数$f'(x)$解释为$f(x)$相对于$x$的*瞬时*(instantaneous)变化率。\n",
"所谓的瞬时变化率是基于$x$中的变化$h$,且$h$接近$0$。\n",
"\n",
"为了更好地解释导数,让我们做一个实验。\n",
"(**定义$u=f(x)=3x^2-4x$**)如下:\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "02d617cb",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:25.065994Z",
"iopub.status.busy": "2023-08-18T07:01:25.065245Z",
"iopub.status.idle": "2023-08-18T07:01:27.381378Z",
"shell.execute_reply": "2023-08-18T07:01:27.380233Z"
},
"origin_pos": 2,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import numpy as np\n",
"from matplotlib_inline import backend_inline\n",
"from d2l import torch as d2l\n",
"\n",
"\n",
"def f(x):\n",
" return 3 * x ** 2 - 4 * x"
]
},
{
"cell_type": "markdown",
"id": "60a0915b",
"metadata": {
"origin_pos": 5
},
"source": [
"[**通过令$x=1$并让$h$接近$0$,**] :eqref:`eq_derivative`中(**$\\frac{f(x+h)-f(x)}{h}$的数值结果接近$2$**)。\n",
"虽然这个实验不是一个数学证明,但稍后会看到,当$x=1$时,导数$u'$是$2$。\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "39cf9942",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.387542Z",
"iopub.status.busy": "2023-08-18T07:01:27.386582Z",
"iopub.status.idle": "2023-08-18T07:01:27.394057Z",
"shell.execute_reply": "2023-08-18T07:01:27.393090Z"
},
"origin_pos": 6,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"h=0.10000, numerical limit=2.30000\n",
"h=0.01000, numerical limit=2.03000\n",
"h=0.00100, numerical limit=2.00300\n",
"h=0.00010, numerical limit=2.00030\n",
"h=0.00001, numerical limit=2.00003\n"
]
}
],
"source": [
"def numerical_lim(f, x, h):\n",
" return (f(x + h) - f(x)) / h\n",
"\n",
"h = 0.1\n",
"for i in range(5):\n",
" print(f'h={h:.5f}, numerical limit={numerical_lim(f, 1, h):.5f}')\n",
" h *= 0.1"
]
},
{
"cell_type": "markdown",
"id": "ea011f86",
"metadata": {
"origin_pos": 7
},
"source": [
"让我们熟悉一下导数的几个等价符号。\n",
"给定$y=f(x)$,其中$x$和$y$分别是函数$f$的自变量和因变量。以下表达式是等价的:\n",
"\n",
"$$f'(x) = y' = \\frac{dy}{dx} = \\frac{df}{dx} = \\frac{d}{dx} f(x) = Df(x) = D_x f(x),$$\n",
"\n",
"其中符号$\\frac{d}{dx}$和$D$是*微分运算符*,表示*微分*操作。\n",
"我们可以使用以下规则来对常见函数求微分:\n",
"\n",
"* $DC = 0$($C$是一个常数)\n",
"* $Dx^n = nx^{n-1}$(*幂律*(power rule),$n$是任意实数)\n",
"* $De^x = e^x$\n",
"* $D\\ln(x) = 1/x$\n",
"\n",
"为了微分一个由一些常见函数组成的函数,下面的一些法则方便使用。\n",
"假设函数$f$和$g$都是可微的,$C$是一个常数,则:\n",
"\n",
"*常数相乘法则*\n",
"$$\\frac{d}{dx} [Cf(x)] = C \\frac{d}{dx} f(x),$$\n",
"\n",
"*加法法则*\n",
"\n",
"$$\\frac{d}{dx} [f(x) + g(x)] = \\frac{d}{dx} f(x) + \\frac{d}{dx} g(x),$$\n",
"\n",
"*乘法法则*\n",
"\n",
"$$\\frac{d}{dx} [f(x)g(x)] = f(x) \\frac{d}{dx} [g(x)] + g(x) \\frac{d}{dx} [f(x)],$$\n",
"\n",
"*除法法则*\n",
"\n",
"$$\\frac{d}{dx} \\left[\\frac{f(x)}{g(x)}\\right] = \\frac{g(x) \\frac{d}{dx} [f(x)] - f(x) \\frac{d}{dx} [g(x)]}{[g(x)]^2}.$$\n",
"\n",
"现在我们可以应用上述几个法则来计算$u'=f'(x)=3\\frac{d}{dx}x^2-4\\frac{d}{dx}x=6x-4$。\n",
"令$x=1$,我们有$u'=2$:在这个实验中,数值结果接近$2$,\n",
"这一点得到了在本节前面的实验的支持。\n",
"当$x=1$时,此导数也是曲线$u=f(x)$切线的斜率。\n",
"\n",
"[**为了对导数的这种解释进行可视化,我们将使用`matplotlib`**],\n",
"这是一个Python中流行的绘图库。\n",
"要配置`matplotlib`生成图形的属性,我们需要(**定义几个函数**)。\n",
"在下面,`use_svg_display`函数指定`matplotlib`软件包输出svg图表以获得更清晰的图像。\n",
"\n",
"注意,注释`#@save`是一个特殊的标记,会将对应的函数、类或语句保存在`d2l`包中。\n",
"因此,以后无须重新定义就可以直接调用它们(例如,`d2l.use_svg_display()`)。\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a0efe8c9",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.399278Z",
"iopub.status.busy": "2023-08-18T07:01:27.398487Z",
"iopub.status.idle": "2023-08-18T07:01:27.403514Z",
"shell.execute_reply": "2023-08-18T07:01:27.402414Z"
},
"origin_pos": 8,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def use_svg_display(): #@save\n",
" \"\"\"使用svg格式在Jupyter中显示绘图\"\"\"\n",
" backend_inline.set_matplotlib_formats('svg')"
]
},
{
"cell_type": "markdown",
"id": "8b1650c6",
"metadata": {
"origin_pos": 9
},
"source": [
"我们定义`set_figsize`函数来设置图表大小。\n",
"注意,这里可以直接使用`d2l.plt`,因为导入语句\n",
"`from matplotlib import pyplot as plt`已标记为保存到`d2l`包中。\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "acef7e22",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.408462Z",
"iopub.status.busy": "2023-08-18T07:01:27.407659Z",
"iopub.status.idle": "2023-08-18T07:01:27.414090Z",
"shell.execute_reply": "2023-08-18T07:01:27.412718Z"
},
"origin_pos": 10,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def set_figsize(figsize=(3.5, 2.5)): #@save\n",
" \"\"\"设置matplotlib的图表大小\"\"\"\n",
" use_svg_display()\n",
" d2l.plt.rcParams['figure.figsize'] = figsize"
]
},
{
"cell_type": "markdown",
"id": "71a62720",
"metadata": {
"origin_pos": 11
},
"source": [
"下面的`set_axes`函数用于设置由`matplotlib`生成图表的轴的属性。\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0ad890f8",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.419140Z",
"iopub.status.busy": "2023-08-18T07:01:27.418455Z",
"iopub.status.idle": "2023-08-18T07:01:27.426061Z",
"shell.execute_reply": "2023-08-18T07:01:27.424739Z"
},
"origin_pos": 12,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"#@save\n",
"def set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend):\n",
" \"\"\"设置matplotlib的轴\"\"\"\n",
" axes.set_xlabel(xlabel)\n",
" axes.set_ylabel(ylabel)\n",
" axes.set_xscale(xscale)\n",
" axes.set_yscale(yscale)\n",
" axes.set_xlim(xlim)\n",
" axes.set_ylim(ylim)\n",
" if legend:\n",
" axes.legend(legend)\n",
" axes.grid()"
]
},
{
"cell_type": "markdown",
"id": "30e5a1f9",
"metadata": {
"origin_pos": 13
},
"source": [
"通过这三个用于图形配置的函数,定义一个`plot`函数来简洁地绘制多条曲线,\n",
"因为我们需要在整个书中可视化许多曲线。\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "00c43fac",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.431229Z",
"iopub.status.busy": "2023-08-18T07:01:27.430462Z",
"iopub.status.idle": "2023-08-18T07:01:27.441418Z",
"shell.execute_reply": "2023-08-18T07:01:27.440390Z"
},
"origin_pos": 14,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"#@save\n",
"def plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None,\n",
" ylim=None, xscale='linear', yscale='linear',\n",
" fmts=('-', 'm--', 'g-.', 'r:'), figsize=(3.5, 2.5), axes=None):\n",
" \"\"\"绘制数据点\"\"\"\n",
" if legend is None:\n",
" legend = []\n",
"\n",
" set_figsize(figsize)\n",
" axes = axes if axes else d2l.plt.gca()\n",
"\n",
" # 如果X有一个轴,输出True\n",
" def has_one_axis(X):\n",
" return (hasattr(X, \"ndim\") and X.ndim == 1 or isinstance(X, list)\n",
" and not hasattr(X[0], \"__len__\"))\n",
"\n",
" if has_one_axis(X):\n",
" X = [X]\n",
" if Y is None:\n",
" X, Y = [[]] * len(X), X\n",
" elif has_one_axis(Y):\n",
" Y = [Y]\n",
" if len(X) != len(Y):\n",
" X = X * len(Y)\n",
" axes.cla()\n",
" for x, y, fmt in zip(X, Y, fmts):\n",
" if len(x):\n",
" axes.plot(x, y, fmt)\n",
" else:\n",
" axes.plot(y, fmt)\n",
" set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)"
]
},
{
"cell_type": "markdown",
"id": "d9dbfa1f",
"metadata": {
"origin_pos": 15
},
"source": [
"现在我们可以[**绘制函数$u=f(x)$及其在$x=1$处的切线$y=2x-3$**],\n",
"其中系数$2$是切线的斜率。\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f09a2c12",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:01:27.445931Z",
"iopub.status.busy": "2023-08-18T07:01:27.445122Z",
"iopub.status.idle": "2023-08-18T07:01:27.699931Z",
"shell.execute_reply": "2023-08-18T07:01:27.698662Z"
},
"origin_pos": 16,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"x = np.arange(0, 3, 0.1)\n",
"plot(x, [f(x), 2 * x - 3], 'x', 'f(x)', legend=['f(x)', 'Tangent line (x=1)'])"
]
},
{
"cell_type": "markdown",
"id": "c292a783",
"metadata": {
"origin_pos": 17
},
"source": [
"## 偏导数\n",
"\n",
"到目前为止,我们只讨论了仅含一个变量的函数的微分。\n",
"在深度学习中,函数通常依赖于许多变量。\n",
"因此,我们需要将微分的思想推广到*多元函数*(multivariate function)上。\n",
"\n",
"设$y = f(x_1, x_2, \\ldots, x_n)$是一个具有$n$个变量的函数。\n",
"$y$关于第$i$个参数$x_i$的*偏导数*(partial derivative)为:\n",
"\n",
"$$ \\frac{\\partial y}{\\partial x_i} = \\lim_{h \\rightarrow 0} \\frac{f(x_1, \\ldots, x_{i-1}, x_i+h, x_{i+1}, \\ldots, x_n) - f(x_1, \\ldots, x_i, \\ldots, x_n)}{h}.$$\n",
"\n",
"为了计算$\\frac{\\partial y}{\\partial x_i}$,\n",
"我们可以简单地将$x_1, \\ldots, x_{i-1}, x_{i+1}, \\ldots, x_n$看作常数,\n",
"并计算$y$关于$x_i$的导数。\n",
"对于偏导数的表示,以下是等价的:\n",
"\n",
"$$\\frac{\\partial y}{\\partial x_i} = \\frac{\\partial f}{\\partial x_i} = f_{x_i} = f_i = D_i f = D_{x_i} f.$$\n",
"\n",
"## 梯度\n",
":label:`subsec_calculus-grad`\n",
"\n",
"我们可以连结一个多元函数对其所有变量的偏导数,以得到该函数的*梯度*(gradient)向量。\n",
"具体而言,设函数$f:\\mathbb{R}^n\\rightarrow\\mathbb{R}$的输入是\n",
"一个$n$维向量$\\mathbf{x}=[x_1,x_2,\\ldots,x_n]^\\top$,并且输出是一个标量。\n",
"函数$f(\\mathbf{x})$相对于$\\mathbf{x}$的梯度是一个包含$n$个偏导数的向量:\n",
"\n",
"$$\\nabla_{\\mathbf{x}} f(\\mathbf{x}) = \\bigg[\\frac{\\partial f(\\mathbf{x})}{\\partial x_1}, \\frac{\\partial f(\\mathbf{x})}{\\partial x_2}, \\ldots, \\frac{\\partial f(\\mathbf{x})}{\\partial x_n}\\bigg]^\\top,$$\n",
"\n",
"其中$\\nabla_{\\mathbf{x}} f(\\mathbf{x})$通常在没有歧义时被$\\nabla f(\\mathbf{x})$取代。\n",
"\n",
"假设$\\mathbf{x}$为$n$维向量,在微分多元函数时经常使用以下规则:\n",
"\n",
"* 对于所有$\\mathbf{A} \\in \\mathbb{R}^{m \\times n}$,都有$\\nabla_{\\mathbf{x}} \\mathbf{A} \\mathbf{x} = \\mathbf{A}^\\top$\n",
"* 对于所有$\\mathbf{A} \\in \\mathbb{R}^{n \\times m}$,都有$\\nabla_{\\mathbf{x}} \\mathbf{x}^\\top \\mathbf{A} = \\mathbf{A}$\n",
"* 对于所有$\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$,都有$\\nabla_{\\mathbf{x}} \\mathbf{x}^\\top \\mathbf{A} \\mathbf{x} = (\\mathbf{A} + \\mathbf{A}^\\top)\\mathbf{x}$\n",
"* $\\nabla_{\\mathbf{x}} \\|\\mathbf{x} \\|^2 = \\nabla_{\\mathbf{x}} \\mathbf{x}^\\top \\mathbf{x} = 2\\mathbf{x}$\n",
"\n",
"同样,对于任何矩阵$\\mathbf{X}$,都有$\\nabla_{\\mathbf{X}} \\|\\mathbf{X} \\|_F^2 = 2\\mathbf{X}$。\n",
"正如我们之后将看到的,梯度对于设计深度学习中的优化算法有很大用处。\n",
"\n",
"## 链式法则\n",
"\n",
"然而,上面方法可能很难找到梯度。\n",
"这是因为在深度学习中,多元函数通常是*复合*(composite)的,\n",
"所以难以应用上述任何规则来微分这些函数。\n",
"幸运的是,链式法则可以被用来微分复合函数。\n",
"\n",
"让我们先考虑单变量函数。假设函数$y=f(u)$和$u=g(x)$都是可微的,根据链式法则:\n",
"\n",
"$$\\frac{dy}{dx} = \\frac{dy}{du} \\frac{du}{dx}.$$\n",
"\n",
"现在考虑一个更一般的场景,即函数具有任意数量的变量的情况。\n",
"假设可微分函数$y$有变量$u_1, u_2, \\ldots, u_m$,其中每个可微分函数$u_i$都有变量$x_1, x_2, \\ldots, x_n$。\n",
"注意,$y$是$x_1, x_2, \\ldots, x_n$的函数。\n",
"对于任意$i = 1, 2, \\ldots, n$,链式法则给出:\n",
"\n",
"$$\\frac{\\partial y}{\\partial x_i} = \\frac{\\partial y}{\\partial u_1} \\frac{\\partial u_1}{\\partial x_i} + \\frac{\\partial y}{\\partial u_2} \\frac{\\partial u_2}{\\partial x_i} + \\cdots + \\frac{\\partial y}{\\partial u_m} \\frac{\\partial u_m}{\\partial x_i}$$\n",
"\n",
"## 小结\n",
"\n",
"* 微分和积分是微积分的两个分支,前者可以应用于深度学习中的优化问题。\n",
"* 导数可以被解释为函数相对于其变量的瞬时变化率,它也是函数曲线的切线的斜率。\n",
"* 梯度是一个向量,其分量是多变量函数相对于其所有变量的偏导数。\n",
"* 链式法则可以用来微分复合函数。\n",
"\n",
"## 练习\n",
"\n",
"1. 绘制函数$y = f(x) = x^3 - \\frac{1}{x}$和其在$x = 1$处切线的图像。\n",
"1. 求函数$f(\\mathbf{x}) = 3x_1^2 + 5e^{x_2}$的梯度。\n",
"1. 函数$f(\\mathbf{x}) = \\|\\mathbf{x}\\|_2$的梯度是什么?\n",
"1. 尝试写出函数$u = f(x, y, z)$,其中$x = x(a, b)$,$y = y(a, b)$,$z = z(a, b)$的链式法则。\n"
]
},
{
"cell_type": "markdown",
"id": "fed29a76",
"metadata": {
"origin_pos": 19,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/1756)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": []
},
"nbformat": 4,
"nbformat_minor": 5
}