Files
2025-12-16 09:23:53 +08:00

1974 lines
84 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "c9c4a4e3",
"metadata": {
"origin_pos": 0
},
"source": [
"# 凸性\n",
":label:`sec_convexity`\n",
"\n",
"*凸性*convexity)在优化算法的设计中起到至关重要的作用,\n",
"这主要是由于在这种情况下对算法进行分析和测试要容易。\n",
"换言之,如果算法在凸性条件设定下的效果很差,\n",
"那通常我们很难在其他条件下看到好的结果。\n",
"此外,即使深度学习中的优化问题通常是非凸的,\n",
"它们也经常在局部极小值附近表现出一些凸性。\n",
"这可能会产生一些像 :cite:`Izmailov.Podoprikhin.Garipov.ea.2018`这样比较有意思的新优化变体。\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3afc8359",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:06:35.207505Z",
"iopub.status.busy": "2023-08-18T07:06:35.207236Z",
"iopub.status.idle": "2023-08-18T07:06:37.175391Z",
"shell.execute_reply": "2023-08-18T07:06:37.174479Z"
},
"origin_pos": 2,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import numpy as np\n",
"import torch\n",
"from mpl_toolkits import mplot3d\n",
"from d2l import torch as d2l"
]
},
{
"cell_type": "markdown",
"id": "a89373e1",
"metadata": {
"origin_pos": 5
},
"source": [
"## 定义\n",
"\n",
"在进行凸分析之前,我们需要定义*凸集*(convex sets)和*凸函数*convex functions)。\n",
"\n",
"### 凸集\n",
"\n",
"*凸集*convex set)是凸性的基础。\n",
"简单地说,如果对于任何$a, b \\in \\mathcal{X}$,连接$a$和$b$的线段也位于$\\mathcal{X}$中,则向量空间中的一个集合$\\mathcal{X}$是*凸*convex)的。\n",
"在数学术语上,这意味着对于所有$\\lambda \\in [0, 1]$,我们得到\n",
"\n",
"$$\\lambda a + (1-\\lambda) b \\in \\mathcal{X} \\text{ 当 } a, b \\in \\mathcal{X}.$$\n",
"\n",
"这听起来有点抽象,那我们来看一下 :numref:`fig_pacman`里的例子。\n",
"第一组存在不包含在集合内部的线段,所以该集合是非凸的,而另外两组则没有这样的问题。\n",
"\n",
"![第一组是非凸的,另外两组是凸的。](../img/pacman.svg)\n",
":label:`fig_pacman`\n",
"\n",
"接下来来看一下交集 :numref:`fig_convex_intersect`。\n",
"假设$\\mathcal{X}$和$\\mathcal{Y}$是凸集,那么$\\mathcal {X} \\cap \\mathcal{Y}$也是凸集的。\n",
"现在考虑任意$a, b \\in \\mathcal{X} \\cap \\mathcal{Y}$\n",
"因为$\\mathcal{X}$和$\\mathcal{Y}$是凸集,\n",
"所以连接$a$和$b$的线段包含在$\\mathcal{X}$和$\\mathcal{Y}$中。\n",
"鉴于此,它们也需要包含在$\\mathcal {X} \\cap \\mathcal{Y}$中,从而证明我们的定理。\n",
"\n",
"![两个凸集的交集是凸的。](../img/convex-intersect.svg)\n",
":label:`fig_convex_intersect`\n",
"\n",
"我们可以毫不费力地进一步得到这样的结果:\n",
"给定凸集$\\mathcal{X}_i$,它们的交集$\\cap_{i} \\mathcal{X}_i$是凸的。\n",
"但是反向是不正确的,考虑两个不相交的集合$\\mathcal{X} \\cap \\mathcal{Y} = \\emptyset$\n",
"取$a \\in \\mathcal{X}$和$b \\in \\mathcal{Y}$。\n",
"因为我们假设$\\mathcal{X} \\cap \\mathcal{Y} = \\emptyset$\n",
"在 :numref:`fig_nonconvex`中连接$a$和$b$的线段需要包含一部分既不在$\\mathcal{X}$也不在$\\mathcal{Y}$中。\n",
"因此线段也不在$\\mathcal{X} \\cup \\mathcal{Y}$中,因此证明了凸集的并集不一定是凸的,即*非凸*(nonconvex)的。\n",
"\n",
"![两个凸集的并集不一定是凸的。](../img/nonconvex.svg)\n",
":label:`fig_nonconvex`\n",
"\n",
"通常,深度学习中的问题是在凸集上定义的。\n",
"例如,$\\mathbb{R}^d$,即实数的$d$-维向量的集合是凸集(毕竟$\\mathbb{R}^d$中任意两点之间的线存在$\\mathbb{R}^d$)中。\n",
"在某些情况下,我们使用有界长度的变量,例如球的半径定义为$\\{\\mathbf{x} | \\mathbf{x} \\in \\mathbb{R}^d \\text{ 且 } \\| \\mathbf{x} \\| \\leq r\\}$。\n",
"\n",
"### 凸函数\n",
"\n",
"现在我们有了凸集,我们可以引入*凸函数*(convex function$f$。\n",
"给定一个凸集$\\mathcal{X}$,如果对于所有$x, x' \\in \\mathcal{X}$和所有$\\lambda \\in [0, 1]$,函数$f: \\mathcal{X} \\to \\mathbb{R}$是凸的,我们可以得到\n",
"\n",
"$$\\lambda f(x) + (1-\\lambda) f(x') \\geq f(\\lambda x + (1-\\lambda) x').$$\n",
"\n",
"为了说明这一点,让我们绘制一些函数并检查哪些函数满足要求。\n",
"下面我们定义一些函数,包括凸函数和非凸函数。\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "eda6db54",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:06:37.180140Z",
"iopub.status.busy": "2023-08-18T07:06:37.179376Z",
"iopub.status.idle": "2023-08-18T07:06:37.639388Z",
"shell.execute_reply": "2023-08-18T07:06:37.638499Z"
},
"origin_pos": 6,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"539.503125pt\" height=\"194.158125pt\" viewBox=\"0 0 539.503125 194.158125\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
" <metadata>\n",
" <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n",
" <cc:Work>\n",
" <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n",
" <dc:date>2023-08-18T07:06:37.570318</dc:date>\n",
" <dc:format>image/svg+xml</dc:format>\n",
" <dc:creator>\n",
" <cc:Agent>\n",
" <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>\n",
" </cc:Agent>\n",
" </dc:creator>\n",
" </cc:Work>\n",
" </rdf:RDF>\n",
" </metadata>\n",
" <defs>\n",
" <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n",
" </defs>\n",
" <g id=\"figure_1\">\n",
" <g id=\"patch_1\">\n",
" <path d=\"M 0 194.158125 \n",
"L 539.503125 194.158125 \n",
"L 539.503125 0 \n",
"L 0 0 \n",
"L 0 194.158125 \n",
"z\n",
"\" style=\"fill: none\"/>\n",
" </g>\n",
" <g id=\"axes_1\">\n",
" <g id=\"patch_2\">\n",
" <path d=\"M 30.103125 170.28 \n",
"L 177.809007 170.28 \n",
"L 177.809007 7.2 \n",
"L 30.103125 7.2 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"matplotlib.axis_1\">\n",
" <g id=\"xtick_1\">\n",
" <g id=\"line2d_1\">\n",
" <path d=\"M 36.817029 170.28 \n",
"L 36.817029 7.2 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_2\">\n",
" <defs>\n",
" <path id=\"m2ab13f32ce\" d=\"M 0 0 \n",
"L 0 3.5 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#m2ab13f32ce\" x=\"36.817029\" y=\"170.28\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_1\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(29.445935 184.878438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-2212\" d=\"M 678 2272 \n",
"L 4684 2272 \n",
"L 4684 1741 \n",
"L 678 1741 \n",
"L 678 2272 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-32\" d=\"M 1228 531 \n",
"L 3431 531 \n",
"L 3431 0 \n",
"L 469 0 \n",
"L 469 531 \n",
"Q 828 903 1448 1529 \n",
"Q 2069 2156 2228 2338 \n",
"Q 2531 2678 2651 2914 \n",
"Q 2772 3150 2772 3378 \n",
"Q 2772 3750 2511 3984 \n",
"Q 2250 4219 1831 4219 \n",
"Q 1534 4219 1204 4116 \n",
"Q 875 4013 500 3803 \n",
"L 500 4441 \n",
"Q 881 4594 1212 4672 \n",
"Q 1544 4750 1819 4750 \n",
"Q 2544 4750 2975 4387 \n",
"Q 3406 4025 3406 3419 \n",
"Q 3406 3131 3298 2873 \n",
"Q 3191 2616 2906 2266 \n",
"Q 2828 2175 2409 1742 \n",
"Q 1991 1309 1228 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
" <use xlink:href=\"#DejaVuSans-32\" x=\"83.789062\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_2\">\n",
" <g id=\"line2d_3\">\n",
" <path d=\"M 104.124334 170.28 \n",
"L 104.124334 7.2 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_4\">\n",
" <g>\n",
" <use xlink:href=\"#m2ab13f32ce\" x=\"104.124334\" y=\"170.28\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_2\">\n",
" <!-- 0 -->\n",
" <g transform=\"translate(100.943084 184.878438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-30\" d=\"M 2034 4250 \n",
"Q 1547 4250 1301 3770 \n",
"Q 1056 3291 1056 2328 \n",
"Q 1056 1369 1301 889 \n",
"Q 1547 409 2034 409 \n",
"Q 2525 409 2770 889 \n",
"Q 3016 1369 3016 2328 \n",
"Q 3016 3291 2770 3770 \n",
"Q 2525 4250 2034 4250 \n",
"z\n",
"M 2034 4750 \n",
"Q 2819 4750 3233 4129 \n",
"Q 3647 3509 3647 2328 \n",
"Q 3647 1150 3233 529 \n",
"Q 2819 -91 2034 -91 \n",
"Q 1250 -91 836 529 \n",
"Q 422 1150 422 2328 \n",
"Q 422 3509 836 4129 \n",
"Q 1250 4750 2034 4750 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_3\">\n",
" <g id=\"line2d_5\">\n",
" <path d=\"M 171.43164 170.28 \n",
"L 171.43164 7.2 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_6\">\n",
" <g>\n",
" <use xlink:href=\"#m2ab13f32ce\" x=\"171.43164\" y=\"170.28\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_3\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(168.25039 184.878438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"matplotlib.axis_2\">\n",
" <g id=\"ytick_1\">\n",
" <g id=\"line2d_7\">\n",
" <path d=\"M 30.103125 162.867273 \n",
"L 177.809007 162.867273 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_8\">\n",
" <defs>\n",
" <path id=\"m8ed05b8f4d\" d=\"M 0 0 \n",
"L -3.5 0 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"30.103125\" y=\"162.867273\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_4\">\n",
" <!-- 0.0 -->\n",
" <g transform=\"translate(7.2 166.666491)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-2e\" d=\"M 684 794 \n",
"L 1344 794 \n",
"L 1344 0 \n",
"L 684 0 \n",
"L 684 794 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_2\">\n",
" <g id=\"line2d_9\">\n",
" <path d=\"M 30.103125 125.803636 \n",
"L 177.809007 125.803636 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_10\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"30.103125\" y=\"125.803636\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_5\">\n",
" <!-- 0.5 -->\n",
" <g transform=\"translate(7.2 129.602855)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-35\" d=\"M 691 4666 \n",
"L 3169 4666 \n",
"L 3169 4134 \n",
"L 1269 4134 \n",
"L 1269 2991 \n",
"Q 1406 3038 1543 3061 \n",
"Q 1681 3084 1819 3084 \n",
"Q 2600 3084 3056 2656 \n",
"Q 3513 2228 3513 1497 \n",
"Q 3513 744 3044 326 \n",
"Q 2575 -91 1722 -91 \n",
"Q 1428 -91 1123 -41 \n",
"Q 819 9 494 109 \n",
"L 494 744 \n",
"Q 775 591 1075 516 \n",
"Q 1375 441 1709 441 \n",
"Q 2250 441 2565 725 \n",
"Q 2881 1009 2881 1497 \n",
"Q 2881 1984 2565 2268 \n",
"Q 2250 2553 1709 2553 \n",
"Q 1456 2553 1204 2497 \n",
"Q 953 2441 691 2322 \n",
"L 691 4666 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-35\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_3\">\n",
" <g id=\"line2d_11\">\n",
" <path d=\"M 30.103125 88.74 \n",
"L 177.809007 88.74 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_12\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"30.103125\" y=\"88.74\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_6\">\n",
" <!-- 1.0 -->\n",
" <g transform=\"translate(7.2 92.539219)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-31\" d=\"M 794 531 \n",
"L 1825 531 \n",
"L 1825 4091 \n",
"L 703 3866 \n",
"L 703 4441 \n",
"L 1819 4666 \n",
"L 2450 4666 \n",
"L 2450 531 \n",
"L 3481 531 \n",
"L 3481 0 \n",
"L 794 0 \n",
"L 794 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-31\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_4\">\n",
" <g id=\"line2d_13\">\n",
" <path d=\"M 30.103125 51.676364 \n",
"L 177.809007 51.676364 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_14\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"30.103125\" y=\"51.676364\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_7\">\n",
" <!-- 1.5 -->\n",
" <g transform=\"translate(7.2 55.475582)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-35\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_5\">\n",
" <g id=\"line2d_15\">\n",
" <path d=\"M 30.103125 14.612727 \n",
"L 177.809007 14.612727 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_16\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"30.103125\" y=\"14.612727\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_8\">\n",
" <!-- 2.0 -->\n",
" <g transform=\"translate(7.2 18.411946)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_17\">\n",
" <path d=\"M 36.817029 14.612727 \n",
"L 40.855467 31.869557 \n",
"L 44.55737 46.750606 \n",
"L 48.259272 60.734717 \n",
"L 51.96117 73.821881 \n",
"L 55.32654 84.940988 \n",
"L 58.691902 95.318791 \n",
"L 62.057268 104.955341 \n",
"L 65.086094 112.994434 \n",
"L 68.114924 120.433112 \n",
"L 71.143756 127.271359 \n",
"L 73.836048 132.845729 \n",
"L 76.528339 137.945684 \n",
"L 79.220631 142.571224 \n",
"L 81.912923 146.722351 \n",
"L 84.26868 149.965422 \n",
"L 86.624435 152.845266 \n",
"L 88.980191 155.361887 \n",
"L 90.999409 157.229893 \n",
"L 93.018628 158.831042 \n",
"L 95.037849 160.165334 \n",
"L 97.057067 161.232766 \n",
"L 99.076287 162.033341 \n",
"L 101.095506 162.567057 \n",
"L 103.114725 162.833915 \n",
"L 105.133944 162.833915 \n",
"L 107.153163 162.567057 \n",
"L 109.172382 162.033341 \n",
"L 111.191601 161.232766 \n",
"L 113.21082 160.165334 \n",
"L 115.230039 158.831043 \n",
"L 117.249258 157.229894 \n",
"L 119.268479 155.361886 \n",
"L 121.624233 152.845266 \n",
"L 123.979989 149.965422 \n",
"L 126.335744 146.722354 \n",
"L 128.691501 143.11606 \n",
"L 131.383793 138.549822 \n",
"L 134.076085 133.509168 \n",
"L 136.768376 127.994098 \n",
"L 139.460668 122.004619 \n",
"L 142.489498 114.699371 \n",
"L 145.518328 106.793697 \n",
"L 148.547154 98.2876 \n",
"L 151.91252 88.132162 \n",
"L 155.277886 77.235447 \n",
"L 158.643252 65.597469 \n",
"L 162.34515 51.939528 \n",
"L 166.047056 37.384622 \n",
"L 169.748955 21.932804 \n",
"L 171.095104 16.091569 \n",
"L 171.095104 16.091569 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_18\">\n",
" <path d=\"M 53.643855 79.474091 \n",
"L 137.777987 125.803636 \n",
"\" clip-path=\"url(#p5f014a9bbe)\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"patch_3\">\n",
" <path d=\"M 30.103125 170.28 \n",
"L 30.103125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_4\">\n",
" <path d=\"M 177.809007 170.28 \n",
"L 177.809007 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_5\">\n",
" <path d=\"M 30.103125 170.28 \n",
"L 177.809007 170.28 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_6\">\n",
" <path d=\"M 30.103125 7.2 \n",
"L 177.809007 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"axes_2\">\n",
" <g id=\"patch_7\">\n",
" <path d=\"M 207.350184 170.28 \n",
"L 355.056066 170.28 \n",
"L 355.056066 7.2 \n",
"L 207.350184 7.2 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"matplotlib.axis_3\">\n",
" <g id=\"xtick_4\">\n",
" <g id=\"line2d_19\">\n",
" <path d=\"M 214.064088 170.28 \n",
"L 214.064088 7.2 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_20\">\n",
" <g>\n",
" <use xlink:href=\"#m2ab13f32ce\" x=\"214.064088\" y=\"170.28\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_9\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(206.692994 184.878438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
" <use xlink:href=\"#DejaVuSans-32\" x=\"83.789062\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_5\">\n",
" <g id=\"line2d_21\">\n",
" <path d=\"M 281.371393 170.28 \n",
"L 281.371393 7.2 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_22\">\n",
" <g>\n",
" <use xlink:href=\"#m2ab13f32ce\" x=\"281.371393\" y=\"170.28\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_10\">\n",
" <!-- 0 -->\n",
" <g transform=\"translate(278.190143 184.878438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_6\">\n",
" <g id=\"line2d_23\">\n",
" <path d=\"M 348.678699 170.28 \n",
"L 348.678699 7.2 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_24\">\n",
" <g>\n",
" <use xlink:href=\"#m2ab13f32ce\" x=\"348.678699\" y=\"170.28\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_11\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(345.497449 184.878438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"matplotlib.axis_4\">\n",
" <g id=\"ytick_6\">\n",
" <g id=\"line2d_25\">\n",
" <path d=\"M 207.350184 162.867273 \n",
"L 355.056066 162.867273 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_26\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"207.350184\" y=\"162.867273\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_12\">\n",
" <!-- 1.0 -->\n",
" <g transform=\"translate(176.067371 166.666491)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
" <use xlink:href=\"#DejaVuSans-31\" x=\"83.789062\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"147.412109\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"179.199219\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_7\">\n",
" <g id=\"line2d_27\">\n",
" <path d=\"M 207.350184 125.803636 \n",
"L 355.056066 125.803636 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_28\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"207.350184\" y=\"125.803636\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_13\">\n",
" <!-- 0.5 -->\n",
" <g transform=\"translate(176.067371 129.602855)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"83.789062\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"147.412109\"/>\n",
" <use xlink:href=\"#DejaVuSans-35\" x=\"179.199219\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_8\">\n",
" <g id=\"line2d_29\">\n",
" <path d=\"M 207.350184 88.74 \n",
"L 355.056066 88.74 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_30\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"207.350184\" y=\"88.74\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_14\">\n",
" <!-- 0.0 -->\n",
" <g transform=\"translate(184.447059 92.539219)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_9\">\n",
" <g id=\"line2d_31\">\n",
" <path d=\"M 207.350184 51.676364 \n",
"L 355.056066 51.676364 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_32\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"207.350184\" y=\"51.676364\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_15\">\n",
" <!-- 0.5 -->\n",
" <g transform=\"translate(184.447059 55.475582)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-35\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_10\">\n",
" <g id=\"line2d_33\">\n",
" <path d=\"M 207.350184 14.612727 \n",
"L 355.056066 14.612727 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_34\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"207.350184\" y=\"14.612727\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_16\">\n",
" <!-- 1.0 -->\n",
" <g transform=\"translate(184.447059 18.411946)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_35\">\n",
" <path d=\"M 214.064088 14.612727 \n",
"L 214.73716 14.759 \n",
"L 215.410232 15.197241 \n",
"L 216.083305 15.925718 \n",
"L 216.756381 16.941571 \n",
"L 217.76599 18.995074 \n",
"L 218.775598 21.667638 \n",
"L 219.785207 24.935523 \n",
"L 221.131352 30.167948 \n",
"L 222.477501 36.324086 \n",
"L 224.160182 45.169065 \n",
"L 226.179403 57.17813 \n",
"L 228.871693 74.849893 \n",
"L 235.938961 122.393059 \n",
"L 237.958182 134.173115 \n",
"L 239.640863 142.776447 \n",
"L 240.987008 148.710206 \n",
"L 242.333153 153.698213 \n",
"L 243.342766 156.770649 \n",
"L 244.352374 159.239223 \n",
"L 245.361983 161.082047 \n",
"L 246.035059 161.954645 \n",
"L 246.708132 162.538293 \n",
"L 247.381204 162.830693 \n",
"L 248.054279 162.830693 \n",
"L 248.727351 162.538293 \n",
"L 249.400423 161.954645 \n",
"L 250.073496 161.082047 \n",
"L 250.74657 159.923954 \n",
"L 251.756179 157.6618 \n",
"L 252.765787 154.787888 \n",
"L 253.775398 151.327731 \n",
"L 255.121543 145.856058 \n",
"L 256.46769 139.483612 \n",
"L 258.150371 130.40572 \n",
"L 260.169592 118.179497 \n",
"L 262.861884 100.336068 \n",
"L 269.256078 57.17815 \n",
"L 271.275298 45.169078 \n",
"L 272.95798 36.324104 \n",
"L 274.304126 30.167961 \n",
"L 275.650272 24.935541 \n",
"L 276.996418 20.709351 \n",
"L 278.006028 18.240772 \n",
"L 279.015637 16.397953 \n",
"L 279.688711 15.525355 \n",
"L 280.361784 14.941707 \n",
"L 281.034857 14.649307 \n",
"L 281.70793 14.649307 \n",
"L 282.381003 14.941707 \n",
"L 283.054076 15.525355 \n",
"L 283.727149 16.397953 \n",
"L 284.400222 17.556046 \n",
"L 285.409831 19.818205 \n",
"L 286.419441 22.692116 \n",
"L 287.42905 26.152274 \n",
"L 288.775197 31.623955 \n",
"L 290.121343 37.996388 \n",
"L 291.804026 47.074293 \n",
"L 293.823245 59.30051 \n",
"L 296.515537 77.143947 \n",
"L 302.90973 120.301866 \n",
"L 304.92895 132.310913 \n",
"L 306.611633 141.155896 \n",
"L 307.95778 147.312039 \n",
"L 309.303924 152.544464 \n",
"L 310.650071 156.770654 \n",
"L 311.65968 159.239223 \n",
"L 312.66929 161.082047 \n",
"L 313.342363 161.954645 \n",
"L 314.015435 162.538293 \n",
"L 314.688508 162.830693 \n",
"L 315.361582 162.830693 \n",
"L 316.034654 162.538293 \n",
"L 316.707727 161.954645 \n",
"L 317.380799 161.082051 \n",
"L 318.053872 159.923958 \n",
"L 319.063484 157.661795 \n",
"L 320.073093 154.787884 \n",
"L 321.082702 151.327731 \n",
"L 322.42885 145.85604 \n",
"L 323.774995 139.483598 \n",
"L 325.457676 130.40572 \n",
"L 327.476898 118.179468 \n",
"L 330.169191 100.336029 \n",
"L 336.563383 57.17813 \n",
"L 338.5826 45.169096 \n",
"L 340.265285 36.324086 \n",
"L 341.61143 30.16797 \n",
"L 342.957579 24.935523 \n",
"L 344.303724 20.709351 \n",
"L 345.313333 18.240772 \n",
"L 346.322941 16.397949 \n",
"L 346.996014 15.525355 \n",
"L 347.669086 14.941712 \n",
"L 348.342162 14.649307 \n",
"L 348.342162 14.649307 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_36\">\n",
" <path d=\"M 230.890914 88.739999 \n",
"L 315.025046 162.867273 \n",
"\" clip-path=\"url(#pe19d02a630)\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"patch_8\">\n",
" <path d=\"M 207.350184 170.28 \n",
"L 207.350184 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_9\">\n",
" <path d=\"M 355.056066 170.28 \n",
"L 355.056066 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_10\">\n",
" <path d=\"M 207.350184 170.28 \n",
"L 355.056066 170.28 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_11\">\n",
" <path d=\"M 207.350184 7.2 \n",
"L 355.056066 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"axes_3\">\n",
" <g id=\"patch_12\">\n",
" <path d=\"M 384.597243 170.28 \n",
"L 532.303125 170.28 \n",
"L 532.303125 7.2 \n",
"L 384.597243 7.2 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"matplotlib.axis_5\">\n",
" <g id=\"xtick_7\">\n",
" <g id=\"line2d_37\">\n",
" <path d=\"M 391.311146 170.28 \n",
"L 391.311146 7.2 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_38\">\n",
" <g>\n",
" <use xlink:href=\"#m2ab13f32ce\" x=\"391.311146\" y=\"170.28\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_17\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(383.940053 184.878438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
" <use xlink:href=\"#DejaVuSans-32\" x=\"83.789062\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_8\">\n",
" <g id=\"line2d_39\">\n",
" <path d=\"M 458.618452 170.28 \n",
"L 458.618452 7.2 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_40\">\n",
" <g>\n",
" <use xlink:href=\"#m2ab13f32ce\" x=\"458.618452\" y=\"170.28\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_18\">\n",
" <!-- 0 -->\n",
" <g transform=\"translate(455.437202 184.878438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_9\">\n",
" <g id=\"line2d_41\">\n",
" <path d=\"M 525.925757 170.28 \n",
"L 525.925757 7.2 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_42\">\n",
" <g>\n",
" <use xlink:href=\"#m2ab13f32ce\" x=\"525.925757\" y=\"170.28\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_19\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(522.744507 184.878438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"matplotlib.axis_6\">\n",
" <g id=\"ytick_11\">\n",
" <g id=\"line2d_43\">\n",
" <path d=\"M 384.597243 154.485256 \n",
"L 532.303125 154.485256 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_44\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"384.597243\" y=\"154.485256\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_20\">\n",
" <!-- 0.5 -->\n",
" <g transform=\"translate(361.694118 158.284475)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-35\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_12\">\n",
" <g id=\"line2d_45\">\n",
" <path d=\"M 384.597243 122.764163 \n",
"L 532.303125 122.764163 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_46\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"384.597243\" y=\"122.764163\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_21\">\n",
" <!-- 1.0 -->\n",
" <g transform=\"translate(361.694118 126.563382)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_13\">\n",
" <g id=\"line2d_47\">\n",
" <path d=\"M 384.597243 91.043071 \n",
"L 532.303125 91.043071 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_48\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"384.597243\" y=\"91.043071\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_22\">\n",
" <!-- 1.5 -->\n",
" <g transform=\"translate(361.694118 94.842289)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-35\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_14\">\n",
" <g id=\"line2d_49\">\n",
" <path d=\"M 384.597243 59.321978 \n",
"L 532.303125 59.321978 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_50\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"384.597243\" y=\"59.321978\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_23\">\n",
" <!-- 2.0 -->\n",
" <g transform=\"translate(361.694118 63.121197)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_15\">\n",
" <g id=\"line2d_51\">\n",
" <path d=\"M 384.597243 27.600885 \n",
"L 532.303125 27.600885 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_52\">\n",
" <g>\n",
" <use xlink:href=\"#m8ed05b8f4d\" x=\"384.597243\" y=\"27.600885\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_24\">\n",
" <!-- 2.5 -->\n",
" <g transform=\"translate(361.694118 31.400104)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
" <use xlink:href=\"#DejaVuSans-35\" x=\"95.410156\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_53\">\n",
" <path d=\"M 391.311146 162.867273 \n",
"L 398.041875 160.412683 \n",
"L 404.436075 157.842113 \n",
"L 410.493726 155.170933 \n",
"L 416.21485 152.41756 \n",
"L 421.935969 149.419918 \n",
"L 427.320555 146.356086 \n",
"L 432.705138 143.037075 \n",
"L 437.753187 139.674871 \n",
"L 442.801235 136.050806 \n",
"L 447.849283 132.144486 \n",
"L 452.560795 128.224557 \n",
"L 457.272306 124.020402 \n",
"L 461.983817 119.511414 \n",
"L 466.695328 114.675485 \n",
"L 471.070304 109.87154 \n",
"L 475.445278 104.744969 \n",
"L 479.820253 99.274105 \n",
"L 484.19523 93.435813 \n",
"L 488.570202 87.205445 \n",
"L 492.945177 80.556636 \n",
"L 496.983616 74.023624 \n",
"L 501.022054 67.086631 \n",
"L 505.060493 59.720679 \n",
"L 509.098931 51.899236 \n",
"L 513.13737 43.59416 \n",
"L 517.175804 34.775519 \n",
"L 521.214247 25.411547 \n",
"L 525.252681 15.46856 \n",
"L 525.589221 14.612727 \n",
"L 525.589221 14.612727 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_54\">\n",
" <path d=\"M 408.137973 156.238383 \n",
"L 492.272105 81.607872 \n",
"\" clip-path=\"url(#pd322d41aab)\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"patch_13\">\n",
" <path d=\"M 384.597243 170.28 \n",
"L 384.597243 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_14\">\n",
" <path d=\"M 532.303125 170.28 \n",
"L 532.303125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_15\">\n",
" <path d=\"M 384.597243 170.28 \n",
"L 532.303125 170.28 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_16\">\n",
" <path d=\"M 384.597243 7.2 \n",
"L 532.303125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <defs>\n",
" <clipPath id=\"p5f014a9bbe\">\n",
" <rect x=\"30.103125\" y=\"7.2\" width=\"147.705882\" height=\"163.08\"/>\n",
" </clipPath>\n",
" <clipPath id=\"pe19d02a630\">\n",
" <rect x=\"207.350184\" y=\"7.2\" width=\"147.705882\" height=\"163.08\"/>\n",
" </clipPath>\n",
" <clipPath id=\"pd322d41aab\">\n",
" <rect x=\"384.597243\" y=\"7.2\" width=\"147.705882\" height=\"163.08\"/>\n",
" </clipPath>\n",
" </defs>\n",
"</svg>\n"
],
"text/plain": [
"<Figure size 648x216 with 3 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"f = lambda x: 0.5 * x**2 # 凸函数\n",
"g = lambda x: torch.cos(np.pi * x) # 非凸函数\n",
"h = lambda x: torch.exp(0.5 * x) # 凸函数\n",
"\n",
"x, segment = torch.arange(-2, 2, 0.01), torch.tensor([-1.5, 1])\n",
"d2l.use_svg_display()\n",
"_, axes = d2l.plt.subplots(1, 3, figsize=(9, 3))\n",
"for ax, func in zip(axes, [f, g, h]):\n",
" d2l.plot([x, segment], [func(x), func(segment)], axes=ax)"
]
},
{
"cell_type": "markdown",
"id": "c175a6b6",
"metadata": {
"origin_pos": 8
},
"source": [
"不出所料,余弦函数为非凸的,而抛物线函数和指数函数为凸的。\n",
"请注意,为使该条件有意义,$\\mathcal{X}$是凸集的要求是必要的。\n",
"否则可能无法很好地界定$f(\\lambda x + (1-\\lambda) x')$的结果。\n",
"\n",
"### 詹森不等式\n",
"\n",
"给定一个凸函数$f$,最有用的数学工具之一就是*詹森不等式*Jensen's inequality)。\n",
"它是凸性定义的一种推广:\n",
"\n",
"$$\\sum_i \\alpha_i f(x_i) \\geq f\\left(\\sum_i \\alpha_i x_i\\right) \\text{ and } E_X[f(X)] \\geq f\\left(E_X[X]\\right),$$\n",
":eqlabel:`eq_jensens-inequality`\n",
"\n",
"其中$\\alpha_i$是满足$\\sum_i \\alpha_i = 1$的非负实数,$X$是随机变量。\n",
"换句话说,凸函数的期望不小于期望的凸函数,其中后者通常是一个更简单的表达式。\n",
"为了证明第一个不等式,我们多次将凸性的定义应用于一次求和中的一项。\n",
"\n",
"詹森不等式的一个常见应用:用一个较简单的表达式约束一个较复杂的表达式。\n",
"例如,它可以应用于部分观察到的随机变量的对数似然。\n",
"具体地说,由于$\\int P(Y) P(X \\mid Y) dY = P(X)$,所以\n",
"\n",
"$$E_{Y \\sim P(Y)}[-\\log P(X \\mid Y)] \\geq -\\log P(X),$$\n",
"\n",
"这里,$Y$是典型的未观察到的随机变量,$P(Y)$是它可能如何分布的最佳猜测,$P(X)$是将$Y$积分后的分布。\n",
"例如,在聚类中$Y$可能是簇标签,而在应用簇标签时,$P(X \\mid Y)$是生成模型。\n",
"\n",
"## 性质\n",
"\n",
"下面我们来看一下凸函数一些有趣的性质。\n",
"\n",
"### 局部极小值是全局极小值\n",
"\n",
"首先凸函数的局部极小值也是全局极小值。\n",
"下面我们用反证法给出证明。\n",
"\n",
"假设$x^{\\ast} \\in \\mathcal{X}$是一个局部最小值,则存在一个很小的正值$p$,使得当$x \\in \\mathcal{X}$满足$0 < |x - x^{\\ast}| \\leq p$时,有$f(x^{\\ast}) < f(x)$。\n",
"\n",
"现在假设局部极小值$x^{\\ast}$不是$f$的全局极小值:存在$x' \\in \\mathcal{X}$使得$f(x') < f(x^{\\ast})$。\n",
"则存在\n",
"$\\lambda \\in [0, 1)$,比如$\\lambda = 1 - \\frac{p}{|x^{\\ast} - x'|}$,使得\n",
"$0 < |\\lambda x^{\\ast} + (1-\\lambda) x' - x^{\\ast}| \\leq p$。\n",
"\n",
"然而,根据凸性的性质,有\n",
"\n",
"$$\\begin{aligned}\n",
" f(\\lambda x^{\\ast} + (1-\\lambda) x') &\\leq \\lambda f(x^{\\ast}) + (1-\\lambda) f(x') \\\\\n",
" &< \\lambda f(x^{\\ast}) + (1-\\lambda) f(x^{\\ast}) \\\\\n",
" &= f(x^{\\ast}), \\\\\n",
"\\end{aligned}$$\n",
"\n",
"这与$x^{\\ast}$是局部最小值相矛盾。\n",
"因此,不存在$x' \\in \\mathcal{X}$满足$f(x') < f(x^{\\ast})$。\n",
"综上所述,局部最小值$x^{\\ast}$也是全局最小值。\n",
"\n",
"例如,对于凸函数$f(x) = (x-1)^2$,有一个局部最小值$x=1$,它也是全局最小值。\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5a177b38",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:06:37.643725Z",
"iopub.status.busy": "2023-08-18T07:06:37.643028Z",
"iopub.status.idle": "2023-08-18T07:06:37.807002Z",
"shell.execute_reply": "2023-08-18T07:06:37.806156Z"
},
"origin_pos": 9,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"236.740625pt\" height=\"180.65625pt\" viewBox=\"0 0 236.740625 180.65625\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
" <metadata>\n",
" <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n",
" <cc:Work>\n",
" <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n",
" <dc:date>2023-08-18T07:06:37.774366</dc:date>\n",
" <dc:format>image/svg+xml</dc:format>\n",
" <dc:creator>\n",
" <cc:Agent>\n",
" <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>\n",
" </cc:Agent>\n",
" </dc:creator>\n",
" </cc:Work>\n",
" </rdf:RDF>\n",
" </metadata>\n",
" <defs>\n",
" <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n",
" </defs>\n",
" <g id=\"figure_1\">\n",
" <g id=\"patch_1\">\n",
" <path d=\"M 0 180.65625 \n",
"L 236.740625 180.65625 \n",
"L 236.740625 0 \n",
"L 0 0 \n",
"L 0 180.65625 \n",
"z\n",
"\" style=\"fill: none\"/>\n",
" </g>\n",
" <g id=\"axes_1\">\n",
" <g id=\"patch_2\">\n",
" <path d=\"M 34.240625 143.1 \n",
"L 229.540625 143.1 \n",
"L 229.540625 7.2 \n",
"L 34.240625 7.2 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"matplotlib.axis_1\">\n",
" <g id=\"xtick_1\">\n",
" <g id=\"line2d_1\">\n",
" <path d=\"M 43.117898 143.1 \n",
"L 43.117898 7.2 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_2\">\n",
" <defs>\n",
" <path id=\"m8b06a93cd7\" d=\"M 0 0 \n",
"L 0 3.5 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#m8b06a93cd7\" x=\"43.117898\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_1\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(35.746804 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-2212\" d=\"M 678 2272 \n",
"L 4684 2272 \n",
"L 4684 1741 \n",
"L 678 1741 \n",
"L 678 2272 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-32\" d=\"M 1228 531 \n",
"L 3431 531 \n",
"L 3431 0 \n",
"L 469 0 \n",
"L 469 531 \n",
"Q 828 903 1448 1529 \n",
"Q 2069 2156 2228 2338 \n",
"Q 2531 2678 2651 2914 \n",
"Q 2772 3150 2772 3378 \n",
"Q 2772 3750 2511 3984 \n",
"Q 2250 4219 1831 4219 \n",
"Q 1534 4219 1204 4116 \n",
"Q 875 4013 500 3803 \n",
"L 500 4441 \n",
"Q 881 4594 1212 4672 \n",
"Q 1544 4750 1819 4750 \n",
"Q 2544 4750 2975 4387 \n",
"Q 3406 4025 3406 3419 \n",
"Q 3406 3131 3298 2873 \n",
"Q 3191 2616 2906 2266 \n",
"Q 2828 2175 2409 1742 \n",
"Q 1991 1309 1228 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
" <use xlink:href=\"#DejaVuSans-32\" x=\"83.789062\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_2\">\n",
" <g id=\"line2d_3\">\n",
" <path d=\"M 87.615505 143.1 \n",
"L 87.615505 7.2 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_4\">\n",
" <g>\n",
" <use xlink:href=\"#m8b06a93cd7\" x=\"87.615505\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_2\">\n",
" <!-- 1 -->\n",
" <g transform=\"translate(80.244412 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-31\" d=\"M 794 531 \n",
"L 1825 531 \n",
"L 1825 4091 \n",
"L 703 3866 \n",
"L 703 4441 \n",
"L 1819 4666 \n",
"L 2450 4666 \n",
"L 2450 531 \n",
"L 3481 531 \n",
"L 3481 0 \n",
"L 794 0 \n",
"L 794 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
" <use xlink:href=\"#DejaVuSans-31\" x=\"83.789062\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_3\">\n",
" <g id=\"line2d_5\">\n",
" <path d=\"M 132.113113 143.1 \n",
"L 132.113113 7.2 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_6\">\n",
" <g>\n",
" <use xlink:href=\"#m8b06a93cd7\" x=\"132.113113\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_3\">\n",
" <!-- 0 -->\n",
" <g transform=\"translate(128.931863 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-30\" d=\"M 2034 4250 \n",
"Q 1547 4250 1301 3770 \n",
"Q 1056 3291 1056 2328 \n",
"Q 1056 1369 1301 889 \n",
"Q 1547 409 2034 409 \n",
"Q 2525 409 2770 889 \n",
"Q 3016 1369 3016 2328 \n",
"Q 3016 3291 2770 3770 \n",
"Q 2525 4250 2034 4250 \n",
"z\n",
"M 2034 4750 \n",
"Q 2819 4750 3233 4129 \n",
"Q 3647 3509 3647 2328 \n",
"Q 3647 1150 3233 529 \n",
"Q 2819 -91 2034 -91 \n",
"Q 1250 -91 836 529 \n",
"Q 422 1150 422 2328 \n",
"Q 422 3509 836 4129 \n",
"Q 1250 4750 2034 4750 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_4\">\n",
" <g id=\"line2d_7\">\n",
" <path d=\"M 176.61072 143.1 \n",
"L 176.61072 7.2 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_8\">\n",
" <g>\n",
" <use xlink:href=\"#m8b06a93cd7\" x=\"176.61072\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_4\">\n",
" <!-- 1 -->\n",
" <g transform=\"translate(173.42947 157.698438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_5\">\n",
" <g id=\"line2d_9\">\n",
" <path d=\"M 221.108328 143.1 \n",
"L 221.108328 7.2 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_10\">\n",
" <g>\n",
" <use xlink:href=\"#m8b06a93cd7\" x=\"221.108328\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_5\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(217.927078 157.698438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_6\">\n",
" <!-- x -->\n",
" <g transform=\"translate(128.93125 171.376563)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-78\" d=\"M 3513 3500 \n",
"L 2247 1797 \n",
"L 3578 0 \n",
"L 2900 0 \n",
"L 1881 1375 \n",
"L 863 0 \n",
"L 184 0 \n",
"L 1544 1831 \n",
"L 300 3500 \n",
"L 978 3500 \n",
"L 1906 2253 \n",
"L 2834 3500 \n",
"L 3513 3500 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-78\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"matplotlib.axis_2\">\n",
" <g id=\"ytick_1\">\n",
" <g id=\"line2d_11\">\n",
" <path d=\"M 34.240625 136.922727 \n",
"L 229.540625 136.922727 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_12\">\n",
" <defs>\n",
" <path id=\"m15d685032b\" d=\"M 0 0 \n",
"L -3.5 0 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#m15d685032b\" x=\"34.240625\" y=\"136.922727\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_7\">\n",
" <!-- 0 -->\n",
" <g transform=\"translate(20.878125 140.721946)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_2\">\n",
" <g id=\"line2d_13\">\n",
" <path d=\"M 34.240625 109.468182 \n",
"L 229.540625 109.468182 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_14\">\n",
" <g>\n",
" <use xlink:href=\"#m15d685032b\" x=\"34.240625\" y=\"109.468182\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_8\">\n",
" <!-- 2 -->\n",
" <g transform=\"translate(20.878125 113.267401)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_3\">\n",
" <g id=\"line2d_15\">\n",
" <path d=\"M 34.240625 82.013636 \n",
"L 229.540625 82.013636 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_16\">\n",
" <g>\n",
" <use xlink:href=\"#m15d685032b\" x=\"34.240625\" y=\"82.013636\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_9\">\n",
" <!-- 4 -->\n",
" <g transform=\"translate(20.878125 85.812855)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-34\" d=\"M 2419 4116 \n",
"L 825 1625 \n",
"L 2419 1625 \n",
"L 2419 4116 \n",
"z\n",
"M 2253 4666 \n",
"L 3047 4666 \n",
"L 3047 1625 \n",
"L 3713 1625 \n",
"L 3713 1100 \n",
"L 3047 1100 \n",
"L 3047 0 \n",
"L 2419 0 \n",
"L 2419 1100 \n",
"L 313 1100 \n",
"L 313 1709 \n",
"L 2253 4666 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-34\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_4\">\n",
" <g id=\"line2d_17\">\n",
" <path d=\"M 34.240625 54.559091 \n",
"L 229.540625 54.559091 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_18\">\n",
" <g>\n",
" <use xlink:href=\"#m15d685032b\" x=\"34.240625\" y=\"54.559091\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_10\">\n",
" <!-- 6 -->\n",
" <g transform=\"translate(20.878125 58.35831)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-36\" d=\"M 2113 2584 \n",
"Q 1688 2584 1439 2293 \n",
"Q 1191 2003 1191 1497 \n",
"Q 1191 994 1439 701 \n",
"Q 1688 409 2113 409 \n",
"Q 2538 409 2786 701 \n",
"Q 3034 994 3034 1497 \n",
"Q 3034 2003 2786 2293 \n",
"Q 2538 2584 2113 2584 \n",
"z\n",
"M 3366 4563 \n",
"L 3366 3988 \n",
"Q 3128 4100 2886 4159 \n",
"Q 2644 4219 2406 4219 \n",
"Q 1781 4219 1451 3797 \n",
"Q 1122 3375 1075 2522 \n",
"Q 1259 2794 1537 2939 \n",
"Q 1816 3084 2150 3084 \n",
"Q 2853 3084 3261 2657 \n",
"Q 3669 2231 3669 1497 \n",
"Q 3669 778 3244 343 \n",
"Q 2819 -91 2113 -91 \n",
"Q 1303 -91 875 529 \n",
"Q 447 1150 447 2328 \n",
"Q 447 3434 972 4092 \n",
"Q 1497 4750 2381 4750 \n",
"Q 2619 4750 2861 4703 \n",
"Q 3103 4656 3366 4563 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-36\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_5\">\n",
" <g id=\"line2d_19\">\n",
" <path d=\"M 34.240625 27.104545 \n",
"L 229.540625 27.104545 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_20\">\n",
" <g>\n",
" <use xlink:href=\"#m15d685032b\" x=\"34.240625\" y=\"27.104545\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_11\">\n",
" <!-- 8 -->\n",
" <g transform=\"translate(20.878125 30.903764)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-38\" d=\"M 2034 2216 \n",
"Q 1584 2216 1326 1975 \n",
"Q 1069 1734 1069 1313 \n",
"Q 1069 891 1326 650 \n",
"Q 1584 409 2034 409 \n",
"Q 2484 409 2743 651 \n",
"Q 3003 894 3003 1313 \n",
"Q 3003 1734 2745 1975 \n",
"Q 2488 2216 2034 2216 \n",
"z\n",
"M 1403 2484 \n",
"Q 997 2584 770 2862 \n",
"Q 544 3141 544 3541 \n",
"Q 544 4100 942 4425 \n",
"Q 1341 4750 2034 4750 \n",
"Q 2731 4750 3128 4425 \n",
"Q 3525 4100 3525 3541 \n",
"Q 3525 3141 3298 2862 \n",
"Q 3072 2584 2669 2484 \n",
"Q 3125 2378 3379 2068 \n",
"Q 3634 1759 3634 1313 \n",
"Q 3634 634 3220 271 \n",
"Q 2806 -91 2034 -91 \n",
"Q 1263 -91 848 271 \n",
"Q 434 634 434 1313 \n",
"Q 434 1759 690 2068 \n",
"Q 947 2378 1403 2484 \n",
"z\n",
"M 1172 3481 \n",
"Q 1172 3119 1398 2916 \n",
"Q 1625 2713 2034 2713 \n",
"Q 2441 2713 2670 2916 \n",
"Q 2900 3119 2900 3481 \n",
"Q 2900 3844 2670 4047 \n",
"Q 2441 4250 2034 4250 \n",
"Q 1625 4250 1398 4047 \n",
"Q 1172 3844 1172 3481 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-38\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_12\">\n",
" <!-- f(x) -->\n",
" <g transform=\"translate(14.798438 83.771094)rotate(-90)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-66\" d=\"M 2375 4863 \n",
"L 2375 4384 \n",
"L 1825 4384 \n",
"Q 1516 4384 1395 4259 \n",
"Q 1275 4134 1275 3809 \n",
"L 1275 3500 \n",
"L 2222 3500 \n",
"L 2222 3053 \n",
"L 1275 3053 \n",
"L 1275 0 \n",
"L 697 0 \n",
"L 697 3053 \n",
"L 147 3053 \n",
"L 147 3500 \n",
"L 697 3500 \n",
"L 697 3744 \n",
"Q 697 4328 969 4595 \n",
"Q 1241 4863 1831 4863 \n",
"L 2375 4863 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-28\" d=\"M 1984 4856 \n",
"Q 1566 4138 1362 3434 \n",
"Q 1159 2731 1159 2009 \n",
"Q 1159 1288 1364 580 \n",
"Q 1569 -128 1984 -844 \n",
"L 1484 -844 \n",
"Q 1016 -109 783 600 \n",
"Q 550 1309 550 2009 \n",
"Q 550 2706 781 3412 \n",
"Q 1013 4119 1484 4856 \n",
"L 1984 4856 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-29\" d=\"M 513 4856 \n",
"L 1013 4856 \n",
"Q 1481 4119 1714 3412 \n",
"Q 1947 2706 1947 2009 \n",
"Q 1947 1309 1714 600 \n",
"Q 1481 -109 1013 -844 \n",
"L 513 -844 \n",
"Q 928 -128 1133 580 \n",
"Q 1338 1288 1338 2009 \n",
"Q 1338 2731 1133 3434 \n",
"Q 928 4138 513 4856 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-66\"/>\n",
" <use xlink:href=\"#DejaVuSans-28\" x=\"35.205078\"/>\n",
" <use xlink:href=\"#DejaVuSans-78\" x=\"74.21875\"/>\n",
" <use xlink:href=\"#DejaVuSans-29\" x=\"133.398438\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_21\">\n",
" <path d=\"M 43.117898 13.377273 \n",
"L 48.902586 23.852563 \n",
"L 54.687275 33.863857 \n",
"L 60.471969 43.41118 \n",
"L 65.811677 51.812264 \n",
"L 71.15139 59.818016 \n",
"L 76.491103 67.428409 \n",
"L 81.830817 74.643456 \n",
"L 87.17053 81.463171 \n",
"L 92.065267 87.367274 \n",
"L 96.960002 92.939174 \n",
"L 101.854739 98.178868 \n",
"L 106.749477 103.086376 \n",
"L 111.644214 107.661671 \n",
"L 116.093973 111.532763 \n",
"L 120.543735 115.129309 \n",
"L 124.993496 118.451311 \n",
"L 129.443256 121.498765 \n",
"L 133.893017 124.271673 \n",
"L 138.342778 126.770036 \n",
"L 142.792538 128.993855 \n",
"L 146.797323 130.760554 \n",
"L 150.802109 132.304873 \n",
"L 154.806892 133.626809 \n",
"L 158.811678 134.726364 \n",
"L 162.816462 135.603536 \n",
"L 166.821248 136.258327 \n",
"L 170.826032 136.690736 \n",
"L 174.830815 136.900764 \n",
"L 178.835599 136.888409 \n",
"L 182.840385 136.653673 \n",
"L 186.845171 136.196554 \n",
"L 190.849952 135.517055 \n",
"L 194.854738 134.615173 \n",
"L 198.859524 133.490909 \n",
"L 202.86431 132.144263 \n",
"L 206.869091 130.575237 \n",
"L 210.873877 128.783828 \n",
"L 214.878664 126.770036 \n",
"L 219.32842 124.271675 \n",
"L 220.663352 123.468627 \n",
"L 220.663352 123.468627 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_22\">\n",
" <path d=\"M 65.366702 51.127273 \n",
"L 176.61072 136.922727 \n",
"\" clip-path=\"url(#p794e87d229)\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"patch_3\">\n",
" <path d=\"M 34.240625 143.1 \n",
"L 34.240625 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_4\">\n",
" <path d=\"M 229.540625 143.1 \n",
"L 229.540625 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_5\">\n",
" <path d=\"M 34.240625 143.1 \n",
"L 229.540625 143.1 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_6\">\n",
" <path d=\"M 34.240625 7.2 \n",
"L 229.540625 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <defs>\n",
" <clipPath id=\"p794e87d229\">\n",
" <rect x=\"34.240625\" y=\"7.2\" width=\"195.3\" height=\"135.9\"/>\n",
" </clipPath>\n",
" </defs>\n",
"</svg>\n"
],
"text/plain": [
"<Figure size 252x180 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"f = lambda x: (x - 1) ** 2\n",
"d2l.set_figsize()\n",
"d2l.plot([x, segment], [f(x), f(segment)], 'x', 'f(x)')"
]
},
{
"cell_type": "markdown",
"id": "1bf2b6ed",
"metadata": {
"origin_pos": 10
},
"source": [
"凸函数的局部极小值同时也是全局极小值这一性质是很方便的。\n",
"这意味着如果我们最小化函数,我们就不会“卡住”。\n",
"但是请注意,这并不意味着不能有多个全局最小值,或者可能不存在一个全局最小值。\n",
"例如,函数$f(x) = \\mathrm{max}(|x|-1, 0)$在$[-1,1]$区间上都是最小值。\n",
"相反,函数$f(x) = \\exp(x)$在$\\mathbb{R}$上没有取得最小值。对于$x \\to -\\infty$,它趋近于$0$,但是没有$f(x) = 0$的$x$。\n",
"\n",
"### 凸函数的下水平集是凸的\n",
"\n",
"我们可以方便地通过凸函数的*下水平集*(below sets)定义凸集。\n",
"具体来说,给定一个定义在凸集$\\mathcal{X}$上的凸函数$f$,其任意一个下水平集\n",
"\n",
"$$\\mathcal{S}_b := \\{x | x \\in \\mathcal{X} \\text{ and } f(x) \\leq b\\}$$\n",
"\n",
"是凸的。\n",
"\n",
"让我们快速证明一下。\n",
"对于任何$x, x' \\in \\mathcal{S}_b$,我们需要证明:当$\\lambda \\in [0, 1]$时,$\\lambda x + (1-\\lambda) x' \\in \\mathcal{S}_b$。\n",
"因为$f(x) \\leq b$且$f(x') \\leq b$,所以\n",
"\n",
"$$f(\\lambda x + (1-\\lambda) x') \\leq \\lambda f(x) + (1-\\lambda) f(x') \\leq b.$$\n",
"\n",
"### 凸性和二阶导数\n",
"\n",
"当一个函数的二阶导数$f: \\mathbb{R}^n \\rightarrow \\mathbb{R}$存在时,我们很容易检查这个函数的凸性。\n",
"我们需要做的就是检查$\\nabla^2f \\succeq 0$\n",
"即对于所有$\\mathbf{x} \\in \\mathbb{R}^n$$\\mathbf{x}^\\top \\mathbf{H} \\mathbf{x} \\geq 0$.\n",
"例如,函数$f(\\mathbf{x}) = \\frac{1}{2} \\|\\mathbf{x}\\|^2$是凸的,因为$\\nabla^2 f = \\mathbf{1}$,即其导数是单位矩阵。\n",
"\n",
"更正式地讲,$f$为凸函数,当且仅当任意二次可微一维函数$f: \\mathbb{R}^n \\rightarrow \\mathbb{R}$是凸的。\n",
"对于任意二次可微多维函数$f: \\mathbb{R}^{n} \\rightarrow \\mathbb{R}$\n",
"它是凸的当且仅当它的Hessian$\\nabla^2f\\succeq 0$。\n",
"\n",
"首先,我们来证明一下一维情况。\n",
"为了证明凸函数的$f''(x) \\geq 0$,我们使用:\n",
"\n",
"$$\\frac{1}{2} f(x + \\epsilon) + \\frac{1}{2} f(x - \\epsilon) \\geq f\\left(\\frac{x + \\epsilon}{2} + \\frac{x - \\epsilon}{2}\\right) = f(x).$$\n",
"\n",
"因为二阶导数是由有限差分的极限给出的,所以遵循\n",
"\n",
"$$f''(x) = \\lim_{\\epsilon \\to 0} \\frac{f(x+\\epsilon) + f(x - \\epsilon) - 2f(x)}{\\epsilon^2} \\geq 0.$$\n",
"\n",
"为了证明$f'' \\geq 0$可以推导$f$是凸的,\n",
"我们使用这样一个事实:$f'' \\geq 0$意味着$f'$是一个单调的非递减函数。\n",
"假设$a < x < b$是$\\mathbb{R}$中的三个点,\n",
"其中,$x = (1-\\lambda)a + \\lambda b$且$\\lambda \\in (0, 1)$.\n",
"根据中值定理,存在$\\alpha \\in [a, x]$$\\beta \\in [x, b]$,使得\n",
"$$f'(\\alpha) = \\frac{f(x) - f(a)}{x-a} \\text{ 且 } f'(\\beta) = \\frac{f(b) - f(x)}{b-x}.$$\n",
"\n",
"通过单调性$f'(\\beta) \\geq f'(\\alpha)$,因此\n",
"\n",
"$$\\frac{x-a}{b-a}f(b) + \\frac{b-x}{b-a}f(a) \\geq f(x).$$\n",
"\n",
"由于$x = (1-\\lambda)a + \\lambda b$,所以\n",
"\n",
"$$\\lambda f(b) + (1-\\lambda)f(a) \\geq f((1-\\lambda)a + \\lambda b),$$\n",
"\n",
"从而证明了凸性。\n",
"\n",
"第二,我们需要一个引理证明多维情况:\n",
"$f: \\mathbb{R}^n \\rightarrow \\mathbb{R}$\n",
"是凸的当且仅当对于所有$\\mathbf{x}, \\mathbf{y} \\in \\mathbb{R}^n$\n",
"\n",
"$$g(z) \\stackrel{\\mathrm{def}}{=} f(z \\mathbf{x} + (1-z) \\mathbf{y}) \\text{ where } z \\in [0,1]$$ \n",
"\n",
"是凸的。\n",
"\n",
"为了证明$f$的凸性意味着$g$是凸的,\n",
"我们可以证明,对于所有的$ab\\lambda \\in[01]$(这样有$0 \\leq \\lambda a + (1-\\lambda) b \\leq 1$),\n",
"\n",
"$$\\begin{aligned} &g(\\lambda a + (1-\\lambda) b)\\\\\n",
"=&f\\left(\\left(\\lambda a + (1-\\lambda) b\\right)\\mathbf{x} + \\left(1-\\lambda a - (1-\\lambda) b\\right)\\mathbf{y} \\right)\\\\\n",
"=&f\\left(\\lambda \\left(a \\mathbf{x} + (1-a) \\mathbf{y}\\right) + (1-\\lambda) \\left(b \\mathbf{x} + (1-b) \\mathbf{y}\\right) \\right)\\\\\n",
"\\leq& \\lambda f\\left(a \\mathbf{x} + (1-a) \\mathbf{y}\\right) + (1-\\lambda) f\\left(b \\mathbf{x} + (1-b) \\mathbf{y}\\right) \\\\\n",
"=& \\lambda g(a) + (1-\\lambda) g(b).\n",
"\\end{aligned}$$\n",
"\n",
"为了证明这一点,我们可以证明对\n",
"$[01]$中所有的$\\lambda$\n",
"\n",
"$$\\begin{aligned} &f(\\lambda \\mathbf{x} + (1-\\lambda) \\mathbf{y})\\\\\n",
"=&g(\\lambda \\cdot 1 + (1-\\lambda) \\cdot 0)\\\\\n",
"\\leq& \\lambda g(1) + (1-\\lambda) g(0) \\\\\n",
"=& \\lambda f(\\mathbf{x}) + (1-\\lambda) f(\\mathbf{y}).\n",
"\\end{aligned}$$\n",
"\n",
"最后,利用上面的引理和一维情况的结果,我们可以证明多维情况:\n",
"多维函数$f:\\mathbb{R}^n\\rightarrow\\mathbb{R}$是凸函数,当且仅当$g(z) \\stackrel{\\mathrm{def}}{=} f(z \\mathbf{x} + (1-z) \\mathbf{y})$是凸的,这里$z \\in [0,1]$$\\mathbf{x}, \\mathbf{y} \\in \\mathbb{R}^n$。\n",
"根据一维情况,\n",
"此条成立的条件为,当且仅当对于所有$\\mathbf{x}, \\mathbf{y} \\in \\mathbb{R}^n$\n",
"$g'' = (\\mathbf{x} - \\mathbf{y})^\\top \\mathbf{H}(\\mathbf{x} - \\mathbf{y}) \\geq 0$$\\mathbf{H} \\stackrel{\\mathrm{def}}{=} \\nabla^2f$)。\n",
"这相当于根据半正定矩阵的定义,$\\mathbf{H} \\succeq 0$。\n",
"\n",
"## 约束\n",
"\n",
"凸优化的一个很好的特性是能够让我们有效地处理*约束*(constraints)。\n",
"即它使我们能够解决以下形式的*约束优化*constrained optimization)问题:\n",
"\n",
"$$\\begin{aligned} \\mathop{\\mathrm{minimize~}}_{\\mathbf{x}} & f(\\mathbf{x}) \\\\\n",
" \\text{ subject to } & c_i(\\mathbf{x}) \\leq 0 \\text{ for all } i \\in \\{1, \\ldots, N\\}.\n",
"\\end{aligned}$$\n",
"\n",
"这里$f$是目标函数,$c_i$是约束函数。\n",
"例如第一个约束$c_1(\\mathbf{x}) = \\|\\mathbf{x}\\|_2 - 1$,则参数$\\mathbf{x}$被限制为单位球。\n",
"如果第二个约束$c_2(\\mathbf{x}) = \\mathbf{v}^\\top \\mathbf{x} + b$,那么这对应于半空间上所有的$\\mathbf{x}$。\n",
"同时满足这两个约束等于选择一个球的切片作为约束集。\n",
"\n",
"### 拉格朗日函数\n",
"\n",
"通常,求解一个有约束的优化问题是困难的,解决这个问题的一种方法来自物理中相当简单的直觉。\n",
"想象一个球在一个盒子里,球会滚到最低的地方,重力将与盒子两侧对球施加的力平衡。\n",
"简而言之,目标函数(即重力)的梯度将被约束函数的梯度所抵消(由于墙壁的“推回”作用,需要保持在盒子内)。\n",
"请注意,任何不起作用的约束(即球不接触壁)都将无法对球施加任何力。\n",
"\n",
"这里我们简略拉格朗日函数$L$的推导,上述推理可以通过以下鞍点优化问题来表示:\n",
"\n",
"$$L(\\mathbf{x}, \\alpha_1, \\ldots, \\alpha_n) = f(\\mathbf{x}) + \\sum_{i=1}^n \\alpha_i c_i(\\mathbf{x}) \\text{ where } \\alpha_i \\geq 0.$$\n",
"\n",
"这里的变量$\\alpha_i$$i=1,\\ldots,n$)是所谓的*拉格朗日乘数*Lagrange multipliers),它确保约束被正确地执行。\n",
"选择它们的大小足以确保所有$i$的$c_i(\\mathbf{x}) \\leq 0$。\n",
"例如,对于$c_i(\\mathbf{x}) < 0$中任意$\\mathbf{x}$,我们最终会选择$\\alpha_i = 0$。\n",
"此外,这是一个*鞍点*saddlepoint)优化问题。\n",
"在这个问题中,我们想要使$L$相对于$\\alpha_i$*最大化*maximize),同时使它相对于$\\mathbf{x}$*最小化*minimize)。\n",
"有大量的文献解释如何得出函数$L(\\mathbf{x}, \\alpha_1, \\ldots, \\alpha_n)$。\n",
"我们这里只需要知道$L$的鞍点是原始约束优化问题的最优解就足够了。\n",
"\n",
"### 惩罚\n",
"\n",
"一种至少近似地满足约束优化问题的方法是采用拉格朗日函数$L$。除了满足$c_i(\\mathbf{x}) \\leq 0$之外,我们只需将$\\alpha_i c_i(\\mathbf{x})$添加到目标函数$f(x)$。\n",
"这样可以确保不会严重违反约束。\n",
"\n",
"事实上,我们一直在使用这个技巧。\n",
"比如权重衰减 :numref:`sec_weight_decay`,在目标函数中加入$\\frac{\\lambda}{2} |\\mathbf{w}|^2$,以确保$\\mathbf{w}$不会增长太大。\n",
"使用约束优化的观点,我们可以看到,对于若干半径$r$,这将确保$|\\mathbf{w}|^2 - r^2 \\leq 0$。\n",
"通过调整$\\lambda$的值,我们可以改变$\\mathbf{w}$的大小。\n",
"\n",
"通常,添加惩罚是确保近似满足约束的一种好方法。\n",
"在实践中,这被证明比精确的满意度更可靠。\n",
"此外,对于非凸问题,许多使精确方法在凸情况下的性质(例如,可求最优解)不再成立。\n",
"\n",
"### 投影\n",
"\n",
"满足约束条件的另一种策略是*投影*projections)。\n",
"同样,我们之前也遇到过,例如在 :numref:`sec_rnn_scratch`中处理梯度截断时,我们通过\n",
"\n",
"$$\\mathbf{g} \\leftarrow \\mathbf{g} \\cdot \\mathrm{min}(1, \\theta/\\|\\mathbf{g}\\|),$$\n",
"\n",
"确保梯度的长度以$\\theta$为界限。\n",
"\n",
"这就是$\\mathbf{g}$在半径为$\\theta$的球上的*投影*projection)。\n",
"更泛化地说,在凸集$\\mathcal{X}$上的投影被定义为\n",
"\n",
"$$\\mathrm{Proj}_\\mathcal{X}(\\mathbf{x}) = \\mathop{\\mathrm{argmin}}_{\\mathbf{x}' \\in \\mathcal{X}} \\|\\mathbf{x} - \\mathbf{x}'\\|.$$\n",
"\n",
"它是$\\mathcal{X}$中离$\\mathbf{X}$最近的点。\n",
"\n",
"![Convex Projections.](../img/projections.svg)\n",
":label:`fig_projections`\n",
"\n",
"投影的数学定义听起来可能有点抽象,为了解释得更清楚一些,请看 :numref:`fig_projections`。\n",
"图中有两个凸集,一个圆和一个菱形。\n",
"两个集合内的点(黄色)在投影期间保持不变。\n",
"两个集合(黑色)之外的点投影到集合中接近原始点(黑色)的点(红色)。\n",
"虽然对$L_2$的球面来说,方向保持不变,但一般情况下不需要这样。\n",
"\n",
"凸投影的一个用途是计算稀疏权重向量。\n",
"在本例中,我们将权重向量投影到一个$L_1$的球上,\n",
"这是 :numref:`fig_projections`中菱形例子的一个广义版本。\n",
"\n",
"## 小结\n",
"\n",
"在深度学习的背景下,凸函数的主要目的是帮助我们详细了解优化算法。\n",
"我们由此得出梯度下降法和随机梯度下降法是如何相应推导出来的。\n",
"\n",
"* 凸集的交点是凸的,并集不是。\n",
"* 根据詹森不等式,“一个多变量凸函数的总期望值”大于或等于“用每个变量的期望值计算这个函数的总值“。\n",
"* 一个二次可微函数是凸函数,当且仅当其Hessian(二阶导数矩阵)是半正定的。\n",
"* 凸约束可以通过拉格朗日函数来添加。在实践中,只需在目标函数中加上一个惩罚就可以了。\n",
"* 投影映射到凸集中最接近原始点的点。\n",
"\n",
"## 练习 \n",
"\n",
"1. 假设我们想要通过绘制集合内点之间的所有直线并检查这些直线是否包含来验证集合的凸性。i.证明只检查边界上的点是充分的。ii.证明只检查集合的顶点是充分的。\n",
"\n",
"2. 用$p$-范数表示半径为$r$的球,证明$\\mathcal{B}_p[r] := \\{\\mathbf{x} | \\mathbf{x} \\in \\mathbb{R}^d \\text{ and } \\|\\mathbf{x}\\|_p \\leq r\\}$$\\mathcal{B}_p[r]$对于所有$p \\geq 1$是凸的。\n",
"\n",
"3. 已知凸函数$f$和$g$表明$\\mathrm{max}(f, g)$也是凸函数。证明$\\mathrm{min}(f, g)$是非凸的。\n",
"\n",
"4. 证明Softmax函数的规范化是凸的,即$f(x) = \\log \\sum_i \\exp(x_i)$的凸性。\n",
"\n",
"5. 证明线性子空间$\\mathcal{X} = \\{\\mathbf{x} | \\mathbf{W} \\mathbf{x} = \\mathbf{b}\\}$是凸集。\n",
"\n",
"6. 证明在线性子空间$\\mathbf{b} = \\mathbf{0}$的情况下,对于矩阵$\\mathbf{M}$的投影$\\mathrm {Proj} \\mathcal{X}$可以写成$\\mathbf{M} \\mathbf{X}$。\n",
"\n",
"7. 证明对于凸二次可微函数$f$,对于$\\xi \\in [0, \\epsilon]$,我们可以写成$f(x + \\epsilon) = f(x) + \\epsilon f'(x) + \\frac{1}{2} \\epsilon^2 f''(x + \\xi)$。\n",
"\n",
"8. 给定一个凸集$\\mathcal{X}$和两个向量$\\mathbf{x}$和$\\mathbf{y}$证明了投影不会增加距离,即$\\|\\mathbf{x} - \\mathbf{y}\\| \\geq \\|\\mathrm{Proj}_\\mathcal{X}(\\mathbf{x}) - \\mathrm{Proj}_\\mathcal{X}(\\mathbf{y})\\|$。\n"
]
},
{
"cell_type": "markdown",
"id": "e5fb0bfb",
"metadata": {
"origin_pos": 12,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/3815)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": []
},
"nbformat": 4,
"nbformat_minor": 5
}