1304 lines
52 KiB
Plaintext
1304 lines
52 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b0211d46",
|
||
"metadata": {
|
||
"origin_pos": 0
|
||
},
|
||
"source": [
|
||
"# 数值稳定性和模型初始化\n",
|
||
":label:`sec_numerical_stability`\n",
|
||
"\n",
|
||
"到目前为止,我们实现的每个模型都是根据某个预先指定的分布来初始化模型的参数。\n",
|
||
"有人会认为初始化方案是理所当然的,忽略了如何做出这些选择的细节。甚至有人可能会觉得,初始化方案的选择并不是特别重要。\n",
|
||
"相反,初始化方案的选择在神经网络学习中起着举足轻重的作用,\n",
|
||
"它对保持数值稳定性至关重要。\n",
|
||
"此外,这些初始化方案的选择可以与非线性激活函数的选择有趣的结合在一起。\n",
|
||
"我们选择哪个函数以及如何初始化参数可以决定优化算法收敛的速度有多快。\n",
|
||
"糟糕选择可能会导致我们在训练时遇到梯度爆炸或梯度消失。\n",
|
||
"本节将更详细地探讨这些主题,并讨论一些有用的启发式方法。\n",
|
||
"这些启发式方法在整个深度学习生涯中都很有用。\n",
|
||
"\n",
|
||
"## 梯度消失和梯度爆炸\n",
|
||
"\n",
|
||
"考虑一个具有$L$层、输入$\\mathbf{x}$和输出$\\mathbf{o}$的深层网络。\n",
|
||
"每一层$l$由变换$f_l$定义,\n",
|
||
"该变换的参数为权重$\\mathbf{W}^{(l)}$,\n",
|
||
"其隐藏变量是$\\mathbf{h}^{(l)}$(令 $\\mathbf{h}^{(0)} = \\mathbf{x}$)。\n",
|
||
"我们的网络可以表示为:\n",
|
||
"\n",
|
||
"$$\\mathbf{h}^{(l)} = f_l (\\mathbf{h}^{(l-1)}) \\text{ 因此 } \\mathbf{o} = f_L \\circ \\ldots \\circ f_1(\\mathbf{x}).$$\n",
|
||
"\n",
|
||
"如果所有隐藏变量和输入都是向量,\n",
|
||
"我们可以将$\\mathbf{o}$关于任何一组参数$\\mathbf{W}^{(l)}$的梯度写为下式:\n",
|
||
"\n",
|
||
"$$\\partial_{\\mathbf{W}^{(l)}} \\mathbf{o} = \\underbrace{\\partial_{\\mathbf{h}^{(L-1)}} \\mathbf{h}^{(L)}}_{ \\mathbf{M}^{(L)} \\stackrel{\\mathrm{def}}{=}} \\cdot \\ldots \\cdot \\underbrace{\\partial_{\\mathbf{h}^{(l)}} \\mathbf{h}^{(l+1)}}_{ \\mathbf{M}^{(l+1)} \\stackrel{\\mathrm{def}}{=}} \\underbrace{\\partial_{\\mathbf{W}^{(l)}} \\mathbf{h}^{(l)}}_{ \\mathbf{v}^{(l)} \\stackrel{\\mathrm{def}}{=}}.$$\n",
|
||
"\n",
|
||
"换言之,该梯度是$L-l$个矩阵\n",
|
||
"$\\mathbf{M}^{(L)} \\cdot \\ldots \\cdot \\mathbf{M}^{(l+1)}$\n",
|
||
"与梯度向量 $\\mathbf{v}^{(l)}$的乘积。\n",
|
||
"因此,我们容易受到数值下溢问题的影响.\n",
|
||
"当将太多的概率乘在一起时,这些问题经常会出现。\n",
|
||
"在处理概率时,一个常见的技巧是切换到对数空间,\n",
|
||
"即将数值表示的压力从尾数转移到指数。\n",
|
||
"不幸的是,上面的问题更为严重:\n",
|
||
"最初,矩阵 $\\mathbf{M}^{(l)}$ 可能具有各种各样的特征值。\n",
|
||
"他们可能很小,也可能很大;\n",
|
||
"他们的乘积可能非常大,也可能非常小。\n",
|
||
"\n",
|
||
"不稳定梯度带来的风险不止在于数值表示;\n",
|
||
"不稳定梯度也威胁到我们优化算法的稳定性。\n",
|
||
"我们可能面临一些问题。\n",
|
||
"要么是*梯度爆炸*(gradient exploding)问题:\n",
|
||
"参数更新过大,破坏了模型的稳定收敛;\n",
|
||
"要么是*梯度消失*(gradient vanishing)问题:\n",
|
||
"参数更新过小,在每次更新时几乎不会移动,导致模型无法学习。\n",
|
||
"\n",
|
||
"### (**梯度消失**)\n",
|
||
"\n",
|
||
"曾经sigmoid函数$1/(1 + \\exp(-x))$( :numref:`sec_mlp`提到过)很流行,\n",
|
||
"因为它类似于阈值函数。\n",
|
||
"由于早期的人工神经网络受到生物神经网络的启发,\n",
|
||
"神经元要么完全激活要么完全不激活(就像生物神经元)的想法很有吸引力。\n",
|
||
"然而,它却是导致梯度消失问题的一个常见的原因,\n",
|
||
"让我们仔细看看sigmoid函数为什么会导致梯度消失。\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "4b473c67",
|
||
"metadata": {
|
||
"execution": {
|
||
"iopub.execute_input": "2023-08-18T07:03:31.292874Z",
|
||
"iopub.status.busy": "2023-08-18T07:03:31.292041Z",
|
||
"iopub.status.idle": "2023-08-18T07:03:34.470038Z",
|
||
"shell.execute_reply": "2023-08-18T07:03:34.469058Z"
|
||
},
|
||
"origin_pos": 2,
|
||
"tab": [
|
||
"pytorch"
|
||
]
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/svg+xml": [
|
||
"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
|
||
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
|
||
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
|
||
"<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"288.403125pt\" height=\"166.978125pt\" viewBox=\"0 0 288.403125 166.978125\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
|
||
" <metadata>\n",
|
||
" <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n",
|
||
" <cc:Work>\n",
|
||
" <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n",
|
||
" <dc:date>2023-08-18T07:03:34.416770</dc:date>\n",
|
||
" <dc:format>image/svg+xml</dc:format>\n",
|
||
" <dc:creator>\n",
|
||
" <cc:Agent>\n",
|
||
" <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>\n",
|
||
" </cc:Agent>\n",
|
||
" </dc:creator>\n",
|
||
" </cc:Work>\n",
|
||
" </rdf:RDF>\n",
|
||
" </metadata>\n",
|
||
" <defs>\n",
|
||
" <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n",
|
||
" </defs>\n",
|
||
" <g id=\"figure_1\">\n",
|
||
" <g id=\"patch_1\">\n",
|
||
" <path d=\"M 0 166.978125 \n",
|
||
"L 288.403125 166.978125 \n",
|
||
"L 288.403125 0 \n",
|
||
"L 0 0 \n",
|
||
"L 0 166.978125 \n",
|
||
"z\n",
|
||
"\" style=\"fill: none\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"axes_1\">\n",
|
||
" <g id=\"patch_2\">\n",
|
||
" <path d=\"M 30.103125 143.1 \n",
|
||
"L 281.203125 143.1 \n",
|
||
"L 281.203125 7.2 \n",
|
||
"L 30.103125 7.2 \n",
|
||
"z\n",
|
||
"\" style=\"fill: #ffffff\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"matplotlib.axis_1\">\n",
|
||
" <g id=\"xtick_1\">\n",
|
||
" <g id=\"line2d_1\">\n",
|
||
" <path d=\"M 48.695149 143.1 \n",
|
||
"L 48.695149 7.2 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_2\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"md1c6d888c7\" d=\"M 0 0 \n",
|
||
"L 0 3.5 \n",
|
||
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </defs>\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#md1c6d888c7\" x=\"48.695149\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_1\">\n",
|
||
" <!-- −7.5 -->\n",
|
||
" <g transform=\"translate(36.553743 157.698438)scale(0.1 -0.1)\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"DejaVuSans-2212\" d=\"M 678 2272 \n",
|
||
"L 4684 2272 \n",
|
||
"L 4684 1741 \n",
|
||
"L 678 1741 \n",
|
||
"L 678 2272 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-37\" d=\"M 525 4666 \n",
|
||
"L 3525 4666 \n",
|
||
"L 3525 4397 \n",
|
||
"L 1831 0 \n",
|
||
"L 1172 0 \n",
|
||
"L 2766 4134 \n",
|
||
"L 525 4134 \n",
|
||
"L 525 4666 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-2e\" d=\"M 684 794 \n",
|
||
"L 1344 794 \n",
|
||
"L 1344 0 \n",
|
||
"L 684 0 \n",
|
||
"L 684 794 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-35\" d=\"M 691 4666 \n",
|
||
"L 3169 4666 \n",
|
||
"L 3169 4134 \n",
|
||
"L 1269 4134 \n",
|
||
"L 1269 2991 \n",
|
||
"Q 1406 3038 1543 3061 \n",
|
||
"Q 1681 3084 1819 3084 \n",
|
||
"Q 2600 3084 3056 2656 \n",
|
||
"Q 3513 2228 3513 1497 \n",
|
||
"Q 3513 744 3044 326 \n",
|
||
"Q 2575 -91 1722 -91 \n",
|
||
"Q 1428 -91 1123 -41 \n",
|
||
"Q 819 9 494 109 \n",
|
||
"L 494 744 \n",
|
||
"Q 775 591 1075 516 \n",
|
||
"Q 1375 441 1709 441 \n",
|
||
"Q 2250 441 2565 725 \n",
|
||
"Q 2881 1009 2881 1497 \n",
|
||
"Q 2881 1984 2565 2268 \n",
|
||
"Q 2250 2553 1709 2553 \n",
|
||
"Q 1456 2553 1204 2497 \n",
|
||
"Q 953 2441 691 2322 \n",
|
||
"L 691 4666 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" </defs>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-37\" x=\"83.789062\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"147.412109\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-35\" x=\"179.199219\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"xtick_2\">\n",
|
||
" <g id=\"line2d_3\">\n",
|
||
" <path d=\"M 84.587088 143.1 \n",
|
||
"L 84.587088 7.2 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_4\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#md1c6d888c7\" x=\"84.587088\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_2\">\n",
|
||
" <!-- −5.0 -->\n",
|
||
" <g transform=\"translate(72.445682 157.698438)scale(0.1 -0.1)\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"DejaVuSans-30\" d=\"M 2034 4250 \n",
|
||
"Q 1547 4250 1301 3770 \n",
|
||
"Q 1056 3291 1056 2328 \n",
|
||
"Q 1056 1369 1301 889 \n",
|
||
"Q 1547 409 2034 409 \n",
|
||
"Q 2525 409 2770 889 \n",
|
||
"Q 3016 1369 3016 2328 \n",
|
||
"Q 3016 3291 2770 3770 \n",
|
||
"Q 2525 4250 2034 4250 \n",
|
||
"z\n",
|
||
"M 2034 4750 \n",
|
||
"Q 2819 4750 3233 4129 \n",
|
||
"Q 3647 3509 3647 2328 \n",
|
||
"Q 3647 1150 3233 529 \n",
|
||
"Q 2819 -91 2034 -91 \n",
|
||
"Q 1250 -91 836 529 \n",
|
||
"Q 422 1150 422 2328 \n",
|
||
"Q 422 3509 836 4129 \n",
|
||
"Q 1250 4750 2034 4750 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" </defs>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-35\" x=\"83.789062\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"147.412109\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\" x=\"179.199219\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"xtick_3\">\n",
|
||
" <g id=\"line2d_5\">\n",
|
||
" <path d=\"M 120.479027 143.1 \n",
|
||
"L 120.479027 7.2 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_6\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#md1c6d888c7\" x=\"120.479027\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_3\">\n",
|
||
" <!-- −2.5 -->\n",
|
||
" <g transform=\"translate(108.337621 157.698438)scale(0.1 -0.1)\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"DejaVuSans-32\" d=\"M 1228 531 \n",
|
||
"L 3431 531 \n",
|
||
"L 3431 0 \n",
|
||
"L 469 0 \n",
|
||
"L 469 531 \n",
|
||
"Q 828 903 1448 1529 \n",
|
||
"Q 2069 2156 2228 2338 \n",
|
||
"Q 2531 2678 2651 2914 \n",
|
||
"Q 2772 3150 2772 3378 \n",
|
||
"Q 2772 3750 2511 3984 \n",
|
||
"Q 2250 4219 1831 4219 \n",
|
||
"Q 1534 4219 1204 4116 \n",
|
||
"Q 875 4013 500 3803 \n",
|
||
"L 500 4441 \n",
|
||
"Q 881 4594 1212 4672 \n",
|
||
"Q 1544 4750 1819 4750 \n",
|
||
"Q 2544 4750 2975 4387 \n",
|
||
"Q 3406 4025 3406 3419 \n",
|
||
"Q 3406 3131 3298 2873 \n",
|
||
"Q 3191 2616 2906 2266 \n",
|
||
"Q 2828 2175 2409 1742 \n",
|
||
"Q 1991 1309 1228 531 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" </defs>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2212\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-32\" x=\"83.789062\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"147.412109\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-35\" x=\"179.199219\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"xtick_4\">\n",
|
||
" <g id=\"line2d_7\">\n",
|
||
" <path d=\"M 156.370967 143.1 \n",
|
||
"L 156.370967 7.2 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_8\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#md1c6d888c7\" x=\"156.370967\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_4\">\n",
|
||
" <!-- 0.0 -->\n",
|
||
" <g transform=\"translate(148.419404 157.698438)scale(0.1 -0.1)\">\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"xtick_5\">\n",
|
||
" <g id=\"line2d_9\">\n",
|
||
" <path d=\"M 192.262906 143.1 \n",
|
||
"L 192.262906 7.2 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_10\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#md1c6d888c7\" x=\"192.262906\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_5\">\n",
|
||
" <!-- 2.5 -->\n",
|
||
" <g transform=\"translate(184.311343 157.698438)scale(0.1 -0.1)\">\n",
|
||
" <use xlink:href=\"#DejaVuSans-32\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-35\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"xtick_6\">\n",
|
||
" <g id=\"line2d_11\">\n",
|
||
" <path d=\"M 228.154845 143.1 \n",
|
||
"L 228.154845 7.2 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_12\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#md1c6d888c7\" x=\"228.154845\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_6\">\n",
|
||
" <!-- 5.0 -->\n",
|
||
" <g transform=\"translate(220.203282 157.698438)scale(0.1 -0.1)\">\n",
|
||
" <use xlink:href=\"#DejaVuSans-35\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"xtick_7\">\n",
|
||
" <g id=\"line2d_13\">\n",
|
||
" <path d=\"M 264.046784 143.1 \n",
|
||
"L 264.046784 7.2 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_14\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#md1c6d888c7\" x=\"264.046784\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_7\">\n",
|
||
" <!-- 7.5 -->\n",
|
||
" <g transform=\"translate(256.095221 157.698438)scale(0.1 -0.1)\">\n",
|
||
" <use xlink:href=\"#DejaVuSans-37\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-35\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"matplotlib.axis_2\">\n",
|
||
" <g id=\"ytick_1\">\n",
|
||
" <g id=\"line2d_15\">\n",
|
||
" <path d=\"M 30.103125 136.964174 \n",
|
||
"L 281.203125 136.964174 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_16\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"m60922d617f\" d=\"M 0 0 \n",
|
||
"L -3.5 0 \n",
|
||
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </defs>\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#m60922d617f\" x=\"30.103125\" y=\"136.964174\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_8\">\n",
|
||
" <!-- 0.0 -->\n",
|
||
" <g transform=\"translate(7.2 140.763392)scale(0.1 -0.1)\">\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"ytick_2\">\n",
|
||
" <g id=\"line2d_17\">\n",
|
||
" <path d=\"M 30.103125 112.237629 \n",
|
||
"L 281.203125 112.237629 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_18\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#m60922d617f\" x=\"30.103125\" y=\"112.237629\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_9\">\n",
|
||
" <!-- 0.2 -->\n",
|
||
" <g transform=\"translate(7.2 116.036848)scale(0.1 -0.1)\">\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-32\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"ytick_3\">\n",
|
||
" <g id=\"line2d_19\">\n",
|
||
" <path d=\"M 30.103125 87.511085 \n",
|
||
"L 281.203125 87.511085 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_20\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#m60922d617f\" x=\"30.103125\" y=\"87.511085\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_10\">\n",
|
||
" <!-- 0.4 -->\n",
|
||
" <g transform=\"translate(7.2 91.310304)scale(0.1 -0.1)\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"DejaVuSans-34\" d=\"M 2419 4116 \n",
|
||
"L 825 1625 \n",
|
||
"L 2419 1625 \n",
|
||
"L 2419 4116 \n",
|
||
"z\n",
|
||
"M 2253 4666 \n",
|
||
"L 3047 4666 \n",
|
||
"L 3047 1625 \n",
|
||
"L 3713 1625 \n",
|
||
"L 3713 1100 \n",
|
||
"L 3047 1100 \n",
|
||
"L 3047 0 \n",
|
||
"L 2419 0 \n",
|
||
"L 2419 1100 \n",
|
||
"L 313 1100 \n",
|
||
"L 313 1709 \n",
|
||
"L 2253 4666 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" </defs>\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-34\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"ytick_4\">\n",
|
||
" <g id=\"line2d_21\">\n",
|
||
" <path d=\"M 30.103125 62.784541 \n",
|
||
"L 281.203125 62.784541 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_22\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#m60922d617f\" x=\"30.103125\" y=\"62.784541\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_11\">\n",
|
||
" <!-- 0.6 -->\n",
|
||
" <g transform=\"translate(7.2 66.583759)scale(0.1 -0.1)\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"DejaVuSans-36\" d=\"M 2113 2584 \n",
|
||
"Q 1688 2584 1439 2293 \n",
|
||
"Q 1191 2003 1191 1497 \n",
|
||
"Q 1191 994 1439 701 \n",
|
||
"Q 1688 409 2113 409 \n",
|
||
"Q 2538 409 2786 701 \n",
|
||
"Q 3034 994 3034 1497 \n",
|
||
"Q 3034 2003 2786 2293 \n",
|
||
"Q 2538 2584 2113 2584 \n",
|
||
"z\n",
|
||
"M 3366 4563 \n",
|
||
"L 3366 3988 \n",
|
||
"Q 3128 4100 2886 4159 \n",
|
||
"Q 2644 4219 2406 4219 \n",
|
||
"Q 1781 4219 1451 3797 \n",
|
||
"Q 1122 3375 1075 2522 \n",
|
||
"Q 1259 2794 1537 2939 \n",
|
||
"Q 1816 3084 2150 3084 \n",
|
||
"Q 2853 3084 3261 2657 \n",
|
||
"Q 3669 2231 3669 1497 \n",
|
||
"Q 3669 778 3244 343 \n",
|
||
"Q 2819 -91 2113 -91 \n",
|
||
"Q 1303 -91 875 529 \n",
|
||
"Q 447 1150 447 2328 \n",
|
||
"Q 447 3434 972 4092 \n",
|
||
"Q 1497 4750 2381 4750 \n",
|
||
"Q 2619 4750 2861 4703 \n",
|
||
"Q 3103 4656 3366 4563 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" </defs>\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-36\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"ytick_5\">\n",
|
||
" <g id=\"line2d_23\">\n",
|
||
" <path d=\"M 30.103125 38.057996 \n",
|
||
"L 281.203125 38.057996 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_24\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#m60922d617f\" x=\"30.103125\" y=\"38.057996\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_12\">\n",
|
||
" <!-- 0.8 -->\n",
|
||
" <g transform=\"translate(7.2 41.857215)scale(0.1 -0.1)\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"DejaVuSans-38\" d=\"M 2034 2216 \n",
|
||
"Q 1584 2216 1326 1975 \n",
|
||
"Q 1069 1734 1069 1313 \n",
|
||
"Q 1069 891 1326 650 \n",
|
||
"Q 1584 409 2034 409 \n",
|
||
"Q 2484 409 2743 651 \n",
|
||
"Q 3003 894 3003 1313 \n",
|
||
"Q 3003 1734 2745 1975 \n",
|
||
"Q 2488 2216 2034 2216 \n",
|
||
"z\n",
|
||
"M 1403 2484 \n",
|
||
"Q 997 2584 770 2862 \n",
|
||
"Q 544 3141 544 3541 \n",
|
||
"Q 544 4100 942 4425 \n",
|
||
"Q 1341 4750 2034 4750 \n",
|
||
"Q 2731 4750 3128 4425 \n",
|
||
"Q 3525 4100 3525 3541 \n",
|
||
"Q 3525 3141 3298 2862 \n",
|
||
"Q 3072 2584 2669 2484 \n",
|
||
"Q 3125 2378 3379 2068 \n",
|
||
"Q 3634 1759 3634 1313 \n",
|
||
"Q 3634 634 3220 271 \n",
|
||
"Q 2806 -91 2034 -91 \n",
|
||
"Q 1263 -91 848 271 \n",
|
||
"Q 434 634 434 1313 \n",
|
||
"Q 434 1759 690 2068 \n",
|
||
"Q 947 2378 1403 2484 \n",
|
||
"z\n",
|
||
"M 1172 3481 \n",
|
||
"Q 1172 3119 1398 2916 \n",
|
||
"Q 1625 2713 2034 2713 \n",
|
||
"Q 2441 2713 2670 2916 \n",
|
||
"Q 2900 3119 2900 3481 \n",
|
||
"Q 2900 3844 2670 4047 \n",
|
||
"Q 2441 4250 2034 4250 \n",
|
||
"Q 1625 4250 1398 4047 \n",
|
||
"Q 1172 3844 1172 3481 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" </defs>\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-38\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"ytick_6\">\n",
|
||
" <g id=\"line2d_25\">\n",
|
||
" <path d=\"M 30.103125 13.331452 \n",
|
||
"L 281.203125 13.331452 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_26\">\n",
|
||
" <g>\n",
|
||
" <use xlink:href=\"#m60922d617f\" x=\"30.103125\" y=\"13.331452\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_13\">\n",
|
||
" <!-- 1.0 -->\n",
|
||
" <g transform=\"translate(7.2 17.130671)scale(0.1 -0.1)\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"DejaVuSans-31\" d=\"M 794 531 \n",
|
||
"L 1825 531 \n",
|
||
"L 1825 4091 \n",
|
||
"L 703 3866 \n",
|
||
"L 703 4441 \n",
|
||
"L 1819 4666 \n",
|
||
"L 2450 4666 \n",
|
||
"L 2450 531 \n",
|
||
"L 3481 531 \n",
|
||
"L 3481 0 \n",
|
||
"L 794 0 \n",
|
||
"L 794 531 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" </defs>\n",
|
||
" <use xlink:href=\"#DejaVuSans-31\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-2e\" x=\"63.623047\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-30\" x=\"95.410156\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_27\">\n",
|
||
" <path d=\"M 41.516761 136.922713 \n",
|
||
"L 65.923277 136.737562 \n",
|
||
"L 77.4087 136.460971 \n",
|
||
"L 86.022764 136.050337 \n",
|
||
"L 93.201152 135.464702 \n",
|
||
"L 98.943864 134.74049 \n",
|
||
"L 103.250896 133.981285 \n",
|
||
"L 107.557928 132.971398 \n",
|
||
"L 110.429284 132.122009 \n",
|
||
"L 113.30064 131.100784 \n",
|
||
"L 116.171995 129.87703 \n",
|
||
"L 119.043348 128.416404 \n",
|
||
"L 121.914704 126.681307 \n",
|
||
"L 124.786059 124.63175 \n",
|
||
"L 127.657415 122.226792 \n",
|
||
"L 130.528769 119.426742 \n",
|
||
"L 133.400125 116.19615 \n",
|
||
"L 136.271481 112.50763 \n",
|
||
"L 139.142835 108.346262 \n",
|
||
"L 142.014191 103.714212 \n",
|
||
"L 144.885546 98.634876 \n",
|
||
"L 147.756901 93.155696 \n",
|
||
"L 152.063934 84.351342 \n",
|
||
"L 164.985032 57.139929 \n",
|
||
"L 167.856387 51.66075 \n",
|
||
"L 170.727742 46.581409 \n",
|
||
"L 173.599098 41.949358 \n",
|
||
"L 176.470452 37.787991 \n",
|
||
"L 179.341808 34.099477 \n",
|
||
"L 182.213164 30.868893 \n",
|
||
"L 185.084518 28.06884 \n",
|
||
"L 187.955874 25.663873 \n",
|
||
"L 190.827229 23.614316 \n",
|
||
"L 193.698585 21.879221 \n",
|
||
"L 196.569941 20.418595 \n",
|
||
"L 199.441293 19.194841 \n",
|
||
"L 202.312649 18.173618 \n",
|
||
"L 206.619681 16.955405 \n",
|
||
"L 210.926713 16.036702 \n",
|
||
"L 215.233745 15.346977 \n",
|
||
"L 220.976457 14.689795 \n",
|
||
"L 228.154845 14.158904 \n",
|
||
"L 236.768909 13.786942 \n",
|
||
"L 248.254332 13.536533 \n",
|
||
"L 266.918136 13.38742 \n",
|
||
"L 269.789489 13.377273 \n",
|
||
"L 269.789489 13.377273 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_28\">\n",
|
||
" <path d=\"M 41.516761 136.922727 \n",
|
||
"L 65.923277 136.737978 \n",
|
||
"L 77.4087 136.46302 \n",
|
||
"L 86.022764 136.057092 \n",
|
||
"L 93.201152 135.482889 \n",
|
||
"L 98.943864 134.780485 \n",
|
||
"L 103.250896 134.053253 \n",
|
||
"L 107.557928 133.100346 \n",
|
||
"L 111.86496 131.864446 \n",
|
||
"L 114.736316 130.852567 \n",
|
||
"L 117.607672 129.66889 \n",
|
||
"L 120.479027 128.297061 \n",
|
||
"L 123.35038 126.724966 \n",
|
||
"L 126.221736 124.947729 \n",
|
||
"L 129.093091 122.971391 \n",
|
||
"L 133.400125 119.684796 \n",
|
||
"L 144.885546 110.517935 \n",
|
||
"L 147.756901 108.678954 \n",
|
||
"L 150.628256 107.260083 \n",
|
||
"L 152.063934 106.741126 \n",
|
||
"L 153.499611 106.363024 \n",
|
||
"L 154.935289 106.133133 \n",
|
||
"L 156.370967 106.055993 \n",
|
||
"L 157.806644 106.133135 \n",
|
||
"L 159.242322 106.363026 \n",
|
||
"L 160.677999 106.741126 \n",
|
||
"L 162.113677 107.260083 \n",
|
||
"L 164.985032 108.678954 \n",
|
||
"L 167.856387 110.517935 \n",
|
||
"L 170.727742 112.656506 \n",
|
||
"L 176.470452 117.34553 \n",
|
||
"L 180.777486 120.817036 \n",
|
||
"L 185.084518 123.983526 \n",
|
||
"L 187.955874 125.861917 \n",
|
||
"L 190.827229 127.536562 \n",
|
||
"L 193.698585 129.007383 \n",
|
||
"L 196.569941 130.283295 \n",
|
||
"L 199.441293 131.378861 \n",
|
||
"L 203.748325 132.722848 \n",
|
||
"L 208.055361 133.763369 \n",
|
||
"L 212.362393 134.559937 \n",
|
||
"L 218.105105 135.33127 \n",
|
||
"L 223.847809 135.859872 \n",
|
||
"L 231.026204 136.289615 \n",
|
||
"L 241.075944 136.627342 \n",
|
||
"L 255.43272 136.839828 \n",
|
||
"L 269.789489 136.91837 \n",
|
||
"L 269.789489 136.91837 \n",
|
||
"\" clip-path=\"url(#p676f3d42a3)\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"patch_3\">\n",
|
||
" <path d=\"M 30.103125 143.1 \n",
|
||
"L 30.103125 7.2 \n",
|
||
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"patch_4\">\n",
|
||
" <path d=\"M 281.203125 143.1 \n",
|
||
"L 281.203125 7.2 \n",
|
||
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"patch_5\">\n",
|
||
" <path d=\"M 30.103125 143.1 \n",
|
||
"L 281.203125 143.1 \n",
|
||
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"patch_6\">\n",
|
||
" <path d=\"M 30.103125 7.2 \n",
|
||
"L 281.203125 7.2 \n",
|
||
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"legend_1\">\n",
|
||
" <g id=\"patch_7\">\n",
|
||
" <path d=\"M 37.103125 44.55625 \n",
|
||
"L 111.228125 44.55625 \n",
|
||
"Q 113.228125 44.55625 113.228125 42.55625 \n",
|
||
"L 113.228125 14.2 \n",
|
||
"Q 113.228125 12.2 111.228125 12.2 \n",
|
||
"L 37.103125 12.2 \n",
|
||
"Q 35.103125 12.2 35.103125 14.2 \n",
|
||
"L 35.103125 42.55625 \n",
|
||
"Q 35.103125 44.55625 37.103125 44.55625 \n",
|
||
"z\n",
|
||
"\" style=\"fill: #ffffff; opacity: 0.8; stroke: #cccccc; stroke-linejoin: miter\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_29\">\n",
|
||
" <path d=\"M 39.103125 20.298438 \n",
|
||
"L 49.103125 20.298438 \n",
|
||
"L 59.103125 20.298438 \n",
|
||
"\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_14\">\n",
|
||
" <!-- sigmoid -->\n",
|
||
" <g transform=\"translate(67.103125 23.798438)scale(0.1 -0.1)\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"DejaVuSans-73\" d=\"M 2834 3397 \n",
|
||
"L 2834 2853 \n",
|
||
"Q 2591 2978 2328 3040 \n",
|
||
"Q 2066 3103 1784 3103 \n",
|
||
"Q 1356 3103 1142 2972 \n",
|
||
"Q 928 2841 928 2578 \n",
|
||
"Q 928 2378 1081 2264 \n",
|
||
"Q 1234 2150 1697 2047 \n",
|
||
"L 1894 2003 \n",
|
||
"Q 2506 1872 2764 1633 \n",
|
||
"Q 3022 1394 3022 966 \n",
|
||
"Q 3022 478 2636 193 \n",
|
||
"Q 2250 -91 1575 -91 \n",
|
||
"Q 1294 -91 989 -36 \n",
|
||
"Q 684 19 347 128 \n",
|
||
"L 347 722 \n",
|
||
"Q 666 556 975 473 \n",
|
||
"Q 1284 391 1588 391 \n",
|
||
"Q 1994 391 2212 530 \n",
|
||
"Q 2431 669 2431 922 \n",
|
||
"Q 2431 1156 2273 1281 \n",
|
||
"Q 2116 1406 1581 1522 \n",
|
||
"L 1381 1569 \n",
|
||
"Q 847 1681 609 1914 \n",
|
||
"Q 372 2147 372 2553 \n",
|
||
"Q 372 3047 722 3315 \n",
|
||
"Q 1072 3584 1716 3584 \n",
|
||
"Q 2034 3584 2315 3537 \n",
|
||
"Q 2597 3491 2834 3397 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-69\" d=\"M 603 3500 \n",
|
||
"L 1178 3500 \n",
|
||
"L 1178 0 \n",
|
||
"L 603 0 \n",
|
||
"L 603 3500 \n",
|
||
"z\n",
|
||
"M 603 4863 \n",
|
||
"L 1178 4863 \n",
|
||
"L 1178 4134 \n",
|
||
"L 603 4134 \n",
|
||
"L 603 4863 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-67\" d=\"M 2906 1791 \n",
|
||
"Q 2906 2416 2648 2759 \n",
|
||
"Q 2391 3103 1925 3103 \n",
|
||
"Q 1463 3103 1205 2759 \n",
|
||
"Q 947 2416 947 1791 \n",
|
||
"Q 947 1169 1205 825 \n",
|
||
"Q 1463 481 1925 481 \n",
|
||
"Q 2391 481 2648 825 \n",
|
||
"Q 2906 1169 2906 1791 \n",
|
||
"z\n",
|
||
"M 3481 434 \n",
|
||
"Q 3481 -459 3084 -895 \n",
|
||
"Q 2688 -1331 1869 -1331 \n",
|
||
"Q 1566 -1331 1297 -1286 \n",
|
||
"Q 1028 -1241 775 -1147 \n",
|
||
"L 775 -588 \n",
|
||
"Q 1028 -725 1275 -790 \n",
|
||
"Q 1522 -856 1778 -856 \n",
|
||
"Q 2344 -856 2625 -561 \n",
|
||
"Q 2906 -266 2906 331 \n",
|
||
"L 2906 616 \n",
|
||
"Q 2728 306 2450 153 \n",
|
||
"Q 2172 0 1784 0 \n",
|
||
"Q 1141 0 747 490 \n",
|
||
"Q 353 981 353 1791 \n",
|
||
"Q 353 2603 747 3093 \n",
|
||
"Q 1141 3584 1784 3584 \n",
|
||
"Q 2172 3584 2450 3431 \n",
|
||
"Q 2728 3278 2906 2969 \n",
|
||
"L 2906 3500 \n",
|
||
"L 3481 3500 \n",
|
||
"L 3481 434 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-6d\" d=\"M 3328 2828 \n",
|
||
"Q 3544 3216 3844 3400 \n",
|
||
"Q 4144 3584 4550 3584 \n",
|
||
"Q 5097 3584 5394 3201 \n",
|
||
"Q 5691 2819 5691 2113 \n",
|
||
"L 5691 0 \n",
|
||
"L 5113 0 \n",
|
||
"L 5113 2094 \n",
|
||
"Q 5113 2597 4934 2840 \n",
|
||
"Q 4756 3084 4391 3084 \n",
|
||
"Q 3944 3084 3684 2787 \n",
|
||
"Q 3425 2491 3425 1978 \n",
|
||
"L 3425 0 \n",
|
||
"L 2847 0 \n",
|
||
"L 2847 2094 \n",
|
||
"Q 2847 2600 2669 2842 \n",
|
||
"Q 2491 3084 2119 3084 \n",
|
||
"Q 1678 3084 1418 2786 \n",
|
||
"Q 1159 2488 1159 1978 \n",
|
||
"L 1159 0 \n",
|
||
"L 581 0 \n",
|
||
"L 581 3500 \n",
|
||
"L 1159 3500 \n",
|
||
"L 1159 2956 \n",
|
||
"Q 1356 3278 1631 3431 \n",
|
||
"Q 1906 3584 2284 3584 \n",
|
||
"Q 2666 3584 2933 3390 \n",
|
||
"Q 3200 3197 3328 2828 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-6f\" d=\"M 1959 3097 \n",
|
||
"Q 1497 3097 1228 2736 \n",
|
||
"Q 959 2375 959 1747 \n",
|
||
"Q 959 1119 1226 758 \n",
|
||
"Q 1494 397 1959 397 \n",
|
||
"Q 2419 397 2687 759 \n",
|
||
"Q 2956 1122 2956 1747 \n",
|
||
"Q 2956 2369 2687 2733 \n",
|
||
"Q 2419 3097 1959 3097 \n",
|
||
"z\n",
|
||
"M 1959 3584 \n",
|
||
"Q 2709 3584 3137 3096 \n",
|
||
"Q 3566 2609 3566 1747 \n",
|
||
"Q 3566 888 3137 398 \n",
|
||
"Q 2709 -91 1959 -91 \n",
|
||
"Q 1206 -91 779 398 \n",
|
||
"Q 353 888 353 1747 \n",
|
||
"Q 353 2609 779 3096 \n",
|
||
"Q 1206 3584 1959 3584 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-64\" d=\"M 2906 2969 \n",
|
||
"L 2906 4863 \n",
|
||
"L 3481 4863 \n",
|
||
"L 3481 0 \n",
|
||
"L 2906 0 \n",
|
||
"L 2906 525 \n",
|
||
"Q 2725 213 2448 61 \n",
|
||
"Q 2172 -91 1784 -91 \n",
|
||
"Q 1150 -91 751 415 \n",
|
||
"Q 353 922 353 1747 \n",
|
||
"Q 353 2572 751 3078 \n",
|
||
"Q 1150 3584 1784 3584 \n",
|
||
"Q 2172 3584 2448 3432 \n",
|
||
"Q 2725 3281 2906 2969 \n",
|
||
"z\n",
|
||
"M 947 1747 \n",
|
||
"Q 947 1113 1208 752 \n",
|
||
"Q 1469 391 1925 391 \n",
|
||
"Q 2381 391 2643 752 \n",
|
||
"Q 2906 1113 2906 1747 \n",
|
||
"Q 2906 2381 2643 2742 \n",
|
||
"Q 2381 3103 1925 3103 \n",
|
||
"Q 1469 3103 1208 2742 \n",
|
||
"Q 947 2381 947 1747 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" </defs>\n",
|
||
" <use xlink:href=\"#DejaVuSans-73\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-69\" x=\"52.099609\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-67\" x=\"79.882812\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-6d\" x=\"143.359375\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-6f\" x=\"240.771484\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-69\" x=\"301.953125\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-64\" x=\"329.736328\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <g id=\"line2d_30\">\n",
|
||
" <path d=\"M 39.103125 34.976562 \n",
|
||
"L 49.103125 34.976562 \n",
|
||
"L 59.103125 34.976562 \n",
|
||
"\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
|
||
" </g>\n",
|
||
" <g id=\"text_15\">\n",
|
||
" <!-- gradient -->\n",
|
||
" <g transform=\"translate(67.103125 38.476562)scale(0.1 -0.1)\">\n",
|
||
" <defs>\n",
|
||
" <path id=\"DejaVuSans-72\" d=\"M 2631 2963 \n",
|
||
"Q 2534 3019 2420 3045 \n",
|
||
"Q 2306 3072 2169 3072 \n",
|
||
"Q 1681 3072 1420 2755 \n",
|
||
"Q 1159 2438 1159 1844 \n",
|
||
"L 1159 0 \n",
|
||
"L 581 0 \n",
|
||
"L 581 3500 \n",
|
||
"L 1159 3500 \n",
|
||
"L 1159 2956 \n",
|
||
"Q 1341 3275 1631 3429 \n",
|
||
"Q 1922 3584 2338 3584 \n",
|
||
"Q 2397 3584 2469 3576 \n",
|
||
"Q 2541 3569 2628 3553 \n",
|
||
"L 2631 2963 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-61\" d=\"M 2194 1759 \n",
|
||
"Q 1497 1759 1228 1600 \n",
|
||
"Q 959 1441 959 1056 \n",
|
||
"Q 959 750 1161 570 \n",
|
||
"Q 1363 391 1709 391 \n",
|
||
"Q 2188 391 2477 730 \n",
|
||
"Q 2766 1069 2766 1631 \n",
|
||
"L 2766 1759 \n",
|
||
"L 2194 1759 \n",
|
||
"z\n",
|
||
"M 3341 1997 \n",
|
||
"L 3341 0 \n",
|
||
"L 2766 0 \n",
|
||
"L 2766 531 \n",
|
||
"Q 2569 213 2275 61 \n",
|
||
"Q 1981 -91 1556 -91 \n",
|
||
"Q 1019 -91 701 211 \n",
|
||
"Q 384 513 384 1019 \n",
|
||
"Q 384 1609 779 1909 \n",
|
||
"Q 1175 2209 1959 2209 \n",
|
||
"L 2766 2209 \n",
|
||
"L 2766 2266 \n",
|
||
"Q 2766 2663 2505 2880 \n",
|
||
"Q 2244 3097 1772 3097 \n",
|
||
"Q 1472 3097 1187 3025 \n",
|
||
"Q 903 2953 641 2809 \n",
|
||
"L 641 3341 \n",
|
||
"Q 956 3463 1253 3523 \n",
|
||
"Q 1550 3584 1831 3584 \n",
|
||
"Q 2591 3584 2966 3190 \n",
|
||
"Q 3341 2797 3341 1997 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-65\" d=\"M 3597 1894 \n",
|
||
"L 3597 1613 \n",
|
||
"L 953 1613 \n",
|
||
"Q 991 1019 1311 708 \n",
|
||
"Q 1631 397 2203 397 \n",
|
||
"Q 2534 397 2845 478 \n",
|
||
"Q 3156 559 3463 722 \n",
|
||
"L 3463 178 \n",
|
||
"Q 3153 47 2828 -22 \n",
|
||
"Q 2503 -91 2169 -91 \n",
|
||
"Q 1331 -91 842 396 \n",
|
||
"Q 353 884 353 1716 \n",
|
||
"Q 353 2575 817 3079 \n",
|
||
"Q 1281 3584 2069 3584 \n",
|
||
"Q 2775 3584 3186 3129 \n",
|
||
"Q 3597 2675 3597 1894 \n",
|
||
"z\n",
|
||
"M 3022 2063 \n",
|
||
"Q 3016 2534 2758 2815 \n",
|
||
"Q 2500 3097 2075 3097 \n",
|
||
"Q 1594 3097 1305 2825 \n",
|
||
"Q 1016 2553 972 2059 \n",
|
||
"L 3022 2063 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-6e\" d=\"M 3513 2113 \n",
|
||
"L 3513 0 \n",
|
||
"L 2938 0 \n",
|
||
"L 2938 2094 \n",
|
||
"Q 2938 2591 2744 2837 \n",
|
||
"Q 2550 3084 2163 3084 \n",
|
||
"Q 1697 3084 1428 2787 \n",
|
||
"Q 1159 2491 1159 1978 \n",
|
||
"L 1159 0 \n",
|
||
"L 581 0 \n",
|
||
"L 581 3500 \n",
|
||
"L 1159 3500 \n",
|
||
"L 1159 2956 \n",
|
||
"Q 1366 3272 1645 3428 \n",
|
||
"Q 1925 3584 2291 3584 \n",
|
||
"Q 2894 3584 3203 3211 \n",
|
||
"Q 3513 2838 3513 2113 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" <path id=\"DejaVuSans-74\" d=\"M 1172 4494 \n",
|
||
"L 1172 3500 \n",
|
||
"L 2356 3500 \n",
|
||
"L 2356 3053 \n",
|
||
"L 1172 3053 \n",
|
||
"L 1172 1153 \n",
|
||
"Q 1172 725 1289 603 \n",
|
||
"Q 1406 481 1766 481 \n",
|
||
"L 2356 481 \n",
|
||
"L 2356 0 \n",
|
||
"L 1766 0 \n",
|
||
"Q 1100 0 847 248 \n",
|
||
"Q 594 497 594 1153 \n",
|
||
"L 594 3053 \n",
|
||
"L 172 3053 \n",
|
||
"L 172 3500 \n",
|
||
"L 594 3500 \n",
|
||
"L 594 4494 \n",
|
||
"L 1172 4494 \n",
|
||
"z\n",
|
||
"\" transform=\"scale(0.015625)\"/>\n",
|
||
" </defs>\n",
|
||
" <use xlink:href=\"#DejaVuSans-67\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-72\" x=\"63.476562\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-61\" x=\"104.589844\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-64\" x=\"165.869141\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-69\" x=\"229.345703\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-65\" x=\"257.128906\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-6e\" x=\"318.652344\"/>\n",
|
||
" <use xlink:href=\"#DejaVuSans-74\" x=\"382.03125\"/>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" </g>\n",
|
||
" <defs>\n",
|
||
" <clipPath id=\"p676f3d42a3\">\n",
|
||
" <rect x=\"30.103125\" y=\"7.2\" width=\"251.1\" height=\"135.9\"/>\n",
|
||
" </clipPath>\n",
|
||
" </defs>\n",
|
||
"</svg>\n"
|
||
],
|
||
"text/plain": [
|
||
"<Figure size 324x180 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"%matplotlib inline\n",
|
||
"import torch\n",
|
||
"from d2l import torch as d2l\n",
|
||
"\n",
|
||
"x = torch.arange(-8.0, 8.0, 0.1, requires_grad=True)\n",
|
||
"y = torch.sigmoid(x)\n",
|
||
"y.backward(torch.ones_like(x))\n",
|
||
"\n",
|
||
"d2l.plot(x.detach().numpy(), [y.detach().numpy(), x.grad.numpy()],\n",
|
||
" legend=['sigmoid', 'gradient'], figsize=(4.5, 2.5))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "121269aa",
|
||
"metadata": {
|
||
"origin_pos": 5
|
||
},
|
||
"source": [
|
||
"正如上图,当sigmoid函数的输入很大或是很小时,它的梯度都会消失。\n",
|
||
"此外,当反向传播通过许多层时,除非我们在刚刚好的地方,\n",
|
||
"这些地方sigmoid函数的输入接近于零,否则整个乘积的梯度可能会消失。\n",
|
||
"当我们的网络有很多层时,除非我们很小心,否则在某一层可能会切断梯度。\n",
|
||
"事实上,这个问题曾经困扰着深度网络的训练。\n",
|
||
"因此,更稳定的ReLU系列函数已经成为从业者的默认选择(虽然在神经科学的角度看起来不太合理)。\n",
|
||
"\n",
|
||
"### [**梯度爆炸**]\n",
|
||
"\n",
|
||
"相反,梯度爆炸可能同样令人烦恼。\n",
|
||
"为了更好地说明这一点,我们生成100个高斯随机矩阵,并将它们与某个初始矩阵相乘。\n",
|
||
"对于我们选择的尺度(方差$\\sigma^2=1$),矩阵乘积发生爆炸。\n",
|
||
"当这种情况是由于深度网络的初始化所导致时,我们没有机会让梯度下降优化器收敛。\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "1bcc7a40",
|
||
"metadata": {
|
||
"execution": {
|
||
"iopub.execute_input": "2023-08-18T07:03:34.476936Z",
|
||
"iopub.status.busy": "2023-08-18T07:03:34.476128Z",
|
||
"iopub.status.idle": "2023-08-18T07:03:34.491176Z",
|
||
"shell.execute_reply": "2023-08-18T07:03:34.490214Z"
|
||
},
|
||
"origin_pos": 7,
|
||
"tab": [
|
||
"pytorch"
|
||
]
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"一个矩阵 \n",
|
||
" tensor([[-0.7872, 2.7090, 0.5996, -1.3191],\n",
|
||
" [-1.8260, -0.7130, -0.5521, 0.1051],\n",
|
||
" [ 1.1213, 1.0472, -0.3991, -0.3802],\n",
|
||
" [ 0.5552, 0.4517, -0.3218, 0.5214]])\n",
|
||
"乘以100个矩阵后\n",
|
||
" tensor([[-2.1897e+26, 8.8308e+26, 1.9813e+26, 1.7019e+26],\n",
|
||
" [ 1.3110e+26, -5.2870e+26, -1.1862e+26, -1.0189e+26],\n",
|
||
" [-1.6008e+26, 6.4559e+26, 1.4485e+26, 1.2442e+26],\n",
|
||
" [ 3.0943e+25, -1.2479e+26, -2.7998e+25, -2.4050e+25]])\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"M = torch.normal(0, 1, size=(4,4))\n",
|
||
"print('一个矩阵 \\n',M)\n",
|
||
"for i in range(100):\n",
|
||
" M = torch.mm(M,torch.normal(0, 1, size=(4, 4)))\n",
|
||
"\n",
|
||
"print('乘以100个矩阵后\\n', M)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "dc114830",
|
||
"metadata": {
|
||
"origin_pos": 10
|
||
},
|
||
"source": [
|
||
"### 打破对称性\n",
|
||
"\n",
|
||
"神经网络设计中的另一个问题是其参数化所固有的对称性。\n",
|
||
"假设我们有一个简单的多层感知机,它有一个隐藏层和两个隐藏单元。\n",
|
||
"在这种情况下,我们可以对第一层的权重$\\mathbf{W}^{(1)}$进行重排列,\n",
|
||
"并且同样对输出层的权重进行重排列,可以获得相同的函数。\n",
|
||
"第一个隐藏单元与第二个隐藏单元没有什么特别的区别。\n",
|
||
"换句话说,我们在每一层的隐藏单元之间具有排列对称性。\n",
|
||
"\n",
|
||
"假设输出层将上述两个隐藏单元的多层感知机转换为仅一个输出单元。\n",
|
||
"想象一下,如果我们将隐藏层的所有参数初始化为$\\mathbf{W}^{(1)} = c$,\n",
|
||
"$c$为常量,会发生什么?\n",
|
||
"在这种情况下,在前向传播期间,两个隐藏单元采用相同的输入和参数,\n",
|
||
"产生相同的激活,该激活被送到输出单元。\n",
|
||
"在反向传播期间,根据参数$\\mathbf{W}^{(1)}$对输出单元进行微分,\n",
|
||
"得到一个梯度,其元素都取相同的值。\n",
|
||
"因此,在基于梯度的迭代(例如,小批量随机梯度下降)之后,\n",
|
||
"$\\mathbf{W}^{(1)}$的所有元素仍然采用相同的值。\n",
|
||
"这样的迭代永远不会打破对称性,我们可能永远也无法实现网络的表达能力。\n",
|
||
"隐藏层的行为就好像只有一个单元。\n",
|
||
"请注意,虽然小批量随机梯度下降不会打破这种对称性,但暂退法正则化可以。\n",
|
||
"\n",
|
||
"## 参数初始化\n",
|
||
"\n",
|
||
"解决(或至少减轻)上述问题的一种方法是进行参数初始化,\n",
|
||
"优化期间的注意和适当的正则化也可以进一步提高稳定性。\n",
|
||
"\n",
|
||
"### 默认初始化\n",
|
||
"\n",
|
||
"在前面的部分中,例如在 :numref:`sec_linear_concise`中,\n",
|
||
"我们使用正态分布来初始化权重值。如果我们不指定初始化方法,\n",
|
||
"框架将使用默认的随机初始化方法,对于中等难度的问题,这种方法通常很有效。\n",
|
||
"\n",
|
||
"### Xavier初始化\n",
|
||
":label:`subsec_xavier`\n",
|
||
"\n",
|
||
"让我们看看某些*没有非线性*的全连接层输出(例如,隐藏变量)$o_{i}$的尺度分布。\n",
|
||
"对于该层$n_\\mathrm{in}$输入$x_j$及其相关权重$w_{ij}$,输出由下式给出\n",
|
||
"\n",
|
||
"$$o_{i} = \\sum_{j=1}^{n_\\mathrm{in}} w_{ij} x_j.$$\n",
|
||
"\n",
|
||
"权重$w_{ij}$都是从同一分布中独立抽取的。\n",
|
||
"此外,让我们假设该分布具有零均值和方差$\\sigma^2$。\n",
|
||
"请注意,这并不意味着分布必须是高斯的,只是均值和方差需要存在。\n",
|
||
"现在,让我们假设层$x_j$的输入也具有零均值和方差$\\gamma^2$,\n",
|
||
"并且它们独立于$w_{ij}$并且彼此独立。\n",
|
||
"在这种情况下,我们可以按如下方式计算$o_i$的平均值和方差:\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
" E[o_i] & = \\sum_{j=1}^{n_\\mathrm{in}} E[w_{ij} x_j] \\\\&= \\sum_{j=1}^{n_\\mathrm{in}} E[w_{ij}] E[x_j] \\\\&= 0, \\\\\n",
|
||
" \\mathrm{Var}[o_i] & = E[o_i^2] - (E[o_i])^2 \\\\\n",
|
||
" & = \\sum_{j=1}^{n_\\mathrm{in}} E[w^2_{ij} x^2_j] - 0 \\\\\n",
|
||
" & = \\sum_{j=1}^{n_\\mathrm{in}} E[w^2_{ij}] E[x^2_j] \\\\\n",
|
||
" & = n_\\mathrm{in} \\sigma^2 \\gamma^2.\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"保持方差不变的一种方法是设置$n_\\mathrm{in} \\sigma^2 = 1$。\n",
|
||
"现在考虑反向传播过程,我们面临着类似的问题,尽管梯度是从更靠近输出的层传播的。\n",
|
||
"使用与前向传播相同的推断,我们可以看到,除非$n_\\mathrm{out} \\sigma^2 = 1$,\n",
|
||
"否则梯度的方差可能会增大,其中$n_\\mathrm{out}$是该层的输出的数量。\n",
|
||
"这使得我们进退两难:我们不可能同时满足这两个条件。\n",
|
||
"相反,我们只需满足:\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"\\frac{1}{2} (n_\\mathrm{in} + n_\\mathrm{out}) \\sigma^2 = 1 \\text{ 或等价于 }\n",
|
||
"\\sigma = \\sqrt{\\frac{2}{n_\\mathrm{in} + n_\\mathrm{out}}}.\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"这就是现在标准且实用的*Xavier初始化*的基础,\n",
|
||
"它以其提出者 :cite:`Glorot.Bengio.2010` 第一作者的名字命名。\n",
|
||
"通常,Xavier初始化从均值为零,方差\n",
|
||
"$\\sigma^2 = \\frac{2}{n_\\mathrm{in} + n_\\mathrm{out}}$\n",
|
||
"的高斯分布中采样权重。\n",
|
||
"我们也可以将其改为选择从均匀分布中抽取权重时的方差。\n",
|
||
"注意均匀分布$U(-a, a)$的方差为$\\frac{a^2}{3}$。\n",
|
||
"将$\\frac{a^2}{3}$代入到$\\sigma^2$的条件中,将得到初始化值域:\n",
|
||
"\n",
|
||
"$$U\\left(-\\sqrt{\\frac{6}{n_\\mathrm{in} + n_\\mathrm{out}}}, \\sqrt{\\frac{6}{n_\\mathrm{in} + n_\\mathrm{out}}}\\right).$$\n",
|
||
"\n",
|
||
"尽管在上述数学推理中,“不存在非线性”的假设在神经网络中很容易被违反,\n",
|
||
"但Xavier初始化方法在实践中被证明是有效的。\n",
|
||
"\n",
|
||
"### 额外阅读\n",
|
||
"\n",
|
||
"上面的推理仅仅触及了现代参数初始化方法的皮毛。\n",
|
||
"深度学习框架通常实现十几种不同的启发式方法。\n",
|
||
"此外,参数初始化一直是深度学习基础研究的热点领域。\n",
|
||
"其中包括专门用于参数绑定(共享)、超分辨率、序列模型和其他情况的启发式算法。\n",
|
||
"例如,Xiao等人演示了通过使用精心设计的初始化方法\n",
|
||
" :cite:`Xiao.Bahri.Sohl-Dickstein.ea.2018`,\n",
|
||
"可以无须架构上的技巧而训练10000层神经网络的可能性。\n",
|
||
"\n",
|
||
"如果有读者对该主题感兴趣,我们建议深入研究本模块的内容,\n",
|
||
"阅读提出并分析每种启发式方法的论文,然后探索有关该主题的最新出版物。\n",
|
||
"也许会偶然发现甚至发明一个聪明的想法,并为深度学习框架提供一个实现。\n",
|
||
"\n",
|
||
"## 小结\n",
|
||
"\n",
|
||
"* 梯度消失和梯度爆炸是深度网络中常见的问题。在参数初始化时需要非常小心,以确保梯度和参数可以得到很好的控制。\n",
|
||
"* 需要用启发式的初始化方法来确保初始梯度既不太大也不太小。\n",
|
||
"* ReLU激活函数缓解了梯度消失问题,这样可以加速收敛。\n",
|
||
"* 随机初始化是保证在进行优化前打破对称性的关键。\n",
|
||
"* Xavier初始化表明,对于每一层,输出的方差不受输入数量的影响,任何梯度的方差不受输出数量的影响。\n",
|
||
"\n",
|
||
"## 练习\n",
|
||
"\n",
|
||
"1. 除了多层感知机的排列对称性之外,还能设计出其他神经网络可能会表现出对称性且需要被打破的情况吗?\n",
|
||
"2. 我们是否可以将线性回归或softmax回归中的所有权重参数初始化为相同的值?\n",
|
||
"3. 在相关资料中查找两个矩阵乘积特征值的解析界。这对确保梯度条件合适有什么启示?\n",
|
||
"4. 如果我们知道某些项是发散的,我们能在事后修正吗?看看关于按层自适应速率缩放的论文 :cite:`You.Gitman.Ginsburg.2017` 。\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "bf1eb65c",
|
||
"metadata": {
|
||
"origin_pos": 12,
|
||
"tab": [
|
||
"pytorch"
|
||
]
|
||
},
|
||
"source": [
|
||
"[Discussions](https://discuss.d2l.ai/t/1818)\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"language_info": {
|
||
"name": "python"
|
||
},
|
||
"required_libs": []
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
} |