Files
2025-12-16 09:23:53 +08:00

3770 lines
145 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "7dc79f4c",
"metadata": {
"origin_pos": 0
},
"source": [
"# 语言模型和数据集\n",
":label:`sec_language_model`\n",
"\n",
"在 :numref:`sec_text_preprocessing`中,\n",
"我们了解了如何将文本数据映射为词元,\n",
"以及将这些词元可以视为一系列离散的观测,例如单词或字符。\n",
"假设长度为$T$的文本序列中的词元依次为$x_1, x_2, \\ldots, x_T$。\n",
"于是,$x_t$$1 \\leq t \\leq T$\n",
"可以被认为是文本序列在时间步$t$处的观测或标签。\n",
"在给定这样的文本序列时,*语言模型*(language model)的目标是估计序列的联合概率\n",
"\n",
"$$P(x_1, x_2, \\ldots, x_T).$$\n",
"\n",
"例如,只需要一次抽取一个词元$x_t \\sim P(x_t \\mid x_{t-1}, \\ldots, x_1)$\n",
"一个理想的语言模型就能够基于模型本身生成自然文本。\n",
"与猴子使用打字机完全不同的是,从这样的模型中提取的文本\n",
"都将作为自然语言(例如,英语文本)来传递。\n",
"只需要基于前面的对话片断中的文本,\n",
"就足以生成一个有意义的对话。\n",
"显然,我们离设计出这样的系统还很遥远,\n",
"因为它需要“理解”文本,而不仅仅是生成语法合理的内容。\n",
"\n",
"尽管如此,语言模型依然是非常有用的。\n",
"例如,短语“to recognize speech”和“to wreck a nice beach”读音上听起来非常相似。\n",
"这种相似性会导致语音识别中的歧义,但是这很容易通过语言模型来解决,\n",
"因为第二句的语义很奇怪。\n",
"同样,在文档摘要生成算法中,\n",
"“狗咬人”比“人咬狗”出现的频率要高得多,\n",
"或者“我想吃奶奶”是一个相当匪夷所思的语句,\n",
"而“我想吃,奶奶”则要正常得多。\n",
"\n",
"## 学习语言模型\n",
"\n",
"显而易见,我们面对的问题是如何对一个文档,\n",
"甚至是一个词元序列进行建模。\n",
"假设在单词级别对文本数据进行词元化,\n",
"我们可以依靠在 :numref:`sec_sequence`中对序列模型的分析。\n",
"让我们从基本概率规则开始:\n",
"\n",
"$$P(x_1, x_2, \\ldots, x_T) = \\prod_{t=1}^T P(x_t \\mid x_1, \\ldots, x_{t-1}).$$\n",
"\n",
"例如,包含了四个单词的一个文本序列的概率是:\n",
"\n",
"$$P(\\text{deep}, \\text{learning}, \\text{is}, \\text{fun}) = P(\\text{deep}) P(\\text{learning} \\mid \\text{deep}) P(\\text{is} \\mid \\text{deep}, \\text{learning}) P(\\text{fun} \\mid \\text{deep}, \\text{learning}, \\text{is}).$$\n",
"\n",
"为了训练语言模型,我们需要计算单词的概率,\n",
"以及给定前面几个单词后出现某个单词的条件概率。\n",
"这些概率本质上就是语言模型的参数。\n",
"\n",
"这里,我们假设训练数据集是一个大型的文本语料库。\n",
"比如,维基百科的所有条目、\n",
"[古登堡计划](https://en.wikipedia.org/wiki/Project_Gutenberg)\n",
"或者所有发布在网络上的文本。\n",
"训练数据集中词的概率可以根据给定词的相对词频来计算。\n",
"例如,可以将估计值$\\hat{P}(\\text{deep})$\n",
"计算为任何以单词“deep”开头的句子的概率。\n",
"一种(稍稍不太精确的)方法是统计单词“deep”在数据集中的出现次数,\n",
"然后将其除以整个语料库中的单词总数。\n",
"这种方法效果不错,特别是对于频繁出现的单词。\n",
"接下来,我们可以尝试估计\n",
"\n",
"$$\\hat{P}(\\text{learning} \\mid \\text{deep}) = \\frac{n(\\text{deep, learning})}{n(\\text{deep})},$$\n",
"\n",
"其中$n(x)$和$n(x, x')$分别是单个单词和连续单词对的出现次数。\n",
"不幸的是,由于连续单词对“deep learning”的出现频率要低得多,\n",
"所以估计这类单词正确的概率要困难得多。\n",
"特别是对于一些不常见的单词组合,要想找到足够的出现次数来获得准确的估计可能都不容易。\n",
"而对于三个或者更多的单词组合,情况会变得更糟。\n",
"许多合理的三个单词组合可能是存在的,但是在数据集中却找不到。\n",
"除非我们提供某种解决方案,来将这些单词组合指定为非零计数,\n",
"否则将无法在语言模型中使用它们。\n",
"如果数据集很小,或者单词非常罕见,那么这类单词出现一次的机会可能都找不到。\n",
"\n",
"一种常见的策略是执行某种形式的*拉普拉斯平滑*(Laplace smoothing),\n",
"具体方法是在所有计数中添加一个小常量。\n",
"用$n$表示训练集中的单词总数,用$m$表示唯一单词的数量。\n",
"此解决方案有助于处理单元素问题,例如通过:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
" \\hat{P}(x) & = \\frac{n(x) + \\epsilon_1/m}{n + \\epsilon_1}, \\\\\n",
" \\hat{P}(x' \\mid x) & = \\frac{n(x, x') + \\epsilon_2 \\hat{P}(x')}{n(x) + \\epsilon_2}, \\\\\n",
" \\hat{P}(x'' \\mid x,x') & = \\frac{n(x, x',x'') + \\epsilon_3 \\hat{P}(x'')}{n(x, x') + \\epsilon_3}.\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"其中,$\\epsilon_1,\\epsilon_2$和$\\epsilon_3$是超参数。\n",
"以$\\epsilon_1$为例:当$\\epsilon_1 = 0$时,不应用平滑;\n",
"当$\\epsilon_1$接近正无穷大时,$\\hat{P}(x)$接近均匀概率分布$1/m$。\n",
"上面的公式是 :cite:`Wood.Gasthaus.Archambeau.ea.2011`\n",
"的一个相当原始的变形。\n",
"\n",
"然而,这样的模型很容易变得无效,原因如下:\n",
"首先,我们需要存储所有的计数;\n",
"其次,这完全忽略了单词的意思。\n",
"例如,“猫”(cat)和“猫科动物”(feline)可能出现在相关的上下文中,\n",
"但是想根据上下文调整这类模型其实是相当困难的。\n",
"最后,长单词序列大部分是没出现过的,\n",
"因此一个模型如果只是简单地统计先前“看到”的单词序列频率,\n",
"那么模型面对这种问题肯定是表现不佳的。\n",
"\n",
"## 马尔可夫模型与$n$元语法\n",
"\n",
"在讨论包含深度学习的解决方案之前,我们需要了解更多的概念和术语。\n",
"回想一下我们在 :numref:`sec_sequence`中对马尔可夫模型的讨论,\n",
"并且将其应用于语言建模。\n",
"如果$P(x_{t+1} \\mid x_t, \\ldots, x_1) = P(x_{t+1} \\mid x_t)$\n",
"则序列上的分布满足一阶马尔可夫性质。\n",
"阶数越高,对应的依赖关系就越长。\n",
"这种性质推导出了许多可以应用于序列建模的近似公式:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"P(x_1, x_2, x_3, x_4) &= P(x_1) P(x_2) P(x_3) P(x_4),\\\\\n",
"P(x_1, x_2, x_3, x_4) &= P(x_1) P(x_2 \\mid x_1) P(x_3 \\mid x_2) P(x_4 \\mid x_3),\\\\\n",
"P(x_1, x_2, x_3, x_4) &= P(x_1) P(x_2 \\mid x_1) P(x_3 \\mid x_1, x_2) P(x_4 \\mid x_2, x_3).\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"通常,涉及一个、两个和三个变量的概率公式分别被称为\n",
"*一元语法*unigram)、*二元语法*bigram)和*三元语法*trigram)模型。\n",
"下面,我们将学习如何去设计更好的模型。\n",
"\n",
"## 自然语言统计\n",
"\n",
"我们看看在真实数据上如果进行自然语言统计。\n",
"根据 :numref:`sec_text_preprocessing`中介绍的时光机器数据集构建词表,\n",
"并打印前$10$个最常用的(频率最高的)单词。\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "51959a91",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:02.314362Z",
"iopub.status.busy": "2023-08-18T07:04:02.313944Z",
"iopub.status.idle": "2023-08-18T07:04:04.933369Z",
"shell.execute_reply": "2023-08-18T07:04:04.932195Z"
},
"origin_pos": 2,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"import random\n",
"import torch\n",
"from d2l import torch as d2l"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f3d7db5f",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:04.938113Z",
"iopub.status.busy": "2023-08-18T07:04:04.937317Z",
"iopub.status.idle": "2023-08-18T07:04:05.000861Z",
"shell.execute_reply": "2023-08-18T07:04:04.999790Z"
},
"origin_pos": 5,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"[('the', 2261),\n",
" ('i', 1267),\n",
" ('and', 1245),\n",
" ('of', 1155),\n",
" ('a', 816),\n",
" ('to', 695),\n",
" ('was', 552),\n",
" ('in', 541),\n",
" ('that', 443),\n",
" ('my', 440)]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tokens = d2l.tokenize(d2l.read_time_machine())\n",
"# 因为每个文本行不一定是一个句子或一个段落,因此我们把所有文本行拼接到一起\n",
"corpus = [token for line in tokens for token in line]\n",
"vocab = d2l.Vocab(corpus)\n",
"vocab.token_freqs[:10]"
]
},
{
"cell_type": "markdown",
"id": "c86b0507",
"metadata": {
"origin_pos": 6
},
"source": [
"正如我们所看到的,(**最流行的词**)看起来很无聊,\n",
"这些词通常(**被称为*停用词***)(stop words),因此可以被过滤掉。\n",
"尽管如此,它们本身仍然是有意义的,我们仍然会在模型中使用它们。\n",
"此外,还有个明显的问题是词频衰减的速度相当地快。\n",
"例如,最常用单词的词频对比,第$10$个还不到第$1$个的$1/5$。\n",
"为了更好地理解,我们可以[**画出的词频图**]:\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d4636458",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:05.004637Z",
"iopub.status.busy": "2023-08-18T07:04:05.004114Z",
"iopub.status.idle": "2023-08-18T07:04:05.732423Z",
"shell.execute_reply": "2023-08-18T07:04:05.731618Z"
},
"origin_pos": 7,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"247.978125pt\" height=\"180.65625pt\" viewBox=\"0 0 247.978125 180.65625\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
" <metadata>\n",
" <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n",
" <cc:Work>\n",
" <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n",
" <dc:date>2023-08-18T07:04:05.596078</dc:date>\n",
" <dc:format>image/svg+xml</dc:format>\n",
" <dc:creator>\n",
" <cc:Agent>\n",
" <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>\n",
" </cc:Agent>\n",
" </dc:creator>\n",
" </cc:Work>\n",
" </rdf:RDF>\n",
" </metadata>\n",
" <defs>\n",
" <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n",
" </defs>\n",
" <g id=\"figure_1\">\n",
" <g id=\"patch_1\">\n",
" <path d=\"M 0 180.65625 \n",
"L 247.978125 180.65625 \n",
"L 247.978125 0 \n",
"L 0 0 \n",
"L 0 180.65625 \n",
"z\n",
"\" style=\"fill: none\"/>\n",
" </g>\n",
" <g id=\"axes_1\">\n",
" <g id=\"patch_2\">\n",
" <path d=\"M 45.478125 143.1 \n",
"L 240.778125 143.1 \n",
"L 240.778125 7.2 \n",
"L 45.478125 7.2 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"matplotlib.axis_1\">\n",
" <g id=\"xtick_1\">\n",
" <g id=\"line2d_1\">\n",
" <path d=\"M 54.355398 143.1 \n",
"L 54.355398 7.2 \n",
"\" clip-path=\"url(#pfcec344b5c)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_2\">\n",
" <defs>\n",
" <path id=\"mc1d221fe5c\" d=\"M 0 0 \n",
"L 0 3.5 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#mc1d221fe5c\" x=\"54.355398\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_1\">\n",
" <!-- $\\mathdefault{10^{0}}$ -->\n",
" <g transform=\"translate(45.555398 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-31\" d=\"M 794 531 \n",
"L 1825 531 \n",
"L 1825 4091 \n",
"L 703 3866 \n",
"L 703 4441 \n",
"L 1819 4666 \n",
"L 2450 4666 \n",
"L 2450 531 \n",
"L 3481 531 \n",
"L 3481 0 \n",
"L 794 0 \n",
"L 794 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-30\" d=\"M 2034 4250 \n",
"Q 1547 4250 1301 3770 \n",
"Q 1056 3291 1056 2328 \n",
"Q 1056 1369 1301 889 \n",
"Q 1547 409 2034 409 \n",
"Q 2525 409 2770 889 \n",
"Q 3016 1369 3016 2328 \n",
"Q 3016 3291 2770 3770 \n",
"Q 2525 4250 2034 4250 \n",
"z\n",
"M 2034 4750 \n",
"Q 2819 4750 3233 4129 \n",
"Q 3647 3509 3647 2328 \n",
"Q 3647 1150 3233 529 \n",
"Q 2819 -91 2034 -91 \n",
"Q 1250 -91 836 529 \n",
"Q 422 1150 422 2328 \n",
"Q 422 3509 836 4129 \n",
"Q 1250 4750 2034 4750 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_2\">\n",
" <g id=\"line2d_3\">\n",
" <path d=\"M 102.85613 143.1 \n",
"L 102.85613 7.2 \n",
"\" clip-path=\"url(#pfcec344b5c)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_4\">\n",
" <g>\n",
" <use xlink:href=\"#mc1d221fe5c\" x=\"102.85613\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_2\">\n",
" <!-- $\\mathdefault{10^{1}}$ -->\n",
" <g transform=\"translate(94.05613 157.698438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(128.203125 38.965625)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_3\">\n",
" <g id=\"line2d_5\">\n",
" <path d=\"M 151.356861 143.1 \n",
"L 151.356861 7.2 \n",
"\" clip-path=\"url(#pfcec344b5c)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_6\">\n",
" <g>\n",
" <use xlink:href=\"#mc1d221fe5c\" x=\"151.356861\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_3\">\n",
" <!-- $\\mathdefault{10^{2}}$ -->\n",
" <g transform=\"translate(142.556861 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-32\" d=\"M 1228 531 \n",
"L 3431 531 \n",
"L 3431 0 \n",
"L 469 0 \n",
"L 469 531 \n",
"Q 828 903 1448 1529 \n",
"Q 2069 2156 2228 2338 \n",
"Q 2531 2678 2651 2914 \n",
"Q 2772 3150 2772 3378 \n",
"Q 2772 3750 2511 3984 \n",
"Q 2250 4219 1831 4219 \n",
"Q 1534 4219 1204 4116 \n",
"Q 875 4013 500 3803 \n",
"L 500 4441 \n",
"Q 881 4594 1212 4672 \n",
"Q 1544 4750 1819 4750 \n",
"Q 2544 4750 2975 4387 \n",
"Q 3406 4025 3406 3419 \n",
"Q 3406 3131 3298 2873 \n",
"Q 3191 2616 2906 2266 \n",
"Q 2828 2175 2409 1742 \n",
"Q 1991 1309 1228 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-32\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_4\">\n",
" <g id=\"line2d_7\">\n",
" <path d=\"M 199.857593 143.1 \n",
"L 199.857593 7.2 \n",
"\" clip-path=\"url(#pfcec344b5c)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_8\">\n",
" <g>\n",
" <use xlink:href=\"#mc1d221fe5c\" x=\"199.857593\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_4\">\n",
" <!-- $\\mathdefault{10^{3}}$ -->\n",
" <g transform=\"translate(191.057593 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-33\" d=\"M 2597 2516 \n",
"Q 3050 2419 3304 2112 \n",
"Q 3559 1806 3559 1356 \n",
"Q 3559 666 3084 287 \n",
"Q 2609 -91 1734 -91 \n",
"Q 1441 -91 1130 -33 \n",
"Q 819 25 488 141 \n",
"L 488 750 \n",
"Q 750 597 1062 519 \n",
"Q 1375 441 1716 441 \n",
"Q 2309 441 2620 675 \n",
"Q 2931 909 2931 1356 \n",
"Q 2931 1769 2642 2001 \n",
"Q 2353 2234 1838 2234 \n",
"L 1294 2234 \n",
"L 1294 2753 \n",
"L 1863 2753 \n",
"Q 2328 2753 2575 2939 \n",
"Q 2822 3125 2822 3475 \n",
"Q 2822 3834 2567 4026 \n",
"Q 2313 4219 1838 4219 \n",
"Q 1578 4219 1281 4162 \n",
"Q 984 4106 628 3988 \n",
"L 628 4550 \n",
"Q 988 4650 1302 4700 \n",
"Q 1616 4750 1894 4750 \n",
"Q 2613 4750 3031 4423 \n",
"Q 3450 4097 3450 3541 \n",
"Q 3450 3153 3228 2886 \n",
"Q 3006 2619 2597 2516 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-33\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_5\">\n",
" <g id=\"line2d_9\">\n",
" <defs>\n",
" <path id=\"mc2e3ff39b3\" d=\"M 0 0 \n",
"L 0 2 \n",
"\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"46.842539\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_6\">\n",
" <g id=\"line2d_10\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"49.655191\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_7\">\n",
" <g id=\"line2d_11\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"52.136126\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_8\">\n",
" <g id=\"line2d_12\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"68.955573\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_9\">\n",
" <g id=\"line2d_13\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"77.496128\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_10\">\n",
" <g id=\"line2d_14\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"83.555748\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_11\">\n",
" <g id=\"line2d_15\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"88.255954\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_12\">\n",
" <g id=\"line2d_16\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"92.096303\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_13\">\n",
" <g id=\"line2d_17\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"95.343271\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_14\">\n",
" <g id=\"line2d_18\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"98.155923\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_15\">\n",
" <g id=\"line2d_19\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"100.636858\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_16\">\n",
" <g id=\"line2d_20\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"117.456305\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_17\">\n",
" <g id=\"line2d_21\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"125.99686\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_18\">\n",
" <g id=\"line2d_22\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"132.05648\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_19\">\n",
" <g id=\"line2d_23\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"136.756686\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_20\">\n",
" <g id=\"line2d_24\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"140.597035\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_21\">\n",
" <g id=\"line2d_25\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"143.844003\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_22\">\n",
" <g id=\"line2d_26\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"146.656655\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_23\">\n",
" <g id=\"line2d_27\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"149.13759\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_24\">\n",
" <g id=\"line2d_28\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"165.957036\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_25\">\n",
" <g id=\"line2d_29\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"174.497591\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_26\">\n",
" <g id=\"line2d_30\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"180.557211\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_27\">\n",
" <g id=\"line2d_31\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"185.257418\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_28\">\n",
" <g id=\"line2d_32\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"189.097766\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_29\">\n",
" <g id=\"line2d_33\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"192.344735\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_30\">\n",
" <g id=\"line2d_34\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"195.157387\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_31\">\n",
" <g id=\"line2d_35\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"197.638321\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_32\">\n",
" <g id=\"line2d_36\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"214.457768\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_33\">\n",
" <g id=\"line2d_37\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"222.998323\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_34\">\n",
" <g id=\"line2d_38\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"229.057943\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_35\">\n",
" <g id=\"line2d_39\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"233.75815\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_36\">\n",
" <g id=\"line2d_40\">\n",
" <g>\n",
" <use xlink:href=\"#mc2e3ff39b3\" x=\"237.598498\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_5\">\n",
" <!-- token: x -->\n",
" <g transform=\"translate(122.916406 171.376563)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-74\" d=\"M 1172 4494 \n",
"L 1172 3500 \n",
"L 2356 3500 \n",
"L 2356 3053 \n",
"L 1172 3053 \n",
"L 1172 1153 \n",
"Q 1172 725 1289 603 \n",
"Q 1406 481 1766 481 \n",
"L 2356 481 \n",
"L 2356 0 \n",
"L 1766 0 \n",
"Q 1100 0 847 248 \n",
"Q 594 497 594 1153 \n",
"L 594 3053 \n",
"L 172 3053 \n",
"L 172 3500 \n",
"L 594 3500 \n",
"L 594 4494 \n",
"L 1172 4494 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-6f\" d=\"M 1959 3097 \n",
"Q 1497 3097 1228 2736 \n",
"Q 959 2375 959 1747 \n",
"Q 959 1119 1226 758 \n",
"Q 1494 397 1959 397 \n",
"Q 2419 397 2687 759 \n",
"Q 2956 1122 2956 1747 \n",
"Q 2956 2369 2687 2733 \n",
"Q 2419 3097 1959 3097 \n",
"z\n",
"M 1959 3584 \n",
"Q 2709 3584 3137 3096 \n",
"Q 3566 2609 3566 1747 \n",
"Q 3566 888 3137 398 \n",
"Q 2709 -91 1959 -91 \n",
"Q 1206 -91 779 398 \n",
"Q 353 888 353 1747 \n",
"Q 353 2609 779 3096 \n",
"Q 1206 3584 1959 3584 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-6b\" d=\"M 581 4863 \n",
"L 1159 4863 \n",
"L 1159 1991 \n",
"L 2875 3500 \n",
"L 3609 3500 \n",
"L 1753 1863 \n",
"L 3688 0 \n",
"L 2938 0 \n",
"L 1159 1709 \n",
"L 1159 0 \n",
"L 581 0 \n",
"L 581 4863 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-65\" d=\"M 3597 1894 \n",
"L 3597 1613 \n",
"L 953 1613 \n",
"Q 991 1019 1311 708 \n",
"Q 1631 397 2203 397 \n",
"Q 2534 397 2845 478 \n",
"Q 3156 559 3463 722 \n",
"L 3463 178 \n",
"Q 3153 47 2828 -22 \n",
"Q 2503 -91 2169 -91 \n",
"Q 1331 -91 842 396 \n",
"Q 353 884 353 1716 \n",
"Q 353 2575 817 3079 \n",
"Q 1281 3584 2069 3584 \n",
"Q 2775 3584 3186 3129 \n",
"Q 3597 2675 3597 1894 \n",
"z\n",
"M 3022 2063 \n",
"Q 3016 2534 2758 2815 \n",
"Q 2500 3097 2075 3097 \n",
"Q 1594 3097 1305 2825 \n",
"Q 1016 2553 972 2059 \n",
"L 3022 2063 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-6e\" d=\"M 3513 2113 \n",
"L 3513 0 \n",
"L 2938 0 \n",
"L 2938 2094 \n",
"Q 2938 2591 2744 2837 \n",
"Q 2550 3084 2163 3084 \n",
"Q 1697 3084 1428 2787 \n",
"Q 1159 2491 1159 1978 \n",
"L 1159 0 \n",
"L 581 0 \n",
"L 581 3500 \n",
"L 1159 3500 \n",
"L 1159 2956 \n",
"Q 1366 3272 1645 3428 \n",
"Q 1925 3584 2291 3584 \n",
"Q 2894 3584 3203 3211 \n",
"Q 3513 2838 3513 2113 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-3a\" d=\"M 750 794 \n",
"L 1409 794 \n",
"L 1409 0 \n",
"L 750 0 \n",
"L 750 794 \n",
"z\n",
"M 750 3309 \n",
"L 1409 3309 \n",
"L 1409 2516 \n",
"L 750 2516 \n",
"L 750 3309 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-20\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-78\" d=\"M 3513 3500 \n",
"L 2247 1797 \n",
"L 3578 0 \n",
"L 2900 0 \n",
"L 1881 1375 \n",
"L 863 0 \n",
"L 184 0 \n",
"L 1544 1831 \n",
"L 300 3500 \n",
"L 978 3500 \n",
"L 1906 2253 \n",
"L 2834 3500 \n",
"L 3513 3500 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-74\"/>\n",
" <use xlink:href=\"#DejaVuSans-6f\" x=\"39.208984\"/>\n",
" <use xlink:href=\"#DejaVuSans-6b\" x=\"100.390625\"/>\n",
" <use xlink:href=\"#DejaVuSans-65\" x=\"154.675781\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"216.199219\"/>\n",
" <use xlink:href=\"#DejaVuSans-3a\" x=\"279.578125\"/>\n",
" <use xlink:href=\"#DejaVuSans-20\" x=\"313.269531\"/>\n",
" <use xlink:href=\"#DejaVuSans-78\" x=\"345.056641\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"matplotlib.axis_2\">\n",
" <g id=\"ytick_1\">\n",
" <g id=\"line2d_41\">\n",
" <path d=\"M 45.478125 136.922727 \n",
"L 240.778125 136.922727 \n",
"\" clip-path=\"url(#pfcec344b5c)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_42\">\n",
" <defs>\n",
" <path id=\"macb7257190\" d=\"M 0 0 \n",
"L -3.5 0 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#macb7257190\" x=\"45.478125\" y=\"136.922727\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_6\">\n",
" <!-- $\\mathdefault{10^{0}}$ -->\n",
" <g transform=\"translate(20.878125 140.721946)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_2\">\n",
" <g id=\"line2d_43\">\n",
" <path d=\"M 45.478125 100.09077 \n",
"L 240.778125 100.09077 \n",
"\" clip-path=\"url(#pfcec344b5c)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_44\">\n",
" <g>\n",
" <use xlink:href=\"#macb7257190\" x=\"45.478125\" y=\"100.09077\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_7\">\n",
" <!-- $\\mathdefault{10^{1}}$ -->\n",
" <g transform=\"translate(20.878125 103.889989)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(128.203125 38.965625)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_3\">\n",
" <g id=\"line2d_45\">\n",
" <path d=\"M 45.478125 63.258813 \n",
"L 240.778125 63.258813 \n",
"\" clip-path=\"url(#pfcec344b5c)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_46\">\n",
" <g>\n",
" <use xlink:href=\"#macb7257190\" x=\"45.478125\" y=\"63.258813\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_8\">\n",
" <!-- $\\mathdefault{10^{2}}$ -->\n",
" <g transform=\"translate(20.878125 67.058032)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-32\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_4\">\n",
" <g id=\"line2d_47\">\n",
" <path d=\"M 45.478125 26.426856 \n",
"L 240.778125 26.426856 \n",
"\" clip-path=\"url(#pfcec344b5c)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_48\">\n",
" <g>\n",
" <use xlink:href=\"#macb7257190\" x=\"45.478125\" y=\"26.426856\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_9\">\n",
" <!-- $\\mathdefault{10^{3}}$ -->\n",
" <g transform=\"translate(20.878125 30.226075)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-33\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_5\">\n",
" <g id=\"line2d_49\">\n",
" <defs>\n",
" <path id=\"mcda051cd2a\" d=\"M 0 0 \n",
"L -2 0 \n",
"\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"142.62807\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_6\">\n",
" <g id=\"line2d_50\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"140.492113\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_7\">\n",
" <g id=\"line2d_51\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"138.608065\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_8\">\n",
" <g id=\"line2d_52\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"125.835203\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_9\">\n",
" <g id=\"line2d_53\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"119.349418\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_10\">\n",
" <g id=\"line2d_54\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"114.747679\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_11\">\n",
" <g id=\"line2d_55\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"111.178294\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_12\">\n",
" <g id=\"line2d_56\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"108.261894\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_13\">\n",
" <g id=\"line2d_57\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"105.796112\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_14\">\n",
" <g id=\"line2d_58\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"103.660156\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_15\">\n",
" <g id=\"line2d_59\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"101.776108\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_16\">\n",
" <g id=\"line2d_60\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"89.003246\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_17\">\n",
" <g id=\"line2d_61\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"82.517461\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_18\">\n",
" <g id=\"line2d_62\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"77.915722\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_19\">\n",
" <g id=\"line2d_63\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"74.346337\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_20\">\n",
" <g id=\"line2d_64\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"71.429937\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_21\">\n",
" <g id=\"line2d_65\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"68.964155\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_22\">\n",
" <g id=\"line2d_66\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"66.828198\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_23\">\n",
" <g id=\"line2d_67\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"64.944151\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_24\">\n",
" <g id=\"line2d_68\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"52.171289\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_25\">\n",
" <g id=\"line2d_69\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"45.685503\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_26\">\n",
" <g id=\"line2d_70\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"41.083765\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_27\">\n",
" <g id=\"line2d_71\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"37.51438\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_28\">\n",
" <g id=\"line2d_72\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"34.597979\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_29\">\n",
" <g id=\"line2d_73\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"32.132198\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_30\">\n",
" <g id=\"line2d_74\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"29.996241\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_31\">\n",
" <g id=\"line2d_75\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"28.112194\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_32\">\n",
" <g id=\"line2d_76\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"15.339332\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_33\">\n",
" <g id=\"line2d_77\">\n",
" <g>\n",
" <use xlink:href=\"#mcda051cd2a\" x=\"45.478125\" y=\"8.853546\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_10\">\n",
" <!-- frequency: n(x) -->\n",
" <g transform=\"translate(14.798438 113.167187)rotate(-90)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-66\" d=\"M 2375 4863 \n",
"L 2375 4384 \n",
"L 1825 4384 \n",
"Q 1516 4384 1395 4259 \n",
"Q 1275 4134 1275 3809 \n",
"L 1275 3500 \n",
"L 2222 3500 \n",
"L 2222 3053 \n",
"L 1275 3053 \n",
"L 1275 0 \n",
"L 697 0 \n",
"L 697 3053 \n",
"L 147 3053 \n",
"L 147 3500 \n",
"L 697 3500 \n",
"L 697 3744 \n",
"Q 697 4328 969 4595 \n",
"Q 1241 4863 1831 4863 \n",
"L 2375 4863 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-72\" d=\"M 2631 2963 \n",
"Q 2534 3019 2420 3045 \n",
"Q 2306 3072 2169 3072 \n",
"Q 1681 3072 1420 2755 \n",
"Q 1159 2438 1159 1844 \n",
"L 1159 0 \n",
"L 581 0 \n",
"L 581 3500 \n",
"L 1159 3500 \n",
"L 1159 2956 \n",
"Q 1341 3275 1631 3429 \n",
"Q 1922 3584 2338 3584 \n",
"Q 2397 3584 2469 3576 \n",
"Q 2541 3569 2628 3553 \n",
"L 2631 2963 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-71\" d=\"M 947 1747 \n",
"Q 947 1113 1208 752 \n",
"Q 1469 391 1925 391 \n",
"Q 2381 391 2643 752 \n",
"Q 2906 1113 2906 1747 \n",
"Q 2906 2381 2643 2742 \n",
"Q 2381 3103 1925 3103 \n",
"Q 1469 3103 1208 2742 \n",
"Q 947 2381 947 1747 \n",
"z\n",
"M 2906 525 \n",
"Q 2725 213 2448 61 \n",
"Q 2172 -91 1784 -91 \n",
"Q 1150 -91 751 415 \n",
"Q 353 922 353 1747 \n",
"Q 353 2572 751 3078 \n",
"Q 1150 3584 1784 3584 \n",
"Q 2172 3584 2448 3432 \n",
"Q 2725 3281 2906 2969 \n",
"L 2906 3500 \n",
"L 3481 3500 \n",
"L 3481 -1331 \n",
"L 2906 -1331 \n",
"L 2906 525 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-75\" d=\"M 544 1381 \n",
"L 544 3500 \n",
"L 1119 3500 \n",
"L 1119 1403 \n",
"Q 1119 906 1312 657 \n",
"Q 1506 409 1894 409 \n",
"Q 2359 409 2629 706 \n",
"Q 2900 1003 2900 1516 \n",
"L 2900 3500 \n",
"L 3475 3500 \n",
"L 3475 0 \n",
"L 2900 0 \n",
"L 2900 538 \n",
"Q 2691 219 2414 64 \n",
"Q 2138 -91 1772 -91 \n",
"Q 1169 -91 856 284 \n",
"Q 544 659 544 1381 \n",
"z\n",
"M 1991 3584 \n",
"L 1991 3584 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-63\" d=\"M 3122 3366 \n",
"L 3122 2828 \n",
"Q 2878 2963 2633 3030 \n",
"Q 2388 3097 2138 3097 \n",
"Q 1578 3097 1268 2742 \n",
"Q 959 2388 959 1747 \n",
"Q 959 1106 1268 751 \n",
"Q 1578 397 2138 397 \n",
"Q 2388 397 2633 464 \n",
"Q 2878 531 3122 666 \n",
"L 3122 134 \n",
"Q 2881 22 2623 -34 \n",
"Q 2366 -91 2075 -91 \n",
"Q 1284 -91 818 406 \n",
"Q 353 903 353 1747 \n",
"Q 353 2603 823 3093 \n",
"Q 1294 3584 2113 3584 \n",
"Q 2378 3584 2631 3529 \n",
"Q 2884 3475 3122 3366 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-79\" d=\"M 2059 -325 \n",
"Q 1816 -950 1584 -1140 \n",
"Q 1353 -1331 966 -1331 \n",
"L 506 -1331 \n",
"L 506 -850 \n",
"L 844 -850 \n",
"Q 1081 -850 1212 -737 \n",
"Q 1344 -625 1503 -206 \n",
"L 1606 56 \n",
"L 191 3500 \n",
"L 800 3500 \n",
"L 1894 763 \n",
"L 2988 3500 \n",
"L 3597 3500 \n",
"L 2059 -325 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-28\" d=\"M 1984 4856 \n",
"Q 1566 4138 1362 3434 \n",
"Q 1159 2731 1159 2009 \n",
"Q 1159 1288 1364 580 \n",
"Q 1569 -128 1984 -844 \n",
"L 1484 -844 \n",
"Q 1016 -109 783 600 \n",
"Q 550 1309 550 2009 \n",
"Q 550 2706 781 3412 \n",
"Q 1013 4119 1484 4856 \n",
"L 1984 4856 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-29\" d=\"M 513 4856 \n",
"L 1013 4856 \n",
"Q 1481 4119 1714 3412 \n",
"Q 1947 2706 1947 2009 \n",
"Q 1947 1309 1714 600 \n",
"Q 1481 -109 1013 -844 \n",
"L 513 -844 \n",
"Q 928 -128 1133 580 \n",
"Q 1338 1288 1338 2009 \n",
"Q 1338 2731 1133 3434 \n",
"Q 928 4138 513 4856 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-66\"/>\n",
" <use xlink:href=\"#DejaVuSans-72\" x=\"35.205078\"/>\n",
" <use xlink:href=\"#DejaVuSans-65\" x=\"74.068359\"/>\n",
" <use xlink:href=\"#DejaVuSans-71\" x=\"135.591797\"/>\n",
" <use xlink:href=\"#DejaVuSans-75\" x=\"199.068359\"/>\n",
" <use xlink:href=\"#DejaVuSans-65\" x=\"262.447266\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"323.970703\"/>\n",
" <use xlink:href=\"#DejaVuSans-63\" x=\"387.349609\"/>\n",
" <use xlink:href=\"#DejaVuSans-79\" x=\"442.330078\"/>\n",
" <use xlink:href=\"#DejaVuSans-3a\" x=\"494.259766\"/>\n",
" <use xlink:href=\"#DejaVuSans-20\" x=\"527.951172\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"559.738281\"/>\n",
" <use xlink:href=\"#DejaVuSans-28\" x=\"623.117188\"/>\n",
" <use xlink:href=\"#DejaVuSans-78\" x=\"662.130859\"/>\n",
" <use xlink:href=\"#DejaVuSans-29\" x=\"721.310547\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_78\">\n",
" <path d=\"M -1 22.630819 \n",
"L 54.355398 22.641392 \n",
"L 68.955573 22.921582 \n",
"L 77.496128 24.121839 \n",
"L 83.555748 29.67948 \n",
"L 88.255954 32.246865 \n",
"L 92.096303 35.931745 \n",
"L 95.343271 36.253723 \n",
"L 98.155923 39.450499 \n",
"L 102.85613 39.668628 \n",
"L 104.863705 43.037948 \n",
"L 106.696478 46.732081 \n",
"L 108.382465 47.370841 \n",
"L 111.396684 50.574172 \n",
"L 112.756098 50.940227 \n",
"L 114.033071 51.854528 \n",
"L 115.237033 52.171289 \n",
"L 116.375883 55.941884 \n",
"L 117.456305 56.561158 \n",
"L 118.484001 58.223127 \n",
"L 119.46388 58.223127 \n",
"L 120.400194 58.577295 \n",
"L 121.296653 59.185578 \n",
"L 122.156511 59.435519 \n",
"L 122.982641 59.947432 \n",
"L 123.777588 60.078011 \n",
"L 125.282771 60.078011 \n",
"L 126.687531 60.611258 \n",
"L 127.356273 61.162896 \n",
"L 128.004435 61.162896 \n",
"L 128.633246 61.30383 \n",
"L 129.243828 61.30383 \n",
"L 129.837208 61.446017 \n",
"L 130.414329 62.326748 \n",
"L 130.976058 62.47837 \n",
"L 131.523195 62.785993 \n",
"L 132.05648 64.248567 \n",
"L 133.084176 64.592578 \n",
"L 133.579813 64.592578 \n",
"L 134.064055 64.767399 \n",
"L 134.537414 65.122878 \n",
"L 136.331144 65.858452 \n",
"L 137.173801 65.858452 \n",
"L 137.582816 66.047755 \n",
"L 137.984039 66.047755 \n",
"L 138.377763 66.433217 \n",
"L 139.143796 67.439582 \n",
"L 139.516613 67.860551 \n",
"L 139.882946 68.075265 \n",
"L 140.597035 68.075265 \n",
"L 141.956448 68.964155 \n",
"L 142.60461 68.964155 \n",
"L 142.921362 69.194316 \n",
"L 143.233421 69.905363 \n",
"L 143.540924 70.905433 \n",
"L 144.437383 71.698782 \n",
"L 144.72792 71.698782 \n",
"L 145.014504 72.25042 \n",
"L 145.576233 72.25042 \n",
"L 145.851578 72.821763 \n",
"L 146.123371 73.115275 \n",
"L 146.3917 73.115275 \n",
"L 147.684351 74.669498 \n",
"L 148.90224 74.669498 \n",
"L 149.13759 74.999322 \n",
"L 149.370339 74.999322 \n",
"L 149.600544 75.336091 \n",
"L 149.828261 75.336091 \n",
"L 150.053543 75.680102 \n",
"L 150.715281 75.680102 \n",
"L 150.93132 76.031675 \n",
"L 151.145165 76.031675 \n",
"L 151.566451 76.758887 \n",
"L 151.979476 76.758887 \n",
"L 152.182991 77.135279 \n",
"L 152.384558 77.135279 \n",
"L 152.781996 77.915722 \n",
"L 153.364437 77.915722 \n",
"L 153.743971 78.736206 \n",
"L 153.931205 78.736206 \n",
"L 154.116788 79.162789 \n",
"L 154.843191 79.162789 \n",
"L 155.020944 79.60106 \n",
"L 155.887881 79.60106 \n",
"L 156.057068 80.051679 \n",
"L 156.720543 80.051679 \n",
"L 156.883197 80.515361 \n",
"L 157.678144 80.515361 \n",
"L 157.833596 80.992887 \n",
"L 158.141099 80.992887 \n",
"L 158.293184 81.485108 \n",
"L 158.890773 81.485108 \n",
"L 159.037558 81.992957 \n",
"L 159.328095 81.992957 \n",
"L 159.471874 82.517461 \n",
"L 159.756522 82.517461 \n",
"L 159.897416 83.059747 \n",
"L 160.037374 83.059747 \n",
"L 160.176408 83.621065 \n",
"L 160.858138 83.621065 \n",
"L 160.991875 84.202798 \n",
"L 161.518493 84.202798 \n",
"L 161.648115 84.80649 \n",
"L 162.409533 84.80649 \n",
"L 162.533803 85.433861 \n",
"L 162.902271 85.433861 \n",
"L 163.023676 86.086846 \n",
"L 163.502415 86.086846 \n",
"L 163.620418 86.767626 \n",
"L 164.200719 86.767626 \n",
"L 164.314886 87.478673 \n",
"L 165.423752 87.478673 \n",
"L 165.638689 88.222803 \n",
"L 166.374151 88.222803 \n",
"L 166.579651 89.003246 \n",
"L 167.675405 89.003246 \n",
"L 167.86865 89.823729 \n",
"L 168.716963 89.823729 \n",
"L 168.900926 90.688584 \n",
"L 169.173892 90.688584 \n",
"L 169.353924 91.602885 \n",
"L 169.972187 91.602885 \n",
"L 170.145551 92.572632 \n",
"L 171.483372 92.572632 \n",
"L 171.64478 93.604984 \n",
"L 172.433771 93.604984 \n",
"L 172.588084 94.708589 \n",
"L 173.490948 94.708589 \n",
"L 173.637733 95.894013 \n",
"L 175.256101 95.894013 \n",
"L 175.391124 97.17437 \n",
"L 176.37712 97.17437 \n",
"L 176.505167 98.566197 \n",
"L 178.33794 98.566197 \n",
"L 178.454636 100.09077 \n",
"L 180.451629 100.09077 \n",
"L 180.609805 101.776108 \n",
"L 182.42068 101.776108 \n",
"L 182.564787 103.660156 \n",
"L 185.257418 103.660156 \n",
"L 185.383422 105.796112 \n",
"L 188.528445 105.796112 \n",
"L 188.672225 108.261894 \n",
"L 191.67211 108.261894 \n",
"L 191.796014 111.178294 \n",
"L 195.754423 111.178294 \n",
"L 195.882004 114.747679 \n",
"L 200.724191 114.747679 \n",
"L 200.84513 119.349418 \n",
"L 207.214001 119.349418 \n",
"L 207.332503 125.835203 \n",
"L 216.282641 125.835203 \n",
"L 216.398216 136.922727 \n",
"L 231.900852 136.922727 \n",
"L 231.900852 136.922727 \n",
"\" clip-path=\"url(#pfcec344b5c)\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_3\">\n",
" <path d=\"M 45.478125 143.1 \n",
"L 45.478125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_4\">\n",
" <path d=\"M 240.778125 143.1 \n",
"L 240.778125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_5\">\n",
" <path d=\"M 45.478125 143.1 \n",
"L 240.778125 143.1 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_6\">\n",
" <path d=\"M 45.478125 7.2 \n",
"L 240.778125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <defs>\n",
" <clipPath id=\"pfcec344b5c\">\n",
" <rect x=\"45.478125\" y=\"7.2\" width=\"195.3\" height=\"135.9\"/>\n",
" </clipPath>\n",
" </defs>\n",
"</svg>\n"
],
"text/plain": [
"<Figure size 252x180 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"freqs = [freq for token, freq in vocab.token_freqs]\n",
"d2l.plot(freqs, xlabel='token: x', ylabel='frequency: n(x)',\n",
" xscale='log', yscale='log')"
]
},
{
"cell_type": "markdown",
"id": "e1503be3",
"metadata": {
"origin_pos": 8
},
"source": [
"通过此图我们可以发现:词频以一种明确的方式迅速衰减。\n",
"将前几个单词作为例外消除后,剩余的所有单词大致遵循双对数坐标图上的一条直线。\n",
"这意味着单词的频率满足*齐普夫定律*Zipf's law),\n",
"即第$i$个最常用单词的频率$n_i$为:\n",
"\n",
"$$n_i \\propto \\frac{1}{i^\\alpha},$$\n",
":eqlabel:`eq_zipf_law`\n",
"\n",
"等价于\n",
"\n",
"$$\\log n_i = -\\alpha \\log i + c,$$\n",
"\n",
"其中$\\alpha$是刻画分布的指数,$c$是常数。\n",
"这告诉我们想要通过计数统计和平滑来建模单词是不可行的,\n",
"因为这样建模的结果会大大高估尾部单词的频率,也就是所谓的不常用单词。\n",
"那么[**其他的词元组合,比如二元语法、三元语法等等,又会如何呢?**]\n",
"我们来看看二元语法的频率是否与一元语法的频率表现出相同的行为方式。\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "218b6d71",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:05.736827Z",
"iopub.status.busy": "2023-08-18T07:04:05.736074Z",
"iopub.status.idle": "2023-08-18T07:04:05.768860Z",
"shell.execute_reply": "2023-08-18T07:04:05.768064Z"
},
"origin_pos": 9,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"[(('of', 'the'), 309),\n",
" (('in', 'the'), 169),\n",
" (('i', 'had'), 130),\n",
" (('i', 'was'), 112),\n",
" (('and', 'the'), 109),\n",
" (('the', 'time'), 102),\n",
" (('it', 'was'), 99),\n",
" (('to', 'the'), 85),\n",
" (('as', 'i'), 78),\n",
" (('of', 'a'), 73)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bigram_tokens = [pair for pair in zip(corpus[:-1], corpus[1:])]\n",
"bigram_vocab = d2l.Vocab(bigram_tokens)\n",
"bigram_vocab.token_freqs[:10]"
]
},
{
"cell_type": "markdown",
"id": "b5051c57",
"metadata": {
"origin_pos": 10
},
"source": [
"这里值得注意:在十个最频繁的词对中,有九个是由两个停用词组成的,\n",
"只有一个与“the time”有关。\n",
"我们再进一步看看三元语法的频率是否表现出相同的行为方式。\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "45c49a80",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:05.772749Z",
"iopub.status.busy": "2023-08-18T07:04:05.772186Z",
"iopub.status.idle": "2023-08-18T07:04:05.811776Z",
"shell.execute_reply": "2023-08-18T07:04:05.810980Z"
},
"origin_pos": 11,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"[(('the', 'time', 'traveller'), 59),\n",
" (('the', 'time', 'machine'), 30),\n",
" (('the', 'medical', 'man'), 24),\n",
" (('it', 'seemed', 'to'), 16),\n",
" (('it', 'was', 'a'), 15),\n",
" (('here', 'and', 'there'), 15),\n",
" (('seemed', 'to', 'me'), 14),\n",
" (('i', 'did', 'not'), 14),\n",
" (('i', 'saw', 'the'), 13),\n",
" (('i', 'began', 'to'), 13)]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"trigram_tokens = [triple for triple in zip(\n",
" corpus[:-2], corpus[1:-1], corpus[2:])]\n",
"trigram_vocab = d2l.Vocab(trigram_tokens)\n",
"trigram_vocab.token_freqs[:10]"
]
},
{
"cell_type": "markdown",
"id": "fc548bc4",
"metadata": {
"origin_pos": 12
},
"source": [
"最后,我们[**直观地对比三种模型中的词元频率**]:一元语法、二元语法和三元语法。\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3fc5212e",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:05.815498Z",
"iopub.status.busy": "2023-08-18T07:04:05.814980Z",
"iopub.status.idle": "2023-08-18T07:04:06.587207Z",
"shell.execute_reply": "2023-08-18T07:04:06.586018Z"
},
"origin_pos": 13,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"247.978125pt\" height=\"180.65625pt\" viewBox=\"0 0 247.978125 180.65625\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
" <metadata>\n",
" <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n",
" <cc:Work>\n",
" <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n",
" <dc:date>2023-08-18T07:04:06.332539</dc:date>\n",
" <dc:format>image/svg+xml</dc:format>\n",
" <dc:creator>\n",
" <cc:Agent>\n",
" <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>\n",
" </cc:Agent>\n",
" </dc:creator>\n",
" </cc:Work>\n",
" </rdf:RDF>\n",
" </metadata>\n",
" <defs>\n",
" <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n",
" </defs>\n",
" <g id=\"figure_1\">\n",
" <g id=\"patch_1\">\n",
" <path d=\"M 0 180.65625 \n",
"L 247.978125 180.65625 \n",
"L 247.978125 0 \n",
"L 0 0 \n",
"L 0 180.65625 \n",
"z\n",
"\" style=\"fill: none\"/>\n",
" </g>\n",
" <g id=\"axes_1\">\n",
" <g id=\"patch_2\">\n",
" <path d=\"M 45.478125 143.1 \n",
"L 240.778125 143.1 \n",
"L 240.778125 7.2 \n",
"L 45.478125 7.2 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"matplotlib.axis_1\">\n",
" <g id=\"xtick_1\">\n",
" <g id=\"line2d_1\">\n",
" <path d=\"M 54.355398 143.1 \n",
"L 54.355398 7.2 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_2\">\n",
" <defs>\n",
" <path id=\"mbdef78f9d7\" d=\"M 0 0 \n",
"L 0 3.5 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#mbdef78f9d7\" x=\"54.355398\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_1\">\n",
" <!-- $\\mathdefault{10^{0}}$ -->\n",
" <g transform=\"translate(45.555398 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-31\" d=\"M 794 531 \n",
"L 1825 531 \n",
"L 1825 4091 \n",
"L 703 3866 \n",
"L 703 4441 \n",
"L 1819 4666 \n",
"L 2450 4666 \n",
"L 2450 531 \n",
"L 3481 531 \n",
"L 3481 0 \n",
"L 794 0 \n",
"L 794 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-30\" d=\"M 2034 4250 \n",
"Q 1547 4250 1301 3770 \n",
"Q 1056 3291 1056 2328 \n",
"Q 1056 1369 1301 889 \n",
"Q 1547 409 2034 409 \n",
"Q 2525 409 2770 889 \n",
"Q 3016 1369 3016 2328 \n",
"Q 3016 3291 2770 3770 \n",
"Q 2525 4250 2034 4250 \n",
"z\n",
"M 2034 4750 \n",
"Q 2819 4750 3233 4129 \n",
"Q 3647 3509 3647 2328 \n",
"Q 3647 1150 3233 529 \n",
"Q 2819 -91 2034 -91 \n",
"Q 1250 -91 836 529 \n",
"Q 422 1150 422 2328 \n",
"Q 422 3509 836 4129 \n",
"Q 1250 4750 2034 4750 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_2\">\n",
" <g id=\"line2d_3\">\n",
" <path d=\"M 94.026857 143.1 \n",
"L 94.026857 7.2 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_4\">\n",
" <g>\n",
" <use xlink:href=\"#mbdef78f9d7\" x=\"94.026857\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_2\">\n",
" <!-- $\\mathdefault{10^{1}}$ -->\n",
" <g transform=\"translate(85.226857 157.698438)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(128.203125 38.965625)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_3\">\n",
" <g id=\"line2d_5\">\n",
" <path d=\"M 133.698316 143.1 \n",
"L 133.698316 7.2 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_6\">\n",
" <g>\n",
" <use xlink:href=\"#mbdef78f9d7\" x=\"133.698316\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_3\">\n",
" <!-- $\\mathdefault{10^{2}}$ -->\n",
" <g transform=\"translate(124.898316 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-32\" d=\"M 1228 531 \n",
"L 3431 531 \n",
"L 3431 0 \n",
"L 469 0 \n",
"L 469 531 \n",
"Q 828 903 1448 1529 \n",
"Q 2069 2156 2228 2338 \n",
"Q 2531 2678 2651 2914 \n",
"Q 2772 3150 2772 3378 \n",
"Q 2772 3750 2511 3984 \n",
"Q 2250 4219 1831 4219 \n",
"Q 1534 4219 1204 4116 \n",
"Q 875 4013 500 3803 \n",
"L 500 4441 \n",
"Q 881 4594 1212 4672 \n",
"Q 1544 4750 1819 4750 \n",
"Q 2544 4750 2975 4387 \n",
"Q 3406 4025 3406 3419 \n",
"Q 3406 3131 3298 2873 \n",
"Q 3191 2616 2906 2266 \n",
"Q 2828 2175 2409 1742 \n",
"Q 1991 1309 1228 531 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-32\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_4\">\n",
" <g id=\"line2d_7\">\n",
" <path d=\"M 173.369775 143.1 \n",
"L 173.369775 7.2 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_8\">\n",
" <g>\n",
" <use xlink:href=\"#mbdef78f9d7\" x=\"173.369775\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_4\">\n",
" <!-- $\\mathdefault{10^{3}}$ -->\n",
" <g transform=\"translate(164.569775 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-33\" d=\"M 2597 2516 \n",
"Q 3050 2419 3304 2112 \n",
"Q 3559 1806 3559 1356 \n",
"Q 3559 666 3084 287 \n",
"Q 2609 -91 1734 -91 \n",
"Q 1441 -91 1130 -33 \n",
"Q 819 25 488 141 \n",
"L 488 750 \n",
"Q 750 597 1062 519 \n",
"Q 1375 441 1716 441 \n",
"Q 2309 441 2620 675 \n",
"Q 2931 909 2931 1356 \n",
"Q 2931 1769 2642 2001 \n",
"Q 2353 2234 1838 2234 \n",
"L 1294 2234 \n",
"L 1294 2753 \n",
"L 1863 2753 \n",
"Q 2328 2753 2575 2939 \n",
"Q 2822 3125 2822 3475 \n",
"Q 2822 3834 2567 4026 \n",
"Q 2313 4219 1838 4219 \n",
"Q 1578 4219 1281 4162 \n",
"Q 984 4106 628 3988 \n",
"L 628 4550 \n",
"Q 988 4650 1302 4700 \n",
"Q 1616 4750 1894 4750 \n",
"Q 2613 4750 3031 4423 \n",
"Q 3450 4097 3450 3541 \n",
"Q 3450 3153 3228 2886 \n",
"Q 3006 2619 2597 2516 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-33\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_5\">\n",
" <g id=\"line2d_9\">\n",
" <path d=\"M 213.041234 143.1 \n",
"L 213.041234 7.2 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_10\">\n",
" <g>\n",
" <use xlink:href=\"#mbdef78f9d7\" x=\"213.041234\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_5\">\n",
" <!-- $\\mathdefault{10^{4}}$ -->\n",
" <g transform=\"translate(204.241234 157.698438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-34\" d=\"M 2419 4116 \n",
"L 825 1625 \n",
"L 2419 1625 \n",
"L 2419 4116 \n",
"z\n",
"M 2253 4666 \n",
"L 3047 4666 \n",
"L 3047 1625 \n",
"L 3713 1625 \n",
"L 3713 1100 \n",
"L 3047 1100 \n",
"L 3047 0 \n",
"L 2419 0 \n",
"L 2419 1100 \n",
"L 313 1100 \n",
"L 313 1709 \n",
"L 2253 4666 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-34\" transform=\"translate(128.203125 38.965625)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_6\">\n",
" <g id=\"line2d_11\">\n",
" <defs>\n",
" <path id=\"m8b375c7ac5\" d=\"M 0 0 \n",
"L 0 2 \n",
"\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"45.554334\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_7\">\n",
" <g id=\"line2d_12\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"48.210211\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_8\">\n",
" <g id=\"line2d_13\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"50.510836\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_9\">\n",
" <g id=\"line2d_14\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"52.540131\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_10\">\n",
" <g id=\"line2d_15\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"66.297697\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_11\">\n",
" <g id=\"line2d_16\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"73.283494\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_12\">\n",
" <g id=\"line2d_17\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"78.239996\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_13\">\n",
" <g id=\"line2d_18\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"82.084558\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_14\">\n",
" <g id=\"line2d_19\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"85.225793\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_15\">\n",
" <g id=\"line2d_20\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"87.88167\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_16\">\n",
" <g id=\"line2d_21\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"90.182295\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_17\">\n",
" <g id=\"line2d_22\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"92.21159\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_18\">\n",
" <g id=\"line2d_23\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"105.969156\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_19\">\n",
" <g id=\"line2d_24\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"112.954953\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_20\">\n",
" <g id=\"line2d_25\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"117.911455\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_21\">\n",
" <g id=\"line2d_26\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"121.756017\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_22\">\n",
" <g id=\"line2d_27\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"124.897252\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_23\">\n",
" <g id=\"line2d_28\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"127.553129\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_24\">\n",
" <g id=\"line2d_29\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"129.853754\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_25\">\n",
" <g id=\"line2d_30\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"131.883049\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_26\">\n",
" <g id=\"line2d_31\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"145.640615\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_27\">\n",
" <g id=\"line2d_32\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"152.626412\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_28\">\n",
" <g id=\"line2d_33\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"157.582914\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_29\">\n",
" <g id=\"line2d_34\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"161.427476\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_30\">\n",
" <g id=\"line2d_35\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"164.568711\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_31\">\n",
" <g id=\"line2d_36\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"167.224588\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_32\">\n",
" <g id=\"line2d_37\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"169.525213\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_33\">\n",
" <g id=\"line2d_38\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"171.554508\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_34\">\n",
" <g id=\"line2d_39\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"185.312074\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_35\">\n",
" <g id=\"line2d_40\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"192.297871\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_36\">\n",
" <g id=\"line2d_41\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"197.254373\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_37\">\n",
" <g id=\"line2d_42\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"201.098935\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_38\">\n",
" <g id=\"line2d_43\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"204.24017\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_39\">\n",
" <g id=\"line2d_44\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"206.896047\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_40\">\n",
" <g id=\"line2d_45\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"209.196672\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_41\">\n",
" <g id=\"line2d_46\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"211.225968\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_42\">\n",
" <g id=\"line2d_47\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"224.983533\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_43\">\n",
" <g id=\"line2d_48\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"231.96933\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_44\">\n",
" <g id=\"line2d_49\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"236.925832\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_45\">\n",
" <g id=\"line2d_50\">\n",
" <g>\n",
" <use xlink:href=\"#m8b375c7ac5\" x=\"240.770394\" y=\"143.1\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_6\">\n",
" <!-- token: x -->\n",
" <g transform=\"translate(122.916406 171.376563)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-74\" d=\"M 1172 4494 \n",
"L 1172 3500 \n",
"L 2356 3500 \n",
"L 2356 3053 \n",
"L 1172 3053 \n",
"L 1172 1153 \n",
"Q 1172 725 1289 603 \n",
"Q 1406 481 1766 481 \n",
"L 2356 481 \n",
"L 2356 0 \n",
"L 1766 0 \n",
"Q 1100 0 847 248 \n",
"Q 594 497 594 1153 \n",
"L 594 3053 \n",
"L 172 3053 \n",
"L 172 3500 \n",
"L 594 3500 \n",
"L 594 4494 \n",
"L 1172 4494 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-6f\" d=\"M 1959 3097 \n",
"Q 1497 3097 1228 2736 \n",
"Q 959 2375 959 1747 \n",
"Q 959 1119 1226 758 \n",
"Q 1494 397 1959 397 \n",
"Q 2419 397 2687 759 \n",
"Q 2956 1122 2956 1747 \n",
"Q 2956 2369 2687 2733 \n",
"Q 2419 3097 1959 3097 \n",
"z\n",
"M 1959 3584 \n",
"Q 2709 3584 3137 3096 \n",
"Q 3566 2609 3566 1747 \n",
"Q 3566 888 3137 398 \n",
"Q 2709 -91 1959 -91 \n",
"Q 1206 -91 779 398 \n",
"Q 353 888 353 1747 \n",
"Q 353 2609 779 3096 \n",
"Q 1206 3584 1959 3584 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-6b\" d=\"M 581 4863 \n",
"L 1159 4863 \n",
"L 1159 1991 \n",
"L 2875 3500 \n",
"L 3609 3500 \n",
"L 1753 1863 \n",
"L 3688 0 \n",
"L 2938 0 \n",
"L 1159 1709 \n",
"L 1159 0 \n",
"L 581 0 \n",
"L 581 4863 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-65\" d=\"M 3597 1894 \n",
"L 3597 1613 \n",
"L 953 1613 \n",
"Q 991 1019 1311 708 \n",
"Q 1631 397 2203 397 \n",
"Q 2534 397 2845 478 \n",
"Q 3156 559 3463 722 \n",
"L 3463 178 \n",
"Q 3153 47 2828 -22 \n",
"Q 2503 -91 2169 -91 \n",
"Q 1331 -91 842 396 \n",
"Q 353 884 353 1716 \n",
"Q 353 2575 817 3079 \n",
"Q 1281 3584 2069 3584 \n",
"Q 2775 3584 3186 3129 \n",
"Q 3597 2675 3597 1894 \n",
"z\n",
"M 3022 2063 \n",
"Q 3016 2534 2758 2815 \n",
"Q 2500 3097 2075 3097 \n",
"Q 1594 3097 1305 2825 \n",
"Q 1016 2553 972 2059 \n",
"L 3022 2063 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-6e\" d=\"M 3513 2113 \n",
"L 3513 0 \n",
"L 2938 0 \n",
"L 2938 2094 \n",
"Q 2938 2591 2744 2837 \n",
"Q 2550 3084 2163 3084 \n",
"Q 1697 3084 1428 2787 \n",
"Q 1159 2491 1159 1978 \n",
"L 1159 0 \n",
"L 581 0 \n",
"L 581 3500 \n",
"L 1159 3500 \n",
"L 1159 2956 \n",
"Q 1366 3272 1645 3428 \n",
"Q 1925 3584 2291 3584 \n",
"Q 2894 3584 3203 3211 \n",
"Q 3513 2838 3513 2113 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-3a\" d=\"M 750 794 \n",
"L 1409 794 \n",
"L 1409 0 \n",
"L 750 0 \n",
"L 750 794 \n",
"z\n",
"M 750 3309 \n",
"L 1409 3309 \n",
"L 1409 2516 \n",
"L 750 2516 \n",
"L 750 3309 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-20\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-78\" d=\"M 3513 3500 \n",
"L 2247 1797 \n",
"L 3578 0 \n",
"L 2900 0 \n",
"L 1881 1375 \n",
"L 863 0 \n",
"L 184 0 \n",
"L 1544 1831 \n",
"L 300 3500 \n",
"L 978 3500 \n",
"L 1906 2253 \n",
"L 2834 3500 \n",
"L 3513 3500 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-74\"/>\n",
" <use xlink:href=\"#DejaVuSans-6f\" x=\"39.208984\"/>\n",
" <use xlink:href=\"#DejaVuSans-6b\" x=\"100.390625\"/>\n",
" <use xlink:href=\"#DejaVuSans-65\" x=\"154.675781\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"216.199219\"/>\n",
" <use xlink:href=\"#DejaVuSans-3a\" x=\"279.578125\"/>\n",
" <use xlink:href=\"#DejaVuSans-20\" x=\"313.269531\"/>\n",
" <use xlink:href=\"#DejaVuSans-78\" x=\"345.056641\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"matplotlib.axis_2\">\n",
" <g id=\"ytick_1\">\n",
" <g id=\"line2d_51\">\n",
" <path d=\"M 45.478125 136.922727 \n",
"L 240.778125 136.922727 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_52\">\n",
" <defs>\n",
" <path id=\"m46989f122a\" d=\"M 0 0 \n",
"L -3.5 0 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#m46989f122a\" x=\"45.478125\" y=\"136.922727\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_7\">\n",
" <!-- $\\mathdefault{10^{0}}$ -->\n",
" <g transform=\"translate(20.878125 140.721946)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_2\">\n",
" <g id=\"line2d_53\">\n",
" <path d=\"M 45.478125 100.09077 \n",
"L 240.778125 100.09077 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_54\">\n",
" <g>\n",
" <use xlink:href=\"#m46989f122a\" x=\"45.478125\" y=\"100.09077\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_8\">\n",
" <!-- $\\mathdefault{10^{1}}$ -->\n",
" <g transform=\"translate(20.878125 103.889989)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.684375)\"/>\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(128.203125 38.965625)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_3\">\n",
" <g id=\"line2d_55\">\n",
" <path d=\"M 45.478125 63.258813 \n",
"L 240.778125 63.258813 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_56\">\n",
" <g>\n",
" <use xlink:href=\"#m46989f122a\" x=\"45.478125\" y=\"63.258813\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_9\">\n",
" <!-- $\\mathdefault{10^{2}}$ -->\n",
" <g transform=\"translate(20.878125 67.058032)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-32\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_4\">\n",
" <g id=\"line2d_57\">\n",
" <path d=\"M 45.478125 26.426856 \n",
"L 240.778125 26.426856 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #b0b0b0; stroke-width: 0.8; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_58\">\n",
" <g>\n",
" <use xlink:href=\"#m46989f122a\" x=\"45.478125\" y=\"26.426856\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_10\">\n",
" <!-- $\\mathdefault{10^{3}}$ -->\n",
" <g transform=\"translate(20.878125 30.226075)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-31\" transform=\"translate(0 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-30\" transform=\"translate(63.623047 0.765625)\"/>\n",
" <use xlink:href=\"#DejaVuSans-33\" transform=\"translate(128.203125 39.046875)scale(0.7)\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_5\">\n",
" <g id=\"line2d_59\">\n",
" <defs>\n",
" <path id=\"m0205faf438\" d=\"M 0 0 \n",
"L -2 0 \n",
"\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"142.62807\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_6\">\n",
" <g id=\"line2d_60\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"140.492113\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_7\">\n",
" <g id=\"line2d_61\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"138.608065\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_8\">\n",
" <g id=\"line2d_62\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"125.835203\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_9\">\n",
" <g id=\"line2d_63\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"119.349418\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_10\">\n",
" <g id=\"line2d_64\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"114.747679\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_11\">\n",
" <g id=\"line2d_65\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"111.178294\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_12\">\n",
" <g id=\"line2d_66\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"108.261894\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_13\">\n",
" <g id=\"line2d_67\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"105.796112\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_14\">\n",
" <g id=\"line2d_68\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"103.660156\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_15\">\n",
" <g id=\"line2d_69\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"101.776108\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_16\">\n",
" <g id=\"line2d_70\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"89.003246\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_17\">\n",
" <g id=\"line2d_71\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"82.517461\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_18\">\n",
" <g id=\"line2d_72\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"77.915722\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_19\">\n",
" <g id=\"line2d_73\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"74.346337\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_20\">\n",
" <g id=\"line2d_74\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"71.429937\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_21\">\n",
" <g id=\"line2d_75\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"68.964155\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_22\">\n",
" <g id=\"line2d_76\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"66.828198\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_23\">\n",
" <g id=\"line2d_77\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"64.944151\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_24\">\n",
" <g id=\"line2d_78\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"52.171289\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_25\">\n",
" <g id=\"line2d_79\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"45.685503\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_26\">\n",
" <g id=\"line2d_80\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"41.083765\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_27\">\n",
" <g id=\"line2d_81\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"37.51438\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_28\">\n",
" <g id=\"line2d_82\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"34.597979\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_29\">\n",
" <g id=\"line2d_83\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"32.132198\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_30\">\n",
" <g id=\"line2d_84\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"29.996241\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_31\">\n",
" <g id=\"line2d_85\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"28.112194\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_32\">\n",
" <g id=\"line2d_86\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"15.339332\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_33\">\n",
" <g id=\"line2d_87\">\n",
" <g>\n",
" <use xlink:href=\"#m0205faf438\" x=\"45.478125\" y=\"8.853546\" style=\"stroke: #000000; stroke-width: 0.6\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_11\">\n",
" <!-- frequency: n(x) -->\n",
" <g transform=\"translate(14.798438 113.167187)rotate(-90)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-66\" d=\"M 2375 4863 \n",
"L 2375 4384 \n",
"L 1825 4384 \n",
"Q 1516 4384 1395 4259 \n",
"Q 1275 4134 1275 3809 \n",
"L 1275 3500 \n",
"L 2222 3500 \n",
"L 2222 3053 \n",
"L 1275 3053 \n",
"L 1275 0 \n",
"L 697 0 \n",
"L 697 3053 \n",
"L 147 3053 \n",
"L 147 3500 \n",
"L 697 3500 \n",
"L 697 3744 \n",
"Q 697 4328 969 4595 \n",
"Q 1241 4863 1831 4863 \n",
"L 2375 4863 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-72\" d=\"M 2631 2963 \n",
"Q 2534 3019 2420 3045 \n",
"Q 2306 3072 2169 3072 \n",
"Q 1681 3072 1420 2755 \n",
"Q 1159 2438 1159 1844 \n",
"L 1159 0 \n",
"L 581 0 \n",
"L 581 3500 \n",
"L 1159 3500 \n",
"L 1159 2956 \n",
"Q 1341 3275 1631 3429 \n",
"Q 1922 3584 2338 3584 \n",
"Q 2397 3584 2469 3576 \n",
"Q 2541 3569 2628 3553 \n",
"L 2631 2963 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-71\" d=\"M 947 1747 \n",
"Q 947 1113 1208 752 \n",
"Q 1469 391 1925 391 \n",
"Q 2381 391 2643 752 \n",
"Q 2906 1113 2906 1747 \n",
"Q 2906 2381 2643 2742 \n",
"Q 2381 3103 1925 3103 \n",
"Q 1469 3103 1208 2742 \n",
"Q 947 2381 947 1747 \n",
"z\n",
"M 2906 525 \n",
"Q 2725 213 2448 61 \n",
"Q 2172 -91 1784 -91 \n",
"Q 1150 -91 751 415 \n",
"Q 353 922 353 1747 \n",
"Q 353 2572 751 3078 \n",
"Q 1150 3584 1784 3584 \n",
"Q 2172 3584 2448 3432 \n",
"Q 2725 3281 2906 2969 \n",
"L 2906 3500 \n",
"L 3481 3500 \n",
"L 3481 -1331 \n",
"L 2906 -1331 \n",
"L 2906 525 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-75\" d=\"M 544 1381 \n",
"L 544 3500 \n",
"L 1119 3500 \n",
"L 1119 1403 \n",
"Q 1119 906 1312 657 \n",
"Q 1506 409 1894 409 \n",
"Q 2359 409 2629 706 \n",
"Q 2900 1003 2900 1516 \n",
"L 2900 3500 \n",
"L 3475 3500 \n",
"L 3475 0 \n",
"L 2900 0 \n",
"L 2900 538 \n",
"Q 2691 219 2414 64 \n",
"Q 2138 -91 1772 -91 \n",
"Q 1169 -91 856 284 \n",
"Q 544 659 544 1381 \n",
"z\n",
"M 1991 3584 \n",
"L 1991 3584 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-63\" d=\"M 3122 3366 \n",
"L 3122 2828 \n",
"Q 2878 2963 2633 3030 \n",
"Q 2388 3097 2138 3097 \n",
"Q 1578 3097 1268 2742 \n",
"Q 959 2388 959 1747 \n",
"Q 959 1106 1268 751 \n",
"Q 1578 397 2138 397 \n",
"Q 2388 397 2633 464 \n",
"Q 2878 531 3122 666 \n",
"L 3122 134 \n",
"Q 2881 22 2623 -34 \n",
"Q 2366 -91 2075 -91 \n",
"Q 1284 -91 818 406 \n",
"Q 353 903 353 1747 \n",
"Q 353 2603 823 3093 \n",
"Q 1294 3584 2113 3584 \n",
"Q 2378 3584 2631 3529 \n",
"Q 2884 3475 3122 3366 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-79\" d=\"M 2059 -325 \n",
"Q 1816 -950 1584 -1140 \n",
"Q 1353 -1331 966 -1331 \n",
"L 506 -1331 \n",
"L 506 -850 \n",
"L 844 -850 \n",
"Q 1081 -850 1212 -737 \n",
"Q 1344 -625 1503 -206 \n",
"L 1606 56 \n",
"L 191 3500 \n",
"L 800 3500 \n",
"L 1894 763 \n",
"L 2988 3500 \n",
"L 3597 3500 \n",
"L 2059 -325 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-28\" d=\"M 1984 4856 \n",
"Q 1566 4138 1362 3434 \n",
"Q 1159 2731 1159 2009 \n",
"Q 1159 1288 1364 580 \n",
"Q 1569 -128 1984 -844 \n",
"L 1484 -844 \n",
"Q 1016 -109 783 600 \n",
"Q 550 1309 550 2009 \n",
"Q 550 2706 781 3412 \n",
"Q 1013 4119 1484 4856 \n",
"L 1984 4856 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-29\" d=\"M 513 4856 \n",
"L 1013 4856 \n",
"Q 1481 4119 1714 3412 \n",
"Q 1947 2706 1947 2009 \n",
"Q 1947 1309 1714 600 \n",
"Q 1481 -109 1013 -844 \n",
"L 513 -844 \n",
"Q 928 -128 1133 580 \n",
"Q 1338 1288 1338 2009 \n",
"Q 1338 2731 1133 3434 \n",
"Q 928 4138 513 4856 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-66\"/>\n",
" <use xlink:href=\"#DejaVuSans-72\" x=\"35.205078\"/>\n",
" <use xlink:href=\"#DejaVuSans-65\" x=\"74.068359\"/>\n",
" <use xlink:href=\"#DejaVuSans-71\" x=\"135.591797\"/>\n",
" <use xlink:href=\"#DejaVuSans-75\" x=\"199.068359\"/>\n",
" <use xlink:href=\"#DejaVuSans-65\" x=\"262.447266\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"323.970703\"/>\n",
" <use xlink:href=\"#DejaVuSans-63\" x=\"387.349609\"/>\n",
" <use xlink:href=\"#DejaVuSans-79\" x=\"442.330078\"/>\n",
" <use xlink:href=\"#DejaVuSans-3a\" x=\"494.259766\"/>\n",
" <use xlink:href=\"#DejaVuSans-20\" x=\"527.951172\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"559.738281\"/>\n",
" <use xlink:href=\"#DejaVuSans-28\" x=\"623.117188\"/>\n",
" <use xlink:href=\"#DejaVuSans-78\" x=\"662.130859\"/>\n",
" <use xlink:href=\"#DejaVuSans-29\" x=\"721.310547\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_88\">\n",
" <path d=\"M -1 22.628465 \n",
"L 54.355398 22.641392 \n",
"L 66.297697 22.921582 \n",
"L 73.283494 24.121839 \n",
"L 78.239996 29.67948 \n",
"L 82.084558 32.246865 \n",
"L 85.225793 35.931745 \n",
"L 87.88167 36.253723 \n",
"L 90.182295 39.450499 \n",
"L 94.026857 39.668628 \n",
"L 95.668965 43.037948 \n",
"L 97.168092 46.732081 \n",
"L 98.547156 47.370841 \n",
"L 101.012654 50.574172 \n",
"L 102.124594 50.940227 \n",
"L 103.169102 51.854528 \n",
"L 104.15389 52.171289 \n",
"L 105.085419 55.941884 \n",
"L 105.969156 56.561158 \n",
"L 106.809766 58.223127 \n",
"L 107.611264 58.223127 \n",
"L 108.377128 58.577295 \n",
"L 109.110391 59.185578 \n",
"L 109.813718 59.435519 \n",
"L 110.489455 59.947432 \n",
"L 111.139687 60.078011 \n",
"L 112.37086 60.078011 \n",
"L 113.519892 60.611258 \n",
"L 114.066893 61.162896 \n",
"L 114.597061 61.162896 \n",
"L 115.111401 61.30383 \n",
"L 115.61083 61.30383 \n",
"L 116.096189 61.446017 \n",
"L 116.568248 62.326748 \n",
"L 117.027718 62.47837 \n",
"L 117.475252 62.785993 \n",
"L 117.911455 64.248567 \n",
"L 118.752065 64.592578 \n",
"L 119.157475 64.592578 \n",
"L 119.553563 64.767399 \n",
"L 119.94075 65.122878 \n",
"L 121.407942 65.858452 \n",
"L 122.097198 65.858452 \n",
"L 122.431754 66.047755 \n",
"L 122.759937 66.047755 \n",
"L 123.081986 66.433217 \n",
"L 123.708567 67.439582 \n",
"L 124.013515 67.860551 \n",
"L 124.313159 68.075265 \n",
"L 124.897252 68.075265 \n",
"L 126.009193 68.964155 \n",
"L 126.53936 68.964155 \n",
"L 126.798449 69.194316 \n",
"L 127.553129 71.165535 \n",
"L 128.038488 71.698782 \n",
"L 128.276134 71.698782 \n",
"L 128.510547 72.25042 \n",
"L 128.970017 72.25042 \n",
"L 129.417551 73.115275 \n",
"L 129.637033 73.115275 \n",
"L 130.898262 74.669498 \n",
"L 131.690544 74.669498 \n",
"L 131.883049 74.999322 \n",
"L 132.073428 74.999322 \n",
"L 132.261726 75.336091 \n",
"L 132.447988 75.336091 \n",
"L 132.632259 75.680102 \n",
"L 133.173531 75.680102 \n",
"L 133.350241 76.031675 \n",
"L 133.525158 76.031675 \n",
"L 133.869751 76.758887 \n",
"L 134.207587 76.758887 \n",
"L 134.374053 77.135279 \n",
"L 134.538926 77.135279 \n",
"L 134.864013 77.915722 \n",
"L 135.340424 77.915722 \n",
"L 135.650867 78.736206 \n",
"L 135.804015 78.736206 \n",
"L 135.955814 79.162789 \n",
"L 136.54998 79.162789 \n",
"L 136.695374 79.60106 \n",
"L 137.40449 79.60106 \n",
"L 137.542877 80.051679 \n",
"L 138.085571 80.051679 \n",
"L 138.218615 80.515361 \n",
"L 138.868847 80.515361 \n",
"L 138.995999 80.992887 \n",
"L 139.247523 80.992887 \n",
"L 139.371922 81.485108 \n",
"L 139.860723 81.485108 \n",
"L 139.980787 81.992957 \n",
"L 140.218433 81.992957 \n",
"L 140.336039 82.517461 \n",
"L 140.568868 82.517461 \n",
"L 140.684113 83.059747 \n",
"L 140.798592 83.059747 \n",
"L 141.025294 83.621065 \n",
"L 141.469941 83.621065 \n",
"L 141.688034 84.202798 \n",
"L 142.010082 84.202798 \n",
"L 142.221485 84.80649 \n",
"L 142.738914 84.80649 \n",
"L 142.941611 85.433861 \n",
"L 143.141952 85.433861 \n",
"L 143.33999 86.086846 \n",
"L 143.632843 86.086846 \n",
"L 143.825349 86.767626 \n",
"L 144.204025 86.767626 \n",
"L 144.390288 87.478673 \n",
"L 145.204412 87.478673 \n",
"L 145.380221 88.222803 \n",
"L 145.981796 88.222803 \n",
"L 146.149886 89.003246 \n",
"L 147.046164 89.003246 \n",
"L 147.204231 89.823729 \n",
"L 147.898113 89.823729 \n",
"L 148.048587 90.688584 \n",
"L 148.271861 90.688584 \n",
"L 148.419119 91.602885 \n",
"L 148.924831 91.602885 \n",
"L 149.066635 92.572632 \n",
"L 150.160914 92.572632 \n",
"L 150.292938 93.604984 \n",
"L 150.938298 93.604984 \n",
"L 151.064519 94.708589 \n",
"L 151.803022 94.708589 \n",
"L 151.923086 95.894013 \n",
"L 153.246839 95.894013 \n",
"L 153.41224 97.17437 \n",
"L 154.163784 97.17437 \n",
"L 154.320651 98.566197 \n",
"L 155.767648 98.566197 \n",
"L 155.910629 100.09077 \n",
"L 157.496553 100.09077 \n",
"L 157.625933 101.776108 \n",
"L 159.107149 101.776108 \n",
"L 159.225022 103.660156 \n",
"L 161.427476 103.660156 \n",
"L 161.56476 105.796112 \n",
"L 164.103032 105.796112 \n",
"L 164.220637 108.261894 \n",
"L 166.674411 108.261894 \n",
"L 166.801003 111.178294 \n",
"L 170.013563 111.178294 \n",
"L 170.138714 114.747679 \n",
"L 174.078613 114.747679 \n",
"L 174.193969 119.349418 \n",
"L 179.386992 119.349418 \n",
"L 179.508069 125.835203 \n",
"L 186.804739 125.835203 \n",
"L 186.922828 136.922727 \n",
"L 199.579747 136.922727 \n",
"L 199.579747 136.922727 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"line2d_89\">\n",
" <path d=\"M -1 54.851831 \n",
"L 54.355398 54.8653 \n",
"L 73.283494 61.446017 \n",
"L 78.239996 61.880322 \n",
"L 82.084558 62.942052 \n",
"L 85.225793 63.419577 \n",
"L 87.88167 65.858452 \n",
"L 90.182295 67.23318 \n",
"L 92.21159 68.2929 \n",
"L 94.026857 69.427838 \n",
"L 95.668965 69.664818 \n",
"L 97.168092 70.905433 \n",
"L 98.547156 71.165535 \n",
"L 99.823969 71.165535 \n",
"L 101.012654 71.429937 \n",
"L 102.124594 74.029576 \n",
"L 103.169102 74.029576 \n",
"L 104.15389 74.669498 \n",
"L 105.085419 74.669498 \n",
"L 105.969156 75.336091 \n",
"L 106.809766 75.336091 \n",
"L 108.377128 76.031675 \n",
"L 109.813718 76.031675 \n",
"L 110.489455 76.391149 \n",
"L 111.139687 76.391149 \n",
"L 111.766268 77.520741 \n",
"L 112.37086 77.915722 \n",
"L 112.954953 77.915722 \n",
"L 113.519892 78.320704 \n",
"L 115.111401 78.320704 \n",
"L 115.61083 79.162789 \n",
"L 116.096189 79.60106 \n",
"L 117.027718 79.60106 \n",
"L 117.475252 80.515361 \n",
"L 118.336886 80.515361 \n",
"L 118.752065 81.485108 \n",
"L 119.157475 81.485108 \n",
"L 119.94075 82.517461 \n",
"L 121.052691 82.517461 \n",
"L 121.756017 83.621065 \n",
"L 122.431754 83.621065 \n",
"L 122.759937 84.202798 \n",
"L 123.708567 84.202798 \n",
"L 124.013515 84.80649 \n",
"L 124.607681 84.80649 \n",
"L 124.897252 85.433861 \n",
"L 125.462191 85.433861 \n",
"L 125.737863 86.086846 \n",
"L 126.53936 86.086846 \n",
"L 126.798449 86.767626 \n",
"L 127.553129 86.767626 \n",
"L 127.797518 87.478673 \n",
"L 129.637033 87.478673 \n",
"L 129.853754 88.222803 \n",
"L 130.898262 88.222803 \n",
"L 131.099774 89.003246 \n",
"L 131.495862 89.003246 \n",
"L 131.690544 89.823729 \n",
"L 133.173531 89.823729 \n",
"L 133.350241 90.688584 \n",
"L 134.538926 90.688584 \n",
"L 134.702236 91.602885 \n",
"L 135.18308 91.602885 \n",
"L 135.340424 92.572632 \n",
"L 137.542877 92.572632 \n",
"L 137.680162 93.604984 \n",
"L 139.980787 93.604984 \n",
"L 140.10002 94.708589 \n",
"L 141.796053 94.708589 \n",
"L 142.010082 95.894013 \n",
"L 143.535777 95.894013 \n",
"L 143.729364 97.17437 \n",
"L 145.292541 97.17437 \n",
"L 145.467457 98.566197 \n",
"L 147.438644 98.566197 \n",
"L 147.593166 100.09077 \n",
"L 150.489097 100.09077 \n",
"L 150.61864 101.776108 \n",
"L 152.854615 101.776108 \n",
"L 152.967594 103.660156 \n",
"L 156.099442 103.660156 \n",
"L 156.239707 105.796112 \n",
"L 159.458377 105.796112 \n",
"L 159.57388 108.261894 \n",
"L 163.225504 108.261894 \n",
"L 163.349233 111.178294 \n",
"L 167.82918 111.178294 \n",
"L 167.947593 114.747679 \n",
"L 174.128146 114.747679 \n",
"L 174.243171 119.349418 \n",
"L 182.205262 119.349418 \n",
"L 182.318375 125.835203 \n",
"L 195.477351 125.835203 \n",
"L 195.591578 136.922727 \n",
"L 225.183092 136.922727 \n",
"L 225.183092 136.922727 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"line2d_90\">\n",
" <path d=\"M -1 82.502365 \n",
"L 54.355398 82.517461 \n",
"L 66.297697 86.086846 \n",
"L 73.283494 92.572632 \n",
"L 78.239996 93.604984 \n",
"L 82.084558 93.604984 \n",
"L 85.225793 94.708589 \n",
"L 87.88167 94.708589 \n",
"L 90.182295 95.894013 \n",
"L 92.21159 95.894013 \n",
"L 94.026857 97.17437 \n",
"L 98.547156 97.17437 \n",
"L 99.823969 98.566197 \n",
"L 105.969156 98.566197 \n",
"L 106.809766 100.09077 \n",
"L 109.813718 100.09077 \n",
"L 110.489455 101.776108 \n",
"L 114.066893 101.776108 \n",
"L 114.597061 103.660156 \n",
"L 121.052691 103.660156 \n",
"L 121.407942 105.796112 \n",
"L 126.276316 105.796112 \n",
"L 126.53936 108.261894 \n",
"L 132.632259 108.261894 \n",
"L 132.814579 111.178294 \n",
"L 140.568868 111.178294 \n",
"L 140.684113 114.747679 \n",
"L 148.271861 114.747679 \n",
"L 148.419119 119.349418 \n",
"L 160.652212 119.349418 \n",
"L 160.795788 125.835203 \n",
"L 181.787567 125.835203 \n",
"L 181.903446 136.922727 \n",
"L 231.900852 136.922727 \n",
"L 231.900852 136.922727 \n",
"\" clip-path=\"url(#p0ebf69a405)\" style=\"fill: none; stroke-dasharray: 9.6,2.4,1.5,2.4; stroke-dashoffset: 0; stroke: #008000; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"patch_3\">\n",
" <path d=\"M 45.478125 143.1 \n",
"L 45.478125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_4\">\n",
" <path d=\"M 240.778125 143.1 \n",
"L 240.778125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_5\">\n",
" <path d=\"M 45.478125 143.1 \n",
"L 240.778125 143.1 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_6\">\n",
" <path d=\"M 45.478125 7.2 \n",
"L 240.778125 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"legend_1\">\n",
" <g id=\"patch_7\">\n",
" <path d=\"M 159.996875 59.234375 \n",
"L 233.778125 59.234375 \n",
"Q 235.778125 59.234375 235.778125 57.234375 \n",
"L 235.778125 14.2 \n",
"Q 235.778125 12.2 233.778125 12.2 \n",
"L 159.996875 12.2 \n",
"Q 157.996875 12.2 157.996875 14.2 \n",
"L 157.996875 57.234375 \n",
"Q 157.996875 59.234375 159.996875 59.234375 \n",
"z\n",
"\" style=\"fill: #ffffff; opacity: 0.8; stroke: #cccccc; stroke-linejoin: miter\"/>\n",
" </g>\n",
" <g id=\"line2d_91\">\n",
" <path d=\"M 161.996875 20.298438 \n",
"L 171.996875 20.298438 \n",
"L 181.996875 20.298438 \n",
"\" style=\"fill: none; stroke: #1f77b4; stroke-width: 1.5; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"text_12\">\n",
" <!-- unigram -->\n",
" <g transform=\"translate(189.996875 23.798438)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-69\" d=\"M 603 3500 \n",
"L 1178 3500 \n",
"L 1178 0 \n",
"L 603 0 \n",
"L 603 3500 \n",
"z\n",
"M 603 4863 \n",
"L 1178 4863 \n",
"L 1178 4134 \n",
"L 603 4134 \n",
"L 603 4863 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-67\" d=\"M 2906 1791 \n",
"Q 2906 2416 2648 2759 \n",
"Q 2391 3103 1925 3103 \n",
"Q 1463 3103 1205 2759 \n",
"Q 947 2416 947 1791 \n",
"Q 947 1169 1205 825 \n",
"Q 1463 481 1925 481 \n",
"Q 2391 481 2648 825 \n",
"Q 2906 1169 2906 1791 \n",
"z\n",
"M 3481 434 \n",
"Q 3481 -459 3084 -895 \n",
"Q 2688 -1331 1869 -1331 \n",
"Q 1566 -1331 1297 -1286 \n",
"Q 1028 -1241 775 -1147 \n",
"L 775 -588 \n",
"Q 1028 -725 1275 -790 \n",
"Q 1522 -856 1778 -856 \n",
"Q 2344 -856 2625 -561 \n",
"Q 2906 -266 2906 331 \n",
"L 2906 616 \n",
"Q 2728 306 2450 153 \n",
"Q 2172 0 1784 0 \n",
"Q 1141 0 747 490 \n",
"Q 353 981 353 1791 \n",
"Q 353 2603 747 3093 \n",
"Q 1141 3584 1784 3584 \n",
"Q 2172 3584 2450 3431 \n",
"Q 2728 3278 2906 2969 \n",
"L 2906 3500 \n",
"L 3481 3500 \n",
"L 3481 434 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-61\" d=\"M 2194 1759 \n",
"Q 1497 1759 1228 1600 \n",
"Q 959 1441 959 1056 \n",
"Q 959 750 1161 570 \n",
"Q 1363 391 1709 391 \n",
"Q 2188 391 2477 730 \n",
"Q 2766 1069 2766 1631 \n",
"L 2766 1759 \n",
"L 2194 1759 \n",
"z\n",
"M 3341 1997 \n",
"L 3341 0 \n",
"L 2766 0 \n",
"L 2766 531 \n",
"Q 2569 213 2275 61 \n",
"Q 1981 -91 1556 -91 \n",
"Q 1019 -91 701 211 \n",
"Q 384 513 384 1019 \n",
"Q 384 1609 779 1909 \n",
"Q 1175 2209 1959 2209 \n",
"L 2766 2209 \n",
"L 2766 2266 \n",
"Q 2766 2663 2505 2880 \n",
"Q 2244 3097 1772 3097 \n",
"Q 1472 3097 1187 3025 \n",
"Q 903 2953 641 2809 \n",
"L 641 3341 \n",
"Q 956 3463 1253 3523 \n",
"Q 1550 3584 1831 3584 \n",
"Q 2591 3584 2966 3190 \n",
"Q 3341 2797 3341 1997 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"DejaVuSans-6d\" d=\"M 3328 2828 \n",
"Q 3544 3216 3844 3400 \n",
"Q 4144 3584 4550 3584 \n",
"Q 5097 3584 5394 3201 \n",
"Q 5691 2819 5691 2113 \n",
"L 5691 0 \n",
"L 5113 0 \n",
"L 5113 2094 \n",
"Q 5113 2597 4934 2840 \n",
"Q 4756 3084 4391 3084 \n",
"Q 3944 3084 3684 2787 \n",
"Q 3425 2491 3425 1978 \n",
"L 3425 0 \n",
"L 2847 0 \n",
"L 2847 2094 \n",
"Q 2847 2600 2669 2842 \n",
"Q 2491 3084 2119 3084 \n",
"Q 1678 3084 1418 2786 \n",
"Q 1159 2488 1159 1978 \n",
"L 1159 0 \n",
"L 581 0 \n",
"L 581 3500 \n",
"L 1159 3500 \n",
"L 1159 2956 \n",
"Q 1356 3278 1631 3431 \n",
"Q 1906 3584 2284 3584 \n",
"Q 2666 3584 2933 3390 \n",
"Q 3200 3197 3328 2828 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-75\"/>\n",
" <use xlink:href=\"#DejaVuSans-6e\" x=\"63.378906\"/>\n",
" <use xlink:href=\"#DejaVuSans-69\" x=\"126.757812\"/>\n",
" <use xlink:href=\"#DejaVuSans-67\" x=\"154.541016\"/>\n",
" <use xlink:href=\"#DejaVuSans-72\" x=\"218.017578\"/>\n",
" <use xlink:href=\"#DejaVuSans-61\" x=\"259.130859\"/>\n",
" <use xlink:href=\"#DejaVuSans-6d\" x=\"320.410156\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_92\">\n",
" <path d=\"M 161.996875 34.976562 \n",
"L 171.996875 34.976562 \n",
"L 181.996875 34.976562 \n",
"\" style=\"fill: none; stroke-dasharray: 5.55,2.4; stroke-dashoffset: 0; stroke: #bf00bf; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"text_13\">\n",
" <!-- bigram -->\n",
" <g transform=\"translate(189.996875 38.476562)scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"DejaVuSans-62\" d=\"M 3116 1747 \n",
"Q 3116 2381 2855 2742 \n",
"Q 2594 3103 2138 3103 \n",
"Q 1681 3103 1420 2742 \n",
"Q 1159 2381 1159 1747 \n",
"Q 1159 1113 1420 752 \n",
"Q 1681 391 2138 391 \n",
"Q 2594 391 2855 752 \n",
"Q 3116 1113 3116 1747 \n",
"z\n",
"M 1159 2969 \n",
"Q 1341 3281 1617 3432 \n",
"Q 1894 3584 2278 3584 \n",
"Q 2916 3584 3314 3078 \n",
"Q 3713 2572 3713 1747 \n",
"Q 3713 922 3314 415 \n",
"Q 2916 -91 2278 -91 \n",
"Q 1894 -91 1617 61 \n",
"Q 1341 213 1159 525 \n",
"L 1159 0 \n",
"L 581 0 \n",
"L 581 4863 \n",
"L 1159 4863 \n",
"L 1159 2969 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#DejaVuSans-62\"/>\n",
" <use xlink:href=\"#DejaVuSans-69\" x=\"63.476562\"/>\n",
" <use xlink:href=\"#DejaVuSans-67\" x=\"91.259766\"/>\n",
" <use xlink:href=\"#DejaVuSans-72\" x=\"154.736328\"/>\n",
" <use xlink:href=\"#DejaVuSans-61\" x=\"195.849609\"/>\n",
" <use xlink:href=\"#DejaVuSans-6d\" x=\"257.128906\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"line2d_93\">\n",
" <path d=\"M 161.996875 49.654688 \n",
"L 171.996875 49.654688 \n",
"L 181.996875 49.654688 \n",
"\" style=\"fill: none; stroke-dasharray: 9.6,2.4,1.5,2.4; stroke-dashoffset: 0; stroke: #008000; stroke-width: 1.5\"/>\n",
" </g>\n",
" <g id=\"text_14\">\n",
" <!-- trigram -->\n",
" <g transform=\"translate(189.996875 53.154688)scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#DejaVuSans-74\"/>\n",
" <use xlink:href=\"#DejaVuSans-72\" x=\"39.208984\"/>\n",
" <use xlink:href=\"#DejaVuSans-69\" x=\"80.322266\"/>\n",
" <use xlink:href=\"#DejaVuSans-67\" x=\"108.105469\"/>\n",
" <use xlink:href=\"#DejaVuSans-72\" x=\"171.582031\"/>\n",
" <use xlink:href=\"#DejaVuSans-61\" x=\"212.695312\"/>\n",
" <use xlink:href=\"#DejaVuSans-6d\" x=\"273.974609\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <defs>\n",
" <clipPath id=\"p0ebf69a405\">\n",
" <rect x=\"45.478125\" y=\"7.2\" width=\"195.3\" height=\"135.9\"/>\n",
" </clipPath>\n",
" </defs>\n",
"</svg>\n"
],
"text/plain": [
"<Figure size 252x180 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"bigram_freqs = [freq for token, freq in bigram_vocab.token_freqs]\n",
"trigram_freqs = [freq for token, freq in trigram_vocab.token_freqs]\n",
"d2l.plot([freqs, bigram_freqs, trigram_freqs], xlabel='token: x',\n",
" ylabel='frequency: n(x)', xscale='log', yscale='log',\n",
" legend=['unigram', 'bigram', 'trigram'])"
]
},
{
"cell_type": "markdown",
"id": "c86d84a0",
"metadata": {
"origin_pos": 14
},
"source": [
"这张图非常令人振奋!原因有很多:\n",
"\n",
"1. 除了一元语法词,单词序列似乎也遵循齐普夫定律,\n",
"尽管公式 :eqref:`eq_zipf_law`中的指数$\\alpha$更小\n",
"(指数的大小受序列长度的影响);\n",
"2. 词表中$n$元组的数量并没有那么大,这说明语言中存在相当多的结构,\n",
"这些结构给了我们应用模型的希望;\n",
"3. 很多$n$元组很少出现,这使得拉普拉斯平滑非常不适合语言建模。\n",
"作为代替,我们将使用基于深度学习的模型。\n",
"\n",
"## 读取长序列数据\n",
"\n",
"由于序列数据本质上是连续的,因此我们在处理数据时需要解决这个问题。\n",
"在 :numref:`sec_sequence`中我们以一种相当特别的方式做到了这一点:\n",
"当序列变得太长而不能被模型一次性全部处理时,\n",
"我们可能希望拆分这样的序列方便模型读取。\n",
"\n",
"在介绍该模型之前,我们看一下总体策略。\n",
"假设我们将使用神经网络来训练语言模型,\n",
"模型中的网络一次处理具有预定义长度\n",
"(例如$n$个时间步)的一个小批量序列。\n",
"现在的问题是如何[**随机生成一个小批量数据的特征和标签以供读取。**]\n",
"\n",
"首先,由于文本序列可以是任意长的,\n",
"例如整本《时光机器》(*The Time Machine*),\n",
"于是任意长的序列可以被我们划分为具有相同时间步数的子序列。\n",
"当训练我们的神经网络时,这样的小批量子序列将被输入到模型中。\n",
"假设网络一次只处理具有$n$个时间步的子序列。\n",
" :numref:`fig_timemachine_5gram`画出了\n",
"从原始文本序列获得子序列的所有不同的方式,\n",
"其中$n=5$,并且每个时间步的词元对应于一个字符。\n",
"请注意,因为我们可以选择任意偏移量来指示初始位置,所以我们有相当大的自由度。\n",
"\n",
"![分割文本时,不同的偏移量会导致不同的子序列](../img/timemachine-5gram.svg)\n",
":label:`fig_timemachine_5gram`\n",
"\n",
"因此,我们应该从 :numref:`fig_timemachine_5gram`中选择哪一个呢?\n",
"事实上,他们都一样的好。\n",
"然而,如果我们只选择一个偏移量,\n",
"那么用于训练网络的、所有可能的子序列的覆盖范围将是有限的。\n",
"因此,我们可以从随机偏移量开始划分序列,\n",
"以同时获得*覆盖性*coverage)和*随机性*randomness)。\n",
"下面,我们将描述如何实现*随机采样*random sampling)和\n",
"*顺序分区*sequential partitioning)策略。\n",
"\n",
"### 随机采样\n",
"\n",
"(**在随机采样中,每个样本都是在原始的长序列上任意捕获的子序列。**)\n",
"在迭代过程中,来自两个相邻的、随机的、小批量中的子序列不一定在原始序列上相邻。\n",
"对于语言建模,目标是基于到目前为止我们看到的词元来预测下一个词元,\n",
"因此标签是移位了一个词元的原始序列。\n",
"\n",
"下面的代码每次可以从数据中随机生成一个小批量。\n",
"在这里,参数`batch_size`指定了每个小批量中子序列样本的数目,\n",
"参数`num_steps`是每个子序列中预定义的时间步数。\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6763bdfb",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:06.590933Z",
"iopub.status.busy": "2023-08-18T07:04:06.590651Z",
"iopub.status.idle": "2023-08-18T07:04:06.597838Z",
"shell.execute_reply": "2023-08-18T07:04:06.597066Z"
},
"origin_pos": 15,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def seq_data_iter_random(corpus, batch_size, num_steps): #@save\n",
" \"\"\"使用随机抽样生成一个小批量子序列\"\"\"\n",
" # 从随机偏移量开始对序列进行分区,随机范围包括num_steps-1\n",
" corpus = corpus[random.randint(0, num_steps - 1):]\n",
" # 减去1,是因为我们需要考虑标签\n",
" num_subseqs = (len(corpus) - 1) // num_steps\n",
" # 长度为num_steps的子序列的起始索引\n",
" initial_indices = list(range(0, num_subseqs * num_steps, num_steps))\n",
" # 在随机抽样的迭代过程中,\n",
" # 来自两个相邻的、随机的、小批量中的子序列不一定在原始序列上相邻\n",
" random.shuffle(initial_indices)\n",
"\n",
" def data(pos):\n",
" # 返回从pos位置开始的长度为num_steps的序列\n",
" return corpus[pos: pos + num_steps]\n",
"\n",
" num_batches = num_subseqs // batch_size\n",
" for i in range(0, batch_size * num_batches, batch_size):\n",
" # 在这里,initial_indices包含子序列的随机起始索引\n",
" initial_indices_per_batch = initial_indices[i: i + batch_size]\n",
" X = [data(j) for j in initial_indices_per_batch]\n",
" Y = [data(j + 1) for j in initial_indices_per_batch]\n",
" yield torch.tensor(X), torch.tensor(Y)"
]
},
{
"cell_type": "markdown",
"id": "8045d2e2",
"metadata": {
"origin_pos": 16
},
"source": [
"下面我们[**生成一个从$0$到$34$的序列**]。\n",
"假设批量大小为$2$,时间步数为$5$,这意味着可以生成\n",
"$\\lfloor (35 - 1) / 5 \\rfloor= 6$个“特征-标签”子序列对。\n",
"如果设置小批量大小为$2$,我们只能得到$3$个小批量。\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5b286eb2",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:06.600992Z",
"iopub.status.busy": "2023-08-18T07:04:06.600725Z",
"iopub.status.idle": "2023-08-18T07:04:06.607406Z",
"shell.execute_reply": "2023-08-18T07:04:06.606616Z"
},
"origin_pos": 17,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X: tensor([[13, 14, 15, 16, 17],\n",
" [28, 29, 30, 31, 32]]) \n",
"Y: tensor([[14, 15, 16, 17, 18],\n",
" [29, 30, 31, 32, 33]])\n",
"X: tensor([[ 3, 4, 5, 6, 7],\n",
" [18, 19, 20, 21, 22]]) \n",
"Y: tensor([[ 4, 5, 6, 7, 8],\n",
" [19, 20, 21, 22, 23]])\n",
"X: tensor([[ 8, 9, 10, 11, 12],\n",
" [23, 24, 25, 26, 27]]) \n",
"Y: tensor([[ 9, 10, 11, 12, 13],\n",
" [24, 25, 26, 27, 28]])\n"
]
}
],
"source": [
"my_seq = list(range(35))\n",
"for X, Y in seq_data_iter_random(my_seq, batch_size=2, num_steps=5):\n",
" print('X: ', X, '\\nY:', Y)"
]
},
{
"cell_type": "markdown",
"id": "74e73616",
"metadata": {
"origin_pos": 18
},
"source": [
"### 顺序分区\n",
"\n",
"在迭代过程中,除了对原始序列可以随机抽样外,\n",
"我们还可以[**保证两个相邻的小批量中的子序列在原始序列上也是相邻的**]。\n",
"这种策略在基于小批量的迭代过程中保留了拆分的子序列的顺序,因此称为顺序分区。\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8f65cffa",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:06.610652Z",
"iopub.status.busy": "2023-08-18T07:04:06.610385Z",
"iopub.status.idle": "2023-08-18T07:04:06.616587Z",
"shell.execute_reply": "2023-08-18T07:04:06.615794Z"
},
"origin_pos": 19,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def seq_data_iter_sequential(corpus, batch_size, num_steps): #@save\n",
" \"\"\"使用顺序分区生成一个小批量子序列\"\"\"\n",
" # 从随机偏移量开始划分序列\n",
" offset = random.randint(0, num_steps)\n",
" num_tokens = ((len(corpus) - offset - 1) // batch_size) * batch_size\n",
" Xs = torch.tensor(corpus[offset: offset + num_tokens])\n",
" Ys = torch.tensor(corpus[offset + 1: offset + 1 + num_tokens])\n",
" Xs, Ys = Xs.reshape(batch_size, -1), Ys.reshape(batch_size, -1)\n",
" num_batches = Xs.shape[1] // num_steps\n",
" for i in range(0, num_steps * num_batches, num_steps):\n",
" X = Xs[:, i: i + num_steps]\n",
" Y = Ys[:, i: i + num_steps]\n",
" yield X, Y"
]
},
{
"cell_type": "markdown",
"id": "2d94e66a",
"metadata": {
"origin_pos": 22
},
"source": [
"基于相同的设置,通过顺序分区[**读取每个小批量的子序列的特征`X`和标签`Y`**]。\n",
"通过将它们打印出来可以发现:\n",
"迭代期间来自两个相邻的小批量中的子序列在原始序列中确实是相邻的。\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "9f482c61",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:06.619796Z",
"iopub.status.busy": "2023-08-18T07:04:06.619524Z",
"iopub.status.idle": "2023-08-18T07:04:06.625853Z",
"shell.execute_reply": "2023-08-18T07:04:06.625069Z"
},
"origin_pos": 23,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X: tensor([[ 0, 1, 2, 3, 4],\n",
" [17, 18, 19, 20, 21]]) \n",
"Y: tensor([[ 1, 2, 3, 4, 5],\n",
" [18, 19, 20, 21, 22]])\n",
"X: tensor([[ 5, 6, 7, 8, 9],\n",
" [22, 23, 24, 25, 26]]) \n",
"Y: tensor([[ 6, 7, 8, 9, 10],\n",
" [23, 24, 25, 26, 27]])\n",
"X: tensor([[10, 11, 12, 13, 14],\n",
" [27, 28, 29, 30, 31]]) \n",
"Y: tensor([[11, 12, 13, 14, 15],\n",
" [28, 29, 30, 31, 32]])\n"
]
}
],
"source": [
"for X, Y in seq_data_iter_sequential(my_seq, batch_size=2, num_steps=5):\n",
" print('X: ', X, '\\nY:', Y)"
]
},
{
"cell_type": "markdown",
"id": "420857d0",
"metadata": {
"origin_pos": 24
},
"source": [
"现在,我们[**将上面的两个采样函数包装到一个类中**],\n",
"以便稍后可以将其用作数据迭代器。\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "5ca1a50c",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:06.629066Z",
"iopub.status.busy": "2023-08-18T07:04:06.628766Z",
"iopub.status.idle": "2023-08-18T07:04:06.634136Z",
"shell.execute_reply": "2023-08-18T07:04:06.633361Z"
},
"origin_pos": 25,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"class SeqDataLoader: #@save\n",
" \"\"\"加载序列数据的迭代器\"\"\"\n",
" def __init__(self, batch_size, num_steps, use_random_iter, max_tokens):\n",
" if use_random_iter:\n",
" self.data_iter_fn = d2l.seq_data_iter_random\n",
" else:\n",
" self.data_iter_fn = d2l.seq_data_iter_sequential\n",
" self.corpus, self.vocab = d2l.load_corpus_time_machine(max_tokens)\n",
" self.batch_size, self.num_steps = batch_size, num_steps\n",
"\n",
" def __iter__(self):\n",
" return self.data_iter_fn(self.corpus, self.batch_size, self.num_steps)"
]
},
{
"cell_type": "markdown",
"id": "b403ba59",
"metadata": {
"origin_pos": 26
},
"source": [
"[**最后,我们定义了一个函数`load_data_time_machine`\n",
"它同时返回数据迭代器和词表**]\n",
"因此可以与其他带有`load_data`前缀的函数\n",
"(如 :numref:`sec_fashion_mnist`中定义的\n",
"`d2l.load_data_fashion_mnist`)类似地使用。\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "5f93e736",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:04:06.637403Z",
"iopub.status.busy": "2023-08-18T07:04:06.637137Z",
"iopub.status.idle": "2023-08-18T07:04:06.641522Z",
"shell.execute_reply": "2023-08-18T07:04:06.640666Z"
},
"origin_pos": 27,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def load_data_time_machine(batch_size, num_steps, #@save\n",
" use_random_iter=False, max_tokens=10000):\n",
" \"\"\"返回时光机器数据集的迭代器和词表\"\"\"\n",
" data_iter = SeqDataLoader(\n",
" batch_size, num_steps, use_random_iter, max_tokens)\n",
" return data_iter, data_iter.vocab"
]
},
{
"cell_type": "markdown",
"id": "ba8d1926",
"metadata": {
"origin_pos": 28
},
"source": [
"## 小结\n",
"\n",
"* 语言模型是自然语言处理的关键。\n",
"* $n$元语法通过截断相关性,为处理长序列提供了一种实用的模型。\n",
"* 长序列存在一个问题:它们很少出现或者从不出现。\n",
"* 齐普夫定律支配着单词的分布,这个分布不仅适用于一元语法,还适用于其他$n$元语法。\n",
"* 通过拉普拉斯平滑法可以有效地处理结构丰富而频率不足的低频词词组。\n",
"* 读取长序列的主要方式是随机采样和顺序分区。在迭代过程中,后者可以保证来自两个相邻的小批量中的子序列在原始序列上也是相邻的。\n",
"\n",
"## 练习\n",
"\n",
"1. 假设训练数据集中有$100,000$个单词。一个四元语法需要存储多少个词频和相邻多词频率?\n",
"1. 我们如何对一系列对话建模?\n",
"1. 一元语法、二元语法和三元语法的齐普夫定律的指数是不一样的,能设法估计么?\n",
"1. 想一想读取长序列数据的其他方法?\n",
"1. 考虑一下我们用于读取长序列的随机偏移量。\n",
" 1. 为什么随机偏移量是个好主意?\n",
" 1. 它真的会在文档的序列上实现完美的均匀分布吗?\n",
" 1. 要怎么做才能使分布更均匀?\n",
"1. 如果我们希望一个序列样本是一个完整的句子,那么这在小批量抽样中会带来怎样的问题?如何解决?\n"
]
},
{
"cell_type": "markdown",
"id": "be419edd",
"metadata": {
"origin_pos": 30,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/2097)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": []
},
"nbformat": 4,
"nbformat_minor": 5
}