{ "cells": [ { "cell_type": "markdown", "id": "324028bf", "metadata": { "origin_pos": 0 }, "source": [ "# 网络中的网络(NiN)\n", ":label:`sec_nin`\n", "\n", "LeNet、AlexNet和VGG都有一个共同的设计模式:通过一系列的卷积层与汇聚层来提取空间结构特征;然后通过全连接层对特征的表征进行处理。\n", "AlexNet和VGG对LeNet的改进主要在于如何扩大和加深这两个模块。\n", "或者,可以想象在这个过程的早期使用全连接层。然而,如果使用了全连接层,可能会完全放弃表征的空间结构。\n", "*网络中的网络*(*NiN*)提供了一个非常简单的解决方案:在每个像素的通道上分别使用多层感知机 :cite:`Lin.Chen.Yan.2013`\n", "\n", "## (**NiN块**)\n", "\n", "回想一下,卷积层的输入和输出由四维张量组成,张量的每个轴分别对应样本、通道、高度和宽度。\n", "另外,全连接层的输入和输出通常是分别对应于样本和特征的二维张量。\n", "NiN的想法是在每个像素位置(针对每个高度和宽度)应用一个全连接层。\n", "如果我们将权重连接到每个空间位置,我们可以将其视为$1\\times 1$卷积层(如 :numref:`sec_channels`中所述),或作为在每个像素位置上独立作用的全连接层。\n", "从另一个角度看,即将空间维度中的每个像素视为单个样本,将通道维度视为不同特征(feature)。\n", "\n", " :numref:`fig_nin`说明了VGG和NiN及它们的块之间主要架构差异。\n", "NiN块以一个普通卷积层开始,后面是两个$1 \\times 1$的卷积层。这两个$1 \\times 1$卷积层充当带有ReLU激活函数的逐像素全连接层。\n", "第一层的卷积窗口形状通常由用户设置。\n", "随后的卷积窗口形状固定为$1 \\times 1$。\n", "\n", "![对比 VGG 和 NiN 及它们的块之间主要架构差异。](../img/nin.svg)\n", ":width:`600px`\n", ":label:`fig_nin`\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "9b116832", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T07:19:49.388461Z", "iopub.status.busy": "2023-08-18T07:19:49.387694Z", "iopub.status.idle": "2023-08-18T07:19:52.241405Z", "shell.execute_reply": "2023-08-18T07:19:52.240502Z" }, "origin_pos": 2, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "import torch\n", "from torch import nn\n", "from d2l import torch as d2l\n", "\n", "\n", "def nin_block(in_channels, out_channels, kernel_size, strides, padding):\n", " return nn.Sequential(\n", " nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding),\n", " nn.ReLU(),\n", " nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU(),\n", " nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU())" ] }, { "cell_type": "markdown", "id": "fc11a427", "metadata": { "origin_pos": 5 }, "source": [ "## [**NiN模型**]\n", "\n", "最初的NiN网络是在AlexNet后不久提出的,显然从中得到了一些启示。\n", "NiN使用窗口形状为$11\\times 11$、$5\\times 5$和$3\\times 3$的卷积层,输出通道数量与AlexNet中的相同。\n", "每个NiN块后有一个最大汇聚层,汇聚窗口形状为$3\\times 3$,步幅为2。\n", "\n", "NiN和AlexNet之间的一个显著区别是NiN完全取消了全连接层。\n", "相反,NiN使用一个NiN块,其输出通道数等于标签类别的数量。最后放一个*全局平均汇聚层*(global average pooling layer),生成一个对数几率\t(logits)。NiN设计的一个优点是,它显著减少了模型所需参数的数量。然而,在实践中,这种设计有时会增加训练模型的时间。\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "8ba4ca30", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T07:19:52.246912Z", "iopub.status.busy": "2023-08-18T07:19:52.246237Z", "iopub.status.idle": "2023-08-18T07:19:52.294688Z", "shell.execute_reply": "2023-08-18T07:19:52.293840Z" }, "origin_pos": 7, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "net = nn.Sequential(\n", " nin_block(1, 96, kernel_size=11, strides=4, padding=0),\n", " nn.MaxPool2d(3, stride=2),\n", " nin_block(96, 256, kernel_size=5, strides=1, padding=2),\n", " nn.MaxPool2d(3, stride=2),\n", " nin_block(256, 384, kernel_size=3, strides=1, padding=1),\n", " nn.MaxPool2d(3, stride=2),\n", " nn.Dropout(0.5),\n", " # 标签类别数是10\n", " nin_block(384, 10, kernel_size=3, strides=1, padding=1),\n", " nn.AdaptiveAvgPool2d((1, 1)),\n", " # 将四维的输出转成二维的输出,其形状为(批量大小,10)\n", " nn.Flatten())" ] }, { "cell_type": "markdown", "id": "76c9e9ac", "metadata": { "origin_pos": 10 }, "source": [ "我们创建一个数据样本来[**查看每个块的输出形状**]。\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "2917eb46", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T07:19:52.298946Z", "iopub.status.busy": "2023-08-18T07:19:52.298371Z", "iopub.status.idle": "2023-08-18T07:19:52.325792Z", "shell.execute_reply": "2023-08-18T07:19:52.324591Z" }, "origin_pos": 12, "tab": [ "pytorch" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequential output shape:\t torch.Size([1, 96, 54, 54])\n", "MaxPool2d output shape:\t torch.Size([1, 96, 26, 26])\n", "Sequential output shape:\t torch.Size([1, 256, 26, 26])\n", "MaxPool2d output shape:\t torch.Size([1, 256, 12, 12])\n", "Sequential output shape:\t torch.Size([1, 384, 12, 12])\n", "MaxPool2d output shape:\t torch.Size([1, 384, 5, 5])\n", "Dropout output shape:\t torch.Size([1, 384, 5, 5])\n", "Sequential output shape:\t torch.Size([1, 10, 5, 5])\n", "AdaptiveAvgPool2d output shape:\t torch.Size([1, 10, 1, 1])\n", "Flatten output shape:\t torch.Size([1, 10])\n" ] } ], "source": [ "X = torch.rand(size=(1, 1, 224, 224))\n", "for layer in net:\n", " X = layer(X)\n", " print(layer.__class__.__name__,'output shape:\\t', X.shape)" ] }, { "cell_type": "markdown", "id": "95a4c245", "metadata": { "origin_pos": 15 }, "source": [ "## [**训练模型**]\n", "\n", "和以前一样,我们使用Fashion-MNIST来训练模型。训练NiN与训练AlexNet、VGG时相似。\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "3c362e21", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T07:19:52.329626Z", "iopub.status.busy": "2023-08-18T07:19:52.329083Z", "iopub.status.idle": "2023-08-18T07:24:13.568006Z", "shell.execute_reply": "2023-08-18T07:24:13.566948Z" }, "origin_pos": 16, "tab": [ "pytorch" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loss 0.563, train acc 0.786, test acc 0.790\n", "3087.6 examples/sec on cuda:0\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2023-08-18T07:24:13.523044\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.5.1, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "lr, num_epochs, batch_size = 0.1, 10, 128\n", "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)\n", "d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())" ] }, { "cell_type": "markdown", "id": "847261ce", "metadata": { "origin_pos": 17 }, "source": [ "## 小结\n", "\n", "* NiN使用由一个卷积层和多个$1\\times 1$卷积层组成的块。该块可以在卷积神经网络中使用,以允许更多的每像素非线性。\n", "* NiN去除了容易造成过拟合的全连接层,将它们替换为全局平均汇聚层(即在所有位置上进行求和)。该汇聚层通道数量为所需的输出数量(例如,Fashion-MNIST的输出为10)。\n", "* 移除全连接层可减少过拟合,同时显著减少NiN的参数。\n", "* NiN的设计影响了许多后续卷积神经网络的设计。\n", "\n", "## 练习\n", "\n", "1. 调整NiN的超参数,以提高分类准确性。\n", "1. 为什么NiN块中有两个$1\\times 1$卷积层?删除其中一个,然后观察和分析实验现象。\n", "1. 计算NiN的资源使用情况。\n", " 1. 参数的数量是多少?\n", " 1. 计算量是多少?\n", " 1. 训练期间需要多少显存?\n", " 1. 预测期间需要多少显存?\n", "1. 一次性直接将$384 \\times 5 \\times 5$的表示缩减为$10 \\times 5 \\times 5$的表示,会存在哪些问题?\n" ] }, { "cell_type": "markdown", "id": "288616e8", "metadata": { "origin_pos": 19, "tab": [ "pytorch" ] }, "source": [ "[Discussions](https://discuss.d2l.ai/t/1869)\n" ] } ], "metadata": { "language_info": { "name": "python" }, "required_libs": [] }, "nbformat": 4, "nbformat_minor": 5 }