{
"cells": [
{
"cell_type": "markdown",
"id": "98ee705e",
"metadata": {
"origin_pos": 0
},
"source": [
"# 全卷积网络\n",
":label:`sec_fcn`\n",
"\n",
"如 :numref:`sec_semantic_segmentation`中所介绍的那样,语义分割是对图像中的每个像素分类。\n",
"*全卷积网络*(fully convolutional network,FCN)采用卷积神经网络实现了从图像像素到像素类别的变换 :cite:`Long.Shelhamer.Darrell.2015`。\n",
"与我们之前在图像分类或目标检测部分介绍的卷积神经网络不同,全卷积网络将中间层特征图的高和宽变换回输入图像的尺寸:这是通过在 :numref:`sec_transposed_conv`中引入的*转置卷积*(transposed convolution)实现的。\n",
"因此,输出的类别预测与输入图像在像素级别上具有一一对应关系:通道维的输出即该位置对应像素的类别预测。\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9ba53b71",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:20.570706Z",
"iopub.status.busy": "2023-08-18T07:07:20.570035Z",
"iopub.status.idle": "2023-08-18T07:07:22.638674Z",
"shell.execute_reply": "2023-08-18T07:07:22.637517Z"
},
"origin_pos": 2,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import torch\n",
"import torchvision\n",
"from torch import nn\n",
"from torch.nn import functional as F\n",
"from d2l import torch as d2l"
]
},
{
"cell_type": "markdown",
"id": "a6b35251",
"metadata": {
"origin_pos": 4
},
"source": [
"## 构造模型\n",
"\n",
"下面我们了解一下全卷积网络模型最基本的设计。\n",
"如 :numref:`fig_fcn`所示,全卷积网络先使用卷积神经网络抽取图像特征,然后通过$1\\times 1$卷积层将通道数变换为类别个数,最后在 :numref:`sec_transposed_conv`中通过转置卷积层将特征图的高和宽变换为输入图像的尺寸。\n",
"因此,模型输出与输入图像的高和宽相同,且最终输出通道包含了该空间位置像素的类别预测。\n",
"\n",
"\n",
":label:`fig_fcn`\n",
"\n",
"下面,我们[**使用在ImageNet数据集上预训练的ResNet-18模型来提取图像特征**],并将该网络记为`pretrained_net`。\n",
"ResNet-18模型的最后几层包括全局平均汇聚层和全连接层,然而全卷积网络中不需要它们。\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "37e86099",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:22.642884Z",
"iopub.status.busy": "2023-08-18T07:07:22.642480Z",
"iopub.status.idle": "2023-08-18T07:07:23.298176Z",
"shell.execute_reply": "2023-08-18T07:07:23.297190Z"
},
"origin_pos": 6,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading: \"https://download.pytorch.org/models/resnet18-f37072fd.pth\" to /home/ci/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth\n"
]
},
{
"data": {
"application/json": {
"ascii": false,
"bar_format": null,
"colour": null,
"elapsed": 0.00848388671875,
"initial": 0,
"n": 0,
"ncols": null,
"nrows": null,
"postfix": null,
"prefix": "",
"rate": null,
"total": 46830571,
"unit": "B",
"unit_divisor": 1024,
"unit_scale": true
},
"application/vnd.jupyter.widget-view+json": {
"model_id": "5251df7f557143a6adf3d2225231761b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0.00/44.7M [00:00, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"[Sequential(\n",
" (0): BasicBlock(\n",
" (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace=True)\n",
" (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (downsample): Sequential(\n",
" (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
" (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (1): BasicBlock(\n",
" (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace=True)\n",
" (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" ),\n",
" AdaptiveAvgPool2d(output_size=(1, 1)),\n",
" Linear(in_features=512, out_features=1000, bias=True)]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pretrained_net = torchvision.models.resnet18(pretrained=True)\n",
"list(pretrained_net.children())[-3:]"
]
},
{
"cell_type": "markdown",
"id": "c7a6c6ca",
"metadata": {
"origin_pos": 8
},
"source": [
"接下来,我们[**创建一个全卷积网络`net`**]。\n",
"它复制了ResNet-18中大部分的预训练层,除了最后的全局平均汇聚层和最接近输出的全连接层。\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "92397bcf",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:23.303038Z",
"iopub.status.busy": "2023-08-18T07:07:23.302447Z",
"iopub.status.idle": "2023-08-18T07:07:23.307017Z",
"shell.execute_reply": "2023-08-18T07:07:23.306110Z"
},
"origin_pos": 10,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"net = nn.Sequential(*list(pretrained_net.children())[:-2])"
]
},
{
"cell_type": "markdown",
"id": "41361fe4",
"metadata": {
"origin_pos": 11
},
"source": [
"给定高度为320和宽度为480的输入,`net`的前向传播将输入的高和宽减小至原来的$1/32$,即10和15。\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6cbe7c99",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:23.311746Z",
"iopub.status.busy": "2023-08-18T07:07:23.310972Z",
"iopub.status.idle": "2023-08-18T07:07:23.369499Z",
"shell.execute_reply": "2023-08-18T07:07:23.368494Z"
},
"origin_pos": 13,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"torch.Size([1, 512, 10, 15])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = torch.rand(size=(1, 3, 320, 480))\n",
"net(X).shape"
]
},
{
"cell_type": "markdown",
"id": "b2aa79ff",
"metadata": {
"origin_pos": 15
},
"source": [
"接下来[**使用$1\\times1$卷积层将输出通道数转换为Pascal VOC2012数据集的类数(21类)。**]\n",
"最后需要(**将特征图的高度和宽度增加32倍**),从而将其变回输入图像的高和宽。\n",
"回想一下 :numref:`sec_padding`中卷积层输出形状的计算方法:\n",
"由于$(320-64+16\\times2+32)/32=10$且$(480-64+16\\times2+32)/32=15$,我们构造一个步幅为$32$的转置卷积层,并将卷积核的高和宽设为$64$,填充为$16$。\n",
"我们可以看到如果步幅为$s$,填充为$s/2$(假设$s/2$是整数)且卷积核的高和宽为$2s$,转置卷积核会将输入的高和宽分别放大$s$倍。\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1e32ef24",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:23.374842Z",
"iopub.status.busy": "2023-08-18T07:07:23.373922Z",
"iopub.status.idle": "2023-08-18T07:07:23.405937Z",
"shell.execute_reply": "2023-08-18T07:07:23.404771Z"
},
"origin_pos": 17,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"num_classes = 21\n",
"net.add_module('final_conv', nn.Conv2d(512, num_classes, kernel_size=1))\n",
"net.add_module('transpose_conv', nn.ConvTranspose2d(num_classes, num_classes,\n",
" kernel_size=64, padding=16, stride=32))"
]
},
{
"cell_type": "markdown",
"id": "fe867380",
"metadata": {
"origin_pos": 19
},
"source": [
"## [**初始化转置卷积层**]\n",
"\n",
"在图像处理中,我们有时需要将图像放大,即*上采样*(upsampling)。\n",
"*双线性插值*(bilinear interpolation)\n",
"是常用的上采样方法之一,它也经常用于初始化转置卷积层。\n",
"\n",
"为了解释双线性插值,假设给定输入图像,我们想要计算上采样输出图像上的每个像素。\n",
"\n",
"1. 将输出图像的坐标$(x,y)$映射到输入图像的坐标$(x',y')$上。\n",
"例如,根据输入与输出的尺寸之比来映射。\n",
"请注意,映射后的$x′$和$y′$是实数。\n",
"2. 在输入图像上找到离坐标$(x',y')$最近的4个像素。\n",
"3. 输出图像在坐标$(x,y)$上的像素依据输入图像上这4个像素及其与$(x',y')$的相对距离来计算。\n",
"\n",
"双线性插值的上采样可以通过转置卷积层实现,内核由以下`bilinear_kernel`函数构造。\n",
"限于篇幅,我们只给出`bilinear_kernel`函数的实现,不讨论算法的原理。\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "81e0e496",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:23.410931Z",
"iopub.status.busy": "2023-08-18T07:07:23.410049Z",
"iopub.status.idle": "2023-08-18T07:07:23.418870Z",
"shell.execute_reply": "2023-08-18T07:07:23.417816Z"
},
"origin_pos": 21,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def bilinear_kernel(in_channels, out_channels, kernel_size):\n",
" factor = (kernel_size + 1) // 2\n",
" if kernel_size % 2 == 1:\n",
" center = factor - 1\n",
" else:\n",
" center = factor - 0.5\n",
" og = (torch.arange(kernel_size).reshape(-1, 1),\n",
" torch.arange(kernel_size).reshape(1, -1))\n",
" filt = (1 - torch.abs(og[0] - center) / factor) * \\\n",
" (1 - torch.abs(og[1] - center) / factor)\n",
" weight = torch.zeros((in_channels, out_channels,\n",
" kernel_size, kernel_size))\n",
" weight[range(in_channels), range(out_channels), :, :] = filt\n",
" return weight"
]
},
{
"cell_type": "markdown",
"id": "6e5b2c78",
"metadata": {
"origin_pos": 23
},
"source": [
"让我们用[**双线性插值的上采样实验**]它由转置卷积层实现。\n",
"我们构造一个将输入的高和宽放大2倍的转置卷积层,并将其卷积核用`bilinear_kernel`函数初始化。\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c181ae97",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:23.423829Z",
"iopub.status.busy": "2023-08-18T07:07:23.422974Z",
"iopub.status.idle": "2023-08-18T07:07:23.431177Z",
"shell.execute_reply": "2023-08-18T07:07:23.430098Z"
},
"origin_pos": 25,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"conv_trans = nn.ConvTranspose2d(3, 3, kernel_size=4, padding=1, stride=2,\n",
" bias=False)\n",
"conv_trans.weight.data.copy_(bilinear_kernel(3, 3, 4));"
]
},
{
"cell_type": "markdown",
"id": "75884a8b",
"metadata": {
"origin_pos": 27
},
"source": [
"读取图像`X`,将上采样的结果记作`Y`。为了打印图像,我们需要调整通道维的位置。\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cdbf1f0e",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:23.435665Z",
"iopub.status.busy": "2023-08-18T07:07:23.435278Z",
"iopub.status.idle": "2023-08-18T07:07:23.521627Z",
"shell.execute_reply": "2023-08-18T07:07:23.520407Z"
},
"origin_pos": 29,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"img = torchvision.transforms.ToTensor()(d2l.Image.open('../img/catdog.jpg'))\n",
"X = img.unsqueeze(0)\n",
"Y = conv_trans(X)\n",
"out_img = Y[0].permute(1, 2, 0).detach()"
]
},
{
"cell_type": "markdown",
"id": "13f8e306",
"metadata": {
"origin_pos": 31
},
"source": [
"可以看到,转置卷积层将图像的高和宽分别放大了2倍。\n",
"除了坐标刻度不同,双线性插值放大的图像和在 :numref:`sec_bbox`中打印出的原图看上去没什么两样。\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "9bafc470",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:23.527421Z",
"iopub.status.busy": "2023-08-18T07:07:23.526512Z",
"iopub.status.idle": "2023-08-18T07:07:24.199909Z",
"shell.execute_reply": "2023-08-18T07:07:24.199093Z"
},
"origin_pos": 33,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"input image shape: torch.Size([561, 728, 3])\n",
"output image shape: torch.Size([1122, 1456, 3])\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"d2l.set_figsize()\n",
"print('input image shape:', img.permute(1, 2, 0).shape)\n",
"d2l.plt.imshow(img.permute(1, 2, 0));\n",
"print('output image shape:', out_img.shape)\n",
"d2l.plt.imshow(out_img);"
]
},
{
"cell_type": "markdown",
"id": "e28a121f",
"metadata": {
"origin_pos": 35
},
"source": [
"全卷积网络[**用双线性插值的上采样初始化转置卷积层。对于$1\\times 1$卷积层,我们使用Xavier初始化参数。**]\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "3607f0c9",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:24.203681Z",
"iopub.status.busy": "2023-08-18T07:07:24.203097Z",
"iopub.status.idle": "2023-08-18T07:07:24.209142Z",
"shell.execute_reply": "2023-08-18T07:07:24.208048Z"
},
"origin_pos": 37,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"W = bilinear_kernel(num_classes, num_classes, 64)\n",
"net.transpose_conv.weight.data.copy_(W);"
]
},
{
"cell_type": "markdown",
"id": "ff2a5afd",
"metadata": {
"origin_pos": 39
},
"source": [
"## [**读取数据集**]\n",
"\n",
"我们用 :numref:`sec_semantic_segmentation`中介绍的语义分割读取数据集。\n",
"指定随机裁剪的输出图像的形状为$320\\times 480$:高和宽都可以被$32$整除。\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "ff06cc24",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:24.213905Z",
"iopub.status.busy": "2023-08-18T07:07:24.213186Z",
"iopub.status.idle": "2023-08-18T07:07:55.535066Z",
"shell.execute_reply": "2023-08-18T07:07:55.534048Z"
},
"origin_pos": 40,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"read 1114 examples\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"read 1078 examples\n"
]
}
],
"source": [
"batch_size, crop_size = 32, (320, 480)\n",
"train_iter, test_iter = d2l.load_data_voc(batch_size, crop_size)"
]
},
{
"cell_type": "markdown",
"id": "79c83844",
"metadata": {
"origin_pos": 42
},
"source": [
"## [**训练**]\n",
"\n",
"现在我们可以训练全卷积网络了。\n",
"这里的损失函数和准确率计算与图像分类中的并没有本质上的不同,因为我们使用转置卷积层的通道来预测像素的类别,所以需要在损失计算中指定通道维。\n",
"此外,模型基于每个像素的预测类别是否正确来计算准确率。\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "244b4702",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:07:55.540275Z",
"iopub.status.busy": "2023-08-18T07:07:55.539598Z",
"iopub.status.idle": "2023-08-18T07:08:45.398121Z",
"shell.execute_reply": "2023-08-18T07:08:45.397216Z"
},
"origin_pos": 44,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"loss 0.443, train acc 0.863, test acc 0.848\n",
"254.0 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"def loss(inputs, targets):\n",
" return F.cross_entropy(inputs, targets, reduction='none').mean(1).mean(1)\n",
"\n",
"num_epochs, lr, wd, devices = 5, 0.001, 1e-3, d2l.try_all_gpus()\n",
"trainer = torch.optim.SGD(net.parameters(), lr=lr, weight_decay=wd)\n",
"d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)"
]
},
{
"cell_type": "markdown",
"id": "8bcb8df5",
"metadata": {
"origin_pos": 46
},
"source": [
"## [**预测**]\n",
"\n",
"在预测时,我们需要将输入图像在各个通道做标准化,并转成卷积神经网络所需要的四维输入格式。\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "bdb803a3",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:08:45.402153Z",
"iopub.status.busy": "2023-08-18T07:08:45.401873Z",
"iopub.status.idle": "2023-08-18T07:08:45.406358Z",
"shell.execute_reply": "2023-08-18T07:08:45.405611Z"
},
"origin_pos": 48,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def predict(img):\n",
" X = test_iter.dataset.normalize_image(img).unsqueeze(0)\n",
" pred = net(X.to(devices[0])).argmax(dim=1)\n",
" return pred.reshape(pred.shape[1], pred.shape[2])"
]
},
{
"cell_type": "markdown",
"id": "54d2aa8a",
"metadata": {
"origin_pos": 50
},
"source": [
"为了[**可视化预测的类别**]给每个像素,我们将预测类别映射回它们在数据集中的标注颜色。\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "27e3aa15",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:08:45.409772Z",
"iopub.status.busy": "2023-08-18T07:08:45.409264Z",
"iopub.status.idle": "2023-08-18T07:08:45.413358Z",
"shell.execute_reply": "2023-08-18T07:08:45.412563Z"
},
"origin_pos": 52,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def label2image(pred):\n",
" colormap = torch.tensor(d2l.VOC_COLORMAP, device=devices[0])\n",
" X = pred.long()\n",
" return colormap[X, :]"
]
},
{
"cell_type": "markdown",
"id": "e3a9d039",
"metadata": {
"origin_pos": 54
},
"source": [
"测试数据集中的图像大小和形状各异。\n",
"由于模型使用了步幅为32的转置卷积层,因此当输入图像的高或宽无法被32整除时,转置卷积层输出的高或宽会与输入图像的尺寸有偏差。\n",
"为了解决这个问题,我们可以在图像中截取多块高和宽为32的整数倍的矩形区域,并分别对这些区域中的像素做前向传播。\n",
"请注意,这些区域的并集需要完整覆盖输入图像。\n",
"当一个像素被多个区域所覆盖时,它在不同区域前向传播中转置卷积层输出的平均值可以作为`softmax`运算的输入,从而预测类别。\n",
"\n",
"为简单起见,我们只读取几张较大的测试图像,并从图像的左上角开始截取形状为$320\\times480$的区域用于预测。\n",
"对于这些测试图像,我们逐一打印它们截取的区域,再打印预测结果,最后打印标注的类别。\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "f0f8cff8",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T07:08:45.416847Z",
"iopub.status.busy": "2023-08-18T07:08:45.416234Z",
"iopub.status.idle": "2023-08-18T07:09:10.704851Z",
"shell.execute_reply": "2023-08-18T07:09:10.704050Z"
},
"origin_pos": 56,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"voc_dir = d2l.download_extract('voc2012', 'VOCdevkit/VOC2012')\n",
"test_images, test_labels = d2l.read_voc_images(voc_dir, False)\n",
"n, imgs = 4, []\n",
"for i in range(n):\n",
" crop_rect = (0, 0, 320, 480)\n",
" X = torchvision.transforms.functional.crop(test_images[i], *crop_rect)\n",
" pred = label2image(predict(X))\n",
" imgs += [X.permute(1,2,0), pred.cpu(),\n",
" torchvision.transforms.functional.crop(\n",
" test_labels[i], *crop_rect).permute(1,2,0)]\n",
"d2l.show_images(imgs[::3] + imgs[1::3] + imgs[2::3], 3, n, scale=2);"
]
},
{
"cell_type": "markdown",
"id": "b82349b4",
"metadata": {
"origin_pos": 58
},
"source": [
"## 小结\n",
"\n",
"* 全卷积网络先使用卷积神经网络抽取图像特征,然后通过$1\\times 1$卷积层将通道数变换为类别个数,最后通过转置卷积层将特征图的高和宽变换为输入图像的尺寸。\n",
"* 在全卷积网络中,我们可以将转置卷积层初始化为双线性插值的上采样。\n",
"\n",
"## 练习\n",
"\n",
"1. 如果将转置卷积层改用Xavier随机初始化,结果有什么变化?\n",
"1. 调节超参数,能进一步提升模型的精度吗?\n",
"1. 预测测试图像中所有像素的类别。\n",
"1. 最初的全卷积网络的论文中 :cite:`Long.Shelhamer.Darrell.2015`还使用了某些卷积神经网络中间层的输出。试着实现这个想法。\n"
]
},
{
"cell_type": "markdown",
"id": "314d9c7f",
"metadata": {
"origin_pos": 60,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/3297)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": [],
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {
"1a2196f22729431f83d039862392acc0": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "2.0.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "2.0.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "2.0.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_allow_html": false,
"layout": "IPY_MODEL_d19858e45b944e8c8f9059db20975b35",
"max": 46830571.0,
"min": 0.0,
"orientation": "horizontal",
"style": "IPY_MODEL_8734c89a8ec0418fb79ebae7cb29e024",
"tabbable": null,
"tooltip": null,
"value": 46830571.0
}
},
"5251df7f557143a6adf3d2225231761b": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "2.0.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "2.0.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "2.0.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_7cb4d17593b14ecbb1faa48dd9cfa341",
"IPY_MODEL_1a2196f22729431f83d039862392acc0",
"IPY_MODEL_a3afec78bd61439f8ac2cb8cd21ec2a7"
],
"layout": "IPY_MODEL_eadd40b78048487882acca1c18a48abb",
"tabbable": null,
"tooltip": null
}
},
"56a96b11f6874202b0c125339b47a25f": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "2.0.0",
"model_name": "HTMLStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "2.0.0",
"_model_name": "HTMLStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "2.0.0",
"_view_name": "StyleView",
"background": null,
"description_width": "",
"font_size": null,
"text_color": null
}
},
"7cb4d17593b14ecbb1faa48dd9cfa341": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "2.0.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "2.0.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "2.0.0",
"_view_name": "HTMLView",
"description": "",
"description_allow_html": false,
"layout": "IPY_MODEL_9741cfd24c0c4bdb9f733551ad298ce4",
"placeholder": "",
"style": "IPY_MODEL_7e79ccfb796b4045abce79262f800cda",
"tabbable": null,
"tooltip": null,
"value": "100%"
}
},
"7e79ccfb796b4045abce79262f800cda": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "2.0.0",
"model_name": "HTMLStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "2.0.0",
"_model_name": "HTMLStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "2.0.0",
"_view_name": "StyleView",
"background": null,
"description_width": "",
"font_size": null,
"text_color": null
}
},
"8734c89a8ec0418fb79ebae7cb29e024": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "2.0.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "2.0.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "2.0.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"886e463378e84ac29c4788e4908e8a70": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "2.0.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "2.0.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "2.0.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border_bottom": null,
"border_left": null,
"border_right": null,
"border_top": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"9741cfd24c0c4bdb9f733551ad298ce4": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "2.0.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "2.0.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "2.0.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border_bottom": null,
"border_left": null,
"border_right": null,
"border_top": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"a3afec78bd61439f8ac2cb8cd21ec2a7": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "2.0.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "2.0.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "2.0.0",
"_view_name": "HTMLView",
"description": "",
"description_allow_html": false,
"layout": "IPY_MODEL_886e463378e84ac29c4788e4908e8a70",
"placeholder": "",
"style": "IPY_MODEL_56a96b11f6874202b0c125339b47a25f",
"tabbable": null,
"tooltip": null,
"value": " 44.7M/44.7M [00:00<00:00, 163MB/s]"
}
},
"d19858e45b944e8c8f9059db20975b35": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "2.0.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "2.0.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "2.0.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border_bottom": null,
"border_left": null,
"border_right": null,
"border_top": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"eadd40b78048487882acca1c18a48abb": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "2.0.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "2.0.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "2.0.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border_bottom": null,
"border_left": null,
"border_right": null,
"border_top": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
}
},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}