{ "cells": [ { "cell_type": "markdown", "id": "273bffe9", "metadata": { "origin_pos": 0 }, "source": [ "# 风格迁移\n", "\n", "摄影爱好者也许接触过滤波器。它能改变照片的颜色风格，从而使风景照更加锐利或者令人像更加美白。但一个滤波器通常只能改变照片的某个方面。如果要照片达到理想中的风格，可能需要尝试大量不同的组合。这个过程的复杂程度不亚于模型调参。\n", "\n", "本节将介绍如何使用卷积神经网络，自动将一个图像中的风格应用在另一图像之上，即*风格迁移*（style transfer） :cite:`Gatys.Ecker.Bethge.2016`。\n", "这里我们需要两张输入图像：一张是*内容图像*，另一张是*风格图像*。\n", "我们将使用神经网络修改内容图像，使其在风格上接近风格图像。\n", "例如， :numref:`fig_style_transfer`中的内容图像为本书作者在西雅图郊区的雷尼尔山国家公园拍摄的风景照，而风格图像则是一幅主题为秋天橡树的油画。\n", "最终输出的合成图像应用了风格图像的油画笔触让整体颜色更加鲜艳，同时保留了内容图像中物体主体的形状。\n", "\n", "![输入内容图像和风格图像，输出风格迁移后的合成图像](../img/style-transfer.svg)\n", ":label:`fig_style_transfer`\n", "\n", "## 方法\n", "\n", " :numref:`fig_style_transfer_model`用简单的例子阐述了基于卷积神经网络的风格迁移方法。\n", "首先，我们初始化合成图像，例如将其初始化为内容图像。\n", "该合成图像是风格迁移过程中唯一需要更新的变量，即风格迁移所需迭代的模型参数。\n", "然后，我们选择一个预训练的卷积神经网络来抽取图像的特征，其中的模型参数在训练中无须更新。\n", "这个深度卷积神经网络凭借多个层逐级抽取图像的特征，我们可以选择其中某些层的输出作为内容特征或风格特征。\n", "以 :numref:`fig_style_transfer_model`为例，这里选取的预训练的神经网络含有3个卷积层，其中第二层输出内容特征，第一层和第三层输出风格特征。\n", "\n", "![基于卷积神经网络的风格迁移。实线箭头和虚线箭头分别表示前向传播和反向传播](../img/neural-style.svg)\n", ":label:`fig_style_transfer_model`\n", "\n", "接下来，我们通过前向传播（实线箭头方向）计算风格迁移的损失函数，并通过反向传播（虚线箭头方向）迭代模型参数，即不断更新合成图像。\n", "风格迁移常用的损失函数由3部分组成：\n", "\n", "1. *内容损失*使合成图像与内容图像在内容特征上接近；\n", "1. *风格损失*使合成图像与风格图像在风格特征上接近；\n", "1. *全变分损失*则有助于减少合成图像中的噪点。\n", "\n", "最后，当模型训练结束时，我们输出风格迁移的模型参数，即得到最终的合成图像。\n", "\n", "在下面，我们将通过代码来进一步了解风格迁移的技术细节。\n", "\n", "## [**阅读内容和风格图像**]\n", "\n", "首先，我们读取内容和风格图像。\n", "从打印出的图像坐标轴可以看出，它们的尺寸并不一样。\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "a0d90f51", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T07:23:26.021505Z", "iopub.status.busy": "2023-08-18T07:23:26.020759Z", "iopub.status.idle": "2023-08-18T07:23:29.597245Z", "shell.execute_reply": "2023-08-18T07:23:29.595990Z" }, "origin_pos": 2, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n" ], "text/plain": [ "

" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "import torch\n", "import torchvision\n", "from torch import nn\n", "from d2l import torch as d2l\n", "\n", "d2l.set_figsize()\n", "content_img = d2l.Image.open('../img/rainier.jpg')\n", "d2l.plt.imshow(content_img);" ] }, { "cell_type": "code", "execution_count": 2, "id": "ec590a65", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T07:23:29.601550Z", "iopub.status.busy": "2023-08-18T07:23:29.600514Z", "iopub.status.idle": "2023-08-18T07:23:30.096132Z", "shell.execute_reply": "2023-08-18T07:23:30.095315Z" }, "origin_pos": 5, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n" ], "text/plain": [ "

" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "style_img = d2l.Image.open('../img/autumn-oak.jpg')\n", "d2l.plt.imshow(style_img);" ] }, { "cell_type": "markdown", "id": "ddc886a6", "metadata": { "origin_pos": 6 }, "source": [ "## [**预处理和后处理**]\n", "\n", "下面，定义图像的预处理函数和后处理函数。\n", "预处理函数`preprocess`对输入图像在RGB三个通道分别做标准化，并将结果变换成卷积神经网络接受的输入格式。\n", "后处理函数`postprocess`则将输出图像中的像素值还原回标准化之前的值。\n", "由于图像打印函数要求每个像素的浮点数值在0～1之间，我们对小于0和大于1的值分别取0和1。\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "6f351192", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T07:23:30.103272Z", "iopub.status.busy": "2023-08-18T07:23:30.102388Z", "iopub.status.idle": "2023-08-18T07:23:30.112076Z", "shell.execute_reply": "2023-08-18T07:23:30.111052Z" }, "origin_pos": 8, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "rgb_mean = torch.tensor([0.485, 0.456, 0.406])\n", "rgb_std = torch.tensor([0.229, 0.224, 0.225])\n", "\n", "def preprocess(img, image_shape):\n", " transforms = torchvision.transforms.Compose([\n", " torchvision.transforms.Resize(image_shape),\n", " torchvision.transforms.ToTensor(),\n", " torchvision.transforms.Normalize(mean=rgb_mean, std=rgb_std)])\n", " return transforms(img).unsqueeze(0)\n", "\n", "def postprocess(img):\n", " img = img[0].to(rgb_std.device)\n", " img = torch.clamp(img.permute(1, 2, 0) * rgb_std + rgb_mean, 0, 1)\n", " return torchvision.transforms.ToPILImage()(img.permute(2, 0, 1))" ] }, { "cell_type": "markdown", "id": "54e9f5d8", "metadata": { "origin_pos": 10 }, "source": [ "## [**抽取图像特征**]\n", "\n", "我们使用基于ImageNet数据集预训练的VGG-19模型来抽取图像特征 :cite:`Gatys.Ecker.Bethge.2016`。\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "f562ee81", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T07:23:30.117700Z", "iopub.status.busy": "2023-08-18T07:23:30.116834Z", "iopub.status.idle": "2023-08-18T07:23:42.885822Z", "shell.execute_reply": "2023-08-18T07:23:42.884582Z" }, "origin_pos": 12, "tab": [ "pytorch" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading: \"https://download.pytorch.org/models/vgg19-dcbb9e9d.pth\" to /home/ci/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth\n" ] }, { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.007704734802246094, "initial": 0, "n": 0, "ncols": null, "nrows": null, "postfix": null, "prefix": "", "rate": null, "total": 574673361, "unit": "B", "unit_divisor": 1024, "unit_scale": true }, "application/vnd.jupyter.widget-view+json": { "model_id": "0a21ebb04f6e4afe9df09a7d7c6a0fe0", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0.00/548M [00:00\n", "\n", "\n" ], "text/plain": [ "

" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "device, image_shape = d2l.try_gpu(), (300, 450)\n", "net = net.to(device)\n", "content_X, contents_Y = get_contents(image_shape, device)\n", "_, styles_Y = get_styles(image_shape, device)\n", "output = train(content_X, contents_Y, styles_Y, device, 0.3, 500, 50)" ] }, { "cell_type": "markdown", "id": "7f5c2480", "metadata": { "origin_pos": 55 }, "source": [ "我们可以看到，合成图像保留了内容图像的风景和物体，并同时迁移了风格图像的色彩。例如，合成图像具有与风格图像中一样的色彩块，其中一些甚至具有画笔笔触的细微纹理。\n", "\n", "## 小结\n", "\n", "* 风格迁移常用的损失函数由3部分组成：（1）内容损失使合成图像与内容图像在内容特征上接近；（2）风格损失令合成图像与风格图像在风格特征上接近；（3）全变分损失则有助于减少合成图像中的噪点。\n", "* 我们可以通过预训练的卷积神经网络来抽取图像的特征，并通过最小化损失函数来不断更新合成图像来作为模型参数。\n", "* 我们使用格拉姆矩阵表达风格层输出的风格。\n", "\n", "## 练习\n", "\n", "1. 选择不同的内容和风格层，输出有什么变化？\n", "1. 调整损失函数中的权重超参数。输出是否保留更多内容或减少更多噪点？\n", "1. 替换实验中的内容图像和风格图像，能创作出更有趣的合成图像吗？\n", "1. 我们可以对文本使用风格迁移吗？提示:可以参阅调查报告 :cite:`Hu.Lee.Aggarwal.ea.2020`。\n" ] }, { "cell_type": "markdown", "id": "8888edcd", "metadata": { "origin_pos": 57, "tab": [ "pytorch" ] }, "source": [ "[Discussions](https://discuss.d2l.ai/t/3300)\n" ] } ], "metadata": { "language_info": { "name": "python" }, "required_libs": [], "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "0a21ebb04f6e4afe9df09a7d7c6a0fe0": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_3386849c303b40d18895bb91db97e325", "IPY_MODEL_155d363cdf40442e8faf86c2f0def49d", "IPY_MODEL_6840bc285801445eafe45e9cfc4a3216" ], "layout": "IPY_MODEL_96dfbfe851a544c09ae13797ba4d4198", "tabbable": null, "tooltip": null } }, "155d363cdf40442e8faf86c2f0def49d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_6e8fb617074d452eb01dbfe715d3827f", "max": 574673361.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_a24b158989e04d0e90ba67a8b670ce52", "tabbable": null, "tooltip": null, "value": 574673361.0 } }, "2d45b8677d764d7f9e04f2ab4f38d40b": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "3386849c303b40d18895bb91db97e325": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_2d45b8677d764d7f9e04f2ab4f38d40b", "placeholder": "", "style": "IPY_MODEL_bc6bcbf06ef44f2585d8d82ac182ea56", "tabbable": null, "tooltip": null, "value": "100%" } }, "5223c9213fef443497e360398c21149f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null } }, "57464fe9afd448b0a23eee081a9e085d": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "6840bc285801445eafe45e9cfc4a3216": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_57464fe9afd448b0a23eee081a9e085d", "placeholder": "", "style": "IPY_MODEL_5223c9213fef443497e360398c21149f", "tabbable": null, "tooltip": null, "value": " 548M/548M [00:10<00:00, 69.9MB/s]" } }, "6e8fb617074d452eb01dbfe715d3827f": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "96dfbfe851a544c09ae13797ba4d4198": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "a24b158989e04d0e90ba67a8b670ce52": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "bc6bcbf06ef44f2585d8d82ac182ea56": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }