Files
2025-12-16 09:23:53 +08:00

507 lines
15 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "b73c8f7f",
"metadata": {
"origin_pos": 0
},
"source": [
"# 编译器和解释器\n",
":label:`sec_hybridize`\n",
"\n",
"目前为止,本书主要关注的是*命令式编程*imperative programming)。\n",
"命令式编程使用诸如`print`、“`+`”和`if`之类的语句来更改程序的状态。\n",
"考虑下面这段简单的命令式程序:\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2f96dffd",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T06:58:16.571866Z",
"iopub.status.busy": "2023-08-18T06:58:16.571326Z",
"iopub.status.idle": "2023-08-18T06:58:16.580794Z",
"shell.execute_reply": "2023-08-18T06:58:16.579992Z"
},
"origin_pos": 1,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10\n"
]
}
],
"source": [
"def add(a, b):\n",
" return a + b\n",
"\n",
"def fancy_func(a, b, c, d):\n",
" e = add(a, b)\n",
" f = add(c, d)\n",
" g = add(e, f)\n",
" return g\n",
"\n",
"print(fancy_func(1, 2, 3, 4))"
]
},
{
"cell_type": "markdown",
"id": "fa1758bc",
"metadata": {
"origin_pos": 2
},
"source": [
"Python是一种*解释型语言*interpreted language)。因此,当对上面的`fancy_func`函数求值时,它按顺序执行函数体的操作。也就是说,它将通过对`e = add(a, b)`求值,并将结果存储为变量`e`,从而更改程序的状态。接下来的两个语句`f = add(c, d)`和`g = add(e, f)`也将执行类似地操作,即执行加法计算并将结果存储为变量。 :numref:`fig_compute_graph`说明了数据流。\n",
"\n",
"![命令式编程中的数据流](../img/computegraph.svg)\n",
":label:`fig_compute_graph`\n",
"\n",
"尽管命令式编程很方便,但可能效率不高。一方面原因,Python会单独执行这三个函数的调用,而没有考虑`add`函数在`fancy_func`中被重复调用。如果在一个GPU(甚至多个GPU)上执行这些命令,那么Python解释器产生的开销可能会非常大。此外,它需要保存`e`和`f`的变量值,直到`fancy_func`中的所有语句都执行完毕。这是因为程序不知道在执行语句`e = add(a, b)`和`f = add(c, d)`之后,其他部分是否会使用变量`e`和`f`。\n",
"\n",
"## 符号式编程\n",
"\n",
"考虑另一种选择*符号式编程*symbolic programming),即代码通常只在完全定义了过程之后才执行计算。这个策略被多个深度学习框架使用,包括Theano和TensorFlow(后者已经获得了命令式编程的扩展)。一般包括以下步骤:\n",
"\n",
"1. 定义计算流程;\n",
"1. 将流程编译成可执行的程序;\n",
"1. 给定输入,调用编译好的程序执行。\n",
"\n",
"这将允许进行大量的优化。首先,在大多数情况下,我们可以跳过Python解释器。从而消除因为多个更快的GPU与单个CPU上的单个Python线程搭配使用时产生的性能瓶颈。其次,编译器可以将上述代码优化和重写为`print((1 + 2) + (3 + 4))`甚至`print(10)`。因为编译器在将其转换为机器指令之前可以看到完整的代码,所以这种优化是可以实现的。例如,只要某个变量不再需要,编译器就可以释放内存(或者从不分配内存),或者将代码转换为一个完全等价的片段。下面,我们将通过模拟命令式编程来进一步了解符号式编程的概念。\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ccb650c9",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T06:58:16.584271Z",
"iopub.status.busy": "2023-08-18T06:58:16.583746Z",
"iopub.status.idle": "2023-08-18T06:58:16.589230Z",
"shell.execute_reply": "2023-08-18T06:58:16.588464Z"
},
"origin_pos": 3,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"def add(a, b):\n",
" return a + b\n",
"\n",
"def fancy_func(a, b, c, d):\n",
" e = add(a, b)\n",
" f = add(c, d)\n",
" g = add(e, f)\n",
" return g\n",
"print(fancy_func(1, 2, 3, 4))\n",
"10\n"
]
}
],
"source": [
"def add_():\n",
" return '''\n",
"def add(a, b):\n",
" return a + b\n",
"'''\n",
"\n",
"def fancy_func_():\n",
" return '''\n",
"def fancy_func(a, b, c, d):\n",
" e = add(a, b)\n",
" f = add(c, d)\n",
" g = add(e, f)\n",
" return g\n",
"'''\n",
"\n",
"def evoke_():\n",
" return add_() + fancy_func_() + 'print(fancy_func(1, 2, 3, 4))'\n",
"\n",
"prog = evoke_()\n",
"print(prog)\n",
"y = compile(prog, '', 'exec')\n",
"exec(y)"
]
},
{
"cell_type": "markdown",
"id": "2054d959",
"metadata": {
"origin_pos": 4
},
"source": [
"命令式(解释型)编程和符号式编程的区别如下:\n",
"\n",
"* 命令式编程更容易使用。在Python中,命令式编程的大部分代码都是简单易懂的。命令式编程也更容易调试,这是因为无论是获取和打印所有的中间变量值,或者使用Python的内置调试工具都更加简单;\n",
"* 符号式编程运行效率更高,更易于移植。符号式编程更容易在编译期间优化代码,同时还能够将程序移植到与Python无关的格式中,从而允许程序在非Python环境中运行,避免了任何潜在的与Python解释器相关的性能问题。\n",
"\n",
"## 混合式编程\n",
"\n",
"历史上,大部分深度学习框架都在命令式编程与符号式编程之间进行选择。例如,Theano、TensorFlow(灵感来自前者)、Keras和CNTK采用了符号式编程。相反地,Chainer和PyTorch采取了命令式编程。在后来的版本更新中,TensorFlow2.0和Keras增加了命令式编程。\n"
]
},
{
"cell_type": "markdown",
"id": "2bc27dfe",
"metadata": {
"origin_pos": 6,
"tab": [
"pytorch"
]
},
"source": [
"如上所述,PyTorch是基于命令式编程并且使用动态计算图。为了能够利用符号式编程的可移植性和效率,开发人员思考能否将这两种编程模型的优点结合起来,于是就产生了torchscript。torchscript允许用户使用纯命令式编程进行开发和调试,同时能够将大多数程序转换为符号式程序,以便在需要产品级计算性能和部署时使用。\n"
]
},
{
"cell_type": "markdown",
"id": "b88d0031",
"metadata": {
"origin_pos": 9
},
"source": [
"## `Sequential`的混合式编程\n",
"\n",
"要了解混合式编程的工作原理,最简单的方法是考虑具有多层的深层网络。按照惯例,Python解释器需要执行所有层的代码来生成一条指令,然后将该指令转发到CPU或GPU。对于单个的(快速的)计算设备,这不会导致任何重大问题。另一方面,如果我们使用先进的8-GPU服务器,比如AWS P3dn.24xlarge实例,Python将很难让所有的GPU都保持忙碌。在这里,瓶颈是单线程的Python解释器。让我们看看如何通过将`Sequential`替换为`HybridSequential`来解决代码中这个瓶颈。首先,我们定义一个简单的多层感知机。\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "65533e8b",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T06:58:16.592892Z",
"iopub.status.busy": "2023-08-18T06:58:16.592388Z",
"iopub.status.idle": "2023-08-18T06:58:18.663997Z",
"shell.execute_reply": "2023-08-18T06:58:18.662987Z"
},
"origin_pos": 11,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[ 0.0722, -0.0190]], grad_fn=<AddmmBackward0>)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import torch\n",
"from torch import nn\n",
"from d2l import torch as d2l\n",
"\n",
"\n",
"# 生产网络的工厂模式\n",
"def get_net():\n",
" net = nn.Sequential(nn.Linear(512, 256),\n",
" nn.ReLU(),\n",
" nn.Linear(256, 128),\n",
" nn.ReLU(),\n",
" nn.Linear(128, 2))\n",
" return net\n",
"\n",
"x = torch.randn(size=(1, 512))\n",
"net = get_net()\n",
"net(x)"
]
},
{
"cell_type": "markdown",
"id": "c4c394a8",
"metadata": {
"origin_pos": 15,
"tab": [
"pytorch"
]
},
"source": [
"通过使用`torch.jit.script`函数来转换模型,我们就有能力编译和优化多层感知机中的计算,而模型的计算结果保持不变。\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ac75ec68",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T06:58:18.669810Z",
"iopub.status.busy": "2023-08-18T06:58:18.668614Z",
"iopub.status.idle": "2023-08-18T06:58:18.805275Z",
"shell.execute_reply": "2023-08-18T06:58:18.804217Z"
},
"origin_pos": 19,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[ 0.0722, -0.0190]], grad_fn=<AddmmBackward0>)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"net = torch.jit.script(net)\n",
"net(x)"
]
},
{
"cell_type": "markdown",
"id": "a6620d01",
"metadata": {
"origin_pos": 23,
"tab": [
"pytorch"
]
},
"source": [
"我们编写与之前相同的代码,再使用`torch.jit.script`简单地转换模型,当完成这些任务后,网络就将得到优化(我们将在下面对性能进行基准测试)。\n"
]
},
{
"cell_type": "markdown",
"id": "49dd9081",
"metadata": {
"origin_pos": 26
},
"source": [
"### 通过混合式编程加速\n",
"\n",
"为了证明通过编译获得了性能改进,我们比较了混合编程前后执行`net(x)`所需的时间。让我们先定义一个度量时间的类,它在本章中在衡量(和改进)模型性能时将非常有用。\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "843b1333",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T06:58:18.809971Z",
"iopub.status.busy": "2023-08-18T06:58:18.809674Z",
"iopub.status.idle": "2023-08-18T06:58:18.815218Z",
"shell.execute_reply": "2023-08-18T06:58:18.814277Z"
},
"origin_pos": 27,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"#@save\n",
"class Benchmark:\n",
" \"\"\"用于测量运行时间\"\"\"\n",
" def __init__(self, description='Done'):\n",
" self.description = description\n",
"\n",
" def __enter__(self):\n",
" self.timer = d2l.Timer()\n",
" return self\n",
"\n",
" def __exit__(self, *args):\n",
" print(f'{self.description}: {self.timer.stop():.4f} sec')"
]
},
{
"cell_type": "markdown",
"id": "f007d153",
"metadata": {
"origin_pos": 29,
"tab": [
"pytorch"
]
},
"source": [
"现在我们可以调用网络两次,一次使用torchscript,一次不使用torchscript。\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "429dcf27",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T06:58:18.819415Z",
"iopub.status.busy": "2023-08-18T06:58:18.819129Z",
"iopub.status.idle": "2023-08-18T06:58:19.098924Z",
"shell.execute_reply": "2023-08-18T06:58:19.097877Z"
},
"origin_pos": 33,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"无torchscript: 0.1361 sec\n",
"有torchscript: 0.1204 sec\n"
]
}
],
"source": [
"net = get_net()\n",
"with Benchmark('无torchscript'):\n",
" for i in range(1000): net(x)\n",
"\n",
"net = torch.jit.script(net)\n",
"with Benchmark('有torchscript'):\n",
" for i in range(1000): net(x)"
]
},
{
"cell_type": "markdown",
"id": "67b1621c",
"metadata": {
"origin_pos": 37,
"tab": [
"pytorch"
]
},
"source": [
"如以上结果所示,在`nn.Sequential`的实例被函数`torch.jit.script`脚本化后,通过使用符号式编程提高了计算性能。\n"
]
},
{
"cell_type": "markdown",
"id": "55f995fe",
"metadata": {
"origin_pos": 40
},
"source": [
"### 序列化\n"
]
},
{
"cell_type": "markdown",
"id": "77ddf279",
"metadata": {
"origin_pos": 42,
"tab": [
"pytorch"
]
},
"source": [
"编译模型的好处之一是我们可以将模型及其参数序列化(保存)到磁盘。这允许这些训练好的模型部署到其他设备上,并且还能方便地使用其他前端编程语言。同时,通常编译模型的代码执行速度也比命令式编程更快。让我们看看`save`的实际功能。\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "5109f057",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T06:58:19.104418Z",
"iopub.status.busy": "2023-08-18T06:58:19.103582Z",
"iopub.status.idle": "2023-08-18T06:58:19.271595Z",
"shell.execute_reply": "2023-08-18T06:58:19.270264Z"
},
"origin_pos": 46,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 ci ci 651K Aug 18 06:58 my_mlp\r\n"
]
}
],
"source": [
"net.save('my_mlp')\n",
"!ls -lh my_mlp*"
]
},
{
"cell_type": "markdown",
"id": "7e1bcc1c",
"metadata": {
"origin_pos": 60
},
"source": [
"## 小结\n",
"\n",
"* 命令式编程使得新模型的设计变得容易,因为可以依据控制流编写代码,并拥有相对成熟的Python软件生态。\n",
"* 符号式编程要求我们先定义并且编译程序,然后再执行程序,其好处是提高了计算性能。\n"
]
},
{
"cell_type": "markdown",
"id": "b573ae3c",
"metadata": {
"origin_pos": 62
},
"source": [
"## 练习\n"
]
},
{
"cell_type": "markdown",
"id": "e6c4afe2",
"metadata": {
"origin_pos": 64,
"tab": [
"pytorch"
]
},
"source": [
"1. 回顾前几章中感兴趣的模型,能提高它们的计算性能吗?\n"
]
},
{
"cell_type": "markdown",
"id": "5db5892b",
"metadata": {
"origin_pos": 66,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/2788)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": []
},
"nbformat": 4,
"nbformat_minor": 5
}