chore: 更新项目文档、依赖和训练脚本
- 更新 requirements.txt,添加 opencv-python-headless 并补充 uv 安装说明 - 修复 CSV 文件中的换行符格式(CRLF 转 LF) - 更新 TASK_PROGRESS.md,记录并行训练实现和 WSL 支持 - 优化 train_improved.py 代码格式,移除多余空行和注释 - 更新课程作业要求文档的字符编码 - 添加新的 TensorBoard 日志文件和训练模型
This commit is contained in:
+4
-4
@@ -1,5 +1,5 @@
|
|||||||
完成一份 强化学习个人课程作业报告:需要用 Python 从零实现一个 PPO(Proximal Policy Optimization)强化学习算法,让智能体在 CarRacing-v3 环境中完成赛车任务,并在此基础上提交一份不超过 3000 词 的技术报告,系统说明你的方法与结果;具体来说,要介绍该任务的强化学习背景,定义状态空间、动作空间和奖励机制,解释 PPO 的目标函数、裁剪机制和优势估计方法,说明策略网络与价值网络结构、训练流程、超参数设置以及实现过程中遇到的问题和解决办法,同时用图表展示训练与测试结果,分析模型表现和变化趋势,并与如 Stable-Baselines3 这类基线方法在稳定性和样本效率上做简要比较;另外,还要提交一个包含全部源代码和训练好模型的 zip 文件,以及一个单独的 PDF 报告,文件命名和提交格式都必须符合要求,而且实现中不能直接使用 Stable-Baselines 等强化学习专用库,但可以合理使用 TensorBoard 记录实验结果。
|
完成一份 强化学习个人课程作业报告:需要用 Python 从零实现一个 PPO(Proximal Policy Optimization)强化学习算法,让智能体在 CarRacing-v3 环境中完成赛车任务,并在此基础上提交一份不超过 3000 词 的技术报告,系统说明你的方法与结果;具体来说,要介绍该任务的强化学习背景,定义状态空间、动作空间和奖励机制,解释 PPO 的目标函数、裁剪机制和优势估计方法,说明策略网络与价值网络结构、训练流程、超参数设置以及实现过程中遇到的问题和解决办法,同时用图表展示训练与测试结果,分析模型表现和变化趋势,并与如 Stable-Baselines3 这类基线方法在稳定性和样本效率上做简要比较;另外,还要提交一个包含全部源代码和训练好模型的 zip 文件,以及一个单独的 PDF 报告,文件命名和提交格式都必须符合要求,而且实现中不能直接使用 Stable-Baselines 等强化学习专用库,但可以合理使用 TensorBoard 记录实验结果。
|
||||||
|
|
||||||
这个 PDF 要求完成一份 强化学习个人项目报告:需要自己选择一个 Atari 游戏,实现并训练一个你选定的 深度强化学习算法 来达到有竞争力的表现,然后提交一份不超过 3000 词 的技术报告和一个包含全部源代码及训练模型的 zip 文件;报告中需要说明选择的游戏及其挑战,调研并总结深度强化学习尤其是在 Atari 游戏中的应用现状,比较考虑过的算法并解释为什么最终选择当前方法,详细介绍算法原理与具体实现,评估智能体表现、说明所选基准和评价指标,并分析为什么该算法在这个游戏上表现好或不好,同时用清晰标注坐标轴和图例的图表来展示结果;另外,作业明确要求不能直接用 Stable-Baselines 等强化学习专用库来实现算法,但可以用它们做 benchmark,对代码质量、结果分析、报告结构、图表使用和引用规范都会评分,最终还要按指定格式命名并提交 PDF 和 zip 文件。
|
这个 PDF 要求完成一份 强化学习个人项目报告:需要自己选择一个 Atari 游戏,实现并训练一个你选定的 深度强化学习算法 来达到有竞争力的表现,然后提交一份不超过 3000 词 的技术报告和一个包含全部源代码及训练模型的 zip 文件;报告中需要说明选择的游戏及其挑战,调研并总结深度强化学习尤其是在 Atari 游戏中的应用现状,比较考虑过的算法并解释为什么最终选择当前方法,详细介绍算法原理与具体实现,评估智能体表现、说明所选基准和评价指标,并分析为什么该算法在这个游戏上表现好或不好,同时用清晰标注坐标轴和图例的图表来展示结果;另外,作业明确要求不能直接用 Stable-Baselines 等强化学习专用库来实现算法,但可以用它们做 benchmark,对代码质量、结果分析、报告结构、图表使用和引用规范都会评分,最终还要按指定格式命名并提交 PDF 和 zip 文件。
|
||||||
|
|
||||||
完成一份 机器学习个人课程作业:围绕一个健康保险数据集,建立并改进一个用于预测申请人保费风险等级(Low / Standard / High)的多分类模型。你需要先完成 Jupyter Notebook 部分,包括数据清理与预处理、识别并删除数据泄露特征、建立基线模型、对比随机森林和一种 boosting 模型、使用高级超参数优化方法调参、根据学号末位完成指定的个性化改进并至少再做一个可选改进、再进行一次 K-Means 与 GMM 的无监督探索,最后基于验证结果选出最终模型并导出规定格式的 hidden-test CSV;同时还要提交一份 不超过1200词 左右的 Theory and Reflection PDF,围绕 bagging vs boosting、超参数优化、K-Means vs GMM、个性化改进反思和 AI 使用声明进行理论与实验结合的总结,并且所有结论都要紧扣你自己 notebook 里的表格、图和指标证据,最终按要求提交 notebook、PDF、CSV 以及必要的补充代码。
|
完成一份 机器学习个人课程作业:围绕一个健康保险数据集,建立并改进一个用于预测申请人保费风险等级(Low / Standard / High)的多分类模型。你需要先完成 Jupyter Notebook 部分,包括数据清理与预处理、识别并删除数据泄露特征、建立基线模型、对比随机森林和一种 boosting 模型、使用高级超参数优化方法调参、根据学号末位完成指定的个性化改进并至少再做一个可选改进、再进行一次 K-Means 与 GMM 的无监督探索,最后基于验证结果选出最终模型并导出规定格式的 hidden-test CSV;同时还要提交一份 不超过1200词 左右的 Theory and Reflection PDF,围绕 bagging vs boosting、超参数优化、K-Means vs GMM、个性化改进反思和 AI 使用声明进行理论与实验结合的总结,并且所有结论都要紧扣你自己 notebook 里的表格、图和指标证据,最终按要求提交 notebook、PDF、CSV 以及必要的补充代码。
|
||||||
+12501
-12501
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,8 +1,8 @@
|
|||||||
k,inertia,silhouette_x,log_likelihood,bic,aic,silhouette_y
|
k,inertia,silhouette_x,log_likelihood,bic,aic,silhouette_y
|
||||||
2,1092962.434364126,0.174016661115075,181335.84491703784,-359250.54291550705,-362061.6898340757,0.41420390111182703
|
2,1092962.434364126,0.174016661115075,181335.84491703784,-359250.54291550705,-362061.6898340757,0.41420390111182703
|
||||||
3,1018586.5047121042,0.17317021187208304,554291.2303605897,-1103445.131905755,-1107666.4607211794,0.2977020104302583
|
3,1018586.5047121042,0.17317021187208304,554291.2303605897,-1103445.131905755,-1107666.4607211794,0.2977020104302583
|
||||||
4,953249.4382030136,0.18080059886795355,972834.1094461675,-1938814.7081800548,-1944446.218892335,0.3964327255424141
|
4,953249.4382030136,0.18080059886795355,972834.1094461675,-1938814.7081800548,-1944446.218892335,0.3964327255424141
|
||||||
5,889284.892342685,0.1964251564081267,1002913.0930748597,-1997256.4935405836,-2004298.1861497194,0.40146893512413845
|
5,889284.892342685,0.1964251564081267,1002913.0930748597,-1997256.4935405836,-2004298.1861497194,0.40146893512413845
|
||||||
6,818950.9117652641,0.17683056672008368,1180025.734163945,-2349765.5938218986,-2358217.46832789,0.24683353848428613
|
6,818950.9117652641,0.17683056672008368,1180025.734163945,-2349765.5938218986,-2358217.46832789,0.24683353848428613
|
||||||
7,777658.2185885893,0.197056012688701,1203191.531501821,-2394381.006600795,-2404243.063003642,0.3109553553475885
|
7,777658.2185885893,0.197056012688701,1203191.531501821,-2394381.006600795,-2404243.063003642,0.3109553553475885
|
||||||
8,691940.8330833976,0.20149802939267383,1261969.3739466753,-2510220.5095936474,-2521492.7478933507,0.17264064800570944
|
8,691940.8330833976,0.20149802939267383,1261969.3739466753,-2510220.5095936474,-2521492.7478933507,0.17264064800570944
|
||||||
|
|||||||
|
@@ -1,2 +1,2 @@
|
|||||||
model,train_accuracy,val_accuracy,train_f1_macro,val_f1_macro,val_f1_High,val_f1_Low,val_f1_Standard
|
model,train_accuracy,val_accuracy,train_f1_macro,val_f1_macro,val_f1_High,val_f1_Low,val_f1_Standard
|
||||||
Baseline_LR,0.7595294117647059,0.7337904761904762,0.7493991157707756,0.7234383324236036,0.7663239074550129,0.6487372909150542,0.7552537989007436
|
Baseline_LR,0.7595294117647059,0.7337904761904762,0.7493991157707756,0.7234383324236036,0.7663239074550129,0.6487372909150542,0.7552537989007436
|
||||||
|
|||||||
|
@@ -1,7 +1,7 @@
|
|||||||
model,train_accuracy,val_accuracy,train_f1_macro,val_f1_macro,val_f1_High,val_f1_Low,val_f1_Standard,train_time
|
model,train_accuracy,val_accuracy,train_f1_macro,val_f1_macro,val_f1_High,val_f1_Low,val_f1_Standard,train_time
|
||||||
Baseline_LR,0.7593680672268908,0.7341714285714286,0.7492574544185482,0.7237629331592531,0.7665209565440987,0.6489501312335958,0.7558177117000646,
|
Baseline_LR,0.7593680672268908,0.7341714285714286,0.7492574544185482,0.7237629331592531,0.7665209565440987,0.6489501312335958,0.7558177117000646,
|
||||||
RandomForest,1.0,0.7877333333333333,1.0,0.770789728543472,0.7874554916461244,0.7095334685598377,0.8153802254244543,57.91048526763916
|
RandomForest,1.0,0.7877333333333333,1.0,0.770789728543472,0.7874554916461244,0.7095334685598377,0.8153802254244543,57.91048526763916
|
||||||
XGBoost,0.8519529411764706,0.8371047619047619,0.8297116592669606,0.8143842728003406,0.8904623073719283,0.6944039941751612,0.8582865168539325,67.63970804214478
|
XGBoost,0.8519529411764706,0.8371047619047619,0.8297116592669606,0.8143842728003406,0.8904623073719283,0.6944039941751612,0.8582865168539325,67.63970804214478
|
||||||
XGBoost_Tuned,0.9767663865546219,0.8700190476190476,0.9739400525375727,0.8519502714571496,0.9084439578486383,0.7620280474649407,0.8853788090578697,142.65462470054626
|
XGBoost_Tuned,0.9767663865546219,0.8700190476190476,0.9739400525375727,0.8519502714571496,0.9084439578486383,0.7620280474649407,0.8853788090578697,142.65462470054626
|
||||||
XGB_CatA_MissingHandling,0.9772638655462185,0.870552380952381,0.9745439553742655,0.8529411889528661,0.910207423580786,0.763542562338779,0.885073580939033,
|
XGB_CatA_MissingHandling,0.9772638655462185,0.870552380952381,0.9745439553742655,0.8529411889528661,0.910207423580786,0.763542562338779,0.885073580939033,
|
||||||
Ensemble_SoftVoting,0.9972436974789916,0.8675047619047619,0.9969472283391928,0.851001101708816,0.9024125779343996,0.7684120902511707,0.8821786369408776,
|
Ensemble_SoftVoting,0.9972436974789916,0.8675047619047619,0.9969472283391928,0.851001101708816,0.9024125779343996,0.7684120902511707,0.8821786369408776,
|
||||||
|
|||||||
|
+38
-17
@@ -26,6 +26,9 @@
|
|||||||
| ✅ 环境预处理 | 灰度化 + Resize(84×84) + 帧堆叠(4帧) Wrapper | [src/utils.py](src/utils.py) |
|
| ✅ 环境预处理 | 灰度化 + Resize(84×84) + 帧堆叠(4帧) Wrapper | [src/utils.py](src/utils.py) |
|
||||||
| ✅ 评估脚本 | 渲染测试 + 多回合平均分数评估 | [src/evaluate.py](src/evaluate.py) |
|
| ✅ 评估脚本 | 渲染测试 + 多回合平均分数评估 | [src/evaluate.py](src/evaluate.py) |
|
||||||
| ✅ 训练入口 | 主训练循环、TensorBoard 记录、模型保存 | [train.py](train.py) |
|
| ✅ 训练入口 | 主训练循环、TensorBoard 记录、模型保存 | [train.py](train.py) |
|
||||||
|
| ✅ 并行训练 | 多环境并行采集 + WSL 支持 | [train_parallel.py](train_parallel.py) |
|
||||||
|
| ✅ WSL 脚本 | 环境配置 + 启动脚本 | [setup_wsl.sh](setup_wsl.sh)、[run_wsl.sh](run_wsl.sh)、[start_wsl_training.bat](start_wsl_training.bat) |
|
||||||
|
| ✅ 测试脚本 | 快速验证并行环境和网络 | [test_parallel.py](test_parallel.py) |
|
||||||
|
|
||||||
**核心算法实现要点**:
|
**核心算法实现要点**:
|
||||||
- 策略网络:3 层 CNN + FC(512) → μ, σ(高斯策略,tanh 激活)
|
- 策略网络:3 层 CNN + FC(512) → μ, σ(高斯策略,tanh 激活)
|
||||||
@@ -60,36 +63,54 @@
|
|||||||
│ ├── trainer.py # PPO 更新逻辑
|
│ ├── trainer.py # PPO 更新逻辑
|
||||||
│ ├── utils.py # 环境预处理 wrappers
|
│ ├── utils.py # 环境预处理 wrappers
|
||||||
│ └── evaluate.py # 评估脚本
|
│ └── evaluate.py # 评估脚本
|
||||||
├── train.py # 主训练入口
|
├── train.py # 单线程训练入口
|
||||||
|
├── train_parallel.py # 多环境并行训练(推荐)
|
||||||
|
├── setup_wsl.sh # WSL 环境配置
|
||||||
|
├── run_wsl.sh # WSL 训练启动脚本
|
||||||
|
├── start_wsl_training.bat # Windows 一键启动 WSL 训练
|
||||||
|
├── test_parallel.py # 并行训练测试
|
||||||
├── requirements.txt
|
├── requirements.txt
|
||||||
├── README.md
|
├── README.md
|
||||||
└── TASK_PROGRESS.md # 本文档
|
├── WSL_README.md # WSL 训练指南
|
||||||
|
└── TASK_PROGRESS.md # 本文档
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 四、超参数配置
|
## 四、超参数配置
|
||||||
|
|
||||||
| 参数 | 值 |
|
| 参数 | train.py (单线程) | train_parallel.py (并行) |
|
||||||
|------|-----|
|
|------|-------------------|--------------------------|
|
||||||
| Learning rate | 3e-4 |
|
| Learning rate | 3e-4 | 3e-4 |
|
||||||
| Gamma | 0.99 |
|
| Gamma | 0.99 | 0.99 |
|
||||||
| GAE lambda | 0.95 |
|
| GAE lambda | 0.95 | 0.98 |
|
||||||
| Clip epsilon | 0.2 |
|
| Clip epsilon | 0.2 | 0.1 |
|
||||||
| PPO epochs | 4 |
|
| PPO epochs | 4 | 10 |
|
||||||
| Mini-batch size | 64 |
|
| Mini-batch size | 64 | 128 |
|
||||||
| Rollout steps | 2048 |
|
| Rollout steps | 2048 | 2048 |
|
||||||
| Entropy coefficient | 0.01 |
|
| Entropy coefficient | 0.01 | 0.005 |
|
||||||
| Value coefficient | 0.5 |
|
| Value coefficient | 0.5 | 0.75 |
|
||||||
| Max gradient norm | 0.5 |
|
| Max gradient norm | 0.5 | 0.5 |
|
||||||
| State shape | (84, 84, 4) |
|
| 总步数 | 500,000 | 2,000,000 |
|
||||||
| Action dim | 3(连续:steer, gas, brake) |
|
| 环境数 | 1 | 4 |
|
||||||
|
| 预计时长 | ~8h | ~5h (4x) |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 五、下一步行动
|
## 五、下一步行动
|
||||||
|
|
||||||
### 立即执行
|
### 方案 A:WSL 并行训练(推荐)
|
||||||
|
```bash
|
||||||
|
# Windows 下双击 start_wsl_training.bat
|
||||||
|
# 或手动:
|
||||||
|
wsl
|
||||||
|
cd "/mnt/d/Code/doing_exercises/programs/外教作业外快/强化学习个人项目报告"
|
||||||
|
chmod +x setup_wsl.sh run_wsl.sh
|
||||||
|
./setup_wsl.sh # 首次运行
|
||||||
|
./run_wsl.sh # 开始训练
|
||||||
|
```
|
||||||
|
|
||||||
|
### 方案 B:Windows 单线程训练
|
||||||
```bash
|
```bash
|
||||||
# 1. 安装依赖
|
# 1. 安装依赖
|
||||||
uv pip install --system -r requirements.txt
|
uv pip install --system -r requirements.txt
|
||||||
|
|||||||
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
Binary file not shown.
@@ -2,4 +2,9 @@ torch
|
|||||||
gymnasium[box2d]
|
gymnasium[box2d]
|
||||||
numpy
|
numpy
|
||||||
matplotlib
|
matplotlib
|
||||||
tensorboard
|
tensorboard
|
||||||
|
opencv-python-headless
|
||||||
|
|
||||||
|
# uv 安装方式(可选):
|
||||||
|
# curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||||
|
# uv pip install -r requirements.txt
|
||||||
+136
-130
@@ -1,4 +1,5 @@
|
|||||||
"""Improved training script with reward shaping and better hyperparameters."""
|
"""Improved training script for CarRacing-v3 PPO with reward shaping."""
|
||||||
|
|
||||||
import os
|
import os
|
||||||
import time
|
import time
|
||||||
import argparse
|
import argparse
|
||||||
@@ -12,36 +13,34 @@ import cv2
|
|||||||
|
|
||||||
|
|
||||||
class RewardShapingWrapper(gym.Wrapper):
|
class RewardShapingWrapper(gym.Wrapper):
|
||||||
"""Add reward shaping for better learning."""
|
|
||||||
|
|
||||||
def __init__(self, env):
|
def __init__(self, env):
|
||||||
super().__init__(env)
|
super().__init__(env)
|
||||||
self.steps_on_track = 0
|
self.steps_on_track = 0
|
||||||
|
|
||||||
def reset(self, **kwargs):
|
def reset(self, **kwargs):
|
||||||
obs, info = self.env.reset(**kwargs)
|
obs, info = self.env.reset(**kwargs)
|
||||||
self.steps_on_track = 0
|
self.steps_on_track = 0
|
||||||
return obs, info
|
return obs, info
|
||||||
|
|
||||||
def step(self, action):
|
def step(self, action):
|
||||||
obs, reward, terminated, truncated, info = self.env.step(action)
|
obs, reward, terminated, truncated, info = self.env.step(action)
|
||||||
done = terminated or truncated
|
done = terminated or truncated
|
||||||
|
|
||||||
shaped_reward = reward
|
shaped_reward = reward
|
||||||
|
|
||||||
if info.get('speed', 0) > 0.1:
|
if info.get("speed", 0) > 0.1:
|
||||||
shaped_reward += info['speed'] * 0.1
|
shaped_reward += info["speed"] * 0.1
|
||||||
|
|
||||||
if not info.get('offtrack', False):
|
if not info.get("offtrack", False):
|
||||||
shaped_reward += 0.1
|
shaped_reward += 0.1
|
||||||
self.steps_on_track += 1
|
self.steps_on_track += 1
|
||||||
else:
|
else:
|
||||||
shaped_reward -= 0.5
|
shaped_reward -= 0.5
|
||||||
self.steps_on_track = 0
|
self.steps_on_track = 0
|
||||||
|
|
||||||
if info.get('lap_complete', False):
|
if info.get("lap_complete", False):
|
||||||
shaped_reward += 100
|
shaped_reward += 100
|
||||||
|
|
||||||
return obs, shaped_reward, terminated, truncated, info
|
return obs, shaped_reward, terminated, truncated, info
|
||||||
|
|
||||||
|
|
||||||
@@ -70,9 +69,7 @@ class FrameStackWrapper(gym.ObservationWrapper):
|
|||||||
self.frames = deque(maxlen=num_stack)
|
self.frames = deque(maxlen=num_stack)
|
||||||
obs_shape = env.observation_space.shape
|
obs_shape = env.observation_space.shape
|
||||||
self.observation_space = gym.spaces.Box(
|
self.observation_space = gym.spaces.Box(
|
||||||
low=0, high=255,
|
low=0, high=255, shape=(num_stack, *obs_shape[-2:]), dtype=np.uint8
|
||||||
shape=(num_stack, *obs_shape[-2:]),
|
|
||||||
dtype=np.uint8
|
|
||||||
)
|
)
|
||||||
|
|
||||||
def reset(self, **kwargs):
|
def reset(self, **kwargs):
|
||||||
@@ -115,7 +112,7 @@ class Actor(nn.Module):
|
|||||||
def __init__(self, state_shape=(84, 84, 4), action_dim=3):
|
def __init__(self, state_shape=(84, 84, 4), action_dim=3):
|
||||||
super().__init__()
|
super().__init__()
|
||||||
c, h, w = state_shape[2], state_shape[0], state_shape[1]
|
c, h, w = state_shape[2], state_shape[0], state_shape[1]
|
||||||
|
|
||||||
self.conv = nn.Sequential(
|
self.conv = nn.Sequential(
|
||||||
nn.Conv2d(c, 32, kernel_size=8, stride=4),
|
nn.Conv2d(c, 32, kernel_size=8, stride=4),
|
||||||
nn.LeakyReLU(0.2),
|
nn.LeakyReLU(0.2),
|
||||||
@@ -126,28 +123,28 @@ class Actor(nn.Module):
|
|||||||
nn.Conv2d(64, 64, kernel_size=3, stride=1),
|
nn.Conv2d(64, 64, kernel_size=3, stride=1),
|
||||||
nn.LeakyReLU(0.2),
|
nn.LeakyReLU(0.2),
|
||||||
)
|
)
|
||||||
|
|
||||||
out_h = (h - 8) // 4 + 1
|
out_h = (h - 8) // 4 + 1
|
||||||
out_h = (out_h - 4) // 2 + 1
|
out_h = (out_h - 4) // 2 + 1
|
||||||
out_h = (out_h - 3) // 1 + 1
|
out_h = (out_h - 3) // 1 + 1
|
||||||
feat_size = 64 * out_h * out_h
|
feat_size = 64 * out_h * out_h
|
||||||
|
|
||||||
self.fc = nn.Sequential(
|
self.fc = nn.Sequential(
|
||||||
nn.Linear(feat_size, 512),
|
nn.Linear(feat_size, 512),
|
||||||
nn.LeakyReLU(0.2),
|
nn.LeakyReLU(0.2),
|
||||||
)
|
)
|
||||||
self.mu_head = nn.Linear(512, action_dim)
|
self.mu_head = nn.Linear(512, action_dim)
|
||||||
self.log_std_head = nn.Linear(512, action_dim)
|
self.log_std_head = nn.Linear(512, action_dim)
|
||||||
|
|
||||||
for m in self.modules():
|
for m in self.modules():
|
||||||
if isinstance(m, (nn.Conv2d, nn.Linear)):
|
if isinstance(m, (nn.Conv2d, nn.Linear)):
|
||||||
nn.init.orthogonal_(m.weight, gain=np.sqrt(2))
|
nn.init.orthogonal_(m.weight, gain=np.sqrt(2))
|
||||||
if m.bias is not None:
|
if m.bias is not None:
|
||||||
nn.init.constant_(m.bias, 0)
|
nn.init.constant_(m.bias, 0)
|
||||||
|
|
||||||
nn.init.orthogonal_(self.mu_head.weight, gain=0.01)
|
nn.init.orthogonal_(self.mu_head.weight, gain=0.01)
|
||||||
nn.init.orthogonal_(self.log_std_head.weight, gain=0.01)
|
nn.init.orthogonal_(self.log_std_head.weight, gain=0.01)
|
||||||
|
|
||||||
def forward(self, x):
|
def forward(self, x):
|
||||||
x = x / 255.0
|
x = x / 255.0
|
||||||
x = self.conv(x)
|
x = self.conv(x)
|
||||||
@@ -162,7 +159,7 @@ class Critic(nn.Module):
|
|||||||
def __init__(self, state_shape=(84, 84, 4)):
|
def __init__(self, state_shape=(84, 84, 4)):
|
||||||
super().__init__()
|
super().__init__()
|
||||||
c, h, w = state_shape[2], state_shape[0], state_shape[1]
|
c, h, w = state_shape[2], state_shape[0], state_shape[1]
|
||||||
|
|
||||||
self.conv = nn.Sequential(
|
self.conv = nn.Sequential(
|
||||||
nn.Conv2d(c, 32, kernel_size=8, stride=4),
|
nn.Conv2d(c, 32, kernel_size=8, stride=4),
|
||||||
nn.LeakyReLU(0.2),
|
nn.LeakyReLU(0.2),
|
||||||
@@ -173,24 +170,20 @@ class Critic(nn.Module):
|
|||||||
nn.Conv2d(64, 64, kernel_size=3, stride=1),
|
nn.Conv2d(64, 64, kernel_size=3, stride=1),
|
||||||
nn.LeakyReLU(0.2),
|
nn.LeakyReLU(0.2),
|
||||||
)
|
)
|
||||||
|
|
||||||
out_h = (h - 8) // 4 + 1
|
out_h = (h - 8) // 4 + 1
|
||||||
out_h = (out_h - 4) // 2 + 1
|
out_h = (out_h - 4) // 2 + 1
|
||||||
out_h = (out_h - 3) // 1 + 1
|
out_h = (out_h - 3) // 1 + 1
|
||||||
feat_size = 64 * out_h * out_h
|
feat_size = 64 * out_h * out_h
|
||||||
|
|
||||||
self.fc = nn.Sequential(
|
self.fc = nn.Sequential(nn.Linear(feat_size, 512), nn.LeakyReLU(0.2), nn.Linear(512, 1))
|
||||||
nn.Linear(feat_size, 512),
|
|
||||||
nn.LeakyReLU(0.2),
|
|
||||||
nn.Linear(512, 1)
|
|
||||||
)
|
|
||||||
|
|
||||||
for m in self.modules():
|
for m in self.modules():
|
||||||
if isinstance(m, (nn.Conv2d, nn.Linear)):
|
if isinstance(m, (nn.Conv2d, nn.Linear)):
|
||||||
nn.init.orthogonal_(m.weight, gain=np.sqrt(2))
|
nn.init.orthogonal_(m.weight, gain=np.sqrt(2))
|
||||||
if m.bias is not None:
|
if m.bias is not None:
|
||||||
nn.init.constant_(m.bias, 0)
|
nn.init.constant_(m.bias, 0)
|
||||||
|
|
||||||
def forward(self, x):
|
def forward(self, x):
|
||||||
x = x / 255.0
|
x = x / 255.0
|
||||||
x = self.conv(x)
|
x = self.conv(x)
|
||||||
@@ -203,14 +196,14 @@ class RolloutBuffer:
|
|||||||
self.buffer_size = buffer_size
|
self.buffer_size = buffer_size
|
||||||
self.ptr = 0
|
self.ptr = 0
|
||||||
self.size = 0
|
self.size = 0
|
||||||
|
|
||||||
self.states = np.zeros((buffer_size, *state_shape), dtype=np.uint8)
|
self.states = np.zeros((buffer_size, *state_shape), dtype=np.uint8)
|
||||||
self.actions = np.zeros((buffer_size, action_dim), dtype=np.float32)
|
self.actions = np.zeros((buffer_size, action_dim), dtype=np.float32)
|
||||||
self.rewards = np.zeros(buffer_size, dtype=np.float32)
|
self.rewards = np.zeros(buffer_size, dtype=np.float32)
|
||||||
self.dones = np.zeros(buffer_size, dtype=np.bool_)
|
self.dones = np.zeros(buffer_size, dtype=np.bool_)
|
||||||
self.values = np.zeros(buffer_size, dtype=np.float32)
|
self.values = np.zeros(buffer_size, dtype=np.float32)
|
||||||
self.log_probs = np.zeros(buffer_size, dtype=np.float32)
|
self.log_probs = np.zeros(buffer_size, dtype=np.float32)
|
||||||
|
|
||||||
def add(self, state, action, reward, done, value, log_prob):
|
def add(self, state, action, reward, done, value, log_prob):
|
||||||
self.states[self.ptr] = state
|
self.states[self.ptr] = state
|
||||||
self.actions[self.ptr] = action
|
self.actions[self.ptr] = action
|
||||||
@@ -220,34 +213,34 @@ class RolloutBuffer:
|
|||||||
self.log_probs[self.ptr] = log_prob
|
self.log_probs[self.ptr] = log_prob
|
||||||
self.ptr = (self.ptr + 1) % self.buffer_size
|
self.ptr = (self.ptr + 1) % self.buffer_size
|
||||||
self.size = min(self.size + 1, self.buffer_size)
|
self.size = min(self.size + 1, self.buffer_size)
|
||||||
|
|
||||||
def compute_returns(self, last_value, gamma=0.99, gae_lambda=0.98):
|
def compute_returns(self, last_value, gamma=0.99, gae_lambda=0.98):
|
||||||
advantages = np.zeros(self.size, dtype=np.float32)
|
advantages = np.zeros(self.size, dtype=np.float32)
|
||||||
last_gae = 0
|
last_gae = 0
|
||||||
|
|
||||||
for t in reversed(range(self.size)):
|
for t in reversed(range(self.size)):
|
||||||
if t == self.size - 1:
|
if t == self.size - 1:
|
||||||
next_value = last_value
|
next_value = last_value
|
||||||
else:
|
else:
|
||||||
next_value = self.values[t + 1]
|
next_value = self.values[t + 1]
|
||||||
|
|
||||||
delta = self.rewards[t] + gamma * next_value * (1 - self.dones[t]) - self.values[t]
|
delta = self.rewards[t] + gamma * next_value * (1 - self.dones[t]) - self.values[t]
|
||||||
last_gae = delta + gamma * gae_lambda * (1 - self.dones[t]) * last_gae
|
last_gae = delta + gamma * gae_lambda * (1 - self.dones[t]) * last_gae
|
||||||
advantages[t] = last_gae
|
advantages[t] = last_gae
|
||||||
|
|
||||||
returns = advantages + self.values[:self.size]
|
returns = advantages + self.values[: self.size]
|
||||||
return returns, advantages
|
return returns, advantages
|
||||||
|
|
||||||
def get(self):
|
def get(self):
|
||||||
return (
|
return (
|
||||||
self.states[:self.size],
|
self.states[: self.size],
|
||||||
self.actions[:self.size],
|
self.actions[: self.size],
|
||||||
self.rewards[:self.size],
|
self.rewards[: self.size],
|
||||||
self.dones[:self.size],
|
self.dones[: self.size],
|
||||||
self.values[:self.size],
|
self.values[: self.size],
|
||||||
self.log_probs[:self.size],
|
self.log_probs[: self.size],
|
||||||
)
|
)
|
||||||
|
|
||||||
def reset(self):
|
def reset(self):
|
||||||
self.ptr = 0
|
self.ptr = 0
|
||||||
self.size = 0
|
self.size = 0
|
||||||
@@ -282,55 +275,53 @@ class PPOTrainer:
|
|||||||
self.max_grad_norm = max_grad_norm
|
self.max_grad_norm = max_grad_norm
|
||||||
self.ppo_epochs = ppo_epochs
|
self.ppo_epochs = ppo_epochs
|
||||||
self.mini_batch_size = mini_batch_size
|
self.mini_batch_size = mini_batch_size
|
||||||
|
|
||||||
self.actor_optim = torch.optim.Adam(actor.parameters(), lr=lr, eps=1e-5)
|
self.actor_optim = torch.optim.Adam(actor.parameters(), lr=lr, eps=1e-5)
|
||||||
self.critic_optim = torch.optim.Adam(critic.parameters(), lr=lr, eps=1e-5)
|
self.critic_optim = torch.optim.Adam(critic.parameters(), lr=lr, eps=1e-5)
|
||||||
|
|
||||||
self.total_updates = 0
|
|
||||||
|
|
||||||
def update(self, last_value):
|
def update(self, last_value):
|
||||||
states, actions, rewards, dones, values, log_probs_old = self.buffer.get()
|
states, actions, rewards, dones, values, log_probs_old = self.buffer.get()
|
||||||
|
|
||||||
returns, advantages = self.buffer.compute_returns(last_value, self.gamma, self.gae_lambda)
|
returns, advantages = self.buffer.compute_returns(last_value, self.gamma, self.gae_lambda)
|
||||||
|
|
||||||
advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-8)
|
advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-8)
|
||||||
|
|
||||||
states_t = torch.from_numpy(states).float().permute(0, 3, 1, 2).to(self.device)
|
states_t = torch.from_numpy(states).float().permute(0, 3, 1, 2).to(self.device)
|
||||||
actions_t = torch.from_numpy(actions).float().to(self.device)
|
actions_t = torch.from_numpy(actions).float().to(self.device)
|
||||||
log_probs_old_t = torch.from_numpy(log_probs_old).float().to(self.device)
|
log_probs_old_t = torch.from_numpy(log_probs_old).float().to(self.device)
|
||||||
returns_t = torch.from_numpy(returns).float().to(self.device)
|
returns_t = torch.from_numpy(returns).float().to(self.device)
|
||||||
advantages_t = torch.from_numpy(advantages).float().to(self.device)
|
advantages_t = torch.from_numpy(advantages).float().to(self.device)
|
||||||
|
|
||||||
dataset = torch.utils.data.TensorDataset(
|
dataset = torch.utils.data.TensorDataset(
|
||||||
states_t, actions_t, log_probs_old_t, returns_t, advantages_t
|
states_t, actions_t, log_probs_old_t, returns_t, advantages_t
|
||||||
)
|
)
|
||||||
loader = torch.utils.data.DataLoader(dataset, batch_size=self.mini_batch_size, shuffle=True)
|
loader = torch.utils.data.DataLoader(dataset, batch_size=self.mini_batch_size, shuffle=True)
|
||||||
|
|
||||||
total_actor_loss = 0
|
total_actor_loss = 0
|
||||||
total_critic_loss = 0
|
total_critic_loss = 0
|
||||||
total_entropy = 0
|
total_entropy = 0
|
||||||
count = 0
|
count = 0
|
||||||
|
|
||||||
for _ in range(self.ppo_epochs):
|
for _ in range(self.ppo_epochs):
|
||||||
for batch in loader:
|
for batch in loader:
|
||||||
s, a, log_pi_old, ret, adv = batch
|
s, a, log_pi_old, ret, adv = batch
|
||||||
|
|
||||||
mu, std = self.actor(s)
|
mu, std = self.actor(s)
|
||||||
dist = torch.distributions.Normal(mu, std)
|
dist = torch.distributions.Normal(mu, std)
|
||||||
log_pi = dist.log_prob(a).sum(dim=-1)
|
log_pi = dist.log_prob(a).sum(dim=-1)
|
||||||
entropy = dist.entropy().sum(dim=-1)
|
entropy = dist.entropy().sum(dim=-1)
|
||||||
|
|
||||||
ratio = torch.exp(log_pi - log_pi_old)
|
ratio = torch.exp(log_pi - log_pi_old)
|
||||||
|
|
||||||
surr1 = ratio * adv
|
surr1 = ratio * adv
|
||||||
surr2 = torch.clamp(ratio, 1 - self.clip_eps, 1 + self.clip_eps) * adv
|
surr2 = torch.clamp(ratio, 1 - self.clip_eps, 1 + self.clip_eps) * adv
|
||||||
actor_loss = -torch.min(surr1, surr2).mean()
|
actor_loss = -torch.min(surr1, surr2).mean()
|
||||||
|
|
||||||
value = self.critic(s)
|
value = self.critic(s)
|
||||||
critic_loss = nn.MSELoss()(value.squeeze(), ret)
|
critic_loss = nn.MSELoss()(value.squeeze(), ret)
|
||||||
|
|
||||||
loss = actor_loss + self.vf_coef * critic_loss - self.ent_coef * entropy.mean()
|
loss = actor_loss + self.vf_coef * critic_loss - self.ent_coef * entropy.mean()
|
||||||
|
|
||||||
self.actor_optim.zero_grad()
|
self.actor_optim.zero_grad()
|
||||||
self.critic_optim.zero_grad()
|
self.critic_optim.zero_grad()
|
||||||
loss.backward()
|
loss.backward()
|
||||||
@@ -338,18 +329,16 @@ class PPOTrainer:
|
|||||||
nn.utils.clip_grad_norm_(self.critic.parameters(), self.max_grad_norm)
|
nn.utils.clip_grad_norm_(self.critic.parameters(), self.max_grad_norm)
|
||||||
self.actor_optim.step()
|
self.actor_optim.step()
|
||||||
self.critic_optim.step()
|
self.critic_optim.step()
|
||||||
|
|
||||||
total_actor_loss += actor_loss.item()
|
total_actor_loss += actor_loss.item()
|
||||||
total_critic_loss += critic_loss.item()
|
total_critic_loss += critic_loss.item()
|
||||||
total_entropy += entropy.mean().item()
|
total_entropy += entropy.mean().item()
|
||||||
count += 1
|
count += 1
|
||||||
|
|
||||||
self.total_updates += 1
|
|
||||||
|
|
||||||
avg_actor = total_actor_loss / count
|
avg_actor = total_actor_loss / count
|
||||||
avg_critic = total_critic_loss / count
|
avg_critic = total_critic_loss / count
|
||||||
avg_entropy = total_entropy / count
|
avg_entropy = total_entropy / count
|
||||||
|
|
||||||
self.buffer.reset()
|
self.buffer.reset()
|
||||||
return avg_actor, avg_critic, avg_entropy
|
return avg_actor, avg_critic, avg_entropy
|
||||||
|
|
||||||
@@ -357,10 +346,10 @@ class PPOTrainer:
|
|||||||
def collect_rollout(actor, critic, env, buffer, device, rollout_steps):
|
def collect_rollout(actor, critic, env, buffer, device, rollout_steps):
|
||||||
obs, _ = env.reset()
|
obs, _ = env.reset()
|
||||||
obs = np.transpose(obs, (1, 2, 0))
|
obs = np.transpose(obs, (1, 2, 0))
|
||||||
|
|
||||||
for step in range(rollout_steps):
|
for step in range(rollout_steps):
|
||||||
obs_t = torch.from_numpy(obs).float().unsqueeze(0).permute(0, 3, 1, 2).to(device)
|
obs_t = torch.from_numpy(obs).float().unsqueeze(0).permute(0, 3, 1, 2).to(device)
|
||||||
|
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
mu, std = actor(obs_t)
|
mu, std = actor(obs_t)
|
||||||
dist = torch.distributions.Normal(mu, std)
|
dist = torch.distributions.Normal(mu, std)
|
||||||
@@ -368,27 +357,27 @@ def collect_rollout(actor, critic, env, buffer, device, rollout_steps):
|
|||||||
action = torch.clamp(action, -1, 1)
|
action = torch.clamp(action, -1, 1)
|
||||||
log_prob = dist.log_prob(action).sum(dim=-1)
|
log_prob = dist.log_prob(action).sum(dim=-1)
|
||||||
value = critic(obs_t).squeeze(0).item()
|
value = critic(obs_t).squeeze(0).item()
|
||||||
|
|
||||||
action_np = action.squeeze(0).cpu().numpy()
|
action_np = action.squeeze(0).cpu().numpy()
|
||||||
log_prob_np = log_prob.squeeze(0).cpu().numpy()
|
log_prob_np = log_prob.squeeze(0).cpu().numpy()
|
||||||
|
|
||||||
next_obs, reward, terminated, truncated, _ = env.step(action_np)
|
next_obs, reward, terminated, truncated, _ = env.step(action_np)
|
||||||
done = terminated or truncated
|
done = terminated or truncated
|
||||||
|
|
||||||
next_obs_stored = np.transpose(next_obs, (1, 2, 0))
|
next_obs_stored = np.transpose(next_obs, (1, 2, 0))
|
||||||
|
|
||||||
buffer.add(obs.copy(), action_np, reward, done, value, log_prob_np)
|
buffer.add(obs.copy(), action_np, reward, done, value, log_prob_np)
|
||||||
|
|
||||||
obs = next_obs_stored
|
obs = next_obs_stored
|
||||||
|
|
||||||
if done:
|
if done:
|
||||||
obs, _ = env.reset()
|
obs, _ = env.reset()
|
||||||
obs = np.transpose(obs, (1, 2, 0))
|
obs = np.transpose(obs, (1, 2, 0))
|
||||||
|
|
||||||
return obs
|
return obs
|
||||||
|
|
||||||
|
|
||||||
def train_improved(
|
def train(
|
||||||
total_steps=2000000,
|
total_steps=2000000,
|
||||||
rollout_steps=2048,
|
rollout_steps=2048,
|
||||||
eval_interval=10,
|
eval_interval=10,
|
||||||
@@ -397,22 +386,22 @@ def train_improved(
|
|||||||
):
|
):
|
||||||
if device is None:
|
if device is None:
|
||||||
device = get_device()
|
device = get_device()
|
||||||
|
|
||||||
env = make_env()
|
env = make_env()
|
||||||
eval_env = make_env()
|
eval_env = make_env()
|
||||||
|
|
||||||
state_shape = (84, 84, 4)
|
state_shape = (84, 84, 4)
|
||||||
action_dim = 3
|
action_dim = 3
|
||||||
|
|
||||||
actor = Actor(state_shape=state_shape, action_dim=action_dim).to(device)
|
actor = Actor(state_shape=state_shape, action_dim=action_dim).to(device)
|
||||||
critic = Critic(state_shape=state_shape).to(device)
|
critic = Critic(state_shape=state_shape).to(device)
|
||||||
|
|
||||||
buffer = RolloutBuffer(
|
buffer = RolloutBuffer(
|
||||||
buffer_size=rollout_steps,
|
buffer_size=rollout_steps,
|
||||||
state_shape=state_shape,
|
state_shape=state_shape,
|
||||||
action_dim=action_dim,
|
action_dim=action_dim,
|
||||||
)
|
)
|
||||||
|
|
||||||
trainer = PPOTrainer(
|
trainer = PPOTrainer(
|
||||||
actor=actor,
|
actor=actor,
|
||||||
critic=critic,
|
critic=critic,
|
||||||
@@ -428,46 +417,48 @@ def train_improved(
|
|||||||
ppo_epochs=10,
|
ppo_epochs=10,
|
||||||
mini_batch_size=128,
|
mini_batch_size=128,
|
||||||
)
|
)
|
||||||
|
|
||||||
log_dir = os.path.join("logs", "tensorboard", f"run_improved_{int(time.time())}")
|
log_dir = os.path.join("logs", "tensorboard", f"run_improved_{int(time.time())}")
|
||||||
writer = SummaryWriter(log_dir)
|
writer = SummaryWriter(log_dir)
|
||||||
|
|
||||||
print(f"Training on {device}")
|
print(f"Training on {device}")
|
||||||
print(f"Log directory: {log_dir}")
|
print(f"Log directory: {log_dir}")
|
||||||
print("Improvements: LeakyReLU, BatchNorm, He init, Reward shaping, LR decay, More epochs")
|
print("Improvements: LeakyReLU, BatchNorm, He init, Reward shaping, More epochs")
|
||||||
|
|
||||||
episode = 0
|
episode = 0
|
||||||
total_timesteps = 0
|
total_timesteps = 0
|
||||||
episode_rewards = []
|
episode_rewards = []
|
||||||
best_eval = -float('inf')
|
best_eval = -float("inf")
|
||||||
|
|
||||||
while total_timesteps < total_steps:
|
while total_timesteps < total_steps:
|
||||||
obs = collect_rollout(actor, critic, env, buffer, device, rollout_steps)
|
obs = collect_rollout(actor, critic, env, buffer, device, rollout_steps)
|
||||||
|
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
obs_t = torch.from_numpy(obs).float().unsqueeze(0).permute(0, 3, 1, 2).to(device)
|
obs_t = torch.from_numpy(obs).float().unsqueeze(0).permute(0, 3, 1, 2).to(device)
|
||||||
last_value = critic(obs_t).squeeze(0).item()
|
last_value = critic(obs_t).squeeze(0).item()
|
||||||
|
|
||||||
actor_loss, critic_loss, entropy = trainer.update(last_value)
|
actor_loss, critic_loss, entropy = trainer.update(last_value)
|
||||||
|
|
||||||
writer.add_scalar("Loss/Actor", actor_loss, total_timesteps)
|
writer.add_scalar("Loss/Actor", actor_loss, total_timesteps)
|
||||||
writer.add_scalar("Loss/Critic", critic_loss, total_timesteps)
|
writer.add_scalar("Loss/Critic", critic_loss, total_timesteps)
|
||||||
writer.add_scalar("Loss/Entropy", entropy, total_timesteps)
|
writer.add_scalar("Loss/Entropy", entropy, total_timesteps)
|
||||||
|
|
||||||
total_timesteps += rollout_steps
|
total_timesteps += rollout_steps
|
||||||
episode += 1
|
episode += 1
|
||||||
|
|
||||||
ep_reward = buffer.rewards[:buffer.size].sum()
|
ep_reward = buffer.rewards[: buffer.size].sum()
|
||||||
episode_rewards.append(ep_reward)
|
episode_rewards.append(ep_reward)
|
||||||
|
|
||||||
recent_rewards = episode_rewards[-10:] if len(episode_rewards) >= 10 else episode_rewards
|
recent_rewards = episode_rewards[-10:] if len(episode_rewards) >= 10 else episode_rewards
|
||||||
avg_reward = np.mean(recent_rewards)
|
avg_reward = np.mean(recent_rewards)
|
||||||
|
|
||||||
writer.add_scalar("Reward/Episode", ep_reward, total_timesteps)
|
writer.add_scalar("Reward/Episode", ep_reward, total_timesteps)
|
||||||
writer.add_scalar("Reward/AvgLast10", avg_reward, total_timesteps)
|
writer.add_scalar("Reward/AvgLast10", avg_reward, total_timesteps)
|
||||||
|
|
||||||
print(f"Episode {episode}, steps {total_timesteps}, ep_reward={ep_reward:.1f}, avg_10={avg_reward:.1f}")
|
print(
|
||||||
|
f"Episode {episode}, steps {total_timesteps}, ep_reward={ep_reward:.1f}, avg_10={avg_reward:.1f}"
|
||||||
|
)
|
||||||
|
|
||||||
if episode % eval_interval == 0:
|
if episode % eval_interval == 0:
|
||||||
eval_returns = []
|
eval_returns = []
|
||||||
for _ in range(5):
|
for _ in range(5):
|
||||||
@@ -475,54 +466,69 @@ def train_improved(
|
|||||||
eval_obs = np.transpose(eval_obs, (1, 2, 0))
|
eval_obs = np.transpose(eval_obs, (1, 2, 0))
|
||||||
eval_reward = 0
|
eval_reward = 0
|
||||||
done = False
|
done = False
|
||||||
|
|
||||||
while not done:
|
while not done:
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
eval_obs_t = torch.from_numpy(eval_obs).float().unsqueeze(0).permute(0, 3, 1, 2).to(device)
|
eval_obs_t = (
|
||||||
|
torch.from_numpy(eval_obs)
|
||||||
|
.float()
|
||||||
|
.unsqueeze(0)
|
||||||
|
.permute(0, 3, 1, 2)
|
||||||
|
.to(device)
|
||||||
|
)
|
||||||
mu, std = actor(eval_obs_t)
|
mu, std = actor(eval_obs_t)
|
||||||
action = torch.clamp(mu, -1, 1).squeeze(0).cpu().numpy()
|
action = torch.clamp(mu, -1, 1).squeeze(0).cpu().numpy()
|
||||||
eval_obs, reward, terminated, truncated, _ = eval_env.step(action)
|
eval_obs, reward, terminated, truncated, _ = eval_env.step(action)
|
||||||
eval_obs = np.transpose(eval_obs, (1, 2, 0))
|
eval_obs = np.transpose(eval_obs, (1, 2, 0))
|
||||||
eval_reward += reward
|
eval_reward += reward
|
||||||
done = terminated or truncated
|
done = terminated or truncated
|
||||||
|
|
||||||
eval_returns.append(eval_reward)
|
eval_returns.append(eval_reward)
|
||||||
|
|
||||||
mean_eval = np.mean(eval_returns)
|
mean_eval = np.mean(eval_returns)
|
||||||
writer.add_scalar("Eval/MeanReturn", mean_eval, episode)
|
writer.add_scalar("Eval/MeanReturn", mean_eval, episode)
|
||||||
print(f" Eval: mean_return={mean_eval:.2f}")
|
print(f" Eval: mean_return={mean_eval:.2f}")
|
||||||
|
|
||||||
if mean_eval > best_eval:
|
if mean_eval > best_eval:
|
||||||
best_eval = mean_eval
|
best_eval = mean_eval
|
||||||
os.makedirs("models", exist_ok=True)
|
os.makedirs("models", exist_ok=True)
|
||||||
torch.save({
|
torch.save(
|
||||||
|
{
|
||||||
|
"actor": actor.state_dict(),
|
||||||
|
"critic": critic.state_dict(),
|
||||||
|
"episode": episode,
|
||||||
|
"timesteps": total_timesteps,
|
||||||
|
"best_eval": best_eval,
|
||||||
|
},
|
||||||
|
os.path.join("models", "ppo_improved_best.pt"),
|
||||||
|
)
|
||||||
|
print(f" New best model saved! eval={best_eval:.2f}")
|
||||||
|
|
||||||
|
if episode % save_interval == 0:
|
||||||
|
os.makedirs("models", exist_ok=True)
|
||||||
|
torch.save(
|
||||||
|
{
|
||||||
"actor": actor.state_dict(),
|
"actor": actor.state_dict(),
|
||||||
"critic": critic.state_dict(),
|
"critic": critic.state_dict(),
|
||||||
"episode": episode,
|
"episode": episode,
|
||||||
"timesteps": total_timesteps,
|
"timesteps": total_timesteps,
|
||||||
"best_eval": best_eval,
|
},
|
||||||
}, os.path.join("models", "ppo_improved_best.pt"))
|
os.path.join("models", f"ppo_improved_ep{episode}.pt"),
|
||||||
print(f" New best model saved! eval={best_eval:.2f}")
|
)
|
||||||
|
|
||||||
if episode % save_interval == 0:
|
|
||||||
os.makedirs("models", exist_ok=True)
|
|
||||||
torch.save({
|
|
||||||
"actor": actor.state_dict(),
|
|
||||||
"critic": critic.state_dict(),
|
|
||||||
"episode": episode,
|
|
||||||
"timesteps": total_timesteps,
|
|
||||||
}, os.path.join("models", f"ppo_improved_ep{episode}.pt"))
|
|
||||||
print(f" Saved model at episode {episode}")
|
print(f" Saved model at episode {episode}")
|
||||||
|
|
||||||
os.makedirs("models", exist_ok=True)
|
os.makedirs("models", exist_ok=True)
|
||||||
torch.save({
|
torch.save(
|
||||||
"actor": actor.state_dict(),
|
{
|
||||||
"critic": critic.state_dict(),
|
"actor": actor.state_dict(),
|
||||||
"episode": episode,
|
"critic": critic.state_dict(),
|
||||||
"timesteps": total_timesteps,
|
"episode": episode,
|
||||||
"best_eval": best_eval,
|
"timesteps": total_timesteps,
|
||||||
}, os.path.join("models", "ppo_improved_final.pt"))
|
"best_eval": best_eval,
|
||||||
|
},
|
||||||
|
os.path.join("models", "ppo_improved_final.pt"),
|
||||||
|
)
|
||||||
|
|
||||||
writer.close()
|
writer.close()
|
||||||
env.close()
|
env.close()
|
||||||
eval_env.close()
|
eval_env.close()
|
||||||
@@ -534,6 +540,6 @@ if __name__ == "__main__":
|
|||||||
parser.add_argument("--steps", type=int, default=2000000, help="Total training steps")
|
parser.add_argument("--steps", type=int, default=2000000, help="Total training steps")
|
||||||
parser.add_argument("--rollout", type=int, default=2048, help="Rollout buffer size")
|
parser.add_argument("--rollout", type=int, default=2048, help="Rollout buffer size")
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
device = get_device()
|
device = get_device()
|
||||||
train_improved(total_steps=args.steps, rollout_steps=args.rollout, device=device)
|
train(total_steps=args.steps, rollout_steps=args.rollout, device=device)
|
||||||
|
|||||||
+250
-250
@@ -1,251 +1,251 @@
|
|||||||
XJTLU Entrepreneur College (Taicang) Cover Sheet
|
XJTLU Entrepreneur College (Taicang) Cover Sheet
|
||||||
|
|
||||||
Module code and Title DTS307TC Reinforcement Learning
|
Module code and Title DTS307TC Reinforcement Learning
|
||||||
School Title School of AI and Advanced Computing
|
School Title School of AI and Advanced Computing
|
||||||
Assignment Title Coursework 1
|
Assignment Title Coursework 1
|
||||||
Submission Deadline 04/May/2026 23:59
|
Submission Deadline 04/May/2026 23:59
|
||||||
Final Word Count
|
Final Word Count
|
||||||
If you agree to let the university use your work anonymously for teaching
|
If you agree to let the university use your work anonymously for teaching
|
||||||
and learning purposes, please type “yes” here.
|
and learning purposes, please type “yes” here.
|
||||||
|
|
||||||
|
|
||||||
I certify that I have read and understood the University’s Policy for dealing with Plagiarism,
|
I certify that I have read and understood the University’s Policy for dealing with Plagiarism,
|
||||||
Collusion and the Fabrication of Data (available on Learning Mall Online). With reference to this
|
Collusion and the Fabrication of Data (available on Learning Mall Online). With reference to this
|
||||||
policy I certify that:
|
policy I certify that:
|
||||||
|
|
||||||
• My work does not contain any instances of plagiarism and/or collusion.
|
• My work does not contain any instances of plagiarism and/or collusion.
|
||||||
My work does not contain any fabricated data.
|
My work does not contain any fabricated data.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
By uploading my assignment onto Learning Mall Online, I formally declare
|
By uploading my assignment onto Learning Mall Online, I formally declare
|
||||||
that all of the above information is true to the best of my knowledge and
|
that all of the above information is true to the best of my knowledge and
|
||||||
belief.
|
belief.
|
||||||
Scoring – For Tutor Use
|
Scoring – For Tutor Use
|
||||||
Student ID
|
Student ID
|
||||||
|
|
||||||
Stage of Marker Learning Outcomes Achieved (F/P/M/D) Final
|
Stage of Marker Learning Outcomes Achieved (F/P/M/D) Final
|
||||||
Marking Code (please modify as appropriate) Score
|
Marking Code (please modify as appropriate) Score
|
||||||
A B C
|
A B C
|
||||||
1st Marker – red
|
1st Marker – red
|
||||||
pen
|
pen
|
||||||
Moderation The original mark has been accepted by the moderator Y/N
|
Moderation The original mark has been accepted by the moderator Y/N
|
||||||
IM (please circle as appropriate):
|
IM (please circle as appropriate):
|
||||||
– green pen Initials
|
– green pen Initials
|
||||||
Data entry and score calculation have been checked by Y
|
Data entry and score calculation have been checked by Y
|
||||||
another tutor (please circle):
|
another tutor (please circle):
|
||||||
2nd Marker if
|
2nd Marker if
|
||||||
needed – green
|
needed – green
|
||||||
pen
|
pen
|
||||||
For Academic Office Use Possible Academic Infringement (please tick as appropriate)
|
For Academic Office Use Possible Academic Infringement (please tick as appropriate)
|
||||||
Date Days Late ☐ Category A
|
Date Days Late ☐ Category A
|
||||||
Received late Penalty Total Academic Infringement Penalty
|
Received late Penalty Total Academic Infringement Penalty
|
||||||
☐ Category B (A,B, C, D, E, Please modify where
|
☐ Category B (A,B, C, D, E, Please modify where
|
||||||
necessary) _____________________
|
necessary) _____________________
|
||||||
☐ Category C
|
☐ Category C
|
||||||
☐ Category D
|
☐ Category D
|
||||||
☐ Category E
|
☐ Category E
|
||||||
School of Artificial Intelligence and Advanced Computing
|
School of Artificial Intelligence and Advanced Computing
|
||||||
Xi’an Jiaotong-Liverpool University
|
Xi’an Jiaotong-Liverpool University
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
DTS307TC Reinforcement Learning
|
DTS307TC Reinforcement Learning
|
||||||
|
|
||||||
Coursework - Individual Report
|
Coursework - Individual Report
|
||||||
|
|
||||||
Due: 04/May/2026 23:59
|
Due: 04/May/2026 23:59
|
||||||
Weight: 40%
|
Weight: 40%
|
||||||
Maximum score: 40 marks
|
Maximum score: 40 marks
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
|
|
||||||
The purpose of this assignment is to gain experience in Python programming and the design of
|
The purpose of this assignment is to gain experience in Python programming and the design of
|
||||||
reinforcement leaning algorithms. You are expected to implement an RL algorithm that solves a
|
reinforcement leaning algorithms. You are expected to implement an RL algorithm that solves a
|
||||||
specific environment and provide an explanation of the algorithm’s methodology. You are expected
|
specific environment and provide an explanation of the algorithm’s methodology. You are expected
|
||||||
to analyse your results, including challenges and your solutions.
|
to analyse your results, including challenges and your solutions.
|
||||||
|
|
||||||
|
|
||||||
Learning Outcomes Assessed
|
Learning Outcomes Assessed
|
||||||
|
|
||||||
A: Systematically understand the fundamental concepts and principles of reinforcement learning
|
A: Systematically understand the fundamental concepts and principles of reinforcement learning
|
||||||
B: Critically analyse real-life problem situations and expertly map them as reinforcement learning
|
B: Critically analyse real-life problem situations and expertly map them as reinforcement learning
|
||||||
tasks.
|
tasks.
|
||||||
C: Mastery of Monte Carlo Methods and Temporal Difference Learning
|
C: Mastery of Monte Carlo Methods and Temporal Difference Learning
|
||||||
D: Proficiency in Deep Reinforcement Learning algorithms
|
D: Proficiency in Deep Reinforcement Learning algorithms
|
||||||
|
|
||||||
|
|
||||||
Late policy
|
Late policy
|
||||||
|
|
||||||
5% of the total marks available for the assessment shall be deducted from the assessment mark for
|
5% of the total marks available for the assessment shall be deducted from the assessment mark for
|
||||||
each working day after the submission date, up to a maximum of five working days
|
each working day after the submission date, up to a maximum of five working days
|
||||||
|
|
||||||
|
|
||||||
Avoid Plagiarism
|
Avoid Plagiarism
|
||||||
|
|
||||||
• Do not submit work from other students.
|
• Do not submit work from other students.
|
||||||
|
|
||||||
• Do not share code/work with other students
|
• Do not share code/work with other students
|
||||||
|
|
||||||
• Do not use open-source code as it is or without proper reference.
|
• Do not use open-source code as it is or without proper reference.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
2
|
2
|
||||||
Risks
|
Risks
|
||||||
|
|
||||||
• Please read the coursework instructions and requirements carefully. Not following these instructions
|
• Please read the coursework instructions and requirements carefully. Not following these instructions
|
||||||
and requirements may result in a loss of marks.
|
and requirements may result in a loss of marks.
|
||||||
• The assignment must be submitted via Learning Mall. Only electronic submission is accepted
|
• The assignment must be submitted via Learning Mall. Only electronic submission is accepted
|
||||||
and no hard copy submission.
|
and no hard copy submission.
|
||||||
• All students must download their file and check that it is viewable after submission. Documents
|
• All students must download their file and check that it is viewable after submission. Documents
|
||||||
may become corrupted during the uploading process (e.g. due to slow internet connections).
|
may become corrupted during the uploading process (e.g. due to slow internet connections).
|
||||||
However, students are responsible for submitting a functional and correct file for assessments.
|
However, students are responsible for submitting a functional and correct file for assessments.
|
||||||
• Academic Integrity Policy is strictly followed.
|
• Academic Integrity Policy is strictly followed.
|
||||||
|
|
||||||
|
|
||||||
Individual Report (40 marks)
|
Individual Report (40 marks)
|
||||||
|
|
||||||
The primary objective of this coursework is to familiarize students with the PPO algorithm using
|
The primary objective of this coursework is to familiarize students with the PPO algorithm using
|
||||||
basic deep learning libraries, enabling them to improve their capability in transferring mathematical
|
basic deep learning libraries, enabling them to improve their capability in transferring mathematical
|
||||||
and theoretical knowledge into Python implementation, and further their understanding of the actor-
|
and theoretical knowledge into Python implementation, and further their understanding of the actor-
|
||||||
critic algorithm.
|
critic algorithm.
|
||||||
|
|
||||||
|
|
||||||
Algorithm Overview
|
Algorithm Overview
|
||||||
|
|
||||||
Proximal Policy Optimization (PPO) is a state-of-the-art reinforcement learning algorithm that optimizes
|
Proximal Policy Optimization (PPO) is a state-of-the-art reinforcement learning algorithm that optimizes
|
||||||
a stochastic policy in an on-policy manner. To ensure stable training and avoid catastrophic performance
|
a stochastic policy in an on-policy manner. To ensure stable training and avoid catastrophic performance
|
||||||
collapse, PPO utilizes a clipped surrogate objective to prevent the policy update from stepping too
|
collapse, PPO utilizes a clipped surrogate objective to prevent the policy update from stepping too
|
||||||
far from the current behavior.
|
far from the current behavior.
|
||||||
|
|
||||||
|
|
||||||
The Environment: CarRacing-v3
|
The Environment: CarRacing-v3
|
||||||
|
|
||||||
We will be using the Car Racing environment from the OpenAI Gymnasium. This environment
|
We will be using the Car Racing environment from the OpenAI Gymnasium. This environment
|
||||||
features a top-down racing track where the agent must learn to navigate through tiles based on
|
features a top-down racing track where the agent must learn to navigate through tiles based on
|
||||||
pixel inputs. You can find more details about this environment on their website.(https://gymnasium.
|
pixel inputs. You can find more details about this environment on their website.(https://gymnasium.
|
||||||
farama.org/environments/box2d/car_racing/)
|
farama.org/environments/box2d/car_racing/)
|
||||||
Here’s a code snippet for you to get started:
|
Here’s a code snippet for you to get started:
|
||||||
|
|
||||||
import gymnasium as gym
|
import gymnasium as gym
|
||||||
env = gym . make ( " CarRacing - v3 " , render_mode = " rgb_array " )
|
env = gym . make ( " CarRacing - v3 " , render_mode = " rgb_array " )
|
||||||
env . reset ()
|
env . reset ()
|
||||||
|
|
||||||
Since CarRacing-v3 is quite computationally expensive for a standard laptop (due to the pixel processing),
|
Since CarRacing-v3 is quite computationally expensive for a standard laptop (due to the pixel processing),
|
||||||
you might want to consider using a gray-scaling or frame-stacking wrapper to speed up training.
|
you might want to consider using a gray-scaling or frame-stacking wrapper to speed up training.
|
||||||
Alternatively, you can also use the lab computers, which have GPUs and have all the environment
|
Alternatively, you can also use the lab computers, which have GPUs and have all the environment
|
||||||
already set up.
|
already set up.
|
||||||
|
|
||||||
|
|
||||||
The PPO Agent
|
The PPO Agent
|
||||||
|
|
||||||
You will implement an RL agent using PPO to play the CarRacing-v3 environment. The agent
|
You will implement an RL agent using PPO to play the CarRacing-v3 environment. The agent
|
||||||
will use the standard observation and actions provided by the environment. You may edit the
|
will use the standard observation and actions provided by the environment. You may edit the
|
||||||
|
|
||||||
3
|
3
|
||||||
environment to speed up your training, but your agent must still perform well in the standard
|
environment to speed up your training, but your agent must still perform well in the standard
|
||||||
environment. (i.e, removing the camera zoom at the beginning is allowed during training, but
|
environment. (i.e, removing the camera zoom at the beginning is allowed during training, but
|
||||||
your agent should still be tested in the original environment.) You should record your training and
|
your agent should still be tested in the original environment.) You should record your training and
|
||||||
evaluation process using Tensorboard. You should also record important losses and other data for
|
evaluation process using Tensorboard. You should also record important losses and other data for
|
||||||
your analysis later.
|
your analysis later.
|
||||||
|
|
||||||
|
|
||||||
The Report
|
The Report
|
||||||
|
|
||||||
Upon completion of your implementation, you are required to submit a comprehensive technical
|
Upon completion of your implementation, you are required to submit a comprehensive technical
|
||||||
report. The report should document your engineering decisions, the theoretical grounding of your
|
report. The report should document your engineering decisions, the theoretical grounding of your
|
||||||
code, and a critical analysis of the agent’s performance.
|
code, and a critical analysis of the agent’s performance.
|
||||||
|
|
||||||
1. Introduction
|
1. Introduction
|
||||||
|
|
||||||
• Provide a brief overview of Reinforcement Learning in the context of the CarRacing-v3
|
• Provide a brief overview of Reinforcement Learning in the context of the CarRacing-v3
|
||||||
environment.
|
environment.
|
||||||
• Define the state space (pixels), action space (discrete commands), and the reward structure
|
• Define the state space (pixels), action space (discrete commands), and the reward structure
|
||||||
of the task.
|
of the task.
|
||||||
|
|
||||||
2. Methodology
|
2. Methodology
|
||||||
|
|
||||||
• Mathematical Foundation: Formulate the PPO objective function. Explain the significance
|
• Mathematical Foundation: Formulate the PPO objective function. Explain the significance
|
||||||
of the clipping parameter and the probability ratio.
|
of the clipping parameter and the probability ratio.
|
||||||
• Advantage Estimation: Describe your method for calculating advantages (e.g., standard
|
• Advantage Estimation: Describe your method for calculating advantages (e.g., standard
|
||||||
advantage vs. Generalized Advantage Estimation (GAE)).
|
advantage vs. Generalized Advantage Estimation (GAE)).
|
||||||
|
|
||||||
3. Implementation Details
|
3. Implementation Details
|
||||||
|
|
||||||
• Describe your implementation, including any challenges faced and how you addressed
|
• Describe your implementation, including any challenges faced and how you addressed
|
||||||
them.
|
them.
|
||||||
• Explain the structure of your policy and value networks.
|
• Explain the structure of your policy and value networks.
|
||||||
• Detail the training process and hyperparameters used.
|
• Detail the training process and hyperparameters used.
|
||||||
|
|
||||||
4. Results and Analysis
|
4. Results and Analysis
|
||||||
|
|
||||||
• Present your results (use graphs for better clarity).
|
• Present your results (use graphs for better clarity).
|
||||||
• Discuss the performance of your agent and any trends observed.
|
• Discuss the performance of your agent and any trends observed.
|
||||||
• Briefly compare your custom implementation’s stability and sample efficiency against baseline
|
• Briefly compare your custom implementation’s stability and sample efficiency against baseline
|
||||||
benchmarks (e.g., Stable-Baselines3).
|
benchmarks (e.g., Stable-Baselines3).
|
||||||
|
|
||||||
5. Conclusion
|
5. Conclusion
|
||||||
|
|
||||||
• Summarize your key findings regarding the sensitivity of PPO to hyperparameter tuning
|
• Summarize your key findings regarding the sensitivity of PPO to hyperparameter tuning
|
||||||
and the effectiveness of the actor-critic framework in continuous-input environments.
|
and the effectiveness of the actor-critic framework in continuous-input environments.
|
||||||
|
|
||||||
Note: All figures and plots must be clearly labeled with axes titles and legends. Raw code
|
Note: All figures and plots must be clearly labeled with axes titles and legends. Raw code
|
||||||
snippets should be kept to a minimum in the report; focus on high-level logic and pseudo-
|
snippets should be kept to a minimum in the report; focus on high-level logic and pseudo-
|
||||||
code where necessary.
|
code where necessary.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
4
|
4
|
||||||
Important Note
|
Important Note
|
||||||
|
|
||||||
• Do NOT use Stable-baselines libraries or any other reinforcement learning specific libraries in
|
• Do NOT use Stable-baselines libraries or any other reinforcement learning specific libraries in
|
||||||
your implementation (You may use tensorboard for recording your results).
|
your implementation (You may use tensorboard for recording your results).
|
||||||
|
|
||||||
• Do NOT exceed the word count limit of 3000 words for each report, reference and appendix
|
• Do NOT exceed the word count limit of 3000 words for each report, reference and appendix
|
||||||
excluded.
|
excluded.
|
||||||
|
|
||||||
• Although you are allowed to use any generative AI tools to assist your work, please keep in mind
|
• Although you are allowed to use any generative AI tools to assist your work, please keep in mind
|
||||||
that you should be using them responsibly. (Good use: Improve your report after writing it
|
that you should be using them responsibly. (Good use: Improve your report after writing it
|
||||||
and always review its output to ensure that it is correct. Bad use: Copy-pasting an entire report
|
and always review its output to ensure that it is correct. Bad use: Copy-pasting an entire report
|
||||||
from AI without any effort of your own. )
|
from AI without any effort of your own. )
|
||||||
|
|
||||||
|
|
||||||
Submission Requirements
|
Submission Requirements
|
||||||
|
|
||||||
Please prepare and submit the following documents:
|
Please prepare and submit the following documents:
|
||||||
|
|
||||||
• A cover page featuring your student ID. This page should be the first page of your report.
|
• A cover page featuring your student ID. This page should be the first page of your report.
|
||||||
|
|
||||||
• A zip file containing all the source codes and your trained agent model, which should be named
|
• A zip file containing all the source codes and your trained agent model, which should be named
|
||||||
using your full name and student ID in the following format: CW1_ID_Name.zip
|
using your full name and student ID in the following format: CW1_ID_Name.zip
|
||||||
|
|
||||||
• One PDF file for your report. The file should be separated from the zip file, which contains your
|
• One PDF file for your report. The file should be separated from the zip file, which contains your
|
||||||
code. The files should be named in the following format: CW1_ID_Name.pdf
|
code. The files should be named in the following format: CW1_ID_Name.pdf
|
||||||
|
|
||||||
Note that the quality of the code, the clarity of your writing, and the format/style of your report will
|
Note that the quality of the code, the clarity of your writing, and the format/style of your report will
|
||||||
be taken into consideration during the evaluation. The detailed rubric is outlined below.
|
be taken into consideration during the evaluation. The detailed rubric is outlined below.
|
||||||
|
|
||||||
|
|
||||||
Rubric
|
Rubric
|
||||||
|
|
||||||
CW1 (40 makrs) Criteria Marks
|
CW1 (40 makrs) Criteria Marks
|
||||||
Code Performance Code runs without errors and performs tasks as specified. 6
|
Code Performance Code runs without errors and performs tasks as specified. 6
|
||||||
Code Quality Code is well-organized, includes meaningful comments, and uses appropriate variable names. 6
|
Code Quality Code is well-organized, includes meaningful comments, and uses appropriate variable names. 6
|
||||||
Methodology Comprehensive coverage of topics with detailed explanations of approaches and methodologies. 6
|
Methodology Comprehensive coverage of topics with detailed explanations of approaches and methodologies. 6
|
||||||
Result analysis Insightful analysis of results. 6
|
Result analysis Insightful analysis of results. 6
|
||||||
Report Quality Report is well-structured, formatted, and free of grammatical errors. 6
|
Report Quality Report is well-structured, formatted, and free of grammatical errors. 6
|
||||||
Evidence of Work All required elements are included and correct. 6
|
Evidence of Work All required elements are included and correct. 6
|
||||||
Submission Follows all requirements for submission 4
|
Submission Follows all requirements for submission 4
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
5
|
5
|
||||||
|
|
||||||
+259
-259
@@ -1,260 +1,260 @@
|
|||||||
XJTLU Entrepreneur College (Taicang) Cover Sheet
|
XJTLU Entrepreneur College (Taicang) Cover Sheet
|
||||||
|
|
||||||
School of AI and Advanced
|
School of AI and Advanced
|
||||||
Module code DTS304TC: Machine Learning School title
|
Module code DTS304TC: Machine Learning School title
|
||||||
Computing
|
Computing
|
||||||
|
|
||||||
Assessment title Coursework Task 1 Assessment type Coursework
|
Assessment title Coursework Task 1 Assessment type Coursework
|
||||||
|
|
||||||
Submission
|
Submission
|
||||||
01/May/2026 23:59
|
01/May/2026 23:59
|
||||||
deadline
|
deadline
|
||||||
|
|
||||||
|
|
||||||
I certify that I have read and understood the University's Policy for dealing with Plagiarism, Collusion and the Fabrication of Data
|
I certify that I have read and understood the University's Policy for dealing with Plagiarism, Collusion and the Fabrication of Data
|
||||||
(available on Learning Mall Online).
|
(available on Learning Mall Online).
|
||||||
My work does not contain any instances of plagiarism and/or collusion.
|
My work does not contain any instances of plagiarism and/or collusion.
|
||||||
My work does not contain any fabricated data.
|
My work does not contain any fabricated data.
|
||||||
|
|
||||||
|
|
||||||
By uploading my assignment onto Learning Mall Online, I formally declare that all of the
|
By uploading my assignment onto Learning Mall Online, I formally declare that all of the
|
||||||
above information is true to the best of my knowledge and belief.
|
above information is true to the best of my knowledge and belief.
|
||||||
Scoring – For Tutor Use
|
Scoring – For Tutor Use
|
||||||
Student ID
|
Student ID
|
||||||
Theory and Reflection PDF Word Count (Filled by
|
Theory and Reflection PDF Word Count (Filled by
|
||||||
Students)
|
Students)
|
||||||
|
|
||||||
Stage of Marking Marker Learning Outcomes Achieved (F/P/M/D) Final
|
Stage of Marking Marker Learning Outcomes Achieved (F/P/M/D) Final
|
||||||
Code Score
|
Code Score
|
||||||
(please modify as appropriate)
|
(please modify as appropriate)
|
||||||
A B C
|
A B C
|
||||||
1st Marker – red
|
1st Marker – red
|
||||||
pen
|
pen
|
||||||
Moderation The original mark has been accepted by the moderator Y/N
|
Moderation The original mark has been accepted by the moderator Y/N
|
||||||
IM (please circle as appropriate):
|
IM (please circle as appropriate):
|
||||||
– green pen Initials
|
– green pen Initials
|
||||||
Data entry and score calculation have been checked by Y
|
Data entry and score calculation have been checked by Y
|
||||||
another tutor (please circle):
|
another tutor (please circle):
|
||||||
2nd Marker if
|
2nd Marker if
|
||||||
needed – green
|
needed – green
|
||||||
pen
|
pen
|
||||||
For Academic Office Use Possible Academic Infringement (please tick as appropriate)
|
For Academic Office Use Possible Academic Infringement (please tick as appropriate)
|
||||||
Date Days Late ☐ Category A
|
Date Days Late ☐ Category A
|
||||||
Received late Penalty Total Academic Infringement Penalty
|
Received late Penalty Total Academic Infringement Penalty
|
||||||
☐ Category B (A,B, C, D, E, Please modify where
|
☐ Category B (A,B, C, D, E, Please modify where
|
||||||
necessary) _____________________
|
necessary) _____________________
|
||||||
☐ Category C
|
☐ Category C
|
||||||
☐ Category D
|
☐ Category D
|
||||||
☐ Category E
|
☐ Category E
|
||||||
DTS304TC Machine Learning
|
DTS304TC Machine Learning
|
||||||
Coursework - Assessment Task 1
|
Coursework - Assessment Task 1
|
||||||
• Percentage in final mark: 50%
|
• Percentage in final mark: 50%
|
||||||
• Assessment type: individual coursework
|
• Assessment type: individual coursework
|
||||||
• Submission files: one Jupyter notebook (.ipynb), one Coursework Answer Sheet / Theory and Reflection PDF, and one
|
• Submission files: one Jupyter notebook (.ipynb), one Coursework Answer Sheet / Theory and Reflection PDF, and one
|
||||||
hidden-test CSV
|
hidden-test CSV
|
||||||
|
|
||||||
Learning outcomes assessed
|
Learning outcomes assessed
|
||||||
• A. Demonstrate a solid understanding of the theoretical issues related to problems that machine-learning methods try to
|
• A. Demonstrate a solid understanding of the theoretical issues related to problems that machine-learning methods try to
|
||||||
address.
|
address.
|
||||||
• B. Demonstrate understanding of the properties of existing machine-learning algorithms and how they behave on practical data.
|
• B. Demonstrate understanding of the properties of existing machine-learning algorithms and how they behave on practical data.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Notes
|
Notes
|
||||||
• Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may
|
• Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may
|
||||||
result in a loss of marks.
|
result in a loss of marks.
|
||||||
• The formal procedure for submitting coursework at XJTLU is strictly followed. Submission link on Learning Mall will be provided
|
• The formal procedure for submitting coursework at XJTLU is strictly followed. Submission link on Learning Mall will be provided
|
||||||
in due course. The submission timestamp on Learning Mall will be used to check late submission.
|
in due course. The submission timestamp on Learning Mall will be used to check late submission.
|
||||||
• 5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the
|
• 5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the
|
||||||
submission date, up to a maximum of five working days.
|
submission date, up to a maximum of five working days.
|
||||||
• All modelling work must be completed individually. Discussion of general ideas is allowed, but code, experiments, and
|
• All modelling work must be completed individually. Discussion of general ideas is allowed, but code, experiments, and
|
||||||
notebooks must be independently developed.
|
notebooks must be independently developed.
|
||||||
• You may not use ChatGPT to directly generate answers for the coursework. High-scoring work must demonstrate your own
|
• You may not use ChatGPT to directly generate answers for the coursework. High-scoring work must demonstrate your own
|
||||||
experimental design, controlled comparisons, failure analysis, and image-level interpretation. ChatGPT or similar tools may be
|
experimental design, controlled comparisons, failure analysis, and image-level interpretation. ChatGPT or similar tools may be
|
||||||
used only in a limited support role such as code understanding, debugging, or grammar support. They must not replace your
|
used only in a limited support role such as code understanding, debugging, or grammar support. They must not replace your
|
||||||
method design, ablation logic, qualitative analysis, or reflection. Generic AI-produced descriptions without matching evidence in
|
method design, ablation logic, qualitative analysis, or reflection. Generic AI-produced descriptions without matching evidence in
|
||||||
code, tables, figures, and discussion will not receive high marks.
|
code, tables, figures, and discussion will not receive high marks.
|
||||||
• If you use AI tools or outside code in any meaningful way, you must fully understand, verify, and take ownership of every
|
• If you use AI tools or outside code in any meaningful way, you must fully understand, verify, and take ownership of every
|
||||||
method, number, figure, and written claim that appears in your submission.
|
method, number, figure, and written claim that appears in your submission.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Question 1: Notebook-Based Coding Exercise - Insurance Premium-Risk Classification (60
|
Question 1: Notebook-Based Coding Exercise - Insurance Premium-Risk Classification (60
|
||||||
Marks)
|
Marks)
|
||||||
In this coursework you will build and improve a multiclass classifier for a fictionalised health-insurance dataset. The task is
|
In this coursework you will build and improve a multiclass classifier for a fictionalised health-insurance dataset. The task is
|
||||||
to predict whether each applicant belongs to a Low, Standard, or High premium-risk group before pricing a policy. The
|
to predict whether each applicant belongs to a Low, Standard, or High premium-risk group before pricing a policy. The
|
||||||
dataset is intentionally realistic: it mixes numerical and categorical variables, contains missing values and dirty entries, and
|
dataset is intentionally realistic: it mixes numerical and categorical variables, contains missing values and dirty entries, and
|
||||||
includes some fields that require careful handling to avoid weak modelling practice or label leakage.
|
includes some fields that require careful handling to avoid weak modelling practice or label leakage.
|
||||||
Your work should show a clear machine-learning workflow: build a sensible first pipeline, compare model families, apply
|
Your work should show a clear machine-learning workflow: build a sensible first pipeline, compare model families, apply
|
||||||
stronger hyperparameter optimisation, complete one compulsory improvement category plus at least one optional category,
|
stronger hyperparameter optimisation, complete one compulsory improvement category plus at least one optional category,
|
||||||
carry out a compact K-Means/Gaussian Mixture Model (GMM) exploration, and then produce a hidden-test CSV using
|
carry out a compact K-Means/Gaussian Mixture Model (GMM) exploration, and then produce a hidden-test CSV using
|
||||||
validation evidence only.
|
validation evidence only.
|
||||||
The prediction target variable is ‘premium_risk’, and it has 3 imbalanced classes: Standard, High, Low. The dataset
|
The prediction target variable is ‘premium_risk’, and it has 3 imbalanced classes: Standard, High, Low. The dataset
|
||||||
contains 33 raw columns: admin/PII columns, synthetic noise features, 1 leakage feature, and genuine predictors.
|
contains 33 raw columns: admin/PII columns, synthetic noise features, 1 leakage feature, and genuine predictors.
|
||||||
Unless otherwise stated, macro-F1 is the primary validation metric because the dataset is imbalanced; accuracy is reported
|
Unless otherwise stated, macro-F1 is the primary validation metric because the dataset is imbalanced; accuracy is reported
|
||||||
as a secondary metric.
|
as a secondary metric.
|
||||||
(A) Clean First Pipeline and Baseline Modelling (8 marks)
|
(A) Clean First Pipeline and Baseline Modelling (8 marks)
|
||||||
• Load the provided training and validation files and define a consistent target / feature setup.
|
• Load the provided training and validation files and define a consistent target / feature setup.
|
||||||
• Handle leakage features, dirty values, missing values, and categorical variables sensibly. A compact sanity check is enough; a
|
• Handle leakage features, dirty values, missing values, and categorical variables sensibly. A compact sanity check is enough; a
|
||||||
long data-audit section is not required.
|
long data-audit section is not required.
|
||||||
Important: The dataset contains a leakage feature. You must identify and remove it before proceeding to the next stage
|
Important: The dataset contains a leakage feature. You must identify and remove it before proceeding to the next stage
|
||||||
of analysis; otherwise, the classification results will be severely biased by this leakage and will not be meaningful. If
|
of analysis; otherwise, the classification results will be severely biased by this leakage and will not be meaningful. If
|
||||||
this occurs, multiple parts of your Coursework 1 may be affected, which could significantly impact your marks.
|
this occurs, multiple parts of your Coursework 1 may be affected, which could significantly impact your marks.
|
||||||
• Build one baseline modelling pipeline.
|
• Build one baseline modelling pipeline.
|
||||||
• Report at least one validation result using accuracy and macro-F1 score and include a confusion matrix for the baseline model.
|
• Report at least one validation result using accuracy and macro-F1 score and include a confusion matrix for the baseline model.
|
||||||
• Keep preprocessing consistent across train, validation, and hidden-test files.
|
• Keep preprocessing consistent across train, validation, and hidden-test files.
|
||||||
|
|
||||||
|
|
||||||
(B) Controlled Comparison: Random Forest and One Boosting Model (8 marks)
|
(B) Controlled Comparison: Random Forest and One Boosting Model (8 marks)
|
||||||
• Using the same preprocessing pipeline, validation split, and evaluation metric (primary metric is macro-F1 also report accuracy),
|
• Using the same preprocessing pipeline, validation split, and evaluation metric (primary metric is macro-F1 also report accuracy),
|
||||||
carry out an initial controlled comparison between one Random Forest model and one boosting model.
|
carry out an initial controlled comparison between one Random Forest model and one boosting model.
|
||||||
• Default XGBoost is recommended because it provides a richer tuning space later, but others may also be used. Default settings
|
• Default XGBoost is recommended because it provides a richer tuning space later, but others may also be used. Default settings
|
||||||
or only light sensible adjustments are acceptable in this section.
|
or only light sensible adjustments are acceptable in this section.
|
||||||
• In the notebook, report the validation result of each model and support the comparison with one or two additional analyses, such
|
• In the notebook, report the validation result of each model and support the comparison with one or two additional analyses, such
|
||||||
as class-wise metrics, a confusion matrix, train-versus-validation behaviour, or stability / sensitivity after tuning.
|
as class-wise metrics, a confusion matrix, train-versus-validation behaviour, or stability / sensitivity after tuning.
|
||||||
• Your goal is not to prove that one model type always wins. Your goal is to compare the two models fairly, explain the high-level
|
• Your goal is not to prove that one model type always wins. Your goal is to compare the two models fairly, explain the high-level
|
||||||
learning difference between bagging and boosting, and use your own notebook evidence to give a careful, dataset-specific
|
learning difference between bagging and boosting, and use your own notebook evidence to give a careful, dataset-specific
|
||||||
interpretation. A generic textbook answer without reference to your own results will receive limited credit.
|
interpretation. A generic textbook answer without reference to your own results will receive limited credit.
|
||||||
(C) Advanced Hyperparameter Optimisation (12 marks)
|
(C) Advanced Hyperparameter Optimisation (12 marks)
|
||||||
• At least one main model should be tuned with a genuinely advanced strategy such as Optuna/TPE, Bayesian optimisation,
|
• At least one main model should be tuned with a genuinely advanced strategy such as Optuna/TPE, Bayesian optimisation,
|
||||||
Hyperopt, Ray Tune, or another comparably strong approach.
|
Hyperopt, Ray Tune, or another comparably strong approach.
|
||||||
• Hyperparameter tuning should optimise macro-F1 score on the validation set, and the final tuned result should be reported
|
• Hyperparameter tuning should optimise macro-F1 score on the validation set, and the final tuned result should be reported
|
||||||
using both accuracy and macro-F1.
|
using both accuracy and macro-F1.
|
||||||
• RandomizedSearchCV alone is normally not enough for the top band.
|
• RandomizedSearchCV alone is normally not enough for the top band.
|
||||||
• Explain briefly why your search space and optimiser are reasonable for the chosen model.
|
• Explain briefly why your search space and optimiser are reasonable for the chosen model.
|
||||||
(D) Personalised Improvement Work (18 marks)
|
(D) Personalised Improvement Work (18 marks)
|
||||||
You must complete one compulsory category based on the last digit of your XJTLU student ID, plus at least one additional
|
You must complete one compulsory category based on the last digit of your XJTLU student ID, plus at least one additional
|
||||||
optional category of your choice. A second optional category is recommended for stronger differentiation but is not compulsory.
|
optional category of your choice. A second optional category is recommended for stronger differentiation but is not compulsory.
|
||||||
You should report accuracy and macro-F1 for improved models and include class-wise metrics where helpful. A compact ablation
|
You should report accuracy and macro-F1 for improved models and include class-wise metrics where helpful. A compact ablation
|
||||||
table should normally be included in the notebook for the personalized improvement work
|
table should normally be included in the notebook for the personalized improvement work
|
||||||
|
|
||||||
Last digit Compulsory category
|
Last digit Compulsory category
|
||||||
0-1 Category A - Data quality and missingness
|
0-1 Category A - Data quality and missingness
|
||||||
2-3 Category B - Feature representation and engineering
|
2-3 Category B - Feature representation and engineering
|
||||||
4-5 Category C - Imbalance and objective design
|
4-5 Category C - Imbalance and objective design
|
||||||
6-7 Category D - Model robustness, calibration, or ensembling
|
6-7 Category D - Model robustness, calibration, or ensembling
|
||||||
8-9 Category E - Fairness, diagnostics, or interpretability
|
8-9 Category E - Fairness, diagnostics, or interpretability
|
||||||
Category Examples of what may be done What good evidence looks like
|
Category Examples of what may be done What good evidence looks like
|
||||||
better missing-value strategy; A concise before/after comparison with a short
|
better missing-value strategy; A concise before/after comparison with a short
|
||||||
A MissForest or iterative imputation; explanation of why the data handling changed the
|
A MissForest or iterative imputation; explanation of why the data handling changed the
|
||||||
sensible outlier handling; value cleaning result
|
sensible outlier handling; value cleaning result
|
||||||
feature crosses; grouped categories;
|
feature crosses; grouped categories;
|
||||||
A compact ablation showing what representation
|
A compact ablation showing what representation
|
||||||
B alternative encodings; modest feature
|
B alternative encodings; modest feature
|
||||||
changed and whether it helped
|
changed and whether it helped
|
||||||
selection; transformations
|
selection; transformations
|
||||||
class weighting; focal-style loss if
|
class weighting; focal-style loss if
|
||||||
Clear evidence of how minority or harder classes
|
Clear evidence of how minority or harder classes
|
||||||
C relevant; sampling / resampling;
|
C relevant; sampling / resampling;
|
||||||
changed, even if overall score moved only slightly
|
changed, even if overall score moved only slightly
|
||||||
thresholding logic
|
thresholding logic
|
||||||
bagging/boosting variants; calibration
|
bagging/boosting variants; calibration
|
||||||
A meaningful diagnostic or comparison rather
|
A meaningful diagnostic or comparison rather
|
||||||
D checks; soft voting; stacking;
|
D checks; soft voting; stacking;
|
||||||
than a large collection of loosely connected trials
|
than a large collection of loosely connected trials
|
||||||
robustness checks
|
robustness checks
|
||||||
SHAP / feature importance; subgroup-
|
SHAP / feature importance; subgroup-
|
||||||
Concrete insight into model behaviour, not only
|
Concrete insight into model behaviour, not only
|
||||||
E style fairness checks; error analysis;
|
E style fairness checks; error analysis;
|
||||||
screenshots
|
screenshots
|
||||||
model interpretation
|
model interpretation
|
||||||
(E) K-Means and Gaussian Mixture Model (GMM) Exploration (6 marks)
|
(E) K-Means and Gaussian Mixture Model (GMM) Exploration (6 marks)
|
||||||
This is a compact exploratory section. It is not the main performance section, and it does not require clusters to match the class
|
This is a compact exploratory section. It is not the main performance section, and it does not require clusters to match the class
|
||||||
labels exactly. The aim is to show your understanding of unsupervised learning methods and your ability to interpret their results
|
labels exactly. The aim is to show your understanding of unsupervised learning methods and your ability to interpret their results
|
||||||
carefully.
|
carefully.
|
||||||
• Use a sensible processed numeric feature space and briefly explain what you clustered on.
|
• Use a sensible processed numeric feature space and briefly explain what you clustered on.
|
||||||
• Explore a small range of cluster/component numbers, such as 2-8.
|
• Explore a small range of cluster/component numbers, such as 2-8.
|
||||||
• For K-Means, provide sensible supporting evidence, such as inertia (SSE), cluster sizes, or another simple analysis..
|
• For K-Means, provide sensible supporting evidence, such as inertia (SSE), cluster sizes, or another simple analysis..
|
||||||
• For Gaussian Mixture Model (GMM), provide sensible supporting evidence, such as component sizes, posterior
|
• For Gaussian Mixture Model (GMM), provide sensible supporting evidence, such as component sizes, posterior
|
||||||
confidence/responsibility, or overlap/uncertainty between components.
|
confidence/responsibility, or overlap/uncertainty between components.
|
||||||
• Include at least one compact table or figure comparing K-Means and GMM.
|
• Include at least one compact table or figure comparing K-Means and GMM.
|
||||||
• If class labels are used for reference, explain clearly that unsupervised structure does not need to align exactly with supervised
|
• If class labels are used for reference, explain clearly that unsupervised structure does not need to align exactly with supervised
|
||||||
labels
|
labels
|
||||||
• Stronger work may additionally use silhouette score, log-likelihood trends, or a simple visualization.
|
• Stronger work may additionally use silhouette score, log-likelihood trends, or a simple visualization.
|
||||||
|
|
||||||
|
|
||||||
(F) Final Model Choice and Hidden-Test Export (8 marks)
|
(F) Final Model Choice and Hidden-Test Export (8 marks)
|
||||||
• Choose the final model using validation evidence only.
|
• Choose the final model using validation evidence only.
|
||||||
• Retrain appropriately using both train and validation dataset and generate the hidden-test CSV in the required format.
|
• Retrain appropriately using both train and validation dataset and generate the hidden-test CSV in the required format.
|
||||||
• Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
|
• Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
|
||||||
column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
|
column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
|
||||||
Low).
|
Low).
|
||||||
Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4 marks
|
Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4 marks
|
||||||
from this section.
|
from this section.
|
||||||
• Do not tune on the hidden test and do not claim hidden test performance.
|
• Do not tune on the hidden test and do not claim hidden test performance.
|
||||||
• Note: Hidden test score contributes only a small portion of the final marks. High leaderboard rank alone cannot compensate for
|
• Note: Hidden test score contributes only a small portion of the final marks. High leaderboard rank alone cannot compensate for
|
||||||
weak experimental design or poor documentation.
|
weak experimental design or poor documentation.
|
||||||
|
|
||||||
|
|
||||||
Coursework Answer Sheet / Theory and Reflection (PDF) - all questions below are compulsory
|
Coursework Answer Sheet / Theory and Reflection (PDF) - all questions below are compulsory
|
||||||
(30 Marks)
|
(30 Marks)
|
||||||
The Coursework Answer Sheet / Theory and Reflection PDF should not repeat the notebook section by section. All prompt areas
|
The Coursework Answer Sheet / Theory and Reflection PDF should not repeat the notebook section by section. All prompt areas
|
||||||
below are compulsory. The PDF must be concise, directly linked to your own notebook evidence, and no longer than 4 pages /
|
below are compulsory. The PDF must be concise, directly linked to your own notebook evidence, and no longer than 4 pages /
|
||||||
1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from the PDF section. You should aim to
|
1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from the PDF section. You should aim to
|
||||||
demonstrate both your theoretical or algorithmic understanding and your experimental findings or practical observations and
|
demonstrate both your theoretical or algorithmic understanding and your experimental findings or practical observations and
|
||||||
clearly link your understanding of the algorithms to your experimental analysis. At least one table, figure, or metric from the
|
clearly link your understanding of the algorithms to your experimental analysis. At least one table, figure, or metric from the
|
||||||
notebook must be referenced in each theory answer.
|
notebook must be referenced in each theory answer.
|
||||||
|
|
||||||
Prompt area What you should do
|
Prompt area What you should do
|
||||||
(1) Briefly state the definitions and key theoretical properties of bagging
|
(1) Briefly state the definitions and key theoretical properties of bagging
|
||||||
and boosting models;
|
and boosting models;
|
||||||
(2) report the validation results of each model;
|
(2) report the validation results of each model;
|
||||||
(3) support your comparison with one or two additional analyses, such as
|
(3) support your comparison with one or two additional analyses, such as
|
||||||
class-wise metrics, a confusion matrix, train–validation behaviour, or
|
class-wise metrics, a confusion matrix, train–validation behaviour, or
|
||||||
1. Bagging versus boosting stability/sensitivity after tuning; and
|
1. Bagging versus boosting stability/sensitivity after tuning; and
|
||||||
(4) provide a careful interpretation of what this comparison suggests
|
(4) provide a careful interpretation of what this comparison suggests
|
||||||
about this dataset and how it relates to the theoretical properties of
|
about this dataset and how it relates to the theoretical properties of
|
||||||
bagging versus boosting methods.
|
bagging versus boosting methods.
|
||||||
You are not expected to prove that one model type always performs
|
You are not expected to prove that one model type always performs
|
||||||
better.
|
better.
|
||||||
Explain why your optimiser and search space were reasonable for the
|
Explain why your optimiser and search space were reasonable for the
|
||||||
chosen model, which hyperparameters you expected to matter most,
|
chosen model, which hyperparameters you expected to matter most,
|
||||||
2. Hyperparameter optimisation
|
2. Hyperparameter optimisation
|
||||||
whether the tuned results matched that intuition, and what you learned
|
whether the tuned results matched that intuition, and what you learned
|
||||||
from the tuning process.
|
from the tuning process.
|
||||||
Explain hard versus soft assignment and the main assumption difference
|
Explain hard versus soft assignment and the main assumption difference
|
||||||
between K-Means and GMM. Then use your own compact evidence to
|
between K-Means and GMM. Then use your own compact evidence to
|
||||||
3. K-Means versus Gaussian Mixture Model (GMM) discuss whether the results matched your intuition and whether GMM
|
3. K-Means versus Gaussian Mixture Model (GMM) discuss whether the results matched your intuition and whether GMM
|
||||||
revealed anything extra, such as soft membership, uncertainty, or a
|
revealed anything extra, such as soft membership, uncertainty, or a
|
||||||
better fit to partial cluster structure.
|
better fit to partial cluster structure.
|
||||||
Reflect on the compulsory category and on every optional category you
|
Reflect on the compulsory category and on every optional category you
|
||||||
implemented. Highlight any unique or interesting algorithm or strategy
|
implemented. Highlight any unique or interesting algorithm or strategy
|
||||||
4. Personalised reflection you tried, the personal challenges you faced, the effort you made to
|
4. Personalised reflection you tried, the personal challenges you faced, the effort you made to
|
||||||
address them, and the key lessons you learned. Honest reflection on a
|
address them, and the key lessons you learned. Honest reflection on a
|
||||||
neutral or negative result is acceptable if the reasoning is concrete.
|
neutral or negative result is acceptable if the reasoning is concrete.
|
||||||
State briefly what forms of AI assistance, if any, were used. Generic AI-
|
State briefly what forms of AI assistance, if any, were used. Generic AI-
|
||||||
5. AI-use declaration written theory that does not match your notebook evidence will receive
|
5. AI-use declaration written theory that does not match your notebook evidence will receive
|
||||||
limited credit.
|
limited credit.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Coding Quality, Coursework Answer Sheet Quality, and Submission Guidelines (10 marks)
|
Coding Quality, Coursework Answer Sheet Quality, and Submission Guidelines (10 marks)
|
||||||
|
|
||||||
• Submit your Jupyter Notebook in .ipynb format. It must be well organised, include clear commentary and clean code practices,
|
• Submit your Jupyter Notebook in .ipynb format. It must be well organised, include clear commentary and clean code practices,
|
||||||
and show visible outputs. Do not write a second mini-report repeating notebook content.
|
and show visible outputs. Do not write a second mini-report repeating notebook content.
|
||||||
• The notebook should be reproducible from start to finish without errors. Results cited in the PDF should be visible in the
|
• The notebook should be reproducible from start to finish without errors. Results cited in the PDF should be visible in the
|
||||||
notebook and should match the reported values.
|
notebook and should match the reported values.
|
||||||
• If you used supplementary code outside the notebook, submit that code as well so the full workflow remains reproducible.
|
• If you used supplementary code outside the notebook, submit that code as well so the full workflow remains reproducible.
|
||||||
• Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
|
• Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
|
||||||
column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
|
column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
|
||||||
Low). Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4
|
Low). Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4
|
||||||
marks from this section.
|
marks from this section.
|
||||||
• Submit the Coursework Answer Sheet / Theory and Reflection in PDF format. All questions in that section are compulsory. The
|
• Submit the Coursework Answer Sheet / Theory and Reflection in PDF format. All questions in that section are compulsory. The
|
||||||
Coursework Answer Sheet / Theory and Reflection PDF must answer every required prompt, refer to your own notebook
|
Coursework Answer Sheet / Theory and Reflection PDF must answer every required prompt, refer to your own notebook
|
||||||
evidence, and remain within 4 pages and 1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from
|
evidence, and remain within 4 pages and 1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from
|
||||||
the PDF section.
|
the PDF section.
|
||||||
• Include all required components: Jupyter notebooks (code), any additional experimental scripts or custom code, the hidden
|
• Include all required components: Jupyter notebooks (code), any additional experimental scripts or custom code, the hidden
|
||||||
test-results CSV file, and the Coursework Answer Sheet PDF. Submit all files through the Learning Mall platform. After
|
test-results CSV file, and the Coursework Answer Sheet PDF. Submit all files through the Learning Mall platform. After
|
||||||
submission, download your files to verify that they can be opened and viewed correctly to ensure the submission was
|
submission, download your files to verify that they can be opened and viewed correctly to ensure the submission was
|
||||||
successful.
|
successful.
|
||||||
|
|
||||||
Project Material Access Instructions
|
Project Material Access Instructions
|
||||||
|
|
||||||
To access the complete set of materials for this project, please use the links below:
|
To access the complete set of materials for this project, please use the links below:
|
||||||
|
|
||||||
• OneDrive Link:
|
• OneDrive Link:
|
||||||
https://1drv.ms/f/c/18f09d1a39585f84/IgCXDMbXkFYSSZUZkkTyXyZzAQ1poX9mujUqF8N3JlL0GD0?e=uNhAHq
|
https://1drv.ms/f/c/18f09d1a39585f84/IgCXDMbXkFYSSZUZkkTyXyZzAQ1poX9mujUqF8N3JlL0GD0?e=uNhAHq
|
||||||
• The same coursework materials have also been uploaded to Learning Mall.
|
• The same coursework materials have also been uploaded to Learning Mall.
|
||||||
When extracting the materials, use the following password to unlock the zip file: DTS304TC (case-sensitive, enter in
|
When extracting the materials, use the following password to unlock the zip file: DTS304TC (case-sensitive, enter in
|
||||||
uppercase).
|
uppercase).
|
||||||
|
|
||||||
Reference in New Issue
Block a user