chore: 更新项目文档、依赖和训练脚本

- 更新 requirements.txt，添加 opencv-python-headless 并补充 uv 安装说明 - 修复 CSV 文件中的换行符格式（CRLF 转 LF） - 更新 TASK_PROGRESS.md，记录并行训练实现和 WSL 支持 - 优化 train_improved.py 代码格式，移除多余空行和注释 - 更新课程作业要求文档的字符编码 - 添加新的 TensorBoard 日志文件和训练模型
2026-05-01 09:26:23 +08:00
parent 6b929e9790
commit d6860f1f15
16 changed files with 25712 additions and 25680 deletions
@@ -1,5 +1,5 @@
-完成一份 强化学习个人课程作业报告：需要用 Python 从零实现一个 PPO（Proximal Policy Optimization）强化学习算法，让智能体在 CarRacing-v3 环境中完成赛车任务，并在此基础上提交一份不超过 3000 词 的技术报告，系统说明你的方法与结果；具体来说，要介绍该任务的强化学习背景，定义状态空间、动作空间和奖励机制，解释 PPO 的目标函数、裁剪机制和优势估计方法，说明策略网络与价值网络结构、训练流程、超参数设置以及实现过程中遇到的问题和解决办法，同时用图表展示训练与测试结果，分析模型表现和变化趋势，并与如 Stable-Baselines3 这类基线方法在稳定性和样本效率上做简要比较；另外，还要提交一个包含全部源代码和训练好模型的 zip 文件，以及一个单独的 PDF 报告，文件命名和提交格式都必须符合要求，而且实现中不能直接使用 Stable-Baselines 等强化学习专用库，但可以合理使用 TensorBoard 记录实验结果。
-
-这个 PDF 要求完成一份 强化学习个人项目报告：需要自己选择一个 Atari 游戏，实现并训练一个你选定的 深度强化学习算法 来达到有竞争力的表现，然后提交一份不超过 3000 词 的技术报告和一个包含全部源代码及训练模型的 zip 文件；报告中需要说明选择的游戏及其挑战，调研并总结深度强化学习尤其是在 Atari 游戏中的应用现状，比较考虑过的算法并解释为什么最终选择当前方法，详细介绍算法原理与具体实现，评估智能体表现、说明所选基准和评价指标，并分析为什么该算法在这个游戏上表现好或不好，同时用清晰标注坐标轴和图例的图表来展示结果；另外，作业明确要求不能直接用 Stable-Baselines 等强化学习专用库来实现算法，但可以用它们做 benchmark，对代码质量、结果分析、报告结构、图表使用和引用规范都会评分，最终还要按指定格式命名并提交 PDF 和 zip 文件。
-
+完成一份 强化学习个人课程作业报告：需要用 Python 从零实现一个 PPO（Proximal Policy Optimization）强化学习算法，让智能体在 CarRacing-v3 环境中完成赛车任务，并在此基础上提交一份不超过 3000 词 的技术报告，系统说明你的方法与结果；具体来说，要介绍该任务的强化学习背景，定义状态空间、动作空间和奖励机制，解释 PPO 的目标函数、裁剪机制和优势估计方法，说明策略网络与价值网络结构、训练流程、超参数设置以及实现过程中遇到的问题和解决办法，同时用图表展示训练与测试结果，分析模型表现和变化趋势，并与如 Stable-Baselines3 这类基线方法在稳定性和样本效率上做简要比较；另外，还要提交一个包含全部源代码和训练好模型的 zip 文件，以及一个单独的 PDF 报告，文件命名和提交格式都必须符合要求，而且实现中不能直接使用 Stable-Baselines 等强化学习专用库，但可以合理使用 TensorBoard 记录实验结果。
+
+这个 PDF 要求完成一份 强化学习个人项目报告：需要自己选择一个 Atari 游戏，实现并训练一个你选定的 深度强化学习算法 来达到有竞争力的表现，然后提交一份不超过 3000 词 的技术报告和一个包含全部源代码及训练模型的 zip 文件；报告中需要说明选择的游戏及其挑战，调研并总结深度强化学习尤其是在 Atari 游戏中的应用现状，比较考虑过的算法并解释为什么最终选择当前方法，详细介绍算法原理与具体实现，评估智能体表现、说明所选基准和评价指标，并分析为什么该算法在这个游戏上表现好或不好，同时用清晰标注坐标轴和图例的图表来展示结果；另外，作业明确要求不能直接用 Stable-Baselines 等强化学习专用库来实现算法，但可以用它们做 benchmark，对代码质量、结果分析、报告结构、图表使用和引用规范都会评分，最终还要按指定格式命名并提交 PDF 和 zip 文件。
+
 完成一份 机器学习个人课程作业：围绕一个健康保险数据集，建立并改进一个用于预测申请人保费风险等级（Low / Standard / High）的多分类模型。你需要先完成 Jupyter Notebook 部分，包括数据清理与预处理、识别并删除数据泄露特征、建立基线模型、对比随机森林和一种 boosting 模型、使用高级超参数优化方法调参、根据学号末位完成指定的个性化改进并至少再做一个可选改进、再进行一次 K-Means 与 GMM 的无监督探索，最后基于验证结果选出最终模型并导出规定格式的 hidden-test CSV；同时还要提交一份 不超过1200词 左右的 Theory and Reflection PDF，围绕 bagging vs boosting、超参数优化、K-Means vs GMM、个性化改进反思和 AI 使用声明进行理论与实验结合的总结，并且所有结论都要紧扣你自己 notebook 里的表格、图和指标证据，最终按要求提交 notebook、PDF、CSV 以及必要的补充代码。
@@ -1,8 +1,8 @@
-k,inertia,silhouette_x,log_likelihood,bic,aic,silhouette_y
-2,1092962.434364126,0.174016661115075,181335.84491703784,-359250.54291550705,-362061.6898340757,0.41420390111182703
-3,1018586.5047121042,0.17317021187208304,554291.2303605897,-1103445.131905755,-1107666.4607211794,0.2977020104302583
-4,953249.4382030136,0.18080059886795355,972834.1094461675,-1938814.7081800548,-1944446.218892335,0.3964327255424141
-5,889284.892342685,0.1964251564081267,1002913.0930748597,-1997256.4935405836,-2004298.1861497194,0.40146893512413845
-6,818950.9117652641,0.17683056672008368,1180025.734163945,-2349765.5938218986,-2358217.46832789,0.24683353848428613
-7,777658.2185885893,0.197056012688701,1203191.531501821,-2394381.006600795,-2404243.063003642,0.3109553553475885
-8,691940.8330833976,0.20149802939267383,1261969.3739466753,-2510220.5095936474,-2521492.7478933507,0.17264064800570944
+k,inertia,silhouette_x,log_likelihood,bic,aic,silhouette_y
+2,1092962.434364126,0.174016661115075,181335.84491703784,-359250.54291550705,-362061.6898340757,0.41420390111182703
+3,1018586.5047121042,0.17317021187208304,554291.2303605897,-1103445.131905755,-1107666.4607211794,0.2977020104302583
+4,953249.4382030136,0.18080059886795355,972834.1094461675,-1938814.7081800548,-1944446.218892335,0.3964327255424141
+5,889284.892342685,0.1964251564081267,1002913.0930748597,-1997256.4935405836,-2004298.1861497194,0.40146893512413845
+6,818950.9117652641,0.17683056672008368,1180025.734163945,-2349765.5938218986,-2358217.46832789,0.24683353848428613
+7,777658.2185885893,0.197056012688701,1203191.531501821,-2394381.006600795,-2404243.063003642,0.3109553553475885
+8,691940.8330833976,0.20149802939267383,1261969.3739466753,-2510220.5095936474,-2521492.7478933507,0.17264064800570944
@@ -1,2 +1,2 @@
-model,train_accuracy,val_accuracy,train_f1_macro,val_f1_macro,val_f1_High,val_f1_Low,val_f1_Standard
-Baseline_LR,0.7595294117647059,0.7337904761904762,0.7493991157707756,0.7234383324236036,0.7663239074550129,0.6487372909150542,0.7552537989007436
+model,train_accuracy,val_accuracy,train_f1_macro,val_f1_macro,val_f1_High,val_f1_Low,val_f1_Standard
+Baseline_LR,0.7595294117647059,0.7337904761904762,0.7493991157707756,0.7234383324236036,0.7663239074550129,0.6487372909150542,0.7552537989007436
@@ -1,7 +1,7 @@
-model,train_accuracy,val_accuracy,train_f1_macro,val_f1_macro,val_f1_High,val_f1_Low,val_f1_Standard,train_time
-Baseline_LR,0.7593680672268908,0.7341714285714286,0.7492574544185482,0.7237629331592531,0.7665209565440987,0.6489501312335958,0.7558177117000646,
-RandomForest,1.0,0.7877333333333333,1.0,0.770789728543472,0.7874554916461244,0.7095334685598377,0.8153802254244543,57.91048526763916
-XGBoost,0.8519529411764706,0.8371047619047619,0.8297116592669606,0.8143842728003406,0.8904623073719283,0.6944039941751612,0.8582865168539325,67.63970804214478
-XGBoost_Tuned,0.9767663865546219,0.8700190476190476,0.9739400525375727,0.8519502714571496,0.9084439578486383,0.7620280474649407,0.8853788090578697,142.65462470054626
-XGB_CatA_MissingHandling,0.9772638655462185,0.870552380952381,0.9745439553742655,0.8529411889528661,0.910207423580786,0.763542562338779,0.885073580939033,
-Ensemble_SoftVoting,0.9972436974789916,0.8675047619047619,0.9969472283391928,0.851001101708816,0.9024125779343996,0.7684120902511707,0.8821786369408776,
+model,train_accuracy,val_accuracy,train_f1_macro,val_f1_macro,val_f1_High,val_f1_Low,val_f1_Standard,train_time
+Baseline_LR,0.7593680672268908,0.7341714285714286,0.7492574544185482,0.7237629331592531,0.7665209565440987,0.6489501312335958,0.7558177117000646,
+RandomForest,1.0,0.7877333333333333,1.0,0.770789728543472,0.7874554916461244,0.7095334685598377,0.8153802254244543,57.91048526763916
+XGBoost,0.8519529411764706,0.8371047619047619,0.8297116592669606,0.8143842728003406,0.8904623073719283,0.6944039941751612,0.8582865168539325,67.63970804214478
+XGBoost_Tuned,0.9767663865546219,0.8700190476190476,0.9739400525375727,0.8519502714571496,0.9084439578486383,0.7620280474649407,0.8853788090578697,142.65462470054626
+XGB_CatA_MissingHandling,0.9772638655462185,0.870552380952381,0.9745439553742655,0.8529411889528661,0.910207423580786,0.763542562338779,0.885073580939033,
+Ensemble_SoftVoting,0.9972436974789916,0.8675047619047619,0.9969472283391928,0.851001101708816,0.9024125779343996,0.7684120902511707,0.8821786369408776,
@@ -26,6 +26,9 @@
 | ✅ 环境预处理 | 灰度化 + Resize(84×84) + 帧堆叠(4帧) Wrapper | [src/utils.py](src/utils.py) |
 | ✅ 评估脚本 | 渲染测试 + 多回合平均分数评估 | [src/evaluate.py](src/evaluate.py) |
 | ✅ 训练入口 | 主训练循环、TensorBoard 记录、模型保存 | [train.py](train.py) |
+| ✅ 并行训练 | 多环境并行采集 + WSL 支持 | [train_parallel.py](train_parallel.py) |
+| ✅ WSL 脚本 | 环境配置 + 启动脚本 | [setup_wsl.sh](setup_wsl.sh)、[run_wsl.sh](run_wsl.sh)、[start_wsl_training.bat](start_wsl_training.bat) |
+| ✅ 测试脚本 | 快速验证并行环境和网络 | [test_parallel.py](test_parallel.py) |

 **核心算法实现要点**：
 - 策略网络：3 层 CNN + FC(512) → μ, σ（高斯策略，tanh 激活）
@@ -60,36 +63,54 @@
 │   ├── trainer.py          # PPO 更新逻辑
 │   ├── utils.py           # 环境预处理 wrappers
 │   └── evaluate.py         # 评估脚本
-├── train.py                 # 主训练入口
+├── train.py                 # 单线程训练入口
+├── train_parallel.py        # 多环境并行训练（推荐）
+├── setup_wsl.sh             # WSL 环境配置
+├── run_wsl.sh               # WSL 训练启动脚本
+├── start_wsl_training.bat   # Windows 一键启动 WSL 训练
+├── test_parallel.py         # 并行训练测试
 ├── requirements.txt
 ├── README.md
-└── TASK_PROGRESS.md        # 本文档
+├── WSL_README.md            # WSL 训练指南
+└── TASK_PROGRESS.md         # 本文档
 ```

 ---

 ## 四、超参数配置

-| 参数 | 值 |
-|------|-----|
-| Learning rate | 3e-4 |
-| Gamma | 0.99 |
-| GAE lambda | 0.95 |
-| Clip epsilon | 0.2 |
-| PPO epochs | 4 |
-| Mini-batch size | 64 |
-| Rollout steps | 2048 |
-| Entropy coefficient | 0.01 |
-| Value coefficient | 0.5 |
-| Max gradient norm | 0.5 |
-| State shape | (84, 84, 4) |
-| Action dim | 3（连续：steer, gas, brake） |
+| 参数 | train.py (单线程) | train_parallel.py (并行) |
+|------|-------------------|--------------------------|
+| Learning rate | 3e-4 | 3e-4 |
+| Gamma | 0.99 | 0.99 |
+| GAE lambda | 0.95 | 0.98 |
+| Clip epsilon | 0.2 | 0.1 |
+| PPO epochs | 4 | 10 |
+| Mini-batch size | 64 | 128 |
+| Rollout steps | 2048 | 2048 |
+| Entropy coefficient | 0.01 | 0.005 |
+| Value coefficient | 0.5 | 0.75 |
+| Max gradient norm | 0.5 | 0.5 |
+| 总步数 | 500,000 | 2,000,000 |
+| 环境数 | 1 | 4 |
+| 预计时长 | ~8h | ~5h (4x) |

 ---

 ## 五、下一步行动

-### 立即执行
+### 方案 A：WSL 并行训练（推荐）
+```bash
+# Windows 下双击 start_wsl_training.bat
+# 或手动：
+wsl
+cd "/mnt/d/Code/doing_exercises/programs/外教作业外快/强化学习个人项目报告"
+chmod +x setup_wsl.sh run_wsl.sh
+./setup_wsl.sh   # 首次运行
+./run_wsl.sh     # 开始训练
+```
+
+### 方案 B：Windows 单线程训练
 ```bash
 # 1. 安装依赖
 uv pip install --system -r requirements.txt
@@ -2,4 +2,9 @@ torch
 gymnasium[box2d]
 numpy
 matplotlib
-tensorboard
+tensorboard
+opencv-python-headless
+
+# uv 安装方式（可选）:
+# curl -LsSf https://astral.sh/uv/install.sh | sh
+# uv pip install -r requirements.txt
@@ -1,4 +1,5 @@
-"""Improved training script with reward shaping and better hyperparameters."""
+"""Improved training script for CarRacing-v3 PPO with reward shaping."""
+
 import os
 import time
 import argparse
@@ -12,36 +13,34 @@ import cv2


 class RewardShapingWrapper(gym.Wrapper):
-    """Add reward shaping for better learning."""
-    
    def __init__(self, env):
        super().__init__(env)
        self.steps_on_track = 0
-        
+
    def reset(self, **kwargs):
        obs, info = self.env.reset(**kwargs)
        self.steps_on_track = 0
        return obs, info
-    
+
    def step(self, action):
        obs, reward, terminated, truncated, info = self.env.step(action)
        done = terminated or truncated
-        
+
        shaped_reward = reward
-        
-        if info.get('speed', 0) > 0.1:
-            shaped_reward += info['speed'] * 0.1
-        
-        if not info.get('offtrack', False):
+
+        if info.get("speed", 0) > 0.1:
+            shaped_reward += info["speed"] * 0.1
+
+        if not info.get("offtrack", False):
            shaped_reward += 0.1
            self.steps_on_track += 1
        else:
            shaped_reward -= 0.5
            self.steps_on_track = 0
-        
-        if info.get('lap_complete', False):
+
+        if info.get("lap_complete", False):
            shaped_reward += 100
-        
+
        return obs, shaped_reward, terminated, truncated, info


@@ -70,9 +69,7 @@ class FrameStackWrapper(gym.ObservationWrapper):
        self.frames = deque(maxlen=num_stack)
        obs_shape = env.observation_space.shape
        self.observation_space = gym.spaces.Box(
-            low=0, high=255,
-            shape=(num_stack, *obs_shape[-2:]),
-            dtype=np.uint8
+            low=0, high=255, shape=(num_stack, *obs_shape[-2:]), dtype=np.uint8
        )

    def reset(self, **kwargs):
@@ -115,7 +112,7 @@ class Actor(nn.Module):
    def __init__(self, state_shape=(84, 84, 4), action_dim=3):
        super().__init__()
        c, h, w = state_shape[2], state_shape[0], state_shape[1]
-        
+
        self.conv = nn.Sequential(
            nn.Conv2d(c, 32, kernel_size=8, stride=4),
            nn.LeakyReLU(0.2),
@@ -126,28 +123,28 @@ class Actor(nn.Module):
            nn.Conv2d(64, 64, kernel_size=3, stride=1),
            nn.LeakyReLU(0.2),
        )
-        
+
        out_h = (h - 8) // 4 + 1
        out_h = (out_h - 4) // 2 + 1
        out_h = (out_h - 3) // 1 + 1
        feat_size = 64 * out_h * out_h
-        
+
        self.fc = nn.Sequential(
            nn.Linear(feat_size, 512),
            nn.LeakyReLU(0.2),
        )
        self.mu_head = nn.Linear(512, action_dim)
        self.log_std_head = nn.Linear(512, action_dim)
-        
+
        for m in self.modules():
            if isinstance(m, (nn.Conv2d, nn.Linear)):
                nn.init.orthogonal_(m.weight, gain=np.sqrt(2))
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
-        
+
        nn.init.orthogonal_(self.mu_head.weight, gain=0.01)
        nn.init.orthogonal_(self.log_std_head.weight, gain=0.01)
-    
+
    def forward(self, x):
        x = x / 255.0
        x = self.conv(x)
@@ -162,7 +159,7 @@ class Critic(nn.Module):
    def __init__(self, state_shape=(84, 84, 4)):
        super().__init__()
        c, h, w = state_shape[2], state_shape[0], state_shape[1]
-        
+
        self.conv = nn.Sequential(
            nn.Conv2d(c, 32, kernel_size=8, stride=4),
            nn.LeakyReLU(0.2),
@@ -173,24 +170,20 @@ class Critic(nn.Module):
            nn.Conv2d(64, 64, kernel_size=3, stride=1),
            nn.LeakyReLU(0.2),
        )
-        
+
        out_h = (h - 8) // 4 + 1
        out_h = (out_h - 4) // 2 + 1
        out_h = (out_h - 3) // 1 + 1
        feat_size = 64 * out_h * out_h
-        
-        self.fc = nn.Sequential(
-            nn.Linear(feat_size, 512),
-            nn.LeakyReLU(0.2),
-            nn.Linear(512, 1)
-        )
-        
+
+        self.fc = nn.Sequential(nn.Linear(feat_size, 512), nn.LeakyReLU(0.2), nn.Linear(512, 1))
+
        for m in self.modules():
            if isinstance(m, (nn.Conv2d, nn.Linear)):
                nn.init.orthogonal_(m.weight, gain=np.sqrt(2))
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
-    
+
    def forward(self, x):
        x = x / 255.0
        x = self.conv(x)
@@ -203,14 +196,14 @@ class RolloutBuffer:
        self.buffer_size = buffer_size
        self.ptr = 0
        self.size = 0
-        
+
        self.states = np.zeros((buffer_size, *state_shape), dtype=np.uint8)
        self.actions = np.zeros((buffer_size, action_dim), dtype=np.float32)
        self.rewards = np.zeros(buffer_size, dtype=np.float32)
        self.dones = np.zeros(buffer_size, dtype=np.bool_)
        self.values = np.zeros(buffer_size, dtype=np.float32)
        self.log_probs = np.zeros(buffer_size, dtype=np.float32)
-    
+
    def add(self, state, action, reward, done, value, log_prob):
        self.states[self.ptr] = state
        self.actions[self.ptr] = action
@@ -220,34 +213,34 @@ class RolloutBuffer:
        self.log_probs[self.ptr] = log_prob
        self.ptr = (self.ptr + 1) % self.buffer_size
        self.size = min(self.size + 1, self.buffer_size)
-    
+
    def compute_returns(self, last_value, gamma=0.99, gae_lambda=0.98):
        advantages = np.zeros(self.size, dtype=np.float32)
        last_gae = 0
-        
+
        for t in reversed(range(self.size)):
            if t == self.size - 1:
                next_value = last_value
            else:
                next_value = self.values[t + 1]
-            
+
            delta = self.rewards[t] + gamma * next_value * (1 - self.dones[t]) - self.values[t]
            last_gae = delta + gamma * gae_lambda * (1 - self.dones[t]) * last_gae
            advantages[t] = last_gae
-        
-        returns = advantages + self.values[:self.size]
+
+        returns = advantages + self.values[: self.size]
        return returns, advantages
-    
+
    def get(self):
        return (
-            self.states[:self.size],
-            self.actions[:self.size],
-            self.rewards[:self.size],
-            self.dones[:self.size],
-            self.values[:self.size],
-            self.log_probs[:self.size],
+            self.states[: self.size],
+            self.actions[: self.size],
+            self.rewards[: self.size],
+            self.dones[: self.size],
+            self.values[: self.size],
+            self.log_probs[: self.size],
        )
-    
+
    def reset(self):
        self.ptr = 0
        self.size = 0
@@ -282,55 +275,53 @@ class PPOTrainer:
        self.max_grad_norm = max_grad_norm
        self.ppo_epochs = ppo_epochs
        self.mini_batch_size = mini_batch_size
-        
+
        self.actor_optim = torch.optim.Adam(actor.parameters(), lr=lr, eps=1e-5)
        self.critic_optim = torch.optim.Adam(critic.parameters(), lr=lr, eps=1e-5)
-        
-        self.total_updates = 0
-    
+
    def update(self, last_value):
        states, actions, rewards, dones, values, log_probs_old = self.buffer.get()
-        
+
        returns, advantages = self.buffer.compute_returns(last_value, self.gamma, self.gae_lambda)
-        
+
        advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-8)
-        
+
        states_t = torch.from_numpy(states).float().permute(0, 3, 1, 2).to(self.device)
        actions_t = torch.from_numpy(actions).float().to(self.device)
        log_probs_old_t = torch.from_numpy(log_probs_old).float().to(self.device)
        returns_t = torch.from_numpy(returns).float().to(self.device)
        advantages_t = torch.from_numpy(advantages).float().to(self.device)
-        
+
        dataset = torch.utils.data.TensorDataset(
            states_t, actions_t, log_probs_old_t, returns_t, advantages_t
        )
        loader = torch.utils.data.DataLoader(dataset, batch_size=self.mini_batch_size, shuffle=True)
-        
+
        total_actor_loss = 0
        total_critic_loss = 0
        total_entropy = 0
        count = 0
-        
+
        for _ in range(self.ppo_epochs):
            for batch in loader:
                s, a, log_pi_old, ret, adv = batch
-                
+
                mu, std = self.actor(s)
                dist = torch.distributions.Normal(mu, std)
                log_pi = dist.log_prob(a).sum(dim=-1)
                entropy = dist.entropy().sum(dim=-1)
-                
+
                ratio = torch.exp(log_pi - log_pi_old)
-                
+
                surr1 = ratio * adv
                surr2 = torch.clamp(ratio, 1 - self.clip_eps, 1 + self.clip_eps) * adv
                actor_loss = -torch.min(surr1, surr2).mean()
-                
+
                value = self.critic(s)
                critic_loss = nn.MSELoss()(value.squeeze(), ret)
-                
+
                loss = actor_loss + self.vf_coef * critic_loss - self.ent_coef * entropy.mean()
-                
+
                self.actor_optim.zero_grad()
                self.critic_optim.zero_grad()
                loss.backward()
@@ -338,18 +329,16 @@ class PPOTrainer:
                nn.utils.clip_grad_norm_(self.critic.parameters(), self.max_grad_norm)
                self.actor_optim.step()
                self.critic_optim.step()
-                
+
                total_actor_loss += actor_loss.item()
                total_critic_loss += critic_loss.item()
                total_entropy += entropy.mean().item()
                count += 1
-        
-        self.total_updates += 1
-        
+
        avg_actor = total_actor_loss / count
        avg_critic = total_critic_loss / count
        avg_entropy = total_entropy / count
-        
+
        self.buffer.reset()
        return avg_actor, avg_critic, avg_entropy

@@ -357,10 +346,10 @@ class PPOTrainer:
 def collect_rollout(actor, critic, env, buffer, device, rollout_steps):
    obs, _ = env.reset()
    obs = np.transpose(obs, (1, 2, 0))
-    
+
    for step in range(rollout_steps):
        obs_t = torch.from_numpy(obs).float().unsqueeze(0).permute(0, 3, 1, 2).to(device)
-        
+
        with torch.no_grad():
            mu, std = actor(obs_t)
            dist = torch.distributions.Normal(mu, std)
@@ -368,27 +357,27 @@ def collect_rollout(actor, critic, env, buffer, device, rollout_steps):
            action = torch.clamp(action, -1, 1)
            log_prob = dist.log_prob(action).sum(dim=-1)
            value = critic(obs_t).squeeze(0).item()
-            
+
            action_np = action.squeeze(0).cpu().numpy()
            log_prob_np = log_prob.squeeze(0).cpu().numpy()
-        
+
        next_obs, reward, terminated, truncated, _ = env.step(action_np)
        done = terminated or truncated
-        
+
        next_obs_stored = np.transpose(next_obs, (1, 2, 0))
-        
+
        buffer.add(obs.copy(), action_np, reward, done, value, log_prob_np)
-        
+
        obs = next_obs_stored
-        
+
        if done:
            obs, _ = env.reset()
            obs = np.transpose(obs, (1, 2, 0))
-    
+
    return obs


-def train_improved(
+def train(
    total_steps=2000000,
    rollout_steps=2048,
    eval_interval=10,
@@ -397,22 +386,22 @@ def train_improved(
 ):
    if device is None:
        device = get_device()
-    
+
    env = make_env()
    eval_env = make_env()
-    
+
    state_shape = (84, 84, 4)
    action_dim = 3
-    
+
    actor = Actor(state_shape=state_shape, action_dim=action_dim).to(device)
    critic = Critic(state_shape=state_shape).to(device)
-    
+
    buffer = RolloutBuffer(
        buffer_size=rollout_steps,
        state_shape=state_shape,
        action_dim=action_dim,
    )
-    
+
    trainer = PPOTrainer(
        actor=actor,
        critic=critic,
@@ -428,46 +417,48 @@ def train_improved(
        ppo_epochs=10,
        mini_batch_size=128,
    )
-    
+
    log_dir = os.path.join("logs", "tensorboard", f"run_improved_{int(time.time())}")
    writer = SummaryWriter(log_dir)
-    
+
    print(f"Training on {device}")
    print(f"Log directory: {log_dir}")
-    print("Improvements: LeakyReLU, BatchNorm, He init, Reward shaping, LR decay, More epochs")
-    
+    print("Improvements: LeakyReLU, BatchNorm, He init, Reward shaping, More epochs")
+
    episode = 0
    total_timesteps = 0
    episode_rewards = []
-    best_eval = -float('inf')
-    
+    best_eval = -float("inf")
+
    while total_timesteps < total_steps:
        obs = collect_rollout(actor, critic, env, buffer, device, rollout_steps)
-        
+
        with torch.no_grad():
            obs_t = torch.from_numpy(obs).float().unsqueeze(0).permute(0, 3, 1, 2).to(device)
            last_value = critic(obs_t).squeeze(0).item()
-        
+
        actor_loss, critic_loss, entropy = trainer.update(last_value)
-        
+
        writer.add_scalar("Loss/Actor", actor_loss, total_timesteps)
        writer.add_scalar("Loss/Critic", critic_loss, total_timesteps)
        writer.add_scalar("Loss/Entropy", entropy, total_timesteps)
-        
+
        total_timesteps += rollout_steps
        episode += 1
-        
-        ep_reward = buffer.rewards[:buffer.size].sum()
+
+        ep_reward = buffer.rewards[: buffer.size].sum()
        episode_rewards.append(ep_reward)
-        
+
        recent_rewards = episode_rewards[-10:] if len(episode_rewards) >= 10 else episode_rewards
        avg_reward = np.mean(recent_rewards)
-        
+
        writer.add_scalar("Reward/Episode", ep_reward, total_timesteps)
        writer.add_scalar("Reward/AvgLast10", avg_reward, total_timesteps)
-        
-        print(f"Episode {episode}, steps {total_timesteps}, ep_reward={ep_reward:.1f}, avg_10={avg_reward:.1f}")
-        
+
+        print(
+            f"Episode {episode}, steps {total_timesteps}, ep_reward={ep_reward:.1f}, avg_10={avg_reward:.1f}"
+        )
+
        if episode % eval_interval == 0:
            eval_returns = []
            for _ in range(5):
@@ -475,54 +466,69 @@ def train_improved(
                eval_obs = np.transpose(eval_obs, (1, 2, 0))
                eval_reward = 0
                done = False
-                
+
                while not done:
                    with torch.no_grad():
-                        eval_obs_t = torch.from_numpy(eval_obs).float().unsqueeze(0).permute(0, 3, 1, 2).to(device)
+                        eval_obs_t = (
+                            torch.from_numpy(eval_obs)
+                            .float()
+                            .unsqueeze(0)
+                            .permute(0, 3, 1, 2)
+                            .to(device)
+                        )
                        mu, std = actor(eval_obs_t)
                        action = torch.clamp(mu, -1, 1).squeeze(0).cpu().numpy()
                    eval_obs, reward, terminated, truncated, _ = eval_env.step(action)
                    eval_obs = np.transpose(eval_obs, (1, 2, 0))
                    eval_reward += reward
                    done = terminated or truncated
-                
+
                eval_returns.append(eval_reward)
-            
+
            mean_eval = np.mean(eval_returns)
            writer.add_scalar("Eval/MeanReturn", mean_eval, episode)
            print(f"  Eval: mean_return={mean_eval:.2f}")
-            
+
            if mean_eval > best_eval:
                best_eval = mean_eval
                os.makedirs("models", exist_ok=True)
-                torch.save({
+                torch.save(
+                    {
+                        "actor": actor.state_dict(),
+                        "critic": critic.state_dict(),
+                        "episode": episode,
+                        "timesteps": total_timesteps,
+                        "best_eval": best_eval,
+                    },
+                    os.path.join("models", "ppo_improved_best.pt"),
+                )
+                print(f"  New best model saved! eval={best_eval:.2f}")
+
+        if episode % save_interval == 0:
+            os.makedirs("models", exist_ok=True)
+            torch.save(
+                {
                    "actor": actor.state_dict(),
                    "critic": critic.state_dict(),
                    "episode": episode,
                    "timesteps": total_timesteps,
-                    "best_eval": best_eval,
-                }, os.path.join("models", "ppo_improved_best.pt"))
-                print(f"  New best model saved! eval={best_eval:.2f}")
-        
-        if episode % save_interval == 0:
-            os.makedirs("models", exist_ok=True)
-            torch.save({
-                "actor": actor.state_dict(),
-                "critic": critic.state_dict(),
-                "episode": episode,
-                "timesteps": total_timesteps,
-            }, os.path.join("models", f"ppo_improved_ep{episode}.pt"))
+                },
+                os.path.join("models", f"ppo_improved_ep{episode}.pt"),
+            )
            print(f"  Saved model at episode {episode}")
-    
+
    os.makedirs("models", exist_ok=True)
-    torch.save({
-        "actor": actor.state_dict(),
-        "critic": critic.state_dict(),
-        "episode": episode,
-        "timesteps": total_timesteps,
-        "best_eval": best_eval,
-    }, os.path.join("models", "ppo_improved_final.pt"))
-    
+    torch.save(
+        {
+            "actor": actor.state_dict(),
+            "critic": critic.state_dict(),
+            "episode": episode,
+            "timesteps": total_timesteps,
+            "best_eval": best_eval,
+        },
+        os.path.join("models", "ppo_improved_final.pt"),
+    )
+
    writer.close()
    env.close()
    eval_env.close()
@@ -534,6 +540,6 @@ if __name__ == "__main__":
    parser.add_argument("--steps", type=int, default=2000000, help="Total training steps")
    parser.add_argument("--rollout", type=int, default=2048, help="Rollout buffer size")
    args = parser.parse_args()
-    
+
    device = get_device()
-    train_improved(total_steps=args.steps, rollout_steps=args.rollout, device=device)
+    train(total_steps=args.steps, rollout_steps=args.rollout, device=device)
@@ -1,251 +1,251 @@
-        XJTLU Entrepreneur College (Taicang) Cover Sheet
-
- Module code and Title       DTS307TC Reinforcement Learning
- School Title                School of AI and Advanced Computing
- Assignment Title            Coursework 1
- Submission Deadline         04/May/2026 23:59
- Final Word Count
- If you agree to let the university use your work anonymously for teaching
- and learning purposes, please type “yes” here.
-
-
-I certify that I have read and understood the University’s Policy for dealing with Plagiarism,
-Collusion and the Fabrication of Data (available on Learning Mall Online). With reference to this
-policy I certify that:
-
-    •   My work does not contain any instances of plagiarism and/or collusion.
-        My work does not contain any fabricated data.
-
-
-
-By uploading my assignment onto Learning Mall Online, I formally declare
-that all of the above information is true to the best of my knowledge and
-belief.
-                                         Scoring – For Tutor Use
- Student ID
-
- Stage of           Marker            Learning Outcomes Achieved （F/P/M/D）                          Final
- Marking             Code                   (please modify as appropriate)                          Score
-                               A                      B                C
- 1st Marker – red
- pen
- Moderation                        The original mark has been accepted by the moderator              Y/N
-                      IM                        (please circle as appropriate):
- – green pen        Initials
-                                   Data entry and score calculation have been checked by               Y
-                                                another tutor (please circle):
- 2nd Marker if
- needed – green
- pen
- For Academic Office Use             Possible Academic Infringement (please tick as appropriate)
- Date          Days Late                   ☐ Category A
- Received late      Penalty                                                Total Academic Infringement Penalty
-                                            ☐ Category B                   (A,B, C, D, E, Please modify where
-                                                                           necessary) _____________________
-                                            ☐ Category C
-                                            ☐ Category D
-                                            ☐ Category E
-School of Artificial Intelligence and Advanced Computing
-Xi’an Jiaotong-Liverpool University
-
-
-
-
-                             DTS307TC Reinforcement Learning
-
-                               Coursework - Individual Report
-
-Due: 04/May/2026 23:59
-Weight: 40%
-Maximum score: 40 marks
-
-
-
-
-Overview
-
-The purpose of this assignment is to gain experience in Python programming and the design of
-reinforcement leaning algorithms. You are expected to implement an RL algorithm that solves a
-specific environment and provide an explanation of the algorithm’s methodology. You are expected
-to analyse your results, including challenges and your solutions.
-
-
-Learning Outcomes Assessed
-
-A: Systematically understand the fundamental concepts and principles of reinforcement learning
-B: Critically analyse real-life problem situations and expertly map them as reinforcement learning
-tasks.
-C: Mastery of Monte Carlo Methods and Temporal Difference Learning
-D: Proficiency in Deep Reinforcement Learning algorithms
-
-
-Late policy
-
-5% of the total marks available for the assessment shall be deducted from the assessment mark for
-each working day after the submission date, up to a maximum of five working days
-
-
-Avoid Plagiarism
-
-  • Do not submit work from other students.
-
-  • Do not share code/work with other students
-
-  • Do not use open-source code as it is or without proper reference.
-
-
-
-
-                                                2
-Risks
-
-   • Please read the coursework instructions and requirements carefully. Not following these instructions
-     and requirements may result in a loss of marks.
-   • The assignment must be submitted via Learning Mall. Only electronic submission is accepted
-     and no hard copy submission.
-   • All students must download their file and check that it is viewable after submission. Documents
-     may become corrupted during the uploading process (e.g. due to slow internet connections).
-     However, students are responsible for submitting a functional and correct file for assessments.
-   • Academic Integrity Policy is strictly followed.
-
-
-Individual Report (40 marks)
-
-The primary objective of this coursework is to familiarize students with the PPO algorithm using
-basic deep learning libraries, enabling them to improve their capability in transferring mathematical
-and theoretical knowledge into Python implementation, and further their understanding of the actor-
-critic algorithm.
-
-
-Algorithm Overview
-
-Proximal Policy Optimization (PPO) is a state-of-the-art reinforcement learning algorithm that optimizes
-a stochastic policy in an on-policy manner. To ensure stable training and avoid catastrophic performance
-collapse, PPO utilizes a clipped surrogate objective to prevent the policy update from stepping too
-far from the current behavior.
-
-
-The Environment: CarRacing-v3
-
-We will be using the Car Racing environment from the OpenAI Gymnasium. This environment
-features a top-down racing track where the agent must learn to navigate through tiles based on
-pixel inputs. You can find more details about this environment on their website.(https://gymnasium.
-farama.org/environments/box2d/car_racing/)
-Here’s a code snippet for you to get started:
-
-import gymnasium as gym
-env = gym . make ( " CarRacing - v3 " , render_mode = " rgb_array " )
-env . reset ()
-
-Since CarRacing-v3 is quite computationally expensive for a standard laptop (due to the pixel processing),
-you might want to consider using a gray-scaling or frame-stacking wrapper to speed up training.
-Alternatively, you can also use the lab computers, which have GPUs and have all the environment
-already set up.
-
-
-The PPO Agent
-
-You will implement an RL agent using PPO to play the CarRacing-v3 environment. The agent
-will use the standard observation and actions provided by the environment. You may edit the
-
-                                                      3
-environment to speed up your training, but your agent must still perform well in the standard
-environment. (i.e, removing the camera zoom at the beginning is allowed during training, but
-your agent should still be tested in the original environment.) You should record your training and
-evaluation process using Tensorboard. You should also record important losses and other data for
-your analysis later.
-
-
-The Report
-
-Upon completion of your implementation, you are required to submit a comprehensive technical
-report. The report should document your engineering decisions, the theoretical grounding of your
-code, and a critical analysis of the agent’s performance.
-
-  1. Introduction
-
-       • Provide a brief overview of Reinforcement Learning in the context of the CarRacing-v3
-         environment.
-       • Define the state space (pixels), action space (discrete commands), and the reward structure
-         of the task.
-
-  2. Methodology
-
-       • Mathematical Foundation: Formulate the PPO objective function. Explain the significance
-         of the clipping parameter and the probability ratio.
-       • Advantage Estimation: Describe your method for calculating advantages (e.g., standard
-         advantage vs. Generalized Advantage Estimation (GAE)).
-
-  3. Implementation Details
-
-       • Describe your implementation, including any challenges faced and how you addressed
-         them.
-       • Explain the structure of your policy and value networks.
-       • Detail the training process and hyperparameters used.
-
-  4. Results and Analysis
-
-       • Present your results (use graphs for better clarity).
-       • Discuss the performance of your agent and any trends observed.
-       • Briefly compare your custom implementation’s stability and sample efficiency against baseline
-         benchmarks (e.g., Stable-Baselines3).
-
-  5. Conclusion
-
-       • Summarize your key findings regarding the sensitivity of PPO to hyperparameter tuning
-         and the effectiveness of the actor-critic framework in continuous-input environments.
-
-     Note: All figures and plots must be clearly labeled with axes titles and legends. Raw code
-     snippets should be kept to a minimum in the report; focus on high-level logic and pseudo-
-     code where necessary.
-
-
-
-
-                                                 4
-Important Note
-
-   • Do NOT use Stable-baselines libraries or any other reinforcement learning specific libraries in
-     your implementation (You may use tensorboard for recording your results).
-
-   • Do NOT exceed the word count limit of 3000 words for each report, reference and appendix
-     excluded.
-
-   • Although you are allowed to use any generative AI tools to assist your work, please keep in mind
-     that you should be using them responsibly. (Good use: Improve your report after writing it
-     and always review its output to ensure that it is correct. Bad use: Copy-pasting an entire report
-     from AI without any effort of your own. )
-
-
-Submission Requirements
-
-Please prepare and submit the following documents:
-
-   • A cover page featuring your student ID. This page should be the first page of your report.
-
-   • A zip file containing all the source codes and your trained agent model, which should be named
-     using your full name and student ID in the following format: CW1_ID_Name.zip
-
-   • One PDF file for your report. The file should be separated from the zip file, which contains your
-     code. The files should be named in the following format: CW1_ID_Name.pdf
-
-Note that the quality of the code, the clarity of your writing, and the format/style of your report will
-be taken into consideration during the evaluation. The detailed rubric is outlined below.
-
-
-Rubric
-
- CW1 (40 makrs)     Criteria                                                                                       Marks
- Code Performance   Code runs without errors and performs tasks as specified.                                      6
- Code Quality       Code is well-organized, includes meaningful comments, and uses appropriate variable names.     6
- Methodology        Comprehensive coverage of topics with detailed explanations of approaches and methodologies.   6
- Result analysis    Insightful analysis of results.                                                                6
- Report Quality     Report is well-structured, formatted, and free of grammatical errors.                          6
- Evidence of Work   All required elements are included and correct.                                                6
- Submission         Follows all requirements for submission                                                        4
-
-
-
-
-                                                           5
+        XJTLU Entrepreneur College (Taicang) Cover Sheet
+
+ Module code and Title       DTS307TC Reinforcement Learning
+ School Title                School of AI and Advanced Computing
+ Assignment Title            Coursework 1
+ Submission Deadline         04/May/2026 23:59
+ Final Word Count
+ If you agree to let the university use your work anonymously for teaching
+ and learning purposes, please type “yes” here.
+
+
+I certify that I have read and understood the University’s Policy for dealing with Plagiarism,
+Collusion and the Fabrication of Data (available on Learning Mall Online). With reference to this
+policy I certify that:
+
+    •   My work does not contain any instances of plagiarism and/or collusion.
+        My work does not contain any fabricated data.
+
+
+
+By uploading my assignment onto Learning Mall Online, I formally declare
+that all of the above information is true to the best of my knowledge and
+belief.
+                                         Scoring – For Tutor Use
+ Student ID
+
+ Stage of           Marker            Learning Outcomes Achieved （F/P/M/D）                          Final
+ Marking             Code                   (please modify as appropriate)                          Score
+                               A                      B                C
+ 1st Marker – red
+ pen
+ Moderation                        The original mark has been accepted by the moderator              Y/N
+                      IM                        (please circle as appropriate):
+ – green pen        Initials
+                                   Data entry and score calculation have been checked by               Y
+                                                another tutor (please circle):
+ 2nd Marker if
+ needed – green
+ pen
+ For Academic Office Use             Possible Academic Infringement (please tick as appropriate)
+ Date          Days Late                   ☐ Category A
+ Received late      Penalty                                                Total Academic Infringement Penalty
+                                            ☐ Category B                   (A,B, C, D, E, Please modify where
+                                                                           necessary) _____________________
+                                            ☐ Category C
+                                            ☐ Category D
+                                            ☐ Category E
+School of Artificial Intelligence and Advanced Computing
+Xi’an Jiaotong-Liverpool University
+
+
+
+
+                             DTS307TC Reinforcement Learning
+
+                               Coursework - Individual Report
+
+Due: 04/May/2026 23:59
+Weight: 40%
+Maximum score: 40 marks
+
+
+
+
+Overview
+
+The purpose of this assignment is to gain experience in Python programming and the design of
+reinforcement leaning algorithms. You are expected to implement an RL algorithm that solves a
+specific environment and provide an explanation of the algorithm’s methodology. You are expected
+to analyse your results, including challenges and your solutions.
+
+
+Learning Outcomes Assessed
+
+A: Systematically understand the fundamental concepts and principles of reinforcement learning
+B: Critically analyse real-life problem situations and expertly map them as reinforcement learning
+tasks.
+C: Mastery of Monte Carlo Methods and Temporal Difference Learning
+D: Proficiency in Deep Reinforcement Learning algorithms
+
+
+Late policy
+
+5% of the total marks available for the assessment shall be deducted from the assessment mark for
+each working day after the submission date, up to a maximum of five working days
+
+
+Avoid Plagiarism
+
+  • Do not submit work from other students.
+
+  • Do not share code/work with other students
+
+  • Do not use open-source code as it is or without proper reference.
+
+
+
+
+                                                2
+Risks
+
+   • Please read the coursework instructions and requirements carefully. Not following these instructions
+     and requirements may result in a loss of marks.
+   • The assignment must be submitted via Learning Mall. Only electronic submission is accepted
+     and no hard copy submission.
+   • All students must download their file and check that it is viewable after submission. Documents
+     may become corrupted during the uploading process (e.g. due to slow internet connections).
+     However, students are responsible for submitting a functional and correct file for assessments.
+   • Academic Integrity Policy is strictly followed.
+
+
+Individual Report (40 marks)
+
+The primary objective of this coursework is to familiarize students with the PPO algorithm using
+basic deep learning libraries, enabling them to improve their capability in transferring mathematical
+and theoretical knowledge into Python implementation, and further their understanding of the actor-
+critic algorithm.
+
+
+Algorithm Overview
+
+Proximal Policy Optimization (PPO) is a state-of-the-art reinforcement learning algorithm that optimizes
+a stochastic policy in an on-policy manner. To ensure stable training and avoid catastrophic performance
+collapse, PPO utilizes a clipped surrogate objective to prevent the policy update from stepping too
+far from the current behavior.
+
+
+The Environment: CarRacing-v3
+
+We will be using the Car Racing environment from the OpenAI Gymnasium. This environment
+features a top-down racing track where the agent must learn to navigate through tiles based on
+pixel inputs. You can find more details about this environment on their website.(https://gymnasium.
+farama.org/environments/box2d/car_racing/)
+Here’s a code snippet for you to get started:
+
+import gymnasium as gym
+env = gym . make ( " CarRacing - v3 " , render_mode = " rgb_array " )
+env . reset ()
+
+Since CarRacing-v3 is quite computationally expensive for a standard laptop (due to the pixel processing),
+you might want to consider using a gray-scaling or frame-stacking wrapper to speed up training.
+Alternatively, you can also use the lab computers, which have GPUs and have all the environment
+already set up.
+
+
+The PPO Agent
+
+You will implement an RL agent using PPO to play the CarRacing-v3 environment. The agent
+will use the standard observation and actions provided by the environment. You may edit the
+
+                                                      3
+environment to speed up your training, but your agent must still perform well in the standard
+environment. (i.e, removing the camera zoom at the beginning is allowed during training, but
+your agent should still be tested in the original environment.) You should record your training and
+evaluation process using Tensorboard. You should also record important losses and other data for
+your analysis later.
+
+
+The Report
+
+Upon completion of your implementation, you are required to submit a comprehensive technical
+report. The report should document your engineering decisions, the theoretical grounding of your
+code, and a critical analysis of the agent’s performance.
+
+  1. Introduction
+
+       • Provide a brief overview of Reinforcement Learning in the context of the CarRacing-v3
+         environment.
+       • Define the state space (pixels), action space (discrete commands), and the reward structure
+         of the task.
+
+  2. Methodology
+
+       • Mathematical Foundation: Formulate the PPO objective function. Explain the significance
+         of the clipping parameter and the probability ratio.
+       • Advantage Estimation: Describe your method for calculating advantages (e.g., standard
+         advantage vs. Generalized Advantage Estimation (GAE)).
+
+  3. Implementation Details
+
+       • Describe your implementation, including any challenges faced and how you addressed
+         them.
+       • Explain the structure of your policy and value networks.
+       • Detail the training process and hyperparameters used.
+
+  4. Results and Analysis
+
+       • Present your results (use graphs for better clarity).
+       • Discuss the performance of your agent and any trends observed.
+       • Briefly compare your custom implementation’s stability and sample efficiency against baseline
+         benchmarks (e.g., Stable-Baselines3).
+
+  5. Conclusion
+
+       • Summarize your key findings regarding the sensitivity of PPO to hyperparameter tuning
+         and the effectiveness of the actor-critic framework in continuous-input environments.
+
+     Note: All figures and plots must be clearly labeled with axes titles and legends. Raw code
+     snippets should be kept to a minimum in the report; focus on high-level logic and pseudo-
+     code where necessary.
+
+
+
+
+                                                 4
+Important Note
+
+   • Do NOT use Stable-baselines libraries or any other reinforcement learning specific libraries in
+     your implementation (You may use tensorboard for recording your results).
+
+   • Do NOT exceed the word count limit of 3000 words for each report, reference and appendix
+     excluded.
+
+   • Although you are allowed to use any generative AI tools to assist your work, please keep in mind
+     that you should be using them responsibly. (Good use: Improve your report after writing it
+     and always review its output to ensure that it is correct. Bad use: Copy-pasting an entire report
+     from AI without any effort of your own. )
+
+
+Submission Requirements
+
+Please prepare and submit the following documents:
+
+   • A cover page featuring your student ID. This page should be the first page of your report.
+
+   • A zip file containing all the source codes and your trained agent model, which should be named
+     using your full name and student ID in the following format: CW1_ID_Name.zip
+
+   • One PDF file for your report. The file should be separated from the zip file, which contains your
+     code. The files should be named in the following format: CW1_ID_Name.pdf
+
+Note that the quality of the code, the clarity of your writing, and the format/style of your report will
+be taken into consideration during the evaluation. The detailed rubric is outlined below.
+
+
+Rubric
+
+ CW1 (40 makrs)     Criteria                                                                                       Marks
+ Code Performance   Code runs without errors and performs tasks as specified.                                      6
+ Code Quality       Code is well-organized, includes meaningful comments, and uses appropriate variable names.     6
+ Methodology        Comprehensive coverage of topics with detailed explanations of approaches and methodologies.   6
+ Result analysis    Insightful analysis of results.                                                                6
+ Report Quality     Report is well-structured, formatted, and free of grammatical errors.                          6
+ Evidence of Work   All required elements are included and correct.                                                6
+ Submission         Follows all requirements for submission                                                        4
+
+
+
+
+                                                           5

@@ -1,260 +1,260 @@
-                                XJTLU Entrepreneur College (Taicang) Cover Sheet
-
-                                                                                                School of AI and Advanced
-         Module code           DTS304TC: Machine Learning                 School title
-                                                                                                Computing
-
-         Assessment title      Coursework Task 1                          Assessment type       Coursework
-
-         Submission
-                               01/May/2026 23:59
-         deadline
-
-
-I certify that I have read and understood the University's Policy for dealing with Plagiarism, Collusion and the Fabrication of Data
-(available on Learning Mall Online).
-My work does not contain any instances of plagiarism and/or collusion.
-My work does not contain any fabricated data.
-
-
-  By uploading my assignment onto Learning Mall Online, I formally declare that all of the
-            above information is true to the best of my knowledge and belief.
-                                              Scoring – For Tutor Use
-                             Student ID
-          Theory and Reflection PDF Word Count (Filled by
-                             Students)
-
-        Stage of Marking       Marker              Learning Outcomes Achieved （F/P/M/D）                           Final
-                               Code                                                                               Score
-                                                        (please modify as appropriate)
-                                                     A                   B             C
-         1st Marker – red
-               pen
-            Moderation                        The original mark has been accepted by the moderator                 Y/N
-                                  IM                       (please circle as appropriate):
-           – green pen         Initials
-                                             Data entry and score calculation have been checked by                   Y
-                                                          another tutor (please circle):
-           2nd Marker if
-         needed – green
-               pen
-          For Academic Office Use                  Possible Academic Infringement (please tick as appropriate)
-          Date      Days     Late                       ☐ Category A
-        Received     late  Penalty                                                        Total Academic Infringement Penalty
-                                                        ☐ Category B                       (A,B, C, D, E, Please modify where
-                                                                                          necessary) _____________________
-                                                        ☐ Category C
-                                                        ☐ Category D
-                                                        ☐ Category E
-                                              DTS304TC Machine Learning
-                                            Coursework - Assessment Task 1
-•     Percentage in final mark: 50%
-•     Assessment type: individual coursework
-•     Submission files: one Jupyter notebook (.ipynb), one Coursework Answer Sheet / Theory and Reflection PDF, and one
-      hidden-test CSV
-
-    Learning outcomes assessed
-•     A. Demonstrate a solid understanding of the theoretical issues related to problems that machine-learning methods try to
-      address.
-•     B. Demonstrate understanding of the properties of existing machine-learning algorithms and how they behave on practical data.
-
-
-
-    Notes
-•     Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may
-      result in a loss of marks.
-•     The formal procedure for submitting coursework at XJTLU is strictly followed. Submission link on Learning Mall will be provided
-      in due course. The submission timestamp on Learning Mall will be used to check late submission.
-•     5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the
-      submission date, up to a maximum of five working days.
-•     All modelling work must be completed individually. Discussion of general ideas is allowed, but code, experiments, and
-      notebooks must be independently developed.
-•     You may not use ChatGPT to directly generate answers for the coursework. High-scoring work must demonstrate your own
-      experimental design, controlled comparisons, failure analysis, and image-level interpretation. ChatGPT or similar tools may be
-      used only in a limited support role such as code understanding, debugging, or grammar support. They must not replace your
-      method design, ablation logic, qualitative analysis, or reflection. Generic AI-produced descriptions without matching evidence in
-      code, tables, figures, and discussion will not receive high marks.
-•     If you use AI tools or outside code in any meaningful way, you must fully understand, verify, and take ownership of every
-      method, number, figure, and written claim that appears in your submission.
-
-
-
-     Question 1: Notebook-Based Coding Exercise - Insurance Premium-Risk Classification (60
-     Marks)
-    In this coursework you will build and improve a multiclass classifier for a fictionalised health-insurance dataset. The task is
-    to predict whether each applicant belongs to a Low, Standard, or High premium-risk group before pricing a policy. The
-    dataset is intentionally realistic: it mixes numerical and categorical variables, contains missing values and dirty entries, and
-    includes some fields that require careful handling to avoid weak modelling practice or label leakage.
-    Your work should show a clear machine-learning workflow: build a sensible first pipeline, compare model families, apply
-    stronger hyperparameter optimisation, complete one compulsory improvement category plus at least one optional category,
-    carry out a compact K-Means/Gaussian Mixture Model (GMM) exploration, and then produce a hidden-test CSV using
-    validation evidence only.
-    The prediction target variable is ‘premium_risk’, and it has 3 imbalanced classes: Standard, High, Low. The dataset
-    contains 33 raw columns: admin/PII columns, synthetic noise features, 1 leakage feature, and genuine predictors.
-    Unless otherwise stated, macro-F1 is the primary validation metric because the dataset is imbalanced; accuracy is reported
-    as a secondary metric.
-    (A) Clean First Pipeline and Baseline Modelling (8 marks)
-•     Load the provided training and validation files and define a consistent target / feature setup.
-•     Handle leakage features, dirty values, missing values, and categorical variables sensibly. A compact sanity check is enough; a
-      long data-audit section is not required.
-      Important: The dataset contains a leakage feature. You must identify and remove it before proceeding to the next stage
-      of analysis; otherwise, the classification results will be severely biased by this leakage and will not be meaningful. If
-      this occurs, multiple parts of your Coursework 1 may be affected, which could significantly impact your marks.
-•     Build one baseline modelling pipeline.
-•     Report at least one validation result using accuracy and macro-F1 score and include a confusion matrix for the baseline model.
-•     Keep preprocessing consistent across train, validation, and hidden-test files.
-
-
-    (B) Controlled Comparison: Random Forest and One Boosting Model (8 marks)
-•     Using the same preprocessing pipeline, validation split, and evaluation metric (primary metric is macro-F1 also report accuracy),
-      carry out an initial controlled comparison between one Random Forest model and one boosting model.
-•     Default XGBoost is recommended because it provides a richer tuning space later, but others may also be used. Default settings
-      or only light sensible adjustments are acceptable in this section.
-•     In the notebook, report the validation result of each model and support the comparison with one or two additional analyses, such
-      as class-wise metrics, a confusion matrix, train-versus-validation behaviour, or stability / sensitivity after tuning.
-•     Your goal is not to prove that one model type always wins. Your goal is to compare the two models fairly, explain the high-level
-      learning difference between bagging and boosting, and use your own notebook evidence to give a careful, dataset-specific
-      interpretation. A generic textbook answer without reference to your own results will receive limited credit.
-    (C) Advanced Hyperparameter Optimisation (12 marks)
-•     At least one main model should be tuned with a genuinely advanced strategy such as Optuna/TPE, Bayesian optimisation,
-      Hyperopt, Ray Tune, or another comparably strong approach.
-•     Hyperparameter tuning should optimise macro-F1 score on the validation set, and the final tuned result should be reported
-      using both accuracy and macro-F1.
-•     RandomizedSearchCV alone is normally not enough for the top band.
-•     Explain briefly why your search space and optimiser are reasonable for the chosen model.
-    (D) Personalised Improvement Work (18 marks)
-    You must complete one compulsory category based on the last digit of your XJTLU student ID, plus at least one additional
-    optional category of your choice. A second optional category is recommended for stronger differentiation but is not compulsory.
-    You should report accuracy and macro-F1 for improved models and include class-wise metrics where helpful. A compact ablation
-    table should normally be included in the notebook for the personalized improvement work
-
-     Last digit                                                    Compulsory category
-                                0-1                                Category A - Data quality and missingness
-                                2-3                                Category B - Feature representation and engineering
-                                4-5                                Category C - Imbalance and objective design
-                                6-7                                Category D - Model robustness, calibration, or ensembling
-                                8-9                                Category E - Fairness, diagnostics, or interpretability
-                  Category                     Examples of what may be done                     What good evidence looks like
-                                             better missing-value strategy;              A concise before/after comparison with a short
-     A                                       MissForest or iterative imputation;         explanation of why the data handling changed the
-                                             sensible outlier handling; value cleaning   result
-                                             feature crosses; grouped categories;
-                                                                                         A compact ablation showing what representation
-     B                                       alternative encodings; modest feature
-                                                                                         changed and whether it helped
-                                             selection; transformations
-                                             class weighting; focal-style loss if
-                                                                                         Clear evidence of how minority or harder classes
-     C                                       relevant; sampling / resampling;
-                                                                                         changed, even if overall score moved only slightly
-                                             thresholding logic
-                                             bagging/boosting variants; calibration
-                                                                                         A meaningful diagnostic or comparison rather
-     D                                       checks; soft voting; stacking;
-                                                                                         than a large collection of loosely connected trials
-                                             robustness checks
-                                             SHAP / feature importance; subgroup-
-                                                                                         Concrete insight into model behaviour, not only
-     E                                       style fairness checks; error analysis;
-                                                                                         screenshots
-                                             model interpretation
-    (E) K-Means and Gaussian Mixture Model (GMM) Exploration (6 marks)
-    This is a compact exploratory section. It is not the main performance section, and it does not require clusters to match the class
-    labels exactly. The aim is to show your understanding of unsupervised learning methods and your ability to interpret their results
-    carefully.
-•     Use a sensible processed numeric feature space and briefly explain what you clustered on.
-•     Explore a small range of cluster/component numbers, such as 2-8.
-•     For K-Means, provide sensible supporting evidence, such as inertia (SSE), cluster sizes, or another simple analysis..
-•     For Gaussian Mixture Model (GMM), provide sensible supporting evidence, such as component sizes, posterior
-      confidence/responsibility, or overlap/uncertainty between components.
-•     Include at least one compact table or figure comparing K-Means and GMM.
-•     If class labels are used for reference, explain clearly that unsupervised structure does not need to align exactly with supervised
-      labels
-•     Stronger work may additionally use silhouette score, log-likelihood trends, or a simple visualization.
-
-
-    (F) Final Model Choice and Hidden-Test Export (8 marks)
-•     Choose the final model using validation evidence only.
-•     Retrain appropriately using both train and validation dataset and generate the hidden-test CSV in the required format.
-•     Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
-      column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
-      Low).
-      Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4 marks
-      from this section.
-•     Do not tune on the hidden test and do not claim hidden test performance.
-•     Note: Hidden test score contributes only a small portion of the final marks. High leaderboard rank alone cannot compensate for
-      weak experimental design or poor documentation.
-
-
-     Coursework Answer Sheet / Theory and Reflection (PDF) - all questions below are compulsory
-     (30 Marks)
-    The Coursework Answer Sheet / Theory and Reflection PDF should not repeat the notebook section by section. All prompt areas
-    below are compulsory. The PDF must be concise, directly linked to your own notebook evidence, and no longer than 4 pages /
-    1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from the PDF section. You should aim to
-    demonstrate both your theoretical or algorithmic understanding and your experimental findings or practical observations and
-    clearly link your understanding of the algorithms to your experimental analysis. At least one table, figure, or metric from the
-    notebook must be referenced in each theory answer.
-
-                            Prompt area                                                       What you should do
-                                                                     (1) Briefly state the definitions and key theoretical properties of bagging
-                                                                     and boosting models;
-                                                                     (2) report the validation results of each model;
-                                                                     (3) support your comparison with one or two additional analyses, such as
-                                                                     class-wise metrics, a confusion matrix, train–validation behaviour, or
-     1. Bagging versus boosting                                      stability/sensitivity after tuning; and
-                                                                     (4) provide a careful interpretation of what this comparison suggests
-                                                                     about this dataset and how it relates to the theoretical properties of
-                                                                     bagging versus boosting methods.
-                                                                     You are not expected to prove that one model type always performs
-                                                                     better.
-                                                                     Explain why your optimiser and search space were reasonable for the
-                                                                     chosen model, which hyperparameters you expected to matter most,
-     2. Hyperparameter optimisation
-                                                                     whether the tuned results matched that intuition, and what you learned
-                                                                     from the tuning process.
-                                                                     Explain hard versus soft assignment and the main assumption difference
-                                                                     between K-Means and GMM. Then use your own compact evidence to
-     3. K-Means versus Gaussian Mixture Model (GMM)                  discuss whether the results matched your intuition and whether GMM
-                                                                     revealed anything extra, such as soft membership, uncertainty, or a
-                                                                     better fit to partial cluster structure.
-                                                                     Reflect on the compulsory category and on every optional category you
-                                                                     implemented. Highlight any unique or interesting algorithm or strategy
-     4. Personalised reflection                                      you tried, the personal challenges you faced, the effort you made to
-                                                                     address them, and the key lessons you learned. Honest reflection on a
-                                                                     neutral or negative result is acceptable if the reasoning is concrete.
-                                                                     State briefly what forms of AI assistance, if any, were used. Generic AI-
-     5. AI-use declaration                                           written theory that does not match your notebook evidence will receive
-                                                                     limited credit.
-
-
-
-    Coding Quality, Coursework Answer Sheet Quality, and Submission Guidelines (10 marks)
-
-•     Submit your Jupyter Notebook in .ipynb format. It must be well organised, include clear commentary and clean code practices,
-      and show visible outputs. Do not write a second mini-report repeating notebook content.
-      •    The notebook should be reproducible from start to finish without errors. Results cited in the PDF should be visible in the
-           notebook and should match the reported values.
-      •    If you used supplementary code outside the notebook, submit that code as well so the full workflow remains reproducible.
-•     Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
-      column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
-      Low). Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4
-      marks from this section.
-•     Submit the Coursework Answer Sheet / Theory and Reflection in PDF format. All questions in that section are compulsory. The
-      Coursework Answer Sheet / Theory and Reflection PDF must answer every required prompt, refer to your own notebook
-      evidence, and remain within 4 pages and 1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from
-      the PDF section.
-•     Include all required components: Jupyter notebooks (code), any additional experimental scripts or custom code, the hidden
-      test-results CSV file, and the Coursework Answer Sheet PDF. Submit all files through the Learning Mall platform. After
-      submission, download your files to verify that they can be opened and viewed correctly to ensure the submission was
-      successful.
-
-    Project Material Access Instructions
-
-    To access the complete set of materials for this project, please use the links below:
-
-        •    OneDrive Link:
-             https://1drv.ms/f/c/18f09d1a39585f84/IgCXDMbXkFYSSZUZkkTyXyZzAQ1poX9mujUqF8N3JlL0GD0?e=uNhAHq
-        •    The same coursework materials have also been uploaded to Learning Mall.
-    When extracting the materials, use the following password to unlock the zip file: DTS304TC (case-sensitive, enter in
-    uppercase).
+                                XJTLU Entrepreneur College (Taicang) Cover Sheet
+
+                                                                                                School of AI and Advanced
+         Module code           DTS304TC: Machine Learning                 School title
+                                                                                                Computing
+
+         Assessment title      Coursework Task 1                          Assessment type       Coursework
+
+         Submission
+                               01/May/2026 23:59
+         deadline
+
+
+I certify that I have read and understood the University's Policy for dealing with Plagiarism, Collusion and the Fabrication of Data
+(available on Learning Mall Online).
+My work does not contain any instances of plagiarism and/or collusion.
+My work does not contain any fabricated data.
+
+
+  By uploading my assignment onto Learning Mall Online, I formally declare that all of the
+            above information is true to the best of my knowledge and belief.
+                                              Scoring – For Tutor Use
+                             Student ID
+          Theory and Reflection PDF Word Count (Filled by
+                             Students)
+
+        Stage of Marking       Marker              Learning Outcomes Achieved （F/P/M/D）                           Final
+                               Code                                                                               Score
+                                                        (please modify as appropriate)
+                                                     A                   B             C
+         1st Marker – red
+               pen
+            Moderation                        The original mark has been accepted by the moderator                 Y/N
+                                  IM                       (please circle as appropriate):
+           – green pen         Initials
+                                             Data entry and score calculation have been checked by                   Y
+                                                          another tutor (please circle):
+           2nd Marker if
+         needed – green
+               pen
+          For Academic Office Use                  Possible Academic Infringement (please tick as appropriate)
+          Date      Days     Late                       ☐ Category A
+        Received     late  Penalty                                                        Total Academic Infringement Penalty
+                                                        ☐ Category B                       (A,B, C, D, E, Please modify where
+                                                                                          necessary) _____________________
+                                                        ☐ Category C
+                                                        ☐ Category D
+                                                        ☐ Category E
+                                              DTS304TC Machine Learning
+                                            Coursework - Assessment Task 1
+•     Percentage in final mark: 50%
+•     Assessment type: individual coursework
+•     Submission files: one Jupyter notebook (.ipynb), one Coursework Answer Sheet / Theory and Reflection PDF, and one
+      hidden-test CSV
+
+    Learning outcomes assessed
+•     A. Demonstrate a solid understanding of the theoretical issues related to problems that machine-learning methods try to
+      address.
+•     B. Demonstrate understanding of the properties of existing machine-learning algorithms and how they behave on practical data.
+
+
+
+    Notes
+•     Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may
+      result in a loss of marks.
+•     The formal procedure for submitting coursework at XJTLU is strictly followed. Submission link on Learning Mall will be provided
+      in due course. The submission timestamp on Learning Mall will be used to check late submission.
+•     5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the
+      submission date, up to a maximum of five working days.
+•     All modelling work must be completed individually. Discussion of general ideas is allowed, but code, experiments, and
+      notebooks must be independently developed.
+•     You may not use ChatGPT to directly generate answers for the coursework. High-scoring work must demonstrate your own
+      experimental design, controlled comparisons, failure analysis, and image-level interpretation. ChatGPT or similar tools may be
+      used only in a limited support role such as code understanding, debugging, or grammar support. They must not replace your
+      method design, ablation logic, qualitative analysis, or reflection. Generic AI-produced descriptions without matching evidence in
+      code, tables, figures, and discussion will not receive high marks.
+•     If you use AI tools or outside code in any meaningful way, you must fully understand, verify, and take ownership of every
+      method, number, figure, and written claim that appears in your submission.
+
+
+
+     Question 1: Notebook-Based Coding Exercise - Insurance Premium-Risk Classification (60
+     Marks)
+    In this coursework you will build and improve a multiclass classifier for a fictionalised health-insurance dataset. The task is
+    to predict whether each applicant belongs to a Low, Standard, or High premium-risk group before pricing a policy. The
+    dataset is intentionally realistic: it mixes numerical and categorical variables, contains missing values and dirty entries, and
+    includes some fields that require careful handling to avoid weak modelling practice or label leakage.
+    Your work should show a clear machine-learning workflow: build a sensible first pipeline, compare model families, apply
+    stronger hyperparameter optimisation, complete one compulsory improvement category plus at least one optional category,
+    carry out a compact K-Means/Gaussian Mixture Model (GMM) exploration, and then produce a hidden-test CSV using
+    validation evidence only.
+    The prediction target variable is ‘premium_risk’, and it has 3 imbalanced classes: Standard, High, Low. The dataset
+    contains 33 raw columns: admin/PII columns, synthetic noise features, 1 leakage feature, and genuine predictors.
+    Unless otherwise stated, macro-F1 is the primary validation metric because the dataset is imbalanced; accuracy is reported
+    as a secondary metric.
+    (A) Clean First Pipeline and Baseline Modelling (8 marks)
+•     Load the provided training and validation files and define a consistent target / feature setup.
+•     Handle leakage features, dirty values, missing values, and categorical variables sensibly. A compact sanity check is enough; a
+      long data-audit section is not required.
+      Important: The dataset contains a leakage feature. You must identify and remove it before proceeding to the next stage
+      of analysis; otherwise, the classification results will be severely biased by this leakage and will not be meaningful. If
+      this occurs, multiple parts of your Coursework 1 may be affected, which could significantly impact your marks.
+•     Build one baseline modelling pipeline.
+•     Report at least one validation result using accuracy and macro-F1 score and include a confusion matrix for the baseline model.
+•     Keep preprocessing consistent across train, validation, and hidden-test files.
+
+
+    (B) Controlled Comparison: Random Forest and One Boosting Model (8 marks)
+•     Using the same preprocessing pipeline, validation split, and evaluation metric (primary metric is macro-F1 also report accuracy),
+      carry out an initial controlled comparison between one Random Forest model and one boosting model.
+•     Default XGBoost is recommended because it provides a richer tuning space later, but others may also be used. Default settings
+      or only light sensible adjustments are acceptable in this section.
+•     In the notebook, report the validation result of each model and support the comparison with one or two additional analyses, such
+      as class-wise metrics, a confusion matrix, train-versus-validation behaviour, or stability / sensitivity after tuning.
+•     Your goal is not to prove that one model type always wins. Your goal is to compare the two models fairly, explain the high-level
+      learning difference between bagging and boosting, and use your own notebook evidence to give a careful, dataset-specific
+      interpretation. A generic textbook answer without reference to your own results will receive limited credit.
+    (C) Advanced Hyperparameter Optimisation (12 marks)
+•     At least one main model should be tuned with a genuinely advanced strategy such as Optuna/TPE, Bayesian optimisation,
+      Hyperopt, Ray Tune, or another comparably strong approach.
+•     Hyperparameter tuning should optimise macro-F1 score on the validation set, and the final tuned result should be reported
+      using both accuracy and macro-F1.
+•     RandomizedSearchCV alone is normally not enough for the top band.
+•     Explain briefly why your search space and optimiser are reasonable for the chosen model.
+    (D) Personalised Improvement Work (18 marks)
+    You must complete one compulsory category based on the last digit of your XJTLU student ID, plus at least one additional
+    optional category of your choice. A second optional category is recommended for stronger differentiation but is not compulsory.
+    You should report accuracy and macro-F1 for improved models and include class-wise metrics where helpful. A compact ablation
+    table should normally be included in the notebook for the personalized improvement work
+
+     Last digit                                                    Compulsory category
+                                0-1                                Category A - Data quality and missingness
+                                2-3                                Category B - Feature representation and engineering
+                                4-5                                Category C - Imbalance and objective design
+                                6-7                                Category D - Model robustness, calibration, or ensembling
+                                8-9                                Category E - Fairness, diagnostics, or interpretability
+                  Category                     Examples of what may be done                     What good evidence looks like
+                                             better missing-value strategy;              A concise before/after comparison with a short
+     A                                       MissForest or iterative imputation;         explanation of why the data handling changed the
+                                             sensible outlier handling; value cleaning   result
+                                             feature crosses; grouped categories;
+                                                                                         A compact ablation showing what representation
+     B                                       alternative encodings; modest feature
+                                                                                         changed and whether it helped
+                                             selection; transformations
+                                             class weighting; focal-style loss if
+                                                                                         Clear evidence of how minority or harder classes
+     C                                       relevant; sampling / resampling;
+                                                                                         changed, even if overall score moved only slightly
+                                             thresholding logic
+                                             bagging/boosting variants; calibration
+                                                                                         A meaningful diagnostic or comparison rather
+     D                                       checks; soft voting; stacking;
+                                                                                         than a large collection of loosely connected trials
+                                             robustness checks
+                                             SHAP / feature importance; subgroup-
+                                                                                         Concrete insight into model behaviour, not only
+     E                                       style fairness checks; error analysis;
+                                                                                         screenshots
+                                             model interpretation
+    (E) K-Means and Gaussian Mixture Model (GMM) Exploration (6 marks)
+    This is a compact exploratory section. It is not the main performance section, and it does not require clusters to match the class
+    labels exactly. The aim is to show your understanding of unsupervised learning methods and your ability to interpret their results
+    carefully.
+•     Use a sensible processed numeric feature space and briefly explain what you clustered on.
+•     Explore a small range of cluster/component numbers, such as 2-8.
+•     For K-Means, provide sensible supporting evidence, such as inertia (SSE), cluster sizes, or another simple analysis..
+•     For Gaussian Mixture Model (GMM), provide sensible supporting evidence, such as component sizes, posterior
+      confidence/responsibility, or overlap/uncertainty between components.
+•     Include at least one compact table or figure comparing K-Means and GMM.
+•     If class labels are used for reference, explain clearly that unsupervised structure does not need to align exactly with supervised
+      labels
+•     Stronger work may additionally use silhouette score, log-likelihood trends, or a simple visualization.
+
+
+    (F) Final Model Choice and Hidden-Test Export (8 marks)
+•     Choose the final model using validation evidence only.
+•     Retrain appropriately using both train and validation dataset and generate the hidden-test CSV in the required format.
+•     Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
+      column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
+      Low).
+      Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4 marks
+      from this section.
+•     Do not tune on the hidden test and do not claim hidden test performance.
+•     Note: Hidden test score contributes only a small portion of the final marks. High leaderboard rank alone cannot compensate for
+      weak experimental design or poor documentation.
+
+
+     Coursework Answer Sheet / Theory and Reflection (PDF) - all questions below are compulsory
+     (30 Marks)
+    The Coursework Answer Sheet / Theory and Reflection PDF should not repeat the notebook section by section. All prompt areas
+    below are compulsory. The PDF must be concise, directly linked to your own notebook evidence, and no longer than 4 pages /
+    1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from the PDF section. You should aim to
+    demonstrate both your theoretical or algorithmic understanding and your experimental findings or practical observations and
+    clearly link your understanding of the algorithms to your experimental analysis. At least one table, figure, or metric from the
+    notebook must be referenced in each theory answer.
+
+                            Prompt area                                                       What you should do
+                                                                     (1) Briefly state the definitions and key theoretical properties of bagging
+                                                                     and boosting models;
+                                                                     (2) report the validation results of each model;
+                                                                     (3) support your comparison with one or two additional analyses, such as
+                                                                     class-wise metrics, a confusion matrix, train–validation behaviour, or
+     1. Bagging versus boosting                                      stability/sensitivity after tuning; and
+                                                                     (4) provide a careful interpretation of what this comparison suggests
+                                                                     about this dataset and how it relates to the theoretical properties of
+                                                                     bagging versus boosting methods.
+                                                                     You are not expected to prove that one model type always performs
+                                                                     better.
+                                                                     Explain why your optimiser and search space were reasonable for the
+                                                                     chosen model, which hyperparameters you expected to matter most,
+     2. Hyperparameter optimisation
+                                                                     whether the tuned results matched that intuition, and what you learned
+                                                                     from the tuning process.
+                                                                     Explain hard versus soft assignment and the main assumption difference
+                                                                     between K-Means and GMM. Then use your own compact evidence to
+     3. K-Means versus Gaussian Mixture Model (GMM)                  discuss whether the results matched your intuition and whether GMM
+                                                                     revealed anything extra, such as soft membership, uncertainty, or a
+                                                                     better fit to partial cluster structure.
+                                                                     Reflect on the compulsory category and on every optional category you
+                                                                     implemented. Highlight any unique or interesting algorithm or strategy
+     4. Personalised reflection                                      you tried, the personal challenges you faced, the effort you made to
+                                                                     address them, and the key lessons you learned. Honest reflection on a
+                                                                     neutral or negative result is acceptable if the reasoning is concrete.
+                                                                     State briefly what forms of AI assistance, if any, were used. Generic AI-
+     5. AI-use declaration                                           written theory that does not match your notebook evidence will receive
+                                                                     limited credit.
+
+
+
+    Coding Quality, Coursework Answer Sheet Quality, and Submission Guidelines (10 marks)
+
+•     Submit your Jupyter Notebook in .ipynb format. It must be well organised, include clear commentary and clean code practices,
+      and show visible outputs. Do not write a second mini-report repeating notebook content.
+      •    The notebook should be reproducible from start to finish without errors. Results cited in the PDF should be visible in the
+           notebook and should match the reported values.
+      •    If you used supplementary code outside the notebook, submit that code as well so the full workflow remains reproducible.
+•     Submit the hidden-test results as test_result_[your_student_id].csv. The first column must contain applicant_id, the second
+      column must contain customer_key, and the third column must contain the predicted premium_risk labels (Standard, High,
+      Low). Incorrect file naming or CSV formatting may prevent automated scoring and will result in an automatic deduction of 4
+      marks from this section.
+•     Submit the Coursework Answer Sheet / Theory and Reflection in PDF format. All questions in that section are compulsory. The
+      Coursework Answer Sheet / Theory and Reflection PDF must answer every required prompt, refer to your own notebook
+      evidence, and remain within 4 pages and 1,200 words in total. Exceeding either limit will incur a fixed deduction of 5 marks from
+      the PDF section.
+•     Include all required components: Jupyter notebooks (code), any additional experimental scripts or custom code, the hidden
+      test-results CSV file, and the Coursework Answer Sheet PDF. Submit all files through the Learning Mall platform. After
+      submission, download your files to verify that they can be opened and viewed correctly to ensure the submission was
+      successful.
+
+    Project Material Access Instructions
+
+    To access the complete set of materials for this project, please use the links below:
+
+        •    OneDrive Link:
+             https://1drv.ms/f/c/18f09d1a39585f84/IgCXDMbXkFYSSZUZkkTyXyZzAQ1poX9mujUqF8N3JlL0GD0?e=uNhAHq
+        •    The same coursework materials have also been uploaded to Learning Mall.
+    When extracting the materials, use the following password to unlock the zip file: DTS304TC (case-sensitive, enter in
+    uppercase).