Files

T

Serendipity fb09e66d09 feat: 重构项目结构并添加向量化PPO训练与评估脚本

- 将原始单环境训练代码重构为模块化结构，添加向量化环境支持以提高数据采集效率
- 实现完整的PPO训练流水线，包括共享CNN的Actor-Critic网络、向量化经验回放缓冲和GAE优势估计
- 添加训练脚本(train_vec.py)、评估脚本(evaluate.py)和SB3基线对比脚本(train_sb3_baseline.py)
- 提供详细的文档和开发日志，包含问题解决记录和实验分析
- 移除旧版项目文件，统一项目结构到CW1_id_name目录下

2026-05-02 13:44:08 +08:00

2.4 KiB

Raw Permalink Blame History

docs/ Index

Documentation and report artefacts for the DTS307TC PPO coursework.

Final deliverables

File	Purpose
`CW1_REPORT_TEMPLATE.docx`	Pre-formatted Word source. IEEE style (11pt Times New Roman, 1.15 spacing, 2.5cm margins). All numbers, figures, and native equations embedded. The student fills in cover-page details and exports to PDF.
`generate_report_template.py`	Source script that produces the template.

Word count (excluding References and Appendix): 2972 / 3000.

Figures referenced in the report

File	Used in	Description
`fig_architecture.png`	Fig. 1	Shared-CNN actor-critic architecture (1.69M params)
`fig_training_curves.png`	Fig. 2	6-panel training curves over 1.5M steps
`fig_eval_bar.png`	Fig. 3	Per-episode evaluation returns on 20 unseen seeds
`fig_sb3_comparison.png`	Fig. 4	Ours vs SB3 baseline diagnostics overlay
`demo.mp4`	Submitted alongside the zip	25-second video of the trained agent on seed 117 (return 925.40, completed at wrapped step 187)

Numerical evidence

File	Content
`eval_summary.json`	20-episode evaluation of `models/ppo_final.pt`. Mean 830.17 ± 104.79; min 436.81; max 914.90
`eval_summary_sb3.json`	20-episode evaluation of the SB3 baseline. Mean 664.32 ± 173.93; min 309.40; max 857.14
`checkpoint_scan_vec_main_v3.json`	Per-checkpoint evaluation table; basis for selecting `iter_0700.pt` as the submitted model

Cross-cutting documents

File	Content
`development_log.md`	Step-by-step development timeline (Days 1-9)
`issues_and_fixes.md`	Three substantive engineering challenges resolved + three documented negative-result ablations (raw material for Section 3.4 and 4.4)
`submission_checklist.md`	Pre-submission verification checklist
`INDEX.md`	This file

Project state at submission

runs/      vec_main_v3/         main 1.5M-step training
           sb3_baseline/run_1/  SB3 baseline 500K reference

models/    ppo_final.pt          submitted agent (= iter_0700.pt selected
                                 by held-out checkpoint scanning)
           vec_main_v3/final.pt  training-end backup
           sb3_baseline/final.zip SB3 reference

src/       eight Python modules, no SB3 imports
notebooks/ three development notebooks (env exploration, network sanity,
           evaluation)

2.4 KiB Raw Permalink Blame History

docs/ Index

Final deliverables

Figures referenced in the report

Numerical evidence

Cross-cutting documents

Project state at submission

2.4 KiB

Raw Permalink Blame History