fb09e66d09
- 将原始单环境训练代码重构为模块化结构,添加向量化环境支持以提高数据采集效率 - 实现完整的PPO训练流水线,包括共享CNN的Actor-Critic网络、向量化经验回放缓冲和GAE优势估计 - 添加训练脚本(train_vec.py)、评估脚本(evaluate.py)和SB3基线对比脚本(train_sb3_baseline.py) - 提供详细的文档和开发日志,包含问题解决记录和实验分析 - 移除旧版项目文件,统一项目结构到CW1_id_name目录下
56 lines
2.4 KiB
Markdown
56 lines
2.4 KiB
Markdown
# docs/ Index
|
|
|
|
Documentation and report artefacts for the DTS307TC PPO coursework.
|
|
|
|
## Final deliverables
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `CW1_REPORT_TEMPLATE.docx` | Pre-formatted Word source. IEEE style (11pt Times New Roman, 1.15 spacing, 2.5cm margins). All numbers, figures, and native equations embedded. The student fills in cover-page details and exports to PDF. |
|
|
| `generate_report_template.py` | Source script that produces the template. |
|
|
|
|
**Word count** (excluding References and Appendix): 2972 / 3000.
|
|
|
|
## Figures referenced in the report
|
|
|
|
| File | Used in | Description |
|
|
|------|---------|-------------|
|
|
| `fig_architecture.png` | Fig. 1 | Shared-CNN actor-critic architecture (1.69M params) |
|
|
| `fig_training_curves.png` | Fig. 2 | 6-panel training curves over 1.5M steps |
|
|
| `fig_eval_bar.png` | Fig. 3 | Per-episode evaluation returns on 20 unseen seeds |
|
|
| `fig_sb3_comparison.png` | Fig. 4 | Ours vs SB3 baseline diagnostics overlay |
|
|
| `demo.mp4` | Submitted alongside the zip | 25-second video of the trained agent on seed 117 (return 925.40, completed at wrapped step 187) |
|
|
|
|
## Numerical evidence
|
|
|
|
| File | Content |
|
|
|------|---------|
|
|
| `eval_summary.json` | 20-episode evaluation of `models/ppo_final.pt`. Mean 830.17 ± 104.79; min 436.81; max 914.90 |
|
|
| `eval_summary_sb3.json` | 20-episode evaluation of the SB3 baseline. Mean 664.32 ± 173.93; min 309.40; max 857.14 |
|
|
| `checkpoint_scan_vec_main_v3.json` | Per-checkpoint evaluation table; basis for selecting `iter_0700.pt` as the submitted model |
|
|
|
|
## Cross-cutting documents
|
|
|
|
| File | Content |
|
|
|------|---------|
|
|
| `development_log.md` | Step-by-step development timeline (Days 1-9) |
|
|
| `issues_and_fixes.md` | Three substantive engineering challenges resolved + three documented negative-result ablations (raw material for Section 3.4 and 4.4) |
|
|
| `submission_checklist.md` | Pre-submission verification checklist |
|
|
| `INDEX.md` | This file |
|
|
|
|
## Project state at submission
|
|
|
|
```
|
|
runs/ vec_main_v3/ main 1.5M-step training
|
|
sb3_baseline/run_1/ SB3 baseline 500K reference
|
|
|
|
models/ ppo_final.pt submitted agent (= iter_0700.pt selected
|
|
by held-out checkpoint scanning)
|
|
vec_main_v3/final.pt training-end backup
|
|
sb3_baseline/final.zip SB3 reference
|
|
|
|
src/ eight Python modules, no SB3 imports
|
|
notebooks/ three development notebooks (env exploration, network sanity,
|
|
evaluation)
|
|
```
|