fb09e66d09
- 将原始单环境训练代码重构为模块化结构,添加向量化环境支持以提高数据采集效率 - 实现完整的PPO训练流水线,包括共享CNN的Actor-Critic网络、向量化经验回放缓冲和GAE优势估计 - 添加训练脚本(train_vec.py)、评估脚本(evaluate.py)和SB3基线对比脚本(train_sb3_baseline.py) - 提供详细的文档和开发日志,包含问题解决记录和实验分析 - 移除旧版项目文件,统一项目结构到CW1_id_name目录下
29 lines
639 B
JSON
29 lines
639 B
JSON
{
|
|
"model": "SB3 PPO (CnnPolicy) 500K steps",
|
|
"mean": 664.3150926449418,
|
|
"std": 173.92591000802872,
|
|
"min": 309.3959731543487,
|
|
"max": 857.1428571428397,
|
|
"returns": [
|
|
801.0238907849651,
|
|
489.743589743578,
|
|
849.0909090908918,
|
|
769.9999999999883,
|
|
309.3959731543487,
|
|
660.73619631901,
|
|
857.1428571428397,
|
|
734.9514563106644,
|
|
808.2278481012556,
|
|
818.5185185185022,
|
|
596.4285714285587,
|
|
837.0860927152211,
|
|
768.243243243225,
|
|
560.3773584905526,
|
|
714.1891891891725,
|
|
367.32026143789557,
|
|
670.2265372168171,
|
|
432.42320819111006,
|
|
404.37317784255947,
|
|
836.8029739776804
|
|
]
|
|
} |