feat: 重构项目结构并添加向量化PPO训练与评估脚本
- 将原始单环境训练代码重构为模块化结构,添加向量化环境支持以提高数据采集效率 - 实现完整的PPO训练流水线,包括共享CNN的Actor-Critic网络、向量化经验回放缓冲和GAE优势估计 - 添加训练脚本(train_vec.py)、评估脚本(evaluate.py)和SB3基线对比脚本(train_sb3_baseline.py) - 提供详细的文档和开发日志,包含问题解决记录和实验分析 - 移除旧版项目文件,统一项目结构到CW1_id_name目录下
This commit is contained in:
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"checkpoint": "D:\\projects\\CW1_xxx\\models\\vec_main_v3\\iter_0700.pt",
|
||||
"n_episodes": 20,
|
||||
"seed_start": 1000,
|
||||
"deterministic": false,
|
||||
"mean": 830.1724279409364,
|
||||
"std": 104.79337276485252,
|
||||
"min": 436.8098159509071,
|
||||
"max": 914.8999999999849,
|
||||
"returns": [
|
||||
859.0443686006632,
|
||||
839.1025641025492,
|
||||
707.2727272727101,
|
||||
873.3333333333223,
|
||||
914.8999999999849,
|
||||
436.8098159509071,
|
||||
874.9999999999827,
|
||||
874.1100323624435,
|
||||
871.5189873417628,
|
||||
888.8888888888717,
|
||||
891.0714285714159,
|
||||
863.5761589403863,
|
||||
852.7027027026837,
|
||||
776.0107816711404,
|
||||
859.4594594594402,
|
||||
883.6601307189337,
|
||||
890.2912621359064,
|
||||
724.101706484623,
|
||||
830.0291545189361,
|
||||
892.5650557620664
|
||||
]
|
||||
}
|
||||
Reference in New Issue
Block a user