fix(ppo): 修正日志概率维度与状态张量格式

修复 replay buffer 中 log_probs 的维度错误,从 (buffer_size, action_dim) 改为 buffer_size
修正训练时状态张量格式,从 (N, H, W, C) 转换为 (N, C, H, W)
更新 collect_rollout 返回观测值并修正 log_prob 计算
添加项目配置文件和训练曲线生成脚本
This commit is contained in:
2026-04-30 20:30:40 +08:00
parent d353133b31
commit b32490ae03
19 changed files with 185 additions and 22 deletions
Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB