Files
rl-atari/强化学习个人项目报告
Serendipity 7dea00195e feat: 添加并行训练脚本和奖励塑形以改进PPO性能
引入并行环境训练脚本 train_parallel_improved.py,实现多进程并行数据收集
添加奖励塑形包装器,根据速度、赛道位置和完成圈数调整奖励信号
优化神经网络结构和训练参数,包括更大的rollout缓冲区
删除旧的tensorboard日志文件,创建新的训练运行记录
2026-05-01 09:26:39 +08:00
..

PPO for CarRacing-v3

From-scratch PPO implementation for CarRacing-v3. No Stable-Baselines or other RL libraries used.

Setup

conda activate my_env
uv pip install -r requirements.txt

Train

python train.py --steps 500000

Evaluate

python src/evaluate.py --model models/ppo_carracing_final.pt --episodes 10

TensorBoard

tensorboard --logdir logs/tensorboard

Project Structure

src/
├── network.py       # Actor (Gaussian policy) and Critic (Value) networks
├── replay_buffer.py  # Rollout buffer with GAE computation
├── trainer.py        # PPO update with clipped surrogate objective
├── utils.py          # Environment wrappers (grayscale, resize, frame stack)
└── evaluate.py       # Evaluation script
train.py              # Main training entry point
models/               # Saved checkpoints
logs/tensorboard/     # TensorBoard logs

Hyperparameters

Parameter Value
Learning rate 3e-4
Gamma 0.99
GAE lambda 0.95
Clip epsilon 0.2
PPO epochs 4
Mini-batch size 64
Rollout steps 2048
Entropy coefficient 0.01
Value coefficient 0.5
Max gradient norm 0.5