Files
rl-atari/强化学习个人项目报告
Serendipity cdec40a7c7 docs: 添加课程资料与更新作业要求文档
添加 Lecture4-6 课件 PDF 与课程作业要求 PDF 至文档目录
更新外教课原文要求,仅保留 Atari 游戏项目要求
将完整作业要求文档移动至强化学习项目报告目录
2026-05-01 09:47:09 +08:00
..

PPO for CarRacing-v3

From-scratch PPO implementation for CarRacing-v3. No Stable-Baselines or other RL libraries used.

Setup

conda activate my_env
uv pip install -r requirements.txt

Train

python train.py --steps 500000

Evaluate

python src/evaluate.py --model models/ppo_carracing_final.pt --episodes 10

TensorBoard

tensorboard --logdir logs/tensorboard

Project Structure

src/
├── network.py       # Actor (Gaussian policy) and Critic (Value) networks
├── replay_buffer.py  # Rollout buffer with GAE computation
├── trainer.py        # PPO update with clipped surrogate objective
├── utils.py          # Environment wrappers (grayscale, resize, frame stack)
└── evaluate.py       # Evaluation script
train.py              # Main training entry point
models/               # Saved checkpoints
logs/tensorboard/     # TensorBoard logs

Hyperparameters

Parameter Value
Learning rate 3e-4
Gamma 0.99
GAE lambda 0.95
Clip epsilon 0.2
PPO epochs 4
Mini-batch size 64
Rollout steps 2048
Entropy coefficient 0.01
Value coefficient 0.5
Max gradient norm 0.5