cdec40a7c7
添加 Lecture4-6 课件 PDF 与课程作业要求 PDF 至文档目录 更新外教课原文要求,仅保留 Atari 游戏项目要求 将完整作业要求文档移动至强化学习项目报告目录
PPO for CarRacing-v3
From-scratch PPO implementation for CarRacing-v3. No Stable-Baselines or other RL libraries used.
Setup
conda activate my_env
uv pip install -r requirements.txt
Train
python train.py --steps 500000
Evaluate
python src/evaluate.py --model models/ppo_carracing_final.pt --episodes 10
TensorBoard
tensorboard --logdir logs/tensorboard
Project Structure
src/
├── network.py # Actor (Gaussian policy) and Critic (Value) networks
├── replay_buffer.py # Rollout buffer with GAE computation
├── trainer.py # PPO update with clipped surrogate objective
├── utils.py # Environment wrappers (grayscale, resize, frame stack)
└── evaluate.py # Evaluation script
train.py # Main training entry point
models/ # Saved checkpoints
logs/tensorboard/ # TensorBoard logs
Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 3e-4 |
| Gamma | 0.99 |
| GAE lambda | 0.95 |
| Clip epsilon | 0.2 |
| PPO epochs | 4 |
| Mini-batch size | 64 |
| Rollout steps | 2048 |
| Entropy coefficient | 0.01 |
| Value coefficient | 0.5 |
| Max gradient norm | 0.5 |