Files
rl-atari/强化学习个人项目报告
Serendipity 8f04be4617 chore: 添加 .venv 到 .gitignore 并忽略二进制日志文件
- 在 .gitignore 文件中添加 .venv/ 目录,以排除 Python 虚拟环境
- 忽略 TensorBoard 生成的二进制日志文件,避免将运行时数据提交到仓库
2026-05-01 09:28:36 +08:00
..

PPO for CarRacing-v3

From-scratch PPO implementation for CarRacing-v3. No Stable-Baselines or other RL libraries used.

Setup

conda activate my_env
uv pip install -r requirements.txt

Train

python train.py --steps 500000

Evaluate

python src/evaluate.py --model models/ppo_carracing_final.pt --episodes 10

TensorBoard

tensorboard --logdir logs/tensorboard

Project Structure

src/
├── network.py       # Actor (Gaussian policy) and Critic (Value) networks
├── replay_buffer.py  # Rollout buffer with GAE computation
├── trainer.py        # PPO update with clipped surrogate objective
├── utils.py          # Environment wrappers (grayscale, resize, frame stack)
└── evaluate.py       # Evaluation script
train.py              # Main training entry point
models/               # Saved checkpoints
logs/tensorboard/     # TensorBoard logs

Hyperparameters

Parameter Value
Learning rate 3e-4
Gamma 0.99
GAE lambda 0.95
Clip epsilon 0.2
PPO epochs 4
Mini-batch size 64
Rollout steps 2048
Entropy coefficient 0.01
Value coefficient 0.5
Max gradient norm 0.5