Files
rl-atari/强化学习个人项目报告/README.md
T
Serendipity d353133b31 feat: 添加强化学习项目报告及重构课程作业报告代码结构
- 新增强化学习个人项目报告,包含基于PyTorch从零实现的PPO算法
- 重构课程作业报告代码结构,提取运行时路径管理和notebook执行逻辑到独立模块
- 更新依赖文件requirements.txt,添加强化学习相关依赖
- 简化模型比较结果表格,仅保留基线逻辑回归模型数据
2026-04-30 16:54:41 +08:00

57 lines
1.2 KiB
Markdown

# PPO for CarRacing-v3
From-scratch PPO implementation for CarRacing-v3. No Stable-Baselines or other RL libraries used.
## Setup
```bash
conda activate my_env
uv pip install -r requirements.txt
```
## Train
```bash
python train.py --steps 500000
```
## Evaluate
```bash
python src/evaluate.py --model models/ppo_carracing_final.pt --episodes 10
```
## TensorBoard
```bash
tensorboard --logdir logs/tensorboard
```
## Project Structure
```
src/
├── network.py # Actor (Gaussian policy) and Critic (Value) networks
├── replay_buffer.py # Rollout buffer with GAE computation
├── trainer.py # PPO update with clipped surrogate objective
├── utils.py # Environment wrappers (grayscale, resize, frame stack)
└── evaluate.py # Evaluation script
train.py # Main training entry point
models/ # Saved checkpoints
logs/tensorboard/ # TensorBoard logs
```
## Hyperparameters
| Parameter | Value |
|-----------|-------|
| Learning rate | 3e-4 |
| Gamma | 0.99 |
| GAE lambda | 0.95 |
| Clip epsilon | 0.2 |
| PPO epochs | 4 |
| Mini-batch size | 64 |
| Rollout steps | 2048 |
| Entropy coefficient | 0.01 |
| Value coefficient | 0.5 |
| Max gradient norm | 0.5 |