# PPO for CarRacing-v3 From-scratch PPO implementation for CarRacing-v3. No Stable-Baselines or other RL libraries used. ## Setup ```bash conda activate my_env uv pip install -r requirements.txt ``` ## Train ```bash python train.py --steps 500000 ``` ## Evaluate ```bash python src/evaluate.py --model models/ppo_carracing_final.pt --episodes 10 ``` ## TensorBoard ```bash tensorboard --logdir logs/tensorboard ``` ## Project Structure ``` src/ ├── network.py # Actor (Gaussian policy) and Critic (Value) networks ├── replay_buffer.py # Rollout buffer with GAE computation ├── trainer.py # PPO update with clipped surrogate objective ├── utils.py # Environment wrappers (grayscale, resize, frame stack) └── evaluate.py # Evaluation script train.py # Main training entry point models/ # Saved checkpoints logs/tensorboard/ # TensorBoard logs ``` ## Hyperparameters | Parameter | Value | |-----------|-------| | Learning rate | 3e-4 | | Gamma | 0.99 | | GAE lambda | 0.95 | | Clip epsilon | 0.2 | | PPO epochs | 4 | | Mini-batch size | 64 | | Rollout steps | 2048 | | Entropy coefficient | 0.01 | | Value coefficient | 0.5 | | Max gradient norm | 0.5 |