rl-atari/CW1_id_name/README.md at main

Files

T

Serendipity fb09e66d09 feat: 重构项目结构并添加向量化PPO训练与评估脚本

- 将原始单环境训练代码重构为模块化结构，添加向量化环境支持以提高数据采集效率
- 实现完整的PPO训练流水线，包括共享CNN的Actor-Critic网络、向量化经验回放缓冲和GAE优势估计
- 添加训练脚本(train_vec.py)、评估脚本(evaluate.py)和SB3基线对比脚本(train_sb3_baseline.py)
- 提供详细的文档和开发日志，包含问题解决记录和实验分析
- 移除旧版项目文件，统一项目结构到CW1_id_name目录下

2026-05-02 13:44:08 +08:00

10 KiB

Raw Permalink History

#� �C�W�1� � �P�P�O� �o�n� �C�a�r�R�a�c�i�n�g�-�v�3� � �X�J�T�L�U� �D�T�S�3�0�7�T�C� �R�e�i�n�f�o�r�c�e�m�e�n�t� �L�e�a�r�n�i�n�g� �C�o�u�r�s�e�w�o�r�k� �1�.� � �A� �f�r�o�m�-�s�c�r�a�t�c�h� �P�y�T�o�r�c�h� �i�m�p�l�e�m�e�n�t�a�t�i�o�n� �o�f� �P�r�o�x�i�m�a�l� �P�o�l�i�c�y� �O�p�t�i�m�i�z�a�t�i�o�n� �(�P�P�O�)� �t�h�a�t� �l�e�a�r�n�s� �t�o� �p�l�a�y� �t�h�e� �G�y�m�n�a�s�i�u�m� ��C�a�r�R�a�c�i�n�g�-�v�3�� e�n�v�i�r�o�n�m�e�n�t� �u�s�i�n�g� �a� �d�i�s�c�r�e�t�e� �5�-�a�c�t�i�o�n� �s�p�a�c�e�.� �S�t�a�b�l�e�-�B�a�s�e�l�i�n�e�s�3� �i�s� ��n�o�t�� u�s�e�d� �i�n� �t�h�e� �m�a�i�n� �i�m�p�l�e�m�e�n�t�a�t�i�o�n� �(�o�n�l�y� �a�s� �a�n� �e�x�t�e�r�n�a�l� �b�a�s�e�l�i�n�e� �f�o�r� �t�h�e� �c�o�m�p�a�r�i�s�o�n� �p�l�o�t�)�.� � �#�#� �A�u�t�h�o�r� �-� �N�a�m�e�:� ��<�y�o�u�r� �n�a�m�e�>�� �-� �S�t�u�d�e�n�t� �I�D�:� ��<�y�o�u�r� �I�D�>�� � �#�#� �E�n�v�i�r�o�n�m�e�n�t� �-� �P�y�t�h�o�n� �3�.�9� �/� �3�.�1�0� �-� �P�y�T�o�r�c�h� �2�.�7�.�0� �(�C�U�D�A� �1�2�.�8�)� �-� �T�e�s�t�e�d� �o�n� �N�V�I�D�I�A� �R�T�X� �4�0�6�0� �L�a�p�t�o�p� �G�P�U� �(�8�G�B�)� �-� �W�i�n�d�o�w�s� �1�1� �(�L�i�n�u�x�/�m�a�c�O�S� �u�n�t�e�s�t�e�d� �b�u�t� �s�h�o�u�l�d� �w�o�r�k�)� � �#�#� �S�e�t�u�p� � ����p�o�w�e�r�s�h�e�l�l� �p�i�p� �i�n�s�t�a�l�l� �-�r� �r�e�q�u�i�r�e�m�e�n�t�s�.�t�x�t� ���� � �I�f� ��b�o�x�2�d�-�p�y�� �f�a�i�l�s� �t�o� �c�o�m�p�i�l�e� �o�n� �W�i�n�d�o�w�s�:� ����p�o�w�e�r�s�h�e�l�l� �p�i�p� �i�n�s�t�a�l�l� �s�w�i�g� �p�i�p� �i�n�s�t�a�l�l� �B�o�x�2�D� �p�i�p� �i�n�s�t�a�l�l� �g�y�m�n�a�s�i�u�m� ���� � �#�#� �P�r�o�j�e�c�t� �S�t�r�u�c�t�u�r�e� � ���� C�W�1�_�x�x�x�/� �%�%�% �R�E�A�D�M�E�.�m�d� �%�%�% �r�e�q�u�i�r�e�m�e�n�t�s�.�t�x�t� �%�%�% �t�r�a�i�n�.�p�y� � � � � � � � � � � � � � � � � � � � � � � �S�i�n�g�l�e�-�e�n�v� �t�r�a�i�n�i�n�g� �e�n�t�r�y� �(�l�e�g�a�c�y�)� �%�%�% �t�r�a�i�n�_�v�e�c�.�p�y� � � � � � � � � � � � � � � � � � � �V�e�c�t�o�r�i�s�e�d�-�e�n�v� �t�r�a�i�n�i�n�g� �e�n�t�r�y� �(�r�e�c�o�m�m�e�n�d�e�d�)� �%�%�% �t�r�a�i�n�_�s�b�3�_�b�a�s�e�l�i�n�e�.�p�y� � � � � � � � � � �S�B�3� �P�P�O� �b�a�s�e�l�i�n�e� �f�o�r� �c�o�m�p�a�r�i�s�o�n� �o�n�l�y� �%�%�% �e�v�a�l�u�a�t�e�.�p�y� � � � � � � � � � � � � � � � � � � � �C�L�I� �e�v�a�l�u�a�t�i�o�n�:� �r�e�t�u�r�n�s� �+� �p�l�o�t�s� �+� �v�i�d�e�o� �%�%�% �s�r�c�/� �% � � �%�%�% �e�n�v�_�w�r�a�p�p�e�r�s�.�p�y� � � � � � � � � � � � �S�k�i�p�F�r�a�m�e�,� �G�r�a�y�S�c�a�l�e�R�e�s�i�z�e�,� �F�r�a�m�e�S�t�a�c�k� �% � � �%�%�% �v�e�c�_�e�n�v�_�w�r�a�p�p�e�r�s�.�p�y� � � � � � � � �V�e�c�t�o�r�i�s�e�d� �e�n�v� �f�a�c�t�o�r�y� �% � � �%�%�% �n�e�t�w�o�r�k�s�.�p�y� � � � � � � � � � � � � � � � �S�h�a�r�e�d�-�C�N�N� �A�c�t�o�r�C�r�i�t�i�c� �% � � �%�%�% �r�o�l�l�o�u�t�_�b�u�f�f�e�r�.�p�y� � � � � � � � � � �S�i�n�g�l�e�-�e�n�v� �r�o�l�l�o�u�t� �b�u�f�f�e�r� �+� �G�A�E� �% � � �%�%�% �v�e�c�_�r�o�l�l�o�u�t�_�b�u�f�f�e�r�.�p�y� � � � � � �V�e�c�t�o�r�i�s�e�d� �r�o�l�l�o�u�t� �b�u�f�f�e�r� �+� �G�A�E� �% � � �%�%�% �p�p�o�_�a�g�e�n�t�.�p�y� � � � � � � � � � � � � � � �P�P�O�-�C�l�i�p� �a�g�e�n�t� �(�a�c�t�,� �u�p�d�a�t�e�,� �s�c�h�e�d�u�l�e�)� �% � � �%�%�% �e�v�a�l�_�u�t�i�l�s�.�p�y� � � � � � � � � � � � � � �E�v�a�l�u�a�t�i�o�n� �/� �p�l�o�t�t�i�n�g� �/� �v�i�d�e�o� �h�e�l�p�e�r�s� �% � � �%�%�% �u�t�i�l�s�.�p�y� � � � � � � � � � � � � � � � � � � �s�e�t�_�s�e�e�d�,� �f�o�r�m�a�t�_�s�e�c�o�n�d�s� �%�%�% �n�o�t�e�b�o�o�k�s�/� �% � � �%�%�% �0�1�_�e�x�p�l�o�r�e�_�e�n�v�.�i�p�y�n�b� � � � � � � �E�n�v�i�r�o�n�m�e�n�t� �e�x�p�l�o�r�a�t�i�o�n� �% � � �%�%�% �0�2�_�t�e�s�t�_�n�e�t�w�o�r�k�.�i�p�y�n�b� � � � � � �N�e�t�w�o�r�k� �s�a�n�i�t�y� �c�h�e�c�k�s� �% � � �%�%�% �0�3�_�t�e�s�t�_�b�u�f�f�e�r�.�i�p�y�n�b� � � � � � � �B�u�f�f�e�r� �+� �G�A�E� �s�a�n�i�t�y� �c�h�e�c�k�s� �% � � �%�%�% �0�4�_�t�e�s�t�_�p�p�o�.�i�p�y�n�b� � � � � � � � � � �P�P�O� �u�p�d�a�t�e� �s�a�n�i�t�y� �c�h�e�c�k�s� �% � � �%�%�% �0�5�_�e�v�a�l�u�a�t�e�.�i�p�y�n�b� � � � � � � � � � �T�r�a�i�n�e�d�-�a�g�e�n�t� �e�v�a�l�u�a�t�i�o�n� �(�t�h�i�n� �w�r�a�p�p�e�r�)� �%�%�% �m�o�d�e�l�s�/� �% � � �%�%�% �v�e�c�_�m�a�i�n�/�f�i�n�a�l�.�p�t� � � � � � � � � � �T�r�a�i�n�e�d� �a�g�e�n�t� �(�s�u�b�m�i�t�t�e�d�)� �%�%�% �r�u�n�s�/� � � � � � � � � � � � � � � � � � � � � � � � � � �T�e�n�s�o�r�B�o�a�r�d� �l�o�g�s� �(�o�n�e� �s�u�b�d�i�r� �p�e�r� �e�x�p�e�r�i�m�e�n�t�)� �%�%�% �d�o�c�s�/� � � � � � � � � � � � � � � � � � � � � � � � � � �P�e�r�-�s�t�e�p� �t�e�c�h�n�i�c�a�l� �r�e�p�o�r�t�s� ���� � �#�#� �T�r�a�i�n�i�n�g� �(�r�e�c�o�m�m�e�n�d�e�d�:� �v�e�c�t�o�r�i�s�e�d�)� � ����p�o�w�e�r�s�h�e�l�l� �#� �5�0�0�K� �s�t�e�p�s�,� �~�2�.�5�h� �o�n� �a� �s�i�n�g�l�e� �R�T�X� �4�0�6�0� �L�a�p�t�o�p� �p�y�t�h�o�n� �t�r�a�i�n�_�v�e�c�.�p�y� �-�-�n�-�e�n�v�s� �4� �-�-�t�o�t�a�l�-�s�t�e�p�s� �5�0�0�0�0�0� �^� � � � � �-�-�r�u�n�-�n�a�m�e� �v�e�c�_�m�a�i�n� �-�-�a�n�n�e�a�l�-�l�r� �-�-�a�n�n�e�a�l�-�e�n�t� �-�-�r�e�w�a�r�d�-�c�l�i�p� �1�.�0� ���� � �K�e�y� �f�l�a�g�s�:� �-� ��-�-�n�-�e�n�v�s� �4��:� �p�a�r�a�l�l�e�l� �e�n�v�i�r�o�n�m�e�n�t�s� �(�A�s�y�n�c� �m�u�l�t�i�-�p�r�o�c�e�s�s�)� �-� ��-�-�a�n�n�e�a�l�-�l�r��:� �l�i�n�e�a�r� �L�R� �d�e�c�a�y� �t�o� �0� �-� ��-�-�a�n�n�e�a�l�-�e�n�t��:� �l�i�n�e�a�r� �e�n�t�r�o�p�y�-�c�o�e�f� �d�e�c�a�y� �t�o� �0� �-� ��-�-�r�e�w�a�r�d�-�c�l�i�p� �1�.�0��:� �f�l�o�o�r� �p�e�r�-�f�r�a�m�e� �r�e�w�a�r�d� �a�t� �-�1�.�0� � �#�#� �S�i�n�g�l�e�-�e�n�v�i�r�o�n�m�e�n�t� �t�r�a�i�n�i�n�g� �(�l�e�g�a�c�y�)� � ����p�o�w�e�r�s�h�e�l�l� �p�y�t�h�o�n� �t�r�a�i�n�.�p�y� �-�-�t�o�t�a�l�-�s�t�e�p�s� �5�0�0�0�0�0� �-�-�r�u�n�-�n�a�m�e� �m�a�i�n� ���� � �#�#� �M�o�n�i�t�o�r�i�n�g� � �I�n� �a� �s�e�p�a�r�a�t�e� �P�o�w�e�r�S�h�e�l�l�:� � ����p�o�w�e�r�s�h�e�l�l� �t�e�n�s�o�r�b�o�a�r�d� �-�-�l�o�g�d�i�r�=�r�u�n�s� �-�-�p�o�r�t�=�6�0�0�6� ���� � �O�p�e�n� �h�t�t�p�:�/�/�l�o�c�a�l�h�o�s�t�:�6�0�0�6� �a�n�d� �t�i�c�k� �w�h�i�c�h�e�v�e�r� �r�u�n�s� �t�o� �c�o�m�p�a�r�e�.� � �#�#� �E�v�a�l�u�a�t�i�o�n� � ����p�o�w�e�r�s�h�e�l�l� �#� �N�u�m�e�r�i�c�a�l� �e�v�a�l� �+� �b�a�r� �c�h�a�r�t� �+� �t�r�a�i�n�i�n�g� �c�u�r�v�e�s� �p�y�t�h�o�n� �e�v�a�l�u�a�t�e�.�p�y� �-�-�c�k�p�t� �m�o�d�e�l�s�/�v�e�c�_�m�a�i�n�/�f�i�n�a�l�.�p�t� � �#� �S�a�m�e� �p�l�u�s� �a� �d�e�m�o� �m�p�4� �t�o� �d�o�c�s�/�d�e�m�o�.�m�p�4� �p�y�t�h�o�n� �e�v�a�l�u�a�t�e�.�p�y� �-�-�c�k�p�t� �m�o�d�e�l�s�/�v�e�c�_�m�a�i�n�/�f�i�n�a�l�.�p�t� �-�-�v�i�d�e�o� � �#� �D�e�t�e�r�m�i�n�i�s�t�i�c�-�p�o�l�i�c�y� �e�v�a�l�u�a�t�i�o�n� �(�a�r�g�m�a�x� �o�v�e�r� �l�o�g�i�t�s�)� �p�y�t�h�o�n� �e�v�a�l�u�a�t�e�.�p�y� �-�-�c�k�p�t� �m�o�d�e�l�s�/�v�e�c�_�m�a�i�n�/�f�i�n�a�l�.�p�t� �-�-�d�e�t�e�r�m�i�n�i�s�t�i�c� ���� � �O�u�t�p�u�t�s� �l�a�n�d� �i�n� ��d�o�c�s�/��:� �-� ��e�v�a�l�_�s�u�m�m�a�r�y�.�j�s�o�n�� �p�e�r�-�e�p�i�s�o�d�e� �r�e�t�u�r�n�s� �+� �m�e�a�n� �/� �s�t�d� �-� ��f�i�g�_�e�v�a�l�_�b�a�r�.�p�n�g�� �b�a�r� �c�h�a�r�t� �o�f� �e�v�a�l�u�a�t�i�o�n� �r�e�t�u�r�n�s� �-� ��f�i�g�_�t�r�a�i�n�i�n�g�_�c�u�r�v�e�s�.�p�n�g�� �6�-�p�a�n�e�l� �t�r�a�i�n�i�n�g� �c�u�r�v�e�s� �(�o�v�e�r�l�a�y�s� �a�v�a�i�l�a�b�l�e� �r�u�n�s�)� �-� ��d�e�m�o�.�m�p�4�� �(�i�f� ��-�-�v�i�d�e�o��)� � �T�h�e� �n�o�t�e�b�o�o�k� ��n�o�t�e�b�o�o�k�s�/�0�5�_�e�v�a�l�u�a�t�e�.�i�p�y�n�b�� �i�s� �a� �t�h�i�n� �w�r�a�p�p�e�r� �a�r�o�u�n�d� �t�h�e� �s�a�m�e� �h�e�l�p�e�r�s� �i�n� ��s�r�c�/�e�v�a�l�_�u�t�i�l�s�.�p�y��.� � �#�#� �S�B�3� �b�a�s�e�l�i�n�e� �(�o�p�t�i�o�n�a�l�,� �f�o�r� �t�h�e� �c�o�m�p�a�r�i�s�o�n� �p�l�o�t�)� � ����p�o�w�e�r�s�h�e�l�l� �p�y�t�h�o�n� �t�r�a�i�n�_�s�b�3�_�b�a�s�e�l�i�n�e�.�p�y� �-�-�t�o�t�a�l�-�s�t�e�p�s� �5�0�0�0�0�0� �-�-�r�u�n�-�n�a�m�e� �s�b�3�_�b�a�s�e�l�i�n�e� ���� � �A�f�t�e�r� �t�h�i�s� �r�u�n� �f�i�n�i�s�h�e�s�,� �r�e�-�r�u�n�n�i�n�g� ��e�v�a�l�u�a�t�e�.�p�y�� �w�i�l�l� �a�u�t�o�m�a�t�i�c�a�l�l�y� �i�n�c�l�u�d�e� �t�h�e� �S�B�3� �c�u�r�v�e� �i�n� ��f�i�g�_�t�r�a�i�n�i�n�g�_�c�u�r�v�e�s�.�p�n�g�� �i�f� ��r�u�n�s�/�s�b�3�_�b�a�s�e�l�i�n�e�� �e�x�i�s�t�s�.� � �#�#� �K�e�y� �h�y�p�e�r�p�a�r�a�m�e�t�e�r�s� �(�v�e�c�_�m�a�i�n� �r�u�n�)� � �|� �P�a�r�a�m� �|� �V�a�l�u�e� �|� �S�o�u�r�c�e� �|� �|�-�-�-�-�-�-�-�|�-�-�-�-�-�-�-�|�-�-�-�-�-�-�-�-�|� �|� �T�o�t�a�l� �s�t�e�p�s� �|� �5�0�0�,�0�0�0� �|� �|� �|� �P�a�r�a�l�l�e�l� �e�n�v�s� �|� �4� �|� �A�s�y�n�c�V�e�c�t�o�r�E�n�v� �|� �|� �R�o�l�l�o�u�t� �p�e�r� �e�n�v� �|� �5�1�2� �|� �t�o�t�a�l� �p�e�r�-�i�t�e�r� �s�a�m�p�l�e�s� �=� �2�0�4�8� �|� �|� �U�p�d�a�t�e� �e�p�o�c�h�s� �|� �1�0� �|� �P�P�O� �p�a�p�e�r� �|� �|� �M�i�n�i�b�a�t�c�h� �|� �6�4� �|� �P�P�O� �A�t�a�r�i� �|� �|� �L�e�a�r�n�i�n�g� �r�a�t�e� �|� �2�.�5�e�-�4� ��! �0� �(�l�i�n�e�a�r�)� �|� �a�n�n�e�a�l�e�d� �|� �|� �A�d�a�m� �e�p�s� �|� �1�e�-�5� �|� �"�3�7� �d�e�t�a�i�l�s�"� �|� �|� �� (�d�i�s�c�o�u�n�t�)� �|� �0�.�9�9� �|� �|� �|� �� (�G�A�E�)� �|� �0�.�9�5� �|� �|� �|� �� (�c�l�i�p�)� �|� �0�.�2� �|� �P�P�O� �p�a�p�e�r� �|� �|� �c�1� �(�v�f�)� �|� �0�.�5� �|� �|� �|� �c�2� �(�e�n�t�)� �|� �0�.�0�1� ��! �0� �(�l�i�n�e�a�r�)� �|� �a�n�n�e�a�l�e�d� �|� �|� �m�a�x�-�g�r�a�d�-�n�o�r�m� �|� �0�.�5� �|� �|� �|� �R�e�w�a�r�d� �f�l�o�o�r� �|� �-�1�.�0� �|� �|� � �#�#� �N�o�t�e�s� �-� ��s�r�c�/�� �c�o�n�t�a�i�n�s� ��n�o� �S�t�a�b�l�e�-�B�a�s�e�l�i�n�e�s�3� �i�m�p�o�r�t�s��.� �S�B�3� �i�s� �r�e�f�e�r�e�n�c�e�d� �o�n�l�y� � � �i�n� ��t�r�a�i�n�_�s�b�3�_�b�a�s�e�l�i�n�e�.�p�y�� �a�n�d� �i�s� �p�u�r�e�l�y� �f�o�r� �t�h�e� �e�x�t�e�r�n�a�l� �c�o�m�p�a�r�i�s�o�n� � � �p�l�o�t� �i�n� �t�h�e� �r�e�p�o�r�t�.� �-� ��t�r�a�i�n�_�v�e�c�.�p�y�� �r�e�q�u�i�r�e�s� �t�h�e� ��i�f� �_�_�n�a�m�e�_�_� �=�=� �"�_�_�m�a�i�n�_�_�"�� �g�u�a�r�d� �a�t� �t�h�e� � � �b�o�t�t�o�m� �(�a�l�r�e�a�d�y� �p�r�e�s�e�n�t�)� �f�o�r� �A�s�y�n�c�V�e�c�t�o�r�E�n�v� �t�o� �w�o�r�k� �o�n� �W�i�n�d�o�w�s�.� �-� �S�e�e� ��d�o�c�s�/�i�s�s�u�e�s�_�a�n�d�_�f�i�x�e�s�.�m�d�� �f�o�r� �a� �l�o�g� �o�f� �p�r�a�c�t�i�c�a�l� �i�s�s�u�e�s� �e�n�c�o�u�n�t�e�r�e�d� � � �d�u�r�i�n�g� �d�e�v�e�l�o�p�m�e�n�t� �a�n�d� �h�o�w� �t�h�e�y� �w�e�r�e� �r�e�s�o�l�v�e�d�.� � �#�#� �L�i�c�e�n�s�e� �&� �a�c�a�d�e�m�i�c� �i�n�t�e�g�r�i�t�y� �T�h�i�s� �i�s� �a�n� �i�n�d�i�v�i�d�u�a�l� �c�o�u�r�s�e�w�o�r�k� �s�u�b�m�i�s�s�i�o�n�.� �A�n�y� �e�x�t�e�r�n�a�l� �c�o�d�e� �(�e�.�g�.� �i�n�s�p�i�r�a�t�i�o�n� �f�r�o�m� �C�l�e�a�n�R�L� �o�r� �P�P�O� �p�a�p�e�r�s�)� �i�s� �r�e�f�e�r�e�n�c�e�d� �i�n� �t�h�e� �r�e�p�o�r�t�.� �N�o� �R�L�-�s�p�e�c�i�f�i�c� �l�i�b�r�a�r�i�e�s� �(�S�t�a�b�l�e�-�B�a�s�e�l�i�n�e�s�3�,� �R�L�L�i�b�,� �T�i�a�n�s�h�o�u�,� �e�t�c�.�)� �a�r�e� �u�s�e�d� �i�n� �t�h�e� �m�a�i�n� ��s�r�c�/�� �i�m�p�l�e�m�e�n�t�a�t�i�o�n�.� �

10 KiB Raw Permalink History

10 KiB

Raw Permalink History