- 将原始单环境训练代码重构为模块化结构,添加向量化环境支持以提高数据采集效率 - 实现完整的PPO训练流水线,包括共享CNN的Actor-Critic网络、向量化经验回放缓冲和GAE优势估计 - 添加训练脚本(train_vec.py)、评估脚本(evaluate.py)和SB3基线对比脚本(train_sb3_baseline.py) - 提供详细的文档和开发日志,包含问题解决记录和实验分析 - 移除旧版项目文件,统一项目结构到CW1_id_name目录下
10 KiB
#� �C�W�1� � �P�P�O� �o�n� �C�a�r�R�a�c�i�n�g�-�v�3�
�
�X�J�T�L�U� �D�T�S�3�0�7�T�C� �R�e�i�n�f�o�r�c�e�m�e�n�t� �L�e�a�r�n�i�n�g� �C�o�u�r�s�e�w�o�r�k� �1�.�
�
�A� �f�r�o�m�-�s�c�r�a�t�c�h� �P�y�T�o�r�c�h� �i�m�p�l�e�m�e�n�t�a�t�i�o�n� �o�f� �P�r�o�x�i�m�a�l� �P�o�l�i�c�y� �O�p�t�i�m�i�z�a�t�i�o�n� �(�P�P�O�)�
�t�h�a�t� �l�e�a�r�n�s� �t�o� �p�l�a�y� �t�h�e� �G�y�m�n�a�s�i�u�m� ���C�a�r�R�a�c�i�n�g�-�v�3��� �e�n�v�i�r�o�n�m�e�n�t� �u�s�i�n�g� �a�
�d�i�s�c�r�e�t�e� �5�-�a�c�t�i�o�n� �s�p�a�c�e�.� �S�t�a�b�l�e�-�B�a�s�e�l�i�n�e�s�3� �i�s� ���n�o�t��� �u�s�e�d� �i�n� �t�h�e� �m�a�i�n�
�i�m�p�l�e�m�e�n�t�a�t�i�o�n� �(�o�n�l�y� �a�s� �a�n� �e�x�t�e�r�n�a�l� �b�a�s�e�l�i�n�e� �f�o�r� �t�h�e� �c�o�m�p�a�r�i�s�o�n� �p�l�o�t�)�.�
�
�#�#� �A�u�t�h�o�r�
�-� �N�a�m�e�:� ��<�y�o�u�r� �n�a�m�e�>��
�-� �S�t�u�d�e�n�t� �I�D�:� ��<�y�o�u�r� �I�D�>��
�
�#�#� �E�n�v�i�r�o�n�m�e�n�t�
�-� �P�y�t�h�o�n� �3�.�9� �/� �3�.�1�0�
�-� �P�y�T�o�r�c�h� �2�.�7�.�0� �(�C�U�D�A� �1�2�.�8�)�
�-� �T�e�s�t�e�d� �o�n� �N�V�I�D�I�A� �R�T�X� �4�0�6�0� �L�a�p�t�o�p� �G�P�U� �(�8�G�B�)�
�-� �W�i�n�d�o�w�s� �1�1� �(�L�i�n�u�x�/�m�a�c�O�S� �u�n�t�e�s�t�e�d� �b�u�t� �s�h�o�u�l�d� �w�o�r�k�)�
�
�#�#� �S�e�t�u�p�
�
����p�o�w�e�r�s�h�e�l�l� �p�i�p� �i�n�s�t�a�l�l� �-�r� �r�e�q�u�i�r�e�m�e�n�t�s�.�t�x�t� ����
�
�I�f� ��b�o�x�2�d�-�p�y�� �f�a�i�l�s� �t�o� �c�o�m�p�i�l�e� �o�n� �W�i�n�d�o�w�s�:�
����p�o�w�e�r�s�h�e�l�l� �p�i�p� �i�n�s�t�a�l�l� �s�w�i�g� �p�i�p� �i�n�s�t�a�l�l� �B�o�x�2�D� �p�i�p� �i�n�s�t�a�l�l� �g�y�m�n�a�s�i�u�m� ����
�
�#�#� �P�r�o�j�e�c�t� �S�t�r�u�c�t�u�r�e�
�
���� �C�W�1�_�x�x�x�/� �%�%�% �R�E�A�D�M�E�.�m�d� �%�%�% �r�e�q�u�i�r�e�m�e�n�t�s�.�t�x�t� �%�%�% �t�r�a�i�n�.�p�y� � � � � � � � � � � � � � � � � � � � � � � �S�i�n�g�l�e�-�e�n�v� �t�r�a�i�n�i�n�g� �e�n�t�r�y� �(�l�e�g�a�c�y�)� �%�%�% �t�r�a�i�n�_�v�e�c�.�p�y� � � � � � � � � � � � � � � � � � � �V�e�c�t�o�r�i�s�e�d�-�e�n�v� �t�r�a�i�n�i�n�g� �e�n�t�r�y� �(�r�e�c�o�m�m�e�n�d�e�d�)� �%�%�% �t�r�a�i�n�_�s�b�3�_�b�a�s�e�l�i�n�e�.�p�y� � � � � � � � � � �S�B�3� �P�P�O� �b�a�s�e�l�i�n�e� �f�o�r� �c�o�m�p�a�r�i�s�o�n� �o�n�l�y� �%�%�% �e�v�a�l�u�a�t�e�.�p�y� � � � � � � � � � � � � � � � � � � � �C�L�I� �e�v�a�l�u�a�t�i�o�n�:� �r�e�t�u�r�n�s� �+� �p�l�o�t�s� �+� �v�i�d�e�o� �%�%�% �s�r�c�/� �% � � �%�%�% �e�n�v�_�w�r�a�p�p�e�r�s�.�p�y� � � � � � � � � � � � �S�k�i�p�F�r�a�m�e�,� �G�r�a�y�S�c�a�l�e�R�e�s�i�z�e�,� �F�r�a�m�e�S�t�a�c�k� �% � � �%�%�% �v�e�c�_�e�n�v�_�w�r�a�p�p�e�r�s�.�p�y� � � � � � � � �V�e�c�t�o�r�i�s�e�d� �e�n�v� �f�a�c�t�o�r�y� �% � � �%�%�% �n�e�t�w�o�r�k�s�.�p�y� � � � � � � � � � � � � � � � �S�h�a�r�e�d�-�C�N�N� �A�c�t�o�r�C�r�i�t�i�c� �% � � �%�%�% �r�o�l�l�o�u�t�_�b�u�f�f�e�r�.�p�y� � � � � � � � � � �S�i�n�g�l�e�-�e�n�v� �r�o�l�l�o�u�t� �b�u�f�f�e�r� �+� �G�A�E� �% � � �%�%�% �v�e�c�_�r�o�l�l�o�u�t�_�b�u�f�f�e�r�.�p�y� � � � � � �V�e�c�t�o�r�i�s�e�d� �r�o�l�l�o�u�t� �b�u�f�f�e�r� �+� �G�A�E� �% � � �%�%�% �p�p�o�_�a�g�e�n�t�.�p�y� � � � � � � � � � � � � � � �P�P�O�-�C�l�i�p� �a�g�e�n�t� �(�a�c�t�,� �u�p�d�a�t�e�,� �s�c�h�e�d�u�l�e�)� �% � � �%�%�% �e�v�a�l�_�u�t�i�l�s�.�p�y� � � � � � � � � � � � � � �E�v�a�l�u�a�t�i�o�n� �/� �p�l�o�t�t�i�n�g� �/� �v�i�d�e�o� �h�e�l�p�e�r�s� �% � � �%�%�% �u�t�i�l�s�.�p�y� � � � � � � � � � � � � � � � � � � �s�e�t�_�s�e�e�d�,� �f�o�r�m�a�t�_�s�e�c�o�n�d�s� �%�%�% �n�o�t�e�b�o�o�k�s�/� �% � � �%�%�% �0�1�_�e�x�p�l�o�r�e�_�e�n�v�.�i�p�y�n�b� � � � � � � �E�n�v�i�r�o�n�m�e�n�t� �e�x�p�l�o�r�a�t�i�o�n� �% � � �%�%�% �0�2�_�t�e�s�t�_�n�e�t�w�o�r�k�.�i�p�y�n�b� � � � � � �N�e�t�w�o�r�k� �s�a�n�i�t�y� �c�h�e�c�k�s� �% � � �%�%�% �0�3�_�t�e�s�t�_�b�u�f�f�e�r�.�i�p�y�n�b� � � � � � � �B�u�f�f�e�r� �+� �G�A�E� �s�a�n�i�t�y� �c�h�e�c�k�s� �% � � �%�%�% �0�4�_�t�e�s�t�_�p�p�o�.�i�p�y�n�b� � � � � � � � � � �P�P�O� �u�p�d�a�t�e� �s�a�n�i�t�y� �c�h�e�c�k�s� �% � � �%�%�% �0�5�_�e�v�a�l�u�a�t�e�.�i�p�y�n�b� � � � � � � � � � �T�r�a�i�n�e�d�-�a�g�e�n�t� �e�v�a�l�u�a�t�i�o�n� �(�t�h�i�n� �w�r�a�p�p�e�r�)� �%�%�% �m�o�d�e�l�s�/� �% � � �%�%�% �v�e�c�_�m�a�i�n�/�f�i�n�a�l�.�p�t� � � � � � � � � � �T�r�a�i�n�e�d� �a�g�e�n�t� �(�s�u�b�m�i�t�t�e�d�)� �%�%�% �r�u�n�s�/� � � � � � � � � � � � � � � � � � � � � � � � � � �T�e�n�s�o�r�B�o�a�r�d� �l�o�g�s� �(�o�n�e� �s�u�b�d�i�r� �p�e�r� �e�x�p�e�r�i�m�e�n�t�)� �%�%�% �d�o�c�s�/� � � � � � � � � � � � � � � � � � � � � � � � � � �P�e�r�-�s�t�e�p� �t�e�c�h�n�i�c�a�l� �r�e�p�o�r�t�s� ����
�
�#�#� �T�r�a�i�n�i�n�g� �(�r�e�c�o�m�m�e�n�d�e�d�:� �v�e�c�t�o�r�i�s�e�d�)�
�
����p�o�w�e�r�s�h�e�l�l� �#� �5�0�0�K� �s�t�e�p�s�,� �~�2�.�5�h� �o�n� �a� �s�i�n�g�l�e� �R�T�X� �4�0�6�0� �L�a�p�t�o�p� �p�y�t�h�o�n� �t�r�a�i�n�_�v�e�c�.�p�y� �-�-�n�-�e�n�v�s� �4� �-�-�t�o�t�a�l�-�s�t�e�p�s� �5�0�0�0�0�0� �^� � � � � �-�-�r�u�n�-�n�a�m�e� �v�e�c�_�m�a�i�n� �-�-�a�n�n�e�a�l�-�l�r� �-�-�a�n�n�e�a�l�-�e�n�t� �-�-�r�e�w�a�r�d�-�c�l�i�p� �1�.�0� ����
�
�K�e�y� �f�l�a�g�s�:�
�-� ��-�-�n�-�e�n�v�s� �4��:� �p�a�r�a�l�l�e�l� �e�n�v�i�r�o�n�m�e�n�t�s� �(�A�s�y�n�c� �m�u�l�t�i�-�p�r�o�c�e�s�s�)�
�-� ��-�-�a�n�n�e�a�l�-�l�r��:� �l�i�n�e�a�r� �L�R� �d�e�c�a�y� �t�o� �0�
�-� ��-�-�a�n�n�e�a�l�-�e�n�t��:� �l�i�n�e�a�r� �e�n�t�r�o�p�y�-�c�o�e�f� �d�e�c�a�y� �t�o� �0�
�-� ��-�-�r�e�w�a�r�d�-�c�l�i�p� �1�.�0��:� �f�l�o�o�r� �p�e�r�-�f�r�a�m�e� �r�e�w�a�r�d� �a�t� �-�1�.�0�
�
�#�#� �S�i�n�g�l�e�-�e�n�v�i�r�o�n�m�e�n�t� �t�r�a�i�n�i�n�g� �(�l�e�g�a�c�y�)�
�
����p�o�w�e�r�s�h�e�l�l� �p�y�t�h�o�n� �t�r�a�i�n�.�p�y� �-�-�t�o�t�a�l�-�s�t�e�p�s� �5�0�0�0�0�0� �-�-�r�u�n�-�n�a�m�e� �m�a�i�n� ����
�
�#�#� �M�o�n�i�t�o�r�i�n�g�
�
�I�n� �a� �s�e�p�a�r�a�t�e� �P�o�w�e�r�S�h�e�l�l�:�
�
����p�o�w�e�r�s�h�e�l�l� �t�e�n�s�o�r�b�o�a�r�d� �-�-�l�o�g�d�i�r�=�r�u�n�s� �-�-�p�o�r�t�=�6�0�0�6� ����
�
�O�p�e�n� �h�t�t�p�:�/�/�l�o�c�a�l�h�o�s�t�:�6�0�0�6� �a�n�d� �t�i�c�k� �w�h�i�c�h�e�v�e�r� �r�u�n�s� �t�o� �c�o�m�p�a�r�e�.�
�
�#�#� �E�v�a�l�u�a�t�i�o�n�
�
����p�o�w�e�r�s�h�e�l�l� �#� �N�u�m�e�r�i�c�a�l� �e�v�a�l� �+� �b�a�r� �c�h�a�r�t� �+� �t�r�a�i�n�i�n�g� �c�u�r�v�e�s� �p�y�t�h�o�n� �e�v�a�l�u�a�t�e�.�p�y� �-�-�c�k�p�t� �m�o�d�e�l�s�/�v�e�c�_�m�a�i�n�/�f�i�n�a�l�.�p�t� � �#� �S�a�m�e� �p�l�u�s� �a� �d�e�m�o� �m�p�4� �t�o� �d�o�c�s�/�d�e�m�o�.�m�p�4� �p�y�t�h�o�n� �e�v�a�l�u�a�t�e�.�p�y� �-�-�c�k�p�t� �m�o�d�e�l�s�/�v�e�c�_�m�a�i�n�/�f�i�n�a�l�.�p�t� �-�-�v�i�d�e�o� � �#� �D�e�t�e�r�m�i�n�i�s�t�i�c�-�p�o�l�i�c�y� �e�v�a�l�u�a�t�i�o�n� �(�a�r�g�m�a�x� �o�v�e�r� �l�o�g�i�t�s�)� �p�y�t�h�o�n� �e�v�a�l�u�a�t�e�.�p�y� �-�-�c�k�p�t� �m�o�d�e�l�s�/�v�e�c�_�m�a�i�n�/�f�i�n�a�l�.�p�t� �-�-�d�e�t�e�r�m�i�n�i�s�t�i�c� ����
�
�O�u�t�p�u�t�s� �l�a�n�d� �i�n� ��d�o�c�s�/��:�
�-� ��e�v�a�l�_�s�u�m�m�a�r�y�.�j�s�o�n�� �p�e�r�-�e�p�i�s�o�d�e� �r�e�t�u�r�n�s� �+� �m�e�a�n� �/� �s�t�d�
�-� ��f�i�g�_�e�v�a�l�_�b�a�r�.�p�n�g�� �b�a�r� �c�h�a�r�t� �o�f� �e�v�a�l�u�a�t�i�o�n� �r�e�t�u�r�n�s�
�-� ��f�i�g�_�t�r�a�i�n�i�n�g�_�c�u�r�v�e�s�.�p�n�g�� �6�-�p�a�n�e�l� �t�r�a�i�n�i�n�g� �c�u�r�v�e�s� �(�o�v�e�r�l�a�y�s� �a�v�a�i�l�a�b�l�e� �r�u�n�s�)�
�-� ��d�e�m�o�.�m�p�4�� �(�i�f� ��-�-�v�i�d�e�o��)�
�
�T�h�e� �n�o�t�e�b�o�o�k� ��n�o�t�e�b�o�o�k�s�/�0�5�_�e�v�a�l�u�a�t�e�.�i�p�y�n�b�� �i�s� �a� �t�h�i�n� �w�r�a�p�p�e�r� �a�r�o�u�n�d� �t�h�e� �s�a�m�e�
�h�e�l�p�e�r�s� �i�n� ��s�r�c�/�e�v�a�l�_�u�t�i�l�s�.�p�y��.�
�
�#�#� �S�B�3� �b�a�s�e�l�i�n�e� �(�o�p�t�i�o�n�a�l�,� �f�o�r� �t�h�e� �c�o�m�p�a�r�i�s�o�n� �p�l�o�t�)�
�
����p�o�w�e�r�s�h�e�l�l� �p�y�t�h�o�n� �t�r�a�i�n�_�s�b�3�_�b�a�s�e�l�i�n�e�.�p�y� �-�-�t�o�t�a�l�-�s�t�e�p�s� �5�0�0�0�0�0� �-�-�r�u�n�-�n�a�m�e� �s�b�3�_�b�a�s�e�l�i�n�e� ����
�
�A�f�t�e�r� �t�h�i�s� �r�u�n� �f�i�n�i�s�h�e�s�,� �r�e�-�r�u�n�n�i�n�g� ��e�v�a�l�u�a�t�e�.�p�y�� �w�i�l�l� �a�u�t�o�m�a�t�i�c�a�l�l�y�
�i�n�c�l�u�d�e� �t�h�e� �S�B�3� �c�u�r�v�e� �i�n� ��f�i�g�_�t�r�a�i�n�i�n�g�_�c�u�r�v�e�s�.�p�n�g�� �i�f� ��r�u�n�s�/�s�b�3�_�b�a�s�e�l�i�n�e��
�e�x�i�s�t�s�.�
�
�#�#� �K�e�y� �h�y�p�e�r�p�a�r�a�m�e�t�e�r�s� �(�v�e�c�_�m�a�i�n� �r�u�n�)�
�
�|� �P�a�r�a�m� �|� �V�a�l�u�e� �|� �S�o�u�r�c�e� �|�
�|�-�-�-�-�-�-�-�|�-�-�-�-�-�-�-�|�-�-�-�-�-�-�-�-�|�
�|� �T�o�t�a�l� �s�t�e�p�s� �|� �5�0�0�,�0�0�0� �|� �|�
�|� �P�a�r�a�l�l�e�l� �e�n�v�s� �|� �4� �|� �A�s�y�n�c�V�e�c�t�o�r�E�n�v� �|�
�|� �R�o�l�l�o�u�t� �p�e�r� �e�n�v� �|� �5�1�2� �|� �t�o�t�a�l� �p�e�r�-�i�t�e�r� �s�a�m�p�l�e�s� �=� �2�0�4�8� �|�
�|� �U�p�d�a�t�e� �e�p�o�c�h�s� �|� �1�0� �|� �P�P�O� �p�a�p�e�r� �|�
�|� �M�i�n�i�b�a�t�c�h� �|� �6�4� �|� �P�P�O� �A�t�a�r�i� �|�
�|� �L�e�a�r�n�i�n�g� �r�a�t�e� �|� �2�.�5�e�-�4� ��! �0� �(�l�i�n�e�a�r�)� �|� �a�n�n�e�a�l�e�d� �|�
�|� �A�d�a�m� �e�p�s� �|� �1�e�-�5� �|� �"�3�7� �d�e�t�a�i�l�s�"� �|�
�|� �� �(�d�i�s�c�o�u�n�t�)� �|� �0�.�9�9� �|� �|�
�|� �� �(�G�A�E�)� �|� �0�.�9�5� �|� �|�
�|� �� �(�c�l�i�p�)� �|� �0�.�2� �|� �P�P�O� �p�a�p�e�r� �|�
�|� �c�1� �(�v�f�)� �|� �0�.�5� �|� �|�
�|� �c�2� �(�e�n�t�)� �|� �0�.�0�1� ��! �0� �(�l�i�n�e�a�r�)� �|� �a�n�n�e�a�l�e�d� �|�
�|� �m�a�x�-�g�r�a�d�-�n�o�r�m� �|� �0�.�5� �|� �|�
�|� �R�e�w�a�r�d� �f�l�o�o�r� �|� �-�1�.�0� �|� �|�
�
�#�#� �N�o�t�e�s�
�-� ��s�r�c�/�� �c�o�n�t�a�i�n�s� ���n�o� �S�t�a�b�l�e�-�B�a�s�e�l�i�n�e�s�3� �i�m�p�o�r�t�s���.� �S�B�3� �i�s� �r�e�f�e�r�e�n�c�e�d� �o�n�l�y�
� � �i�n� ��t�r�a�i�n�_�s�b�3�_�b�a�s�e�l�i�n�e�.�p�y�� �a�n�d� �i�s� �p�u�r�e�l�y� �f�o�r� �t�h�e� �e�x�t�e�r�n�a�l� �c�o�m�p�a�r�i�s�o�n�
� � �p�l�o�t� �i�n� �t�h�e� �r�e�p�o�r�t�.�
�-� ��t�r�a�i�n�_�v�e�c�.�p�y�� �r�e�q�u�i�r�e�s� �t�h�e� ��i�f� �_�_�n�a�m�e�_�_� �=�=� �"�_�_�m�a�i�n�_�_�"�� �g�u�a�r�d� �a�t� �t�h�e�
� � �b�o�t�t�o�m� �(�a�l�r�e�a�d�y� �p�r�e�s�e�n�t�)� �f�o�r� �A�s�y�n�c�V�e�c�t�o�r�E�n�v� �t�o� �w�o�r�k� �o�n� �W�i�n�d�o�w�s�.�
�-� �S�e�e� ��d�o�c�s�/�i�s�s�u�e�s�_�a�n�d�_�f�i�x�e�s�.�m�d�� �f�o�r� �a� �l�o�g� �o�f� �p�r�a�c�t�i�c�a�l� �i�s�s�u�e�s� �e�n�c�o�u�n�t�e�r�e�d�
� � �d�u�r�i�n�g� �d�e�v�e�l�o�p�m�e�n�t� �a�n�d� �h�o�w� �t�h�e�y� �w�e�r�e� �r�e�s�o�l�v�e�d�.�
�
�#�#� �L�i�c�e�n�s�e� �&� �a�c�a�d�e�m�i�c� �i�n�t�e�g�r�i�t�y�
�T�h�i�s� �i�s� �a�n� �i�n�d�i�v�i�d�u�a�l� �c�o�u�r�s�e�w�o�r�k� �s�u�b�m�i�s�s�i�o�n�.� �A�n�y� �e�x�t�e�r�n�a�l� �c�o�d�e� �(�e�.�g�.�
�i�n�s�p�i�r�a�t�i�o�n� �f�r�o�m� �C�l�e�a�n�R�L� �o�r� �P�P�O� �p�a�p�e�r�s�)� �i�s� �r�e�f�e�r�e�n�c�e�d� �i�n� �t�h�e� �r�e�p�o�r�t�.� �N�o�
�R�L�-�s�p�e�c�i�f�i�c� �l�i�b�r�a�r�i�e�s� �(�S�t�a�b�l�e�-�B�a�s�e�l�i�n�e�s�3�,� �R�L�L�i�b�,� �T�i�a�n�s�h�o�u�,� �e�t�c�.�)� �a�r�e� �u�s�e�d�
�i�n� �t�h�e� �m�a�i�n� ��s�r�c�/�� �i�m�p�l�e�m�e�n�t�a�t�i�o�n�.�
�