10.14489/vkit.2025.06.pp.019-028

DOI: 10.14489/vkit.2025.06.pp.019-028

Сакулин С. А., Алфимцев А. Н., Белоусов В. С., Тертычный Г. А.
ПОВЫШЕНИЕ ЭФФЕКТИВНОСТИ УПРАВЛЕНИЯ БЕСПИЛОТНЫМИ АВТОМОБИЛЯМИ НА ОСНОВЕ УТОЧНЕНИЯ ФУНКЦИИ НАГРАДЫ В ОБУЧЕНИИ С ПОДКРЕПЛЕНИЕМ
(с. 19-28)

Аннотация. Несмотря на значительные успехи в области управления беспилотными автомобилями, проблема их эффективного взаимодействия в сложных дорожных ситуациях до сих пор не решена. Среди методов обучения мультиагентных систем существуют перспективные подходы к решению этой проблемы, в частности обучение с подкреплением. В статье рассмотрена возможность повышения эффективности такого управления на основе построения функции награды, учитывающей множество факторов, характерных для реальной дорожной обстановки. Эксперименты показали, что уточнение функции награды позволяет повысить эффективность управления беспилотными автомобилями. Эта эффективность оценивается по нескольким общепринятым метрикам, отражающим различные аспекты обучения и последующего управления беспилотными автомобилями.

Ключевые слова: беспилотные автомобили; моделирование; мультиагентные системы; обучение с подкреплением; синтез функции награды.

Sakulin S. A., Alfimtsev A. N., Belousov V. S., Tertychny G. A.
INCREASING THE EFFICIENCY OF CONTROL OF SELF-DRIVING VEHICLES BASED ON REFINEMENT OF THE REWARD FUNCTION IN REINFORCEMENT LEARNING
(pp. 19-28)

Abstract. The article considers the urgent problem of efficient interaction of unmanned vehicles in complex road conditions. Despite significant achievements in the field of autonomous driving, there are still unresolved issues related to safety and control efficiency during such driving. The main focus is on reinforcement learning methods, which represent a promising approach to solving this problem. A new approach to the formation of a reward function is proposed, taking into account many factors characteristic of a real road situation. The experiments demonstrate that refining the reward function leads to a stable improvement in the learning performance of agents, which is confirmed by the analysis of the dependencies of metrics on the duration of training. Several quantitative criteria are highlighted, such as the agent's ability to drive safely and minimize potential emergency situations. These criteria are formalized as a partial order, which allows for a more accurate assessment of agent behavior on the road. The results show that agents trained using a refined reward function demonstrate more aggressive and efficient behavior, which can lead to increased traffic efficiency. The conclusion highlights the need for further research in the implementation of reinforcement learning methods to ensure reliable interaction of self-driving cars in real-world conditions. The development of technologies and control methods in this area will contribute to the creation of a safer and more efficient transport system, which is an important task for the future of autonomous driving.

Keywords: Self-driving cars; Simulation; Multi-agent systems; Reinforcement learning; Synthesis of the reward function.

+ - Информация об авторах (About the Authors) Click to collapse

Рус

С. А. Сакулин, А. Н. Алфимцев, В. С. Белоусов, Г. А. Тертычный (Московский государственный технический университет им. Н. Э. Баумана (национальный исследовательский университет), Москва, Россия) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Eng

S. A. Sakulin, A. N. Alfimtsev, V. S. Belousov, G. A. Tertychny (Bauman Moscow State Technical University, Moscow, Russia) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

+ - Библиографический список (References) Click to collapse

Рус

1. Панов А. И. Одновременное планирование и обучение в иерархической системе управления когнитивным агентом // Автоматика и телемеханика. 2022. № 6. С. 53–71.
2. Лин А. А., Беляев И. Д., Беляева М. Б. Особенности применения технологии обучения с подкреплением // Математическое моделирование процессов и систем. 2022. С. 236–242.
3. Sakulin S. A., Alfimtsev A. N. Reward shaping in reinforcement learning for unmanned vehicles of smart city // Материалы V международного форума (24–25 ноября 2022 г) / под ред. В. И. Сырямкина. Томск: STT, 2022. 147 с. 2023. С. 136.
4. Smarts: An open-source scalable multi-agent rl training school for autonomous driving / M. Zhou, J. Luo, J. Villella et al. // Conference on robot learning. 4th Conference on Robot Learning (CoRL 2020), Nov 16, 2020 Nov 18, 2020, Cambridge MA, USA. PMLR, 2021. P. 264–285.
5. Reinforcement Learning based Negotiation-aware Motion Planning of Autonomous Vehicles / Z. Wang, Y. Zhuang, Q. Gu et al. // 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 27 September 2021 – 01 October 2021, Prague, Czech Republic. 2021. P. 4532–4537.
6. Бобровская О. П., Гавриленко Т. В. Беспилотный автомобиль: подходы к реализации, проблемы // Успехи кибернетики. 2022. Т. 3, № 2. С. 86–96.
7. Hsu C. C. Y., Mendler-Dünner C., Hardt M. Revisiting design choices in proximal policy optimization // arXiv preprint arXiv:2009.10897. 2020. P. 1–29.
8. Саленек И. А., Селиверстов Я. А., Селиверстов С. А., Носкова Н. И. Оптимизация циклов светофорного регулирования методом обучения с подкреплением // Системный анализ в проектировании и управлении. 2023. Т. 26, №. 1. С. 344–350.
9. Proximal policy optimization algorithms / Schulman J., Wolski F., Dhariwal P. et al. // arXiv preprint arXiv:1707.06347. 2017. P. 1–12.
10. Dota 2 with large scale deep reinforcement learning / C. Berner, G. Brockman, B. Chan et al. // arXiv preprint arXiv:1912.06680. 2019. P. 1–66.
11. The surprising effectiveness of ppo in cooperative multiagent games / C. Yu, A. Velu, E. Vinitsky et al. // Advances in neural information processing systems. 2022. V. 35. P. 24611–24624.
12. Papoudakis G., Christianos F., Schäfer L., Albrecht S. V. Benchmarking multi-agent deep rein-forcement learning algorithms in cooperative tasks // arXiv preprint arXiv:2006.07869. 2020. P. 1–33.
13. Siboo S., Bhattacharyya A., Raj R. N., Ashwin S. H. An empirical study of ddpg and ppo-based reinforcement learning algorithms for autonomous driving // Ieee Access. 2023. V. 11. P. 125094–125108.
14. Zhao P., Yuan Z., Thu K., Miyazaki T. Real-world autonomous driving control: An empirical study using the proximal policy optimization (ppo) algorithm // EVERGREEN Joint Journal of Novel Carbon Resource Sciences & Green Asia Strategy. 2024. V. 11, Is. 02. P. 887–899.
15. Сакулин С. А., Алфимцев А. Н. Синтез функции награды в обучении с подкреплением средствами когнитивной графики // Вестник компьютерных и информационных технологий. 2022. Т. 19, № 8. С. 26–36.
16. Сакулин С. А., Алфимцев А. Н. Формализация экспертных знаний об удобстве вебстраниц на основе агрегирования пользовательских критериев // Информационные технологии. 2014. № 6. С. 16–21.
17. Сакулин С. А. Визуализация операторов агрегирования с применением трехмерной когнитивной графики // Вестник компьютерных и информационных технологий. 2022. Т. 19, № 3. С. 15–22.

Eng

1. Panov, A. I. (2022). Simultaneous planning and learning in a hierarchical control system of a cognitive agent. Avtomatika i Telemekhanika, (6),53–71. [in Russian language]
2. Lin, A. A., Belyaev, I. D., & Belyaeva, M. B. (2022). Features of reinforcement learning technology application. Matematicheskoe Modelirovanie Protsessov i Sistem, 236-242. [in Russian language]
3. Sakulin, S. A., & Alfimtsev, A. N. (2022). Reward shaping in reinforcement learning for unmanned vehicles of smart city. In V. I. Syryamkin (Ed.), Proceedings of the V International Forum (November 24–25, 2022) (p. 136). STT.
4. Zhou, M., Luo, J., Villella, J., et al. (2021). SMARTS: An open-source scalable multi-agent RL training school for autonomous driving. 4th Conference on Robot Learning (CoRL 2020), November 16–18, 2020, Cambridge, MA, USA. Proceedings of Machine Learning Research (PMLR), 264-285.
5. Wang, Z., Zhuang, Y., Gu, Q., et al. (2021). Reinforcement learning based negotiation-aware motion planning of autonomous vehicles. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 27 - October 1, 2021, Prague, Czech Republic (pp. 4532-4537). IEEE.
6. Bobrovskaya, O. P., & Gavrilenko, T. V. (2022). Autonomous vehicles: Implementation approaches and challenges. Uspekhi Kibernetiki, 3(2), 86–96. [in Russian language]
7. Hsu, C. C. Y., Mendler-Dünner, C., & Hardt, M. (2020). Revisiting design choices in proximal policy optimization. arXiv preprint. arXiv:2009.10897, 1–29.
8. Salenek, I. A., Seliverstov, Y. A., Seliverstov, S. A., & Noskova, N. I. (2023). Optimization of traffic light cycles using reinforcement learning. Sistemnyi Analiz v Proektirovanii i Upravlenii, 26(1), 344–350. [in Russian language]
9. Schulman, J., Wolski, F., Dhariwal, P., et al. (2017). Proximal policy optimization algorithms. arXiv preprint. arXiv:1707.06347, 1–12.
10. Berner, C., Brockman, G., Chan, B., et al. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint. arXiv:1912.06680, 1–66.
11. Yu, C., Velu, A., Vinitsky, E., et al. (2022). The surprising effectiveness of PPO in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35, 24611-24624.
12. Papoudakis, G., Christianos, F., Schäfer, L., & Albrecht, S. V. (2020). Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks. arXiv preprint. arXiv:2006.07869, 1–33.
13. Siboo, S., Bhattacharyya, A., Raj, R. N., & Ashwin, S. H. (2023). An empirical study of DDPG and PPO-based reinforcement learning algorithms for autonomous driving. IEEE Access, 11, 125094-125108.
14. Zhao, P., Yuan, Z., Thu, K., & Miyazaki, T. (2024). Real-world autonomous driving control: An empirical study using the proximal policy optimization (PPO) algorithm. EVERGREEN Joint Journal of Novel Carbon Resource Sciences & Green Asia Strategy, 11(2), 887–899.
15. Sakulin, S. A., & Alfimtsev, A. N. (2022). Synthesis of a reward function in reinforcement learning using cognitive graphics. Vestnik Kompyuternykh i Informatsionnykh Tekhnologii, 19(8), 26–36. [in Russian language]
16. Sakulin, S. A., & Alfimtsev, A. N. (2014). Formalization of expert knowledge about web page usability based on aggregation of user criteria. Informatsionnye Tekhnologii, (6), 16–21. [in Russian language]
17. Sakulin, S. A. (2022). Visualization of aggregation operators using 3D cognitive graphics. Vestnik Kompyuternykh i Informatsionnykh Tekhnologii, 19(3), 15–22. [in Russian language]

+ - Заказать электронную версию статьи (Purchase digital version of a single article) Click to collapse

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 700 руб. (в том числе НДС 20%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа скопируйте doi статьи:

10.14489/vkit.2018.01.pp.003-012

и заполните форму

Отправляя форму вы даете согласие на обработку персональных данных.

Eng

This article is available in electronic format (PDF).

The cost of a single article is 700 rubles. (including VAT 20%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please copy the article doi:

10.14489/vkit.2018.01.pp.003-012

and fill out the form