10.14489/vkit.2020.09.pp.035-045

DOI: 10.14489/vkit.2020.09.pp.035-045

Дубенко Ю. В., Дышкант Е. Е., Гура Д. А.
АНАЛИЗ ИЕРАРХИЧЕСКОГО ОБУЧЕНИЯ С ПОДКРЕПЛЕНИЕМ ДЛЯ РЕАЛИЗАЦИИ ПОВЕДЕНЧЕСКИХ СТРАТЕГИЙ ИНТЕЛЛЕКТУАЛЬНЫХ АГЕНТОВ
(с. 35-45)

Аннотация. Рассмотрена задача по оценке возможности применения робототехнических систем как способа решения проблем, возникающих при мониторинге сложных инфраструктурных объектов. Изучены методы и алгоритмы реализаций поведенческих стратегий роботов, в частности, поисковые алгоритмы, основанные на деревьях решений. Сделан акцент на важности формирования у роботов способности к самообучению посредством обучения с подкреплением, связанного с моделированием поведения живых существ при взаимодействии с неизвестными элементами окружающей среды. Рассмотрен метод Q-learning как одна из разновидностей парадигмы обучения с подкреплением, вводящая понятие ценности действия, а также подход «иерархического обучения с подкреплением» и его разновидности Options Framework, Feudal, MaxQ. По итогам проведенного обзора методов обучения с подкреплением сделаны ключевые выводы.

Ключевые слова: обучение с подкреплением; интеллектуальные агенты; мониторинг; инфраструктурные объекты; сегментация макродействий; техническое зрение; глубокая кластеризация; роботы.

Dubenko Yu. V., Dyshkant Ye. Ye., Gura D. A.
ANALYSIS OF HIERARCHICAL LEARNING WITH REINFORCEMENT FOR THE IMPLEMENTATION OF BEHAVIORAL STRATEGIES OF INTELLIGENT AGENTS
(pp. 35-45)

Abstract. The paper discusses the task of evaluating the possibility of using robotic systems (intelligent agents) as a way to solve a problem of monitoring complex infrastructure objects, such as buildings, structures, bridges, roads and other transport infrastructure objects. Methods and algorithms for implementing behavioral strategies of robots, in particular, search algorithms based on decision trees, are examined. The emphasis is placed on the importance of forming the ability of robots to self-learn through reinforcement learning associated with modeling the behavior of living creatures when interacting with unknown elements of the environment. The Q-learning method is considered as one of the types of reinforcement learning that introduces the concept of action value, as well as the approach of “hierarchical reinforcement learning” and its varieties “Options Framework”, “Feudal”, “MaxQ”. The problems of determining such parameters as the value and reward function of agents (mobile robots), as well as the mandatory presence of a subsystem of technical vision, are identified in the segmentation of macro actions. Thus, the implementation of the task of segmentation of macro-actions requires improving the methodological base by applying intelligent algorithms and methods, including deep clustering methods. Improving the effectiveness of hierarchical training with reinforcement when mobile robots operate in conditions of lack of information about the monitoring object is possible by transmitting visual information in a variety of states, which will also increase the portability of experience between them in the future when performing tasks on various objects.

Keywords: Reinforcement learning; Intelligent Agents; Monitoring; Infrastructure objects; Segmentation of macro actions; Technical vision; Deep clustering; Robots.

+ - Информация об авторах (About the Authors) Click to collapse

Рус

Ю. В. Дубенко, Е. Е. Дышкант (Кубанский государственный технологический университет, Краснодар, Россия)
Д. А. Гура (Кубанский государственный технологический университет; Кубанский государственный аграрный университет имени И.Т. Трубилина, Краснодар, Россия) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Eng

Yu. V. Dubenko, Ye. Ye. Dyshkant (Kuban State Technological University, Krasnodar, Russia)
D. A. Gura (Kuban State Technological University, Kuban State Agrarian University named after I. T. Trubilin, Krasnodar, Russia) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

+ - Библиографический список (References) Click to collapse

Рус

1. Мониторинг сложных объектов инфраструктуры / Д. А. Гура и др. // Вестник Адыгейского государственного университета. Сер. 4. Естественно-математические и технические науки. 2019. № 4(251). С. 74 – 80.
2. Иванов Д. С. Порядок применения мобильных роботов для обследования и мониторинга аварийных зданий в условиях чрезвычайных ситуаций // Технологии гражданской безопасности. 2013. Т. 10, № 1(35). С. 80 – 82.
3. Современные проблемы строительной науки, техники и технологии: учеб. пособие / Н. В. Брайла и др. // СПб.: Изд-во Санкт-Петербургского политехн. ун-та Петра Великого. 2017. 141 с.
4. Гладких А. А., Супрун Е. А. Обследование труднодоступных участков зданий и сооружений с помощью роботов // Alfabuild. 2017. № 1(1). С. 27 – 35.
5. Достоверный и правдоподобный вывод в интеллектуальных системах / В. Н. Вагин и др.; под ред. В. Н. Вагина, Д. А. Поспелова. 2-е изд. испр. и доп. М.: ФИЗМАТЛИТ, 2008. 704 с.
6. Стюарт Р., Норвиг П. Искусственный интеллект: современный подход / пер. с англ. и ред. К. А. Птицына. 2-е изд. М.: Вильямс, 2006. 1408 с.
7. Варшавский П. Р., Еремеев А. П. Моделирование рассуждений на основе прецедентов в интеллектуальных системах поддержки принятия решений // Искусственный интеллект и принятие решений. 2009. № 2. С. 45 – 57.
8. Саттон Р. С., Барто Э. Г. Обучение с подкреплением / пер. с англ. Е. О. Романова. 2-е изд. М.: БИНОМ. Лаб. знаний, 2017. 400 с.
9. Hierarchical Reinforcement Learning for Robot Navigation Using the Intelligent Space Concept / L. A. Jeni et al. // 11th Intern. Conf. on Intelligent Engineering Systems. 2007. P. 149 – 153. DOI: 10.1109/ines.2007.4283689
10. Еремеев А. П., Подогов И. Ю. Обобщенный метод иерархического подкрепленного обучения для интеллектуальных систем поддержки принятия решений // Программные продукты и системы. 2008. № 2. С. 35 – 39.
11. Farahani M. D., Mozayani N. Automatic Construction and Evaluation of Macro-Actions in Reinforcement Learning // Applied Soft Computing. 2019. V. 82. P. 105574. DOI: 10.1016/j.asoc.2019.105574
12. Learning Deep Representations for Graph Clustering / F. Tian et al. // Proc. of the Twenty-Eighth AAAI Conf. on Artificial Intelligence (Québec City, Québec, Canada – July 27 – 31, 2014). 2014. P. 1293 – 1299.
13. Metzen J. H. Learning the Structure of Continuous Markov Decision Processes (Ph.D. thesis), University of Bremen, 2014. 161 р.
14. Dietterich T. G. The MAXQ Method for Hierarchical Reinforcement Learning / T. G. Dietterich // Intern. Conf. on Machine Learning (ICML). 1998. V. 98. P. 118 – 126.
15. Dietterich T. G. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition // Journal of Artificial Intelligence Research. 2000. V. 13, No. 13. P. 227 – 303.
16. McGovern A., Barto A. G. Automatic Discovery of Subgoals in Reinforcement Learning Using Diverse Density // Intern. Conf. on Machine Learning (ICML). 2001. P. 361 – 368.
17. Discovering Options from Example Trajectories / P. Zang et al. // Proc. of the 26th Annual Intern. Conf. on Machine Learning. 2009. P. 1217 – 1224.
18. Jonsson A., Barto A. A Causal Approach to Hierarchical Decomposition of Factored MDPs // Proc. of the 22nd Intern. Conf. on Machine Learning. 2005. P. 401 – 408.
19. Automatic Discovery and Transfer of MAXQ Hierarchies / N. Mehta et al. // Proc. of the 25th Intern. Conf. on Machine Learning. 2008. P. 648 – 655.
20. Dayan P., Hinton G. E. Feudal Reinforcement Learning [Электронный ресурс] // NIPS. 1993. No. 5. P. 271 – 278. URL: http://www.gatsby.ucl.ac.uk/~dayan/ papers/dh93.pdf (дата обращения: 01.08.2020).
21. Stolle M., Precup D. Learning Options in Reinforcement Learning // Lecture Notes in Computer Science. 2002. V. 2371. P. 212 – 223. DOI: 10.1007/3-540-45622-8_16

Eng

1. Gura D. A. et al. (2019). Monitoring of complex infrastructure objects. Vestnik Adygeyskogo gosudarstvennogo universiteta. Seriya 4. Estestvenno-matematicheskie i tekhnicheskie nauki, 251(4), pp. 74 – 80. [in Russian language]
2. Ivanov D. S. (2013). The procedure for the use of mobile robots for the survey and monitoring of emergency buildings in emergency situations. Tekhnologii grazhdanskoy bezopasnosti, Vol. 10, 35(1), pp. 80 – 82. [in Russian language]
3. Brayla N. V. et al. (2017). Modern problems of building science, engineering and technology: textbook. Saint Petersburg: Izdatel'stvo Sankt-Peterburgskogo politekhnicheskiy universiteta Petra Velikogo. [in Russian language]
4. Gladkih A. A., Suprun E. A. (2017). Survey of hard-to-reach areas of buildings and structures using robots. Alfabuild, 1(1), pp. 27 – 35. [in Russian language]
5. Vagin V. N. (Eds.), Pospelov D. A. (2008). Reliable and plausible conclusion in intelligent systems. 2nd ed. Moscow: FIZMATLIT. [in Russian language]
6. Ptitsyn K. A. (Ed.), Styuart R., Norvig P. (2006). Artificial Intelligence: A Modern Approach. 2nd ed. Moscow: Vil'yams. [in Russian language]
7. Varshavskiy P. R., Eremeev A. P. (2009). Modeling reasoning based on precedents in intelligent decision support systems. Iskusstvenniy intellekt i prinyatie resheniy, (2), pp. 45 – 57. [in Russian language]
8. Satton R. S., Barto E. G. (2017). Reinforcement learning. 2nd ed. Moscow: BINOM. Laboratoriya znaniy. [in Russian language]
9. Jeni L. A. et al. (2007). Hierarchical Reinforcement Learning for Robot Navigation Using the Intelligent Space Concept. 11th International Conference on Intelligent Engineering Systems, pp. 149 – 153. DOI: 10.1109/ines.2007.4283689
10. Eremeev A. P., Podogov I. Yu. (2008). Generalized method of hierarchical reinforced learning for intelligent decision support systems. Programmnye produkty i sistemy, (2), pp. 35 – 39. [in Russian language]
11. Farahani M. D., Mozayani N. (2019). Automatic Construction and Evaluation of Macro-Actions in Reinforcement Learning. Applied Soft Computing, Vol. 82, pp. 105574. DOI: 10.1016/j.asoc.2019.105574
12. Tian F. et al. (2014). Learning Deep Representations for Graph Clustering. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1293 – 1299. Québec City.
13. Metzen J. H. (2014). Learning the Structure of Continuous Markov Decision Processes (Ph.D. thesis). University of Bremen.
14. Dietterich T. G. (1998). The MAXQ Method for Hierarchical Reinforcement Learning. International Conference on Machine Learning (ICML), Vol. 98, pp. 118 – 126.
15. Dietterich T. G. (2000). Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, Vol. 13, 13, pp. 227 – 303.
16. McGovern A., Barto A. G. (2001). Automatic Discovery of Subgoals in Reinforcement Learning Using Diverse Density. International Conference on Machine Learning (ICML), pp. 361 – 368.
17. Zang P. et al. (2009). Discovering Options from Example Trajectories. Procedings of the 26th Annual International Conference on Machine Learning, pp. 1217 – 1224.
18. Jonsson A., Barto A. (2005). A Causal Approach to Hierarchical Decomposition of Factored MDPs. Proceedings of the 22nd International Conference on Machine Learning, pp. 401 – 408.
19. Mehta N. et al. (2008). Automatic Discovery and Transfer of MAXQ Hierarchies. Proceedings of the 25th International Conference on Machine Learning, pp. 648 – 655.
20. Dayan P., Hinton G. E. (1993). Feudal Reinforcement Learning. NIPS, (5), pp. 271 – 278. Available at: http://www.gatsby.ucl.ac.uk/~dayan/ papers/dh93.pdf (Accessed: 01.08.2020).
21. Stolle M., Precup D. (2002). Learning Options in Reinforcement Learning. Lecture Notes in Computer Science, Vol. 2371, pp. 212 – 223. DOI: 10.1007/3-540-45622-8_16

+ - Заказать электронную версию статьи (Purchase digital version of a single article) Click to collapse

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 350 руб. (в том числе НДС 18%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа скопируйте doi статьи:

10.14489/vkit.2020.09.pp.035-045

и заполните форму

Отправляя форму вы даете согласие на обработку персональных данных.

Eng

This article is available in electronic format (PDF).

The cost of a single article is 350 rubles. (including VAT 18%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please copy the article doi:

10.14489/vkit.2020.09.pp.035-045

and fill out the form