10.14489/vkit.2021.08.pp.012-021

DOI: 10.14489/vkit.2021.08.pp.012-021

Теплякова А. Р., Старков С. О.
РАСПОЗНАВАНИЕ ЧЕЛОВЕЧЕСКИХ ДЕЙСТВИЙ В ВИДЕОПОСЛЕДОВАТЕЛЬНОСТЯХ С ИСПОЛЬЗОВАНИЕМ СЕТИ LSTM
(с. 12-21)

Аннотация. Дано описание создания программного модуля, решающего задачу распознавания действий людей в видеопоследовательностях. Проведен обзор существующих наборов данных, подходящих для обучения нейронной сети, дано описание сбора и обработки данных для собственного датасета. Изложены ключевые особенности этапов создания, обучения и тестирования нейронной сети с архитектурой LSTM (Long Short-Term Memory), а также варианты ее практического применения. Показано, что разработанный модуль является довольно гибким, существует возможность увеличения числа классов распознаваемых действий в зависимости от сферы его применения, а также возможность интеграции с другими модулями для контроля поведения людей, имеющими схожее устройство.

Ключевые слова: компьютерное зрение; распознавание действий; рекуррентные нейронные сети; долгая краткосрочная память; оценка позы человека.

Teplyakova A. R., Starkov S. O.
RECOGNITION OF HUMAN ACTIONS IN VIDEO SEQUENCES USING THE LSTM NETWORK
(pp. 12-21)

Abstract. The development of computer vision and the wide applicability of its applied components determine the relevance of research in this field of science. One of the most interesting tasks of computer vision is to monitor the behavior of people, which includes the analysis of their actions and carried out for various purposes. Examples of use are systems for monitoring compliance with safety regulations and the wearing of personal protective equipment by workers in factories, systems such as “smart home”, which track actions, systems for monitoring the condition of people in medical or social institutions, home systems for monitoring the condition of the elderly, which are able to notify relatives in cases of emergency situations. There is no comprehensive program that can solve the described problem and its variations, regardless of the scope of application. Therefore, the development of its prototype, which is a module that solves the human action recognition problem in the video, is an important problem. The article describes the creation of a software module that solves the human action recognition problem in a video. An overview of existing data sets suitable for training a neural network is provided, and data collection and processing for a custom dataset for actions of four different classes is described. The key features of the stages of creating, training and testing a neural network with the LSTM (Long Short-Term Memory) architecture, as well as options for its practical application, are described below. The developed module is quite flexible, there is a possibility to increase the number of classes of recognized actions depending on the scope of its application, as well as the possibility of integration with other modules for monitoring the behavior of people who have a similar device.

Keywords: Computer vision; Action recognition; Recurrent neural networks; Long short-term memory; Human pose estimation.

+ - Информация об авторах (About the Authors) Click to collapse

Рус

А. Р. Теплякова, С. О. Старков (Обнинский институт атомной энергетики – филиал Национального исследовательского ядерного университета «МИФИ», Обнинск, Калужская область, Россия) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Eng

A. R. Teplyakova, S. O. Starkov (Obninsk Institute for Nuclear Power Engineering – the branch of the National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Obninsk, Kaluga Region, Russia) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

+ - Библиографический список (References) Click to collapse

Рус

1. Vision-Based Human Activity Recognition: a Survey / D. R. Beddiar, B. Nini, M. Sabokrou et al. // Multimed Tools Appl. 2020. V. 79. P. 30509 – 30555. DOI 10.1007/s11042-020-09004-3
2. Vrigkas M., Nikou C., Kakadiaris I. A Review of Human Activity Recognition Methods // Frontiers in Robotics and AI. 2015. V. 2(28). URL: https:// www. frontier-sin.org/articles/10.3389/frobt.2015.00028/full. DOI 10.3389/frobt.2015.00028 (дата обращения: 12.07.2021).
3. Hussain Z., Sheng Q., Zhang W. E. Different Approaches for Human Activity Recognition – A Survey. URL: https://arxiv.org/pdf/1906.05074.pdf (дата обращения: 09.03.2021).
4. Hochreiter S., Schmidhuber J. Long Short-Term Memory // Neural Computation. 1997. V. 9(8). P. 1735 – 1780. DOI 10.1162/neco.1997.9.8. 1735
5. Буйко А. Ю., Виноградов А. Н. Выявление действий на видео с помощью рекуррентных нейронных сетей // Программные системы: теория и приложения. 2017. Т. 8, № 4(35). С. 327 – 345.
6. Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine / D. Anguita, A. Ghio, L. Oneto et al. // In: Bravo J., Hervás R., Rodríguez M. (Eds.) Ambient Assisted Living and Home Care. Internetional Workshop on Ambient Assisted Living // Lecture Notes in Computer Science. 2012. V. 7657. P. 216 – 223. DOI 10.1007/978-3-642-35395-6_30
7. Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors / Y. Zhao, R. Yang, G. Chevalier et al. // Mathematical Problems in Engineering. 2018. V. 2018. P. 1 – 13. DOI 10.1155/ 2018/7316954
8. Guan Y., Plotz T. Ensembles of Deep LSTM Learners for Activity Recognition using Wearables // Proc. of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 2017. V. 1, Is. 2. P. 1 – 28. DOI 10.1145/3090076
9. Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks / W. Zhu, C. Lan, J. Xing et al. // The 30th AAAI Conference on Artificial Intelligence (AAAI-16). 2016. URL: https://arxiv.org/pdf/1603. 07772.pdf (дата обращения: 12.07.2021)
10. Veeriah V., Zhuang N., Qi G. Differential Recurrent Neural Networks for Action Recognition // 2015 IEEE Intern. Conf. on Computer Vision (ICCV). 2015. P. 4041 – 4049. DOI 10.1109/ICCV.2015.460
11. Zhang S., Liu X., Xiao J. On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks // IEEE Winter Conf. on Applications of Computer Vision (WACV). 2017. P. 148 – 157. DOI 10.1109/WACV.2017.24
12. Sawant C. Human Activity Recognition with Openpose and Long Short-Term Memory on Real Time Images // EasyChair Preprint. 2020. January. No. 2297. URL: https://www.easychair.org/publications/ preprint_open/gmWL (дата обращения: 12.07.2021)
13. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields / Z. Cao, T. Simon, S.-E. Wei et al. // Proc. of IEEE Conf. Computer Vision and Pattern Recognition (CVPR). 2017. P. 1302 – 1310.
14. Mask R-CNN / K. He, G. Gkioxari, P. Doll’ar et al. // Inter. Conf. on Computer Vision (ICCV). 2017. URL: https://arxiv.org/abs/1703.06870 (дата обращения: 12.07.2021)
15. RMPE: Regional Multiperson Pose Estimation / H.-S. Fang, S. Xie, Y.-W. Tai et al. // Intern. Conf. on Computer Vision (ICCV). 2017. URL: https://arxiv.org/pdf/1612.00137.pdf (дата обращения: 12.07.2021)

Eng

1. Beddiar D. R., Nini B., Sabokrou M. et al. (2020). Vision-Based Human Activity Recognition: a Survey. Multimedia Tools and Applications, Vol. 79, pp. 30509 – 30555. DOI 10.1007/s11042-020-09004-3
2. Vrigkas M., Nikou C., Kakadiaris I. (2015). A Review of Human Activity Recognition Methods. Frontiers in Robotics and AI, Vol. 2, 28. Available at: https:// www. frontier-sin.org/articles/10.3389/frobt.2015.00028/full. DOI 10.3389/frobt.2015.00028 (Accessed: 12.07.2021).
3. Hussain Z., Sheng Q., Zhang W. E. Different Approaches for Human Activity Recognition – A Survey. Available at: https://arxiv.org/pdf/1906.05074.pdf (Accessed: 09.03.2021).
4. Hochreiter S., Schmidhuber J. (1997). Long Short-Term Memory. Neural Computation, Vol. 9(8), pp. 1735 – 1780. DOI 10.1162/neco.1997.9.8. 1735
5. Buyko A. Yu., Vinogradov A. N. (2017). Revealing video actions using recurrent neural networks. Programmnye sistemy: teoriya i prilozheniya, Vol. 8, 35(4), pp. 327 – 345. [in Russian language]
6. Bravo J., Hervás R., Rodríguez M. (Eds.), Anguita D., Ghio A., Oneto L. et al. (2012). Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine. In: Ambient Assisted Living and Home Care. International Workshop on Ambient Assisted Living. Lecture Notes in Computer Science, Vol. 7657, pp. 216 – 223. DOI 10.1007/978-3-642-35395-6_30
7. Zhao Y., Yang R., Chevalier G. et al. (2018). Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors. Mathematical Problems in Engineering, Vol. 2018, pp. 1 – 13. DOI 10.1155/ 2018/7316954
8. Guan Y., Plotz T. (2017). Ensembles of Deep LSTM Learners for Activity Recognition using Wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 1, (2), pp. 1 – 28. DOI 10.1145/3090076
9. Zhu W., Lan C., Xing J. et al. (2016). Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks. The 30th AAAI Conference on Artificial Intelligence (AAAI-16). Available at: https://arxiv.org/pdf/1603. 07772.pdf (Accessed: 12.07.2021)
10. Veeriah V., Zhuang N., Qi G. (2015). Differential Recurrent Neural Networks for Action Recognition. 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4041 – 4049. DOI 10.1109/ICCV.2015.460
11. Zhang S., Liu X., Xiao J. (2017). On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks. IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 148 – 157. DOI 10.1109/WACV.2017.24
12. Sawant C. (2020). Human Activity Recognition with Openpose and Long Short-Term Memory on Real Time Images. EasyChair Preprint, 2297. Available at: https://www.easychair.org/publications/preprint_open/gmWL (Accessed: 12.07.2021)
13. Cao Z., Simon T., Wei S.-E. et al. (2017). Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. Proceedings of IEEE Conference Computer Vision and Pattern Recognition (CVPR), pp. 1302 – 1310.
14. He K., Gkioxari G., Doll’ar P. et al. (2017). Mask R-CNN. International. Conference on Computer Vision (ICCV). Available at: https://arxiv.org/abs/ 1703.06870 (Accessed: 12.07.2021)
15. Fang H.-S., Xie S., Tai Y.-W. et al. (2017). RMPE: Regional Multiperson Pose Estimation. International Conference on Computer Vision (ICCV). Available at: https://arxiv.org/pdf/1612.00137.pdf (Accessed: 12.07.2021)

+ - Заказать электронную версию статьи (Purchase digital version of a single article) Click to collapse

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 450 руб. (в том числе НДС 18%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа скопируйте doi статьи:

10.14489/vkit.2021.08.pp.012-021

и заполните форму

Отправляя форму вы даете согласие на обработку персональных данных.

Eng

This article is available in electronic format (PDF).

The cost of a single article is 450 rubles. (including VAT 18%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please copy the article doi:

10.14489/vkit.2021.08.pp.012-021

and fill out the form