10.14489/vkit.2021.11.pp.027-036

DOI: 10.14489/vkit.2021.11.pp.027-036

Сгибнев И. В., Вишняков Б. В.
СЕМАНТИЧЕСКАЯ СЕГМЕНТАЦИЯ СЦЕНЫ С ИСПОЛЬЗОВАНИЕМ ОПТИЧЕСКИХ ДАТЧИКОВ В ЗАДАЧЕ АВТОНОМНОГО УПРАВЛЕНИЯ ТРАНСПОРТНЫМ СРЕДСТВОМ
(с. 27-36)

Аннотация. Проанализирована проблема семантической сегментации изображений для системы машинного зрения внедорожного автономного роботизированного транспортного средства. Рассмотрены энкодеры на базе ResNet18, ResNet34, MobileNetV2, ShuffleNetV2, EfficientNet-B0 и декодеры на базе U-Net, DeepLabV3 и DeepLabV3+, а также дополнительные компоненты, позволяющие повысить точность сегментации и сократить время вывода. Предложена оптимальная архитектура нейронной сети, использующая ResNet34 и DeepLabV3+ с блоками Squeeze & Excitation. Продемонстрированы наборы виртуальных и натурных данных для семантической сегментации. Кроме того, показано, что предварительное обучение на наборе натурных данных позволяет достичь увеличения точности на 2,6 % по метрике mIoU на наборе натурных данных по сравнению с предварительным обучением на наборе данных Cityscapes. Получено 76,1 % по метрике mIoU на проверочном наборе данных Cityscapes и 85,4 % по метрике mIoU на проверочном наборе натурных данных с кадровой частотой 37 кадров в секунду при размере входного изображения 1024 × 1024 пикселов на одной видеокарте NVIDIA GeForce RTX 2080 с помощью инструментария для оптимизации времени прямого прохода нейронных сетей TensorRT, разработанного компанией Nvidia.

Ключевые слова: семантическая сегментация; автономное вождение; легковесные модели нейронных сетей; энкодер; декодер; ResNet; DeepLabV3+; TensorRT.

Sgibnev I. V., Vishnyakov B. V.
SEMANTIC SEGMENTATION USING OPTICAL SENSORS IN THE TASK OF AUTONOMOUS DRIVING
(pp. 27-36)

Abstract. This paper is devoted to the problem of image semantic segmentation for machine vision system of off-road autonomous robotic vehicle. Most modern convolutional neural networks require large computing resources that go beyond the capabilities of many robotic platforms. Therefore, the main drawback of such models is extremely high complexity of the convolutional neural network used, whereas tasks in real applications must be performed on devices with limited resources in real-time. This paper focuses on the practical application of modern lightweight architectures as applied to the task of semantic segmentation on mobile robotic systems. The article discusses backbones based on ResNet18, ResNet34, MobileNetV2, ShuffleNetV2, EfficientNet-B0 and decoders based on U-Net, DeepLabV3 and DeepLabV3+ as well as additional components that can increase the accuracy of segmentation and reduce the inference time. In this paper we propose a model using ResNet34 enconding and DeepLabV3+ decoding with Squeeze & Excitation blocks that was optimal in terms of inference time and accuracy. We also demonstrate our off-road dataset and simulated dataset for semantic segmentation. Furthermore, we increased mIoU metric by 2.6 % on our off-road dataset using pretrained weights on simulated dataset, compared with mIoU metric using pretrained weights on the Cityscapes. Moreover, we achieved 76.1 % mIoU on the Cityscapes validation set and 85.4 % mIoU on our off-road validation set at 37 FPS (Frames per Second) for an input image of 1024×1024 size on one NVIDIA GeForce RTX 2080 card using NVIDIA TensorRT inference framework.

Keywords: Semantic segmentation; Autonomous driving; Lightweight neural network models; Encoder; Decoder; ResNet; DeepLabV3+; TensorRT.

+ - Информация об авторах (About the Authors) Click to collapse

Рус

И. В. Сгибнев, Б. В. Вишняков (ФГУП «Государственный научно-исследовательский институт авиационных систем» ГНЦ РФ, Москва, Россия) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Eng

I. V. Sgibnev, B. V. Vishnyakov (State Research Institute of Aviation Systems State Scientific Center of Russian Federation, Moscow, Russia) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

+ - Библиографический список (References) Click to collapse

Рус

1. Deep Residual Learning for Image Recognition / K. He et al. // Conf. on Computer Vision and Pattern Recognition (CVPR). 2016. Р. 770 – 778. DOI 10.1109/CVPR.2016.90
2. MobileNetV2: Inverted Residuals and Linear Bottlenecks / M. Sandler et al. // Conf. on Computer Vision and Pattern Recognition (CVPR). 2018. Р. 4510 – 4520. DOI 10.1109/CVPR.2018.00474
3. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design / N. Ma et al. // Computer Vision and Pattern Recognition. 2018. URL: https://www.arxiv-vanity.com/papers/1807.11164/ (дата обращения: 01.10.2021).
4. Tan M., Le Q. V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks // Proc. of the 36th Intern. Conf. on Machine Learning. 2019. URL: https://proceedings.mlr.press/v97/tan19a/tan19a.pdf (дата обращения: 01.10.2021).
5. Rethinking Atrous Convolution for Semantic Image Segmentation / L.-C. Chen et al. 2017. 14 p. URL: https://arxiv.org/pdf/1706.05587.pdf (дата обращения: 01.10.2021).
6. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation / L.-C. Chen et al. 2018. 18 p. URL: https://arxiv.org/pdf/1802.02611.pdf (дата обращения: 01.10.2021).
7. Ronneberger O., Fischer P., Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation // Intern. Conf. on Medical Image Computing and Computer-Assisted Intervention. 2015. V. 9351. P. 234 – 241. URL: https://arxiv.org/pdf/1505.04597v1.pdf (дата обращения: 01.10.2021).
8. Badrinarayanan V., Kendall A., Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015. V. 39, No. 12. P. 24812495. DOI 10.1109/TPAMI.2016.2644615
9. Hu J., Shen L., Sun G. Squeeze-and-Excitation Networks // Conf. on Computer Vision and Pattern Recognition. 2018. P. 7132 – 7141. DOI 10.1109/CVPR.2018.00745
10. ImageNet: A Large-Scale Hierarchical Image Database / J. Deng et al. // Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR). 2009. P. 248 – 255. URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5206848&tag=1 (дата обращения: 01.10.2021). DOI 10.1109/CVPR.2009.5206848
11. The Cityscapes Dataset for Semantic Urban Scene Understanding / M. Cordts et al. // Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2016. 11 p. URL: https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2016cityscapes.pdf (дата обращения: 01.10.2021). DOI 10.1109/CVPR.2016.350
12. Игровой движок Unreal Engine. URL: https://www.unrealengine.com/en-US/unreal (дата обращения: 01.10.2021).
13. Albumentations: Fast and Flexible Image Augmentations / A. Buslaev et al. // Information. 2018. V. 11, No. 2. 125 p. DOI 10.3390/info11020125
14. Kingma D. P., Ba J. L. Adam: A Method for Stochastic Optimization // 3rd Intern. Conf. for Learning Representations. 2014. 15 p. URL: https://arxiv.org/pdf/1412.6980.pdf (дата обращения: 01.10.2021).
15. Ruder S. An Overview of Gradient Descent Optimization Algorithms // Machine Learning. 2016. 14 p. URL: https://sgfin.github.io/files/notes/ruder_gradient.pdf (дата обращения: 01.10.2021).
16. PyTorch. Key Features & Capabilities. URL: https://pytorch.org/ (дата обращения: 01.10.2021).
17. Caffe. URL: https://caffe.berkeleyvision.org/ (дата обращения: 01.10.2021).
18. MxNet. A Flexible and Efficient Library for Deep Learning. URL: https://mxnet.apache.org/ (дата обращения: 01.10.2021).
19. Keras. Simple. Flexible. Powerful. URL: https://keras.io/ (дата обращения: 01.10.2021).
20. TensorFlow. Комплексная платформа машинного обучения с открытым исходным кодом URL: https://www.tensorflow.org/ (дата обращения: 01.10.2021).
21. NVIDIA TensorRT. URL: https://developer. nvidia.com/tensorrt (дата обращения: 01.10.2021).

Eng

1. He K. et al. (2016). Deep Residual Learning for Image Recognition. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770 – 778. DOI 10.1109/CVPR.2016.90
2. Sandler M. et al. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510 – 4520. DOI 10.1109/CVPR.2018.00474
3. Ma N. et al. (2018). ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. Computer Vision and Pattern Recognition. Available at: https://www.arxiv-vanity.com/papers/1807.11164/ (Accessed: 01.10.2021).
4. Tan M., Le Q. V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of the 36th International Conference on Machine Learning. Available at: https://proceedings.mlr. press/v97/tan19a/tan19a.pdf (Accessed: 01.10.2021).
5. Chen L.-C. et al. (2017). Rethinking Atrous Convolution for Semantic Image Segmentation. Available at: https://arxiv.org/pdf/1706.05587.pdf (Accessed: 01.10.2021).
6. Chen L.-C. et al. (2018). Encoder–Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Available at: https://arxiv.org/pdf/ 1802.02611. pdf (Accessed: 01.10.2021).
7. Ronneberger O., Fischer P., Brox T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, Vol. 9351, pp. 234 – 241. Available at: https://arxiv.org/ pdf/1505.04597v1.pdf (Accessed: 01.10.2021).
8. Badrinarayanan V., Kendall A., Cipolla R. (2015). SegNet: A Deep Convolutional Encoder–Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, (12). DOI 10.1109/TPAMI. 2016.2644615
9. Hu J., Shen L., Sun G. (2018). Squeeze-and-Excitation Networks. Conference on Computer Vision and Pattern Recognition, pp. 7132 – 7141. DOI 10.1109/CVPR.2018.00745
10. Deng J. et al. (2009). ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248 – 255. Available at: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5206848&tag=1 (Accessed: 01.10.2021). DOI 10.1109/CVPR.2009.5206848
11. Cordts M. et al. (2016). The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Available at: https://www.citys-capes-dataset.com/wordpress/wp-content/papercite-data/ pdf/cordts2016cityscapes.pdf (Accessed: 01.10.2021). DOI 10.1109/CVPR.2016.350
12. Unreal Engine game engine. Available at: https://www.unrealengine.com/en-US/unreal (Accessed: 01.10.2021).
13. Buslaev A. et al. (2018). Albumentations: Fast and Flexible Image Augmentations. Information, Vol. 11, (2). DOI 10.3390/info11020125
14. Diederik P. Kingma and Jimmy Lei Ba. (2014). Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations. Available at: https://arxiv.org/pdf/1412.6980.pdf (Accessed: 01.10.2021).
15. Ruder S. (2016). An Overview of Gradient Descent Optimization Algorithms. Machine Learning. Available at: https://sgfin.github.io/files/notes/ rud-er_gradient.pdf (Accessed: 01.10.2021).
16. PyTorch. Key Features & Capabilities. Available at: https://pytorch.org/ (Accessed: 01.10.2021).
17. Caffe. Available at: https://caffe.berkeleyvision. org/ (Accessed: 01.10.2021).
18. MxNet. A Flexible and Efficient Library for Deep Learning. Available at: https://mxnet.apache.org/ (Accessed: 01.10.2021).
19. Keras. Simple. Flexible. Powerful. Available at: https://keras.io/ (Accessed: 01.10.2021).
20. TensorFlow. A comprehensive open source machine learning platform. Available at: https://www.ten-sorflow.org/ (Accessed: 01.10.2021). [in Russian language]
21. NVIDIA TensorRT. Available at: https://develo-per.nvidia.com/tensorrt (Accessed: 01.10.2021).

+ - Заказать электронную версию статьи (Purchase digital version of a single article) Click to collapse

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 450 руб. (в том числе НДС 18%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа скопируйте doi статьи:

10.14489/vkit.2021.11.pp.027-036

и заполните форму

Отправляя форму вы даете согласие на обработку персональных данных.

Eng

This article is available in electronic format (PDF).

The cost of a single article is 450 rubles. (including VAT 18%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please copy the article doi:

10.14489/vkit.2021.11.pp.027-036

and fill out the form