10.14489/vkit.2025.03.pp.050-056

DOI: 10.14489/vkit.2025.03.pp.050-056

Секерин А. В., Кудинов В. А.
ДВУХЭТАПНЫЙ АЛГОРИТМ АВТОМАТИЗИРОВАННОГО ИЗВЛЕЧЕНИЯ МНЕНИЙ В РУССКОЯЗЫЧНЫХ ПОЛИТИЧЕСКИХ ТЕКСТАХ
(с. 50-56)

Аннотация. Рассмотрены нейросетевые подходы к автоматизированному извлечению мнений. Предложен двухэтапный алгоритм, основанный на решении частных задач извлечения целей (субъектов) мнений и целевого анализа настроений с помощью моделей машинного обучения архитектуры «трансформер». Представлены наборы обучающих данных русского политического домена. Лучшие результаты в задаче извлечения субъектов показала модель на основе DeBERT, в сентимент-анализе – классификатор на базе ruRoBERT.

Ключевые слова: целевой анализ настроений; извлечение субъектов мнений; трансформер; политика; общественное мнение.

Sekerin A. V., Kudinov V. A.
TWO-STAGE ALGORITHM FOR AUTOMATED OPINION EXTRACTION FROM RUSSIAN POLITICAL TEXTS
(pp. 50-56)

Abstract. The article discusses neural network approaches for automated opinion extraction in Russian. A two-stage algorithm is proposed based on solving particular problems of opinion target extraction (subject extraction) and target sentiment analysis using machine learning models of the transformer architecture The datasets of Russian political texts from microblogs, news publications and public speeches are presented, based on the existing Russian-language sets CABSAR and RuSentNE, as well as the English-language set NewsMTSC, translated using machine translation. The OTE_Ru_dataset includes sentence tokens markup by opinion target in BIO format. The TSA_Ru_dataset includes targeted emotional markup in relation to the subjects of opinions in the context of sentences in the author's format. Additional training of models of the DeBERTa-base, ruRoPEBert-e5-base-512, ruBert-large, ruElectra-large, ruRoBERTa-large, XLM-V-base architectures, previously trained in texts in Russian, was carried out. Metrics for evaluating the quality of machine learning models are described. High performance has been achieved by models whose tokenizers take into account the relative position of the word in the text. The best results of target extraction were achieved by the DeBERTa-based model. The best target sentiment analysis classifier is ruRoBERTa-based classifier. The applicability of the research results to applied tasks such as the analysis of socio-political processes and the identification of ideologemes is described.

Keywords: Target sentiment analysis; Opinion target extraction; Transformer; Politics; Public opinion.

+ - Информация об авторах (About the Authors) Click to collapse

Рус

А. В. Секерин, В. А. Кудинов (Курский государственный университет, Курск, Россия) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Eng

A. V. Sekerin, V. A. Kudinov (Kursk State University, Kursk, Russia) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

+ - Библиографический список (References) Click to collapse

Рус

1. SpanMlt: A Span-based Multi – Task Learning Framework for Pairwise Aspect and Opinion Terms Extraction / H. Zhao, L. Huang, R. Zhang et al. // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 5–10 July 2020. Online. P. 3239–3248.
2. O2-Bert: Two-Stage Target-Based Sentiment Analysis / Y. Yan, B. W. Zhang, G. Ding et al. // Cognitive Computation. V. 16, No. 1. 2024. P. 158–176.
3. Post-processing method with aspect term error correction for enhancing aspect term extraction / R. Wang, C. Lui, R. Zhao et al. // Applied Intelligence. V. 52, No 14. 2022. P. 15761–15763.
4. Question-Driven Span Labeling Model for Aspect–Opinion Pair Extraction / L. Gao, Y. Wang, T. Liu et al. // Proceedings of the AAAI Conference on Artificial Intelligence. V. 35, No. 14. 2021. P. 12875–12883.
5. Vasilyev V. G., Denisenko A. A., Solovyev D. A. Aspect Extraction and Twitter Sentiment Classification by Fragment Rules // Computational Linguistics and Intellectual Technologies: papers from the Annual conference «Dialogue». Is. 14. 2015. V. 2.
6. Извлечение аспектов, тональности и категорий аспектов на основании отзывов пользователей о ресторанах и автомобилях/ В. В. Иванов, Е. В. Тутубалина, Е. Р. Мингазов и др. // Компьютерная лингвистика и интеллектуальные технологии. По материалам ежегодной Международной конференции «Диалог». 2015. Вып. 14. Т. 2.
7. Машкин Д. О., Котельников Е. В. Извлечение аспектных терминов на основе условных случайных полей и векторных представлений слов // Тр. ИСП РАН. 2016. Т. 28, № 6. С. 223–240.
8. Golubev A., Rusnachenko N., Loukachevitch N. RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts // Computational Linguistics and Intellectual Technologies: papers from the Annual conference «Dialogue». 2023. Is. 22.
9. Glazkova A. Fine-tuning Text Classification Models for Named Entity Oriented Sentiment Analysis of Russian Texts // Computational Linguistics and Intellectual Technologies: papers from the Annual conference «Dialogue». 2023. Is. 22.
10. HAlf-MAsked Model for Named Entity Sentiment analysis / P. Podberezko, A. Kaznacheev, S. Abdullayeva et al. // Computational Linguistics and Intellectual Technologies: papers from the Annual conference «Dialogue». 2023. Is. 22.
11. Named Entity-Oriented Sentiment Analysis with text2text Generation Approach / I. Moloshnikov, M. Skorokhodov, A. Naumov et al. // Computational Linguistics and Intellectual Technologies: papers from the Annual conference «Dialogue». 2023.
12. Golubev A., Loukachevitch N. Improving results on Russian sentiment datasets //Conference on artificial intelligence and natural language. Cham:Springer International Publishing, 2020. P. 109–121.
13. Neural-network method for determining text author’s sentiment to an aspect specified by the named entity / A. Naumov, R. Rybka, A. Sboev et al. // CEUR Workshop Proceedings. 2020. V. 2648. P. 134–143.
14. Hamborg F., Donnay K., Merlo P. NewsMTSC: a dataset for (multi-)target-dependent sentiment classification in political news articles // Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics (ACL). 2021. P. 1663–1675.
15. Кукушкин А. Проект Natasha. Набор качественных открытых инструментов для обработки естественного русского языка (NLP). Хабр:сайт. 2020. URL: https://habr.com/ru/post/516098/ (дата обращения: 17.12.2024).
16. Marcus M. P., Ramshaw L. A. Text chunking using transformation-based learning // Natural language processing using very large corpora. Dordrecht: Springer Netherlands, 1999. P. 157–176.
17. He P., Lui X., Gao J., Chen W. DeBERTa: Decoding-enhanced BERT with Disentangled Attention // arXiv preprint; URL: https://arxiv.org/abs/2006.03654
18. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics. 2–7 June 2019. Minneapolis, Minnesota. V.1 (Long and Short Papers). P. 4171–4186.
19. Bratchikov S., Afanasiev M. ruRoPEBert Sentence Model for Russian language – Hugging Face. Hugging Face: website. 2024. URL: https://huggingface.co/Tochka-AI/ruRoPEBert-e5-base-512 (дата обращения 17.12.2024).
20. Text embeddings by weakly-supervised contrastive pretraining / L. Wang, N. Yang, X. Huang et al. // arXiv preprint; https://arxiv.org/pdf/2212.03533
21. RoFormer: Enhanced transformer with rotary position embedding / J. Su, M. Ahmed, Y. Lu et al. // Neurocomputing. 2024. V. 568. No 127063.
22. Zmitrovich D., Abramov A., Kalmykov A. A family of pretrained transformer language models for Russian // arXiv preprint. 2023; https://arxiv.org/pdf/2309.10931
23. XLM-V: Overcoming the Vocabulary Bottle-neck in Multilingual Masked Language Models / D. Liang, H. Gonen, Y. Mao et al. // arXiv preprint. 2023; https://arxiv.org/pdf/2301.10472

Eng

1. Zhao H., Huang L., Zhang R., et al. (2020). SpanMlt: A Span-based Multi – Task Learning Frame-work for Pairwise Aspect and Opinion Terms Extraction. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 3239 – 3248.
2. Yan Y., W. Zhang B., Ding G., et al. (2024). O2-Bert: Two-Stage Target-Based Sentiment Analysis. Cognitive Computation, 16(1), 158 – 176.
3. Wang R., Lui C., Zhao R., et al. (2022). Post-processing method with aspect term error correction for enhancing aspect term extraction. Applied Intelligence, Vol. 52 14, 15761 – 15763.
4. Gao L., Wang Y., Liu T., et al. (2021). Question-Driven Span Labeling Model for Aspect–Opinion Pair Extraction. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35 14, 12875 – 12883.
5. Vasilyev V. G., Denisenko A. A., Solovyev D. A. (2015). Aspect Extraction and Twitter Sentiment Classification by Fragment Rules. Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, (2).
6. Ivanov V. V., Tutubalina E. V., Mingazov E. R. et al. (2015). Extracting aspects, sentiment, and aspect categories from user reviews of restaurants and cars. Computational linguistics and intelligent technologies. Komp'yuternaya lingvistika i intellektual'nye tekhnologii. Po materialam ezhegodnoy Mezhdunarodnoy konferentsii «Dialog», 14(2). [in Russian language]
7. Mashkin D. O., Kotel'nikov E. V. (2016). Aspect term extraction based on conditional random fields and word embeddings. Trudy ISP RAN, 28(6), 223 – 240. [in Russian language]
8. Golubev A., Rusnachenko N., Loukachevitch N. (2023). RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts. Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, 22.
9. Glazkova A. (2023). Fine-tuning Text Classifi-cation Models for Named Entity Oriented Sentiment Analysis of Russian Texts. Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, 22.
10. Podberezko P., Kaznacheev A., Abdullayeva S., et al. (2023). HAlf-MAsked Model for Named Entity Sentiment analysis. Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, 22.
11. Moloshnikov I., Skorokhodov M., Naumov A., et al. (2023). Named Entity-Oriented Sentiment Analysis with text2text Generation Approach. Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”.
12. Golubev A., Loukachevitch N. (2020). Improving results on Russian sentiment datasets. Conference on artificial intelligence and natural language, 109 – 121. Cham: Springer International Publishing.
13. Naumov A., Rybka R., Sboev A. et al. (2020). Neural-network method for determining text author’s sentiment to an aspect specified by the named entity. CEUR Workshop Proceedings, 2648, 134 – 143.
14. Hamborg F., Donnay K., Merlo P. (2021). NewsMTSC: a dataset for (multi-)target-dependent sentiment classification in political news articles. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 1663–1675. Stroudsburg: Association for Computational Linguistics (ACL).
15. Kukushkin A. (2020). Natasha Project. A set of high-quality open-source tools for Russian natural language processing (NLP). Habr: site. Retrieved from https://habr.com/ru/post/516098/ [in Russian language]
16. Marcus M. P., Ramshaw L. A. (1999). Text chunking using transformation-based learning. Natural language processing using very large corpora, 157–176. Dordrecht: Springer Netherlands.
17. He P., Lui X., Gao J., Chen W. (2020). DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv preprint. Retrieved from https://arxiv.org/abs/2006.03654
18. Devlin J., Chang M.-W., Lee K., Toutanova K. (2019). BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (1), 4171 – 4186. Minneapolis: Association for Computational Linguistics.
19. Bratchikov S., Afanasiev M. (2024). ru-RoPEBert Sentence Model for Russian language – Hugging Face. Hugging Face: website. Retrieved from https://huggingface.co/Tochka-AI/ruRoPEBert-e5-base-512
20. Wang L., Yang N., Huang X., et al. (2022). Text embeddings by weakly-supervised contrastive pretraining. arXiv preprint. Retrieved from https://arxiv.org/pdf/2212.03533
21. Su J., Ahmed M., Lu Y., et al. (2024). RoFormer: Enhanced transformer with rotary position embedding. Neurocomputing, Vol. 568 127063.
22. Zmitrovich D., Abramov A., Kalmykov A. (2023). A family of pretrained transformer language models for Russian. arXiv preprint. Retrieved from https://arxiv.org/pdf/2309.10931
23. Liang D., Gonen H., Mao Y., et al. (2023). XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models. arXiv preprint. Retrieved from https://arxiv.org/pdf/2301.10472.

+ - Заказать электронную версию статьи (Purchase digital version of a single article) Click to collapse

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 700 руб. (в том числе НДС 20%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа скопируйте doi статьи:

10.14489/vkit.2025.03.pp.050-056

и заполните форму

Отправляя форму вы даете согласие на обработку персональных данных.

Eng

This article is available in electronic format (PDF).

The cost of a single article is 700 rubles. (including VAT 20%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please copy the article doi:

10.14489/vkit.2025.03.pp.050-056

and fill out the form