10.14489/vkit.2021.02.pp.024-038

DOI: 10.14489/vkit.2021.02.pp.024-038

Нгуен Т. Вьет, Зыонг К. Х. Ту, Кравец А. Г.
АНАЛИЗ И ПРОГНОЗ ТЕНДЕНЦИЙ ИСПОЛЬЗОВАНИЯ ТЕРМИНОВ В КОМПЬЮТЕРНЫХ НАУКАХ НА ОСНОВЕ НЕЙРОСЕТЕВЫХ МОДЕЛЕЙ
(c. 24-38)

Аннотация. Рассмотрен статистический анализ текстов из цифровой библиотеки arXiv.org для выявления наиболее часто встречающихся терминов – биграмм и триграмм – в области компьютерных наук. Исследованы три архитектуры: полносвязная нейронная сеть, сверточная нейронная сеть, рекуррентная нейронная сеть с длительной кратковременной памятью. После оптимизации гиперпараметров обучена полносвязная нейронная сеть, показавшая лучшие среднеквадратичные оценки. Получены результаты прогнозирования тенденций использования терминов в области компьютерных наук в перспективе ближайших трех лет. Темы, связанные с машинным обучением в целом, обучением с подкреплением и рекуррентными нейронными сетями в частности, активно развиваются. Возможность заранее предсказывать научные тенденции потенциально может революционизировать методы работы в науке, например, позволяя финансирующим агентствам оптимизировать распределение ресурсов в перспективных областях исследований.

Ключевые слова: прогнозирование тренда; компьютерные термины; нейронная сеть; сверточная нейронная сеть; длительная кратковременная память; гиперпараметрическая оптимизация; arXiv.org.

Nguyen T. V., Duong Q. H. T., Kravets A. G.
ANALYSIS AND PREDICTION OF TRENDS IN THE USE OF TERMS IN COMPUTER SCIENCE BASED ON NEURAL NETWORK MODELS
(pp. 24-38)

Abstract. The widespread use of information and communication technologies, database technologies and the Internet has led to the development of specialized digital libraries. These digital libraries serve a huge number of different users and play an important role as repositories and providers of information and knowledge. Therefore, the automatic extraction of useful information from texts stored in digital libraries is becoming an increasingly important research topic in the field of data mining. The article discusses the statistical analysis of texts in the digital library arXiv.org to identify the most common terms, bigrams and trigrams. After the hyper-parameters optimization process of neural network models, the trend prediction results in the use of terms in the field of computer sciences are presented. By analyzing statistics and predicting usage frequency of bigram and trigram terms our findings provide evidence that papers concerned with machine learning, reinforcement learning, generative adversarial network, convolutional neural network and recurrent neural network can be seen as main future research trend in Computer science in the next 3 years. Moreover, topics related to will experience a sudden increase in usage frequency. Being able to predict scientific trends in advance could potentially revolutionize the way science is done, for instance, by enabling funding agencies to optimize allocation of resources towards promising research areas.

Keywords: Research trend forecasting; Computer terms; Neural network; CNN (Convolutional Neural Network); LSTM (Long Short-Term Memory); Hyperparametric optimization; arXiv.org.

+ - Информация об авторах (About the Authors) Click to collapse

Рус

Т. Вьет Нгуен, К. Х. Ту Зыонг (Волгоградский государственный технический университет, Волгоград, Россия)
А. Г. Кравец (Волгоградский государственный технический университет, Волгоград, Россия; Государственный университет «Дубна», Московская область, Дубна, Россия) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Eng

T. V. Nguyen, Q. H. T. Duong (Volgograd State Technical University, Volgograd, Russia)
A. G. Kravets (Volgograd State Technical University, Volgograd, Russia; Dubna State University, Moscow region, Dubna, Russia) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

+ - Библиографический список (References) Click to collapse

Рус

1. Kleinberg J. Bursty and Hierarchical Structure in Streams // Data Mining and Knowledge Discovery. 2003. V. 7, No. 4. P. 373 – 397.
2. Mane K. K., Borner K. Mapping Topics and Topic Bursts in PNAS // Proc. of the National Academy of Sciences of the United States of America. 2004. V. 101. P. 5287 – 5290.
3. Guo H. N., Weingart S., Borner K. Mixed-Indicators Model for Identifying Emerging Research Areas // Scientometrics. 2011. V. 89, No. 1. P. 421 – 435.
4. Chen C. M. Cite Space II: Detecting and Visualizing Emerging Trends and Transient Patterns in Scientific Literature // Journal of the American Society for Information Science and Technology. 2006. V. 57, No. 3. P. 359 – 377.
5. Evaluation of Research Trends in Knowledge Management: A Hybrid Analysis through Burst Detection and Text Clustering / B. Sohrabi et al. // Journal of Information & Knowledge Management. 2019. V. 18, No. 4. P. 1950043. DOI: 10.1142/S0219649219500436
6. Towards an Explanatory and Computational Theory of Scientific Discovery / C. M. Chen et al. // Journal of Informetrics. 2009. V. 3, No. 3. P. 191 – 209. DOI: 10.1016/j.joi.2009.03.004
7. Nivash J. P., Dhinesh Babu L. D. Analyzing the Impact of News Trends on Research Publications and Scientific Collaboration Networks // Concurrency and Computation Practice & Experience. 2019. V. 31, No. 14. P. e5058. DOI: 10.1002/cpe.5058
8. Nguyen T. V., Kravets A. G. Analyzing Recent Research Trends of Computer Science from Academic Open-access Digital Library // 8th Intern. Conf. on System Modeling and Advancement in Research Trends (SMART–2019), Moradabad, India, 22–23 November, 2019. New Delhi (India), 2019. P. 31 – 36.
9. Нгуен Т. В., Кравец А. Г. Алгоритм работы веб-краулера для решения задачи сбора данных из открытых интернет-источников // Известия Санкт-Петербургского гос. технологического ин-та (технического ун-та). 2019. № 51(77). C. 115 – 119. DOI: 10.36807/1998-9849-2019-51-77-115-119
10. Computer Science [Электронный ресурс]. URL: https://arxiv.org/archive/cs (дата обращения: 15.12.2019).
11. Using Machine Learning to Predict the Evolution of Physics Research / W. Liu et al. 2018. ArXiv, abs/1810.12116.
12. Dempsey W., Oselio B., Hero A. Hierarchical Network Models for Structured Exchangeable Interaction Processes. 2019. ArXiv, abs/1901.09982.
13. Mistele T., Price T., Hossenfelder S. Predicting Authors’ Citation Counts and h-Indices with a Neural Network // Scientometrics. 2019. V. 120, No. 1. P. 87 – 104. DOI: 10.1007/s11192-019-03110-2
14. Top Websites Ranking [Электронный ресурс]. URL: https://www.similarweb.com/top-websites/ category/science-and-education/public-records-and-direc¬tories (дата обращения: 20.07.2020).
15. Keras: библиотека глубокого обучение на Python [Электронный ресурс]. URL: https://ru-keras.com/home (дата обращения: 02.07.2020).
16. Нейронные сети, перцептрон [Электронный ресурс]. URL: https://neerc.ifmo.ru/wiki/index.php?title= Нейронные_сети,_перцептрон (дата обращения: 02.07.2020).
17. The Study of Neural Networks Effective Architectures for Patents Images Processing / A. G. Kravets et al. // Creativity in Intelligent Technologies and Data Science (CIT&DS-2019), September 16 – 19, 2019, Volgograd, Russia. 2019. V. 1084. P. 27 – 41. (Ser. Communications in Computer and Information Science). DOI: 10.1007/978-3-030-29750-3_3
18. Сверточные нейронные сети [Электронный ресурс]. URL: https://neerc.ifmo.ru/wiki/index.php?title= Сверточные_нейронные_сети (дата обращения: 02.07.2020).
19. Долгая краткосрочная память [Электронный ресурс]. URL: https://neerc.ifmo.ru/wiki/index.php? title=Долгая_краткосрочная_память (дата обращения: 02.07.2020).

Eng

1. Kleinberg J. (2003). Bursty and Hierarchical Structure in Streams. Data Mining and Knowledge Discovery, Vol. 7, (4), pp. 373 – 397.
2. Mane K. K., Borner K. (2004). Mapping Topics and Topic Bursts in PNAS. Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, pp. 5287 – 5290.
3. Guo H. N., Weingart S., Borner K. (2011). Mixed-Indicators Model for Identifying Emerging Research Areas. Scientometrics, Vol. 89, (1), pp. 421 – 435.
4. Chen C. M. (2006). Cite Space II: Detecting and Visualizing Emerging Trends and Transient Patterns in Scientific Literature. Journal of the American Society for Information Science and Technology, Vol. 57, (3), pp. 359 – 377.
5. Sohrabi B. et al. (2019). Evaluation of Research Trends in Knowledge Management: A Hybrid Analysis through Burst Detection and Text Clustering. Journal of Information & Knowledge Management, Vol. 18, (4). DOI: 10.1142/S0219649219500436
6. Chen C. M. et al. (2009). Towards an Explanatory and Computational Theory of Scientific Discovery. Journal of Informetrics, Vol. 3, (3), pp. 191 – 209. DOI: 10.1016/j.joi.2009.03.004
7. Nivash J. P., Dhinesh Babu L. D. (2019). Analyzing the Impact of News Trends on Research Publications and Scientific Collaboration Networks. Concurrency and Computation Practice & Experience, Vol. 31, 14. DOI: 10.1002/cpe.5058
8. Nguyen T. V., Kravets A. G. (2019). Analyzing Recent Research Trends of Computer Science from Academic Open-access Digital Library. 8th International Conference on System Modeling and Advancement in Research Trends (SMART–2019), pp. 31 – 36. Moradabad.
9. Nguen T. V., Kravets A. G. (2019). Algorithm of a web crawler to solve the problem of collecting data from open Internet sources. Izvestiya Sankt-Peterburgskogo gosudarstvennogo tekhnologicheskogo instituta (tekhnicheskogo universiteta), 77(51), pp. 115 – 119. [in Russian language] DOI: 10.36807/1998-9849-2019-51-77-115-119
10. Computer Science. Available at: https://arxiv.org/archive/cs (Accessed: 15.12.2019).
11. Liu W. et al. (2018). Using Machine Learning to Predict the Evolution of Physics Research. ArXiv, abs/1810.12116.
12. Dempsey W., Oselio B., Hero A. (2019). Hier-archical Network Models for Structured Exchangeable Interaction Processes. ArXiv, abs/1901.09982.
13. Mistele T., Price T., Hossenfelder S. (2019). Predicting Authors’ Citation Counts and h-Indices with a Neural Network. Scientometrics, Vol. 120, (1), pp. 87 – 104. DOI: 10.1007/s11192-019-03110-2
14. Top Websites Ranking. Available at: https://www.similarweb.com/top-websites/category/scie-nce-and-education/public-records-and-directories (Accessed: 20.07.2020).
15. Keras: a Python deep learning library. Available at: https://ru-keras.com/home (Accessed: 02.07.2020). [in Russian language]
16. Neural networks, perceptron. Available at: https://neerc.ifmo.ru/wiki/index.php?title=Нейронные_сети,_перцептрон (Accessed: 02.07.2020). [in Russian language]
17. Kravets A. G. et al. (2019). The Study of Neural Networks Effective Architectures for Patents Images Processing. Creativity in Intelligent Technologies and Data Science (CIT&DS-2019), Vol. 1084, pp. 27 – 41. Volgograd. DOI: 10.1007/978-3-030-29750-3_3
18. Convolutional neural networks. Available at: https://neerc.ifmo.ru/wiki/index.php?title=Сверточные_нейронные_сети (Accessed: 02.07.2020). [in Russian language]
19. Long short term memory. Available at: https://neerc.ifmo.ru/wiki/index.php?title=Долгая_краткосрочная_память (Accessed: 02.07.2020). [in Russian language]

+ - Заказать электронную версию статьи (Purchase digital version of a single article) Click to collapse

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 450 руб. (в том числе НДС 18%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа скопируйте doi статьи:

10.14489/vkit.2021.02.pp.024-038

и заполните форму

Отправляя форму вы даете согласие на обработку персональных данных.

Eng

This article is available in electronic format (PDF).

The cost of a single article is 450 rubles. (including VAT 18%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please copy the article doi:

10.14489/vkit.2021.02.pp.024-038

and fill out the form