10.14489/vkit.2026.03.pp.050-056

DOI: 10.14489/vkit.2026.03.pp.050-056

Шумихина А. С., Арабов М. К.
ПРОГНОЗИРОВАНИЕ И КЛАССИФИКАЦИЯ КИБЕРУГРОЗ НА ОСНОВЕ АНАЛИЗА СЕТЕВОГО ТРАФИКА С ИСПОЛЬЗОВАНИЕМ АНСАМБЛЕВЫХ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ
(c. 50-56)

Аннотация. Описано применение методов машинного обучения для анализа сетевого трафика и классификации кибератак в целях разработки эффективной методологии для выявления сетевых аномалий. Методы включают предобработку данных (очистка, нормализация, обработка пропусков и выбросов), исследовательский анализ, статистическую проверку гипотез и отбор информативных признаков. Для классификации использованы ансамблевые алгоритмы Random Forest, XGBoost и LightGBM. Результаты показывают высокие точность, полноту и F1-меру в идентификации различных типов атак. Предложенный подход может применяться для создания более устойчивых и адаптивных систем обнаружения вторжений.

Ключевые слова: кибербезопасность; прогнозирование киберугроз; обнаружение вторжений; машинное обучение; классификация атак; анализ сетевого трафика.

Shumikhina A. S., Arabov M. K.
FORECASTING AND CLASSIFICATION OF CYBER THREATS BASED ON NETWORK TRAFFIC ANALYSIS USING ENSEMBLE MACHINE LEARNING METHODS
(pp. 50-56)

Abstract. This research applies machine learning techniques to network traffic analysis for cyber-attack classification and anomaly detection, aiming to create an effective methodology amid growing cybersecurity challenges. Traditional signature-based detection methods often fail against novel and evolving threats, prompting the need for advanced behavioral pattern recognition in large-scale network data. The study uses the CICIDS2017 dataset, renowned for its realism in intrusion detection research, with a focus on DDoS attacks like LOIC simulated during peak working hours. The methodology starts with thorough data preparation: cleaning, normalization, missing value imputation, and outlier handling to ensure model robustness. Exploratory analysis and statistical hypothesis testing using Shapiro-Wilk tests, Q-Q plots, chi-square criteria, and Welch's t-tests confirm non-normal distributions and significant factors like protocol type, payload size, and port vulnerabilities influencing attack probability. Feature selection combines correlation analysis, mutual information scoring, and LightGBM embedded importance, reducing 62 attributes to 37 ones while retaining essential information. Classification employs ensemble algorithms – Random Forest, XGBoost, and LightGBM – trained on stratified splits to address class imbalance. Evaluation metrics include precision, recall, F1-score, macro and weighted averages, alongside confusion matrices for error assessment. Findings reveal high performance across models, with LightGBM excelling: accuracy reaches 1.00, F1-scores hit 1.00 for rare SQL injection attacks and 0.74 for brute force, outperforming logistic regression and decision trees on imbalanced data. Random Forest provides stable results for common threats, confirming ensemble methods' superiority in handling high-dimensional traffic. This approach enables resilient intrusion detection systems, cutting recovery costs and boosting proactive defense. Limitations include dataset specificity; future extensions could involve broader datasets like CSE-CIC-IDS2018 and deep learning integration for unknown threats.

Keywords: Cybersecurity; Cyber threat prediction; Intrusion detection; Machine learning; Attack classification; Network traffic analysis.

+ - Информация об авторах (About the Authors) Click to collapse

Рус

А. С. Шумихина, М. К. Арабов (Казанский (Приволжский) федеральный университет, Казань, Россия) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Eng

A. S. Shumikhina, M. K. Arabov (Kazan (Volga region) Federal University, Kazan, Russia) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

+ - Библиографический список (References) Click to collapse

Рус

1. Morgan S. The Full Story on Cybersecurity [Электронный ресурс] // Cybersecurity Ventures. 2023. URL: https://cybersecurityventures.com/the-full-story-on-cybersecurity (дата обращения: 03.07.2025).
2. Machine Learning and Deep Learning Methods for Cybersecurity / Y. Xin, L. Kong, Z. Liu et al. // IEEE Access. 2018. V. 6. P. 35365–35381. DOI: 10.1109/ACCESS.2018.2836950
3. Козлов А. В., Иванов П. С., Смирнова Е. К., Петров Д. Л. Многоклассовая классификация сетевых атак на основе ансамблевых методов машинного обучения // Труды Томского государственного университета. Системы управления и радиоэлектроника. 2019. Т. 5, № 1. С. 107–115.
4. Арабов М. К., Ихсен З. Прогнозирование временных рядов продаж недвижимости с использованием модели Transformers: на примере Республики Татарстан // Научно-технический вестник Поволжья. 2025. № 5. С. 206–208.
5. Arabov M. K., Nazipova A. F., Burnashev R. A. Algorithm Application of Machine Learning Algorithms to Predict Energy Demand // Proceedings – 2024 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM 2024). May 20–24, 2024. Sochi, Russia. P. 386–391.
6. Арабов М. К., Седых В. В. Прогнозирование динамики финансовых временных рядов с помощью методов автоматизированного машинного обучения: случай российского рубля // Научно-технический вестник Поволжья. 2025. № 6. С. 199–201.
7. Sharafaldin I., Lashkari A. H., Ghorbani A. A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization // Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018). Funchal, Madeira, Portugal, 22–24 January 2018. P. 108–116.
8. Stallings W., Brown L. Computer Security: Principles and Practice. 3rd ed. Boston: Pearson Education, 2014. 512 p.
9. Lewis J. A., Baker S., McCabe K. E., Morgus R. Economic Impact of Cybercrime – No Slowing Down [Электронный ресурс] // Center for Strategic and International Studies (CSIS). 2018. URL: https://www.csis.org/analysis/economic-impact-cybercrime (дата обращения: 03.07.2025).
10. Axelsson S. The Base-Rate Fallacy and the Difficulty of Intrusion Detection // ACM Transactions on Information and System Security (TISSEC). 2000. V. 3, No. 3. P. 186–205.
11. Шайдуллина А. Н., Арабов М. К. Идентификация автора поста в социальных сетях для борьбы с дезинформацией: подход на основе LSTM // Новые технологии высшей школы = New Technologies in Higher Education : материалы Всероссийской научно-практической конференции. Москва, Россия, 17–24 февраля 2025 г. С. 611–614.
12. Wang S., Miao Q., Liu A., Chen J., Zhang Y. Machine Learning in Network Anomaly Detection: A Survey // IEEE Communications Surveys & Tutorials. 2021. V. 23, No. 4. P. 2357–2385.
13. Network Intrusion Detection System Using XGBoost and Random Forest / N. AlHosni, P. J. Mani, L. Jovanovic et al. // Asian Journal of Pure and Applied Mathematics. 2024. V. 5, No. 1. P. 321–335.
14. CIC IDS2017 Dataset // Hugging Face [Электронный ресурс]. URL: https://huggingface.co/datasets/c01dsnap/CIC-IDS2017 (дата обращения: 15.08.2025).

Eng

1. Morgan, S. (2023). The full story on cybersecurity. Cybersecurity Ventures. https://cybersecurityventures.com/the-full-story-on-cybersecurity
2. Xin, Y., Kong, L., Liu, Z., Chen, Y., Li, Y., Zhu, H., Gao, M., Hou, H., & Wang, C. (2018). Machine learning and deep learning methods for cybersecurity. IEEE Access, 6, 35365–35381. https://doi.org/10.1109/ACCESS.2018.2836950
3. Kozlov, A. V., Ivanov, P. S., Smirnova, E. K., & Petrov, D. L. (2019). Multiclass classification of network attacks based on ensemble machine learning methods. Trudy Tomskogo gosudarstvennogo universiteta. Sistemy upravleniia i radioelektronika, 5(1), 107–115. [in Russian language].
4. Arabov, M. K., & Ikhtiyarovna, I. Z. (2025). Forecasting real estate sales time series using Transformers model: The case of the Republic of Tatarstan. Nauchno-tekhnicheskii vestnik Povolzh'ia, 5, 206–208. [in Russian language].
5. Arabov, M. K., Nazipova, A. F., & Burnashev, R. A. (2024, May 20–24). Algorithm application of machine learning algorithms to predict energy demand [Conference session]. 2024 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM), Sochi, Russian Federation.
6. Arabov, M. K., & Sedykh, V. V. (2025). Forecasting the dynamics of financial time series using automated machine learning methods: The case of the Russian ruble. *Nauchno-tekhnicheskii vestnik Povolzh'ia, 6*, 199–201. [in Russian language].
7. Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018, January 22–24). Toward generating a new intrusion detection dataset and intrusion traffic characterization [Conference session]. 4th International Conference on Information Systems Security and Privacy, Funchal, Madeira, Portugal.
8. Stallings, W., & Brown, L. (2014). Computer security: Principles and practice (3rd ed.). Pearson Education.
9. Lewis, J. A., Baker, S., McCabe, K. E., & Morgus, R. (2018). Economic impact of cybercrime – No slowing down. Center for Strategic and International Studies (CSIS). https://www.csis.org/analysis/economic-impact-cybercrime
10. Axelsson, S. (2000). The base-rate fallacy and the difficulty of intrusion detection. ACM Transactions on Information and System Security, 3(3), 186–205. https://doi.org/10.1145/357830.357849
11. Shaidullina, A. N., & Arabov, M. K. (2025). Author identification of social media posts to combat misinformation: An LSTM-based approach. In Proceedings of the All-Russian Scientific and Practical Conference "New Technologies in Higher Education" (pp. 611–614). [in Russian language]
12. Wang, S., Miao, Q., Liu, A., Chen, J., & Zhang, Y. (2021). Machine learning in network anomaly detection: A survey. IEEE Communications Surveys & Tutorials, 23(4), 2357–2385. https://doi.org/10.1109/COMST.2021.3108105
13. AlHosni, N., Mani, P. J., Jovanovic, L., Antonijevic, M., Gashi, M., & Bhardwaj, A. (2024). Network intrusion detection system using XGBoost and random forest. Asian Journal of Pure and Applied Mathematics, 5(1), 321–335.
14. CIC IDS2017 Dataset. (n.d.). Hugging Face. Retrieved August 15, 2025, from https://huggingface.co/datasets/c01dsnap/CIC-IDS2017

+ - Заказать электронную версию статьи (Purchase digital version of a single article) Click to collapse

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 700 руб. (в том числе НДС 20%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа скопируйте doi статьи:

10.14489/vkit.2026.03.pp.050-056

и заполните форму

Отправляя форму вы даете согласие на обработку персональных данных.

Eng

This article is available in electronic format (PDF).

The cost of a single article is 700 rubles. (including VAT 20%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please copy the article doi:

10.14489/vkit.2026.03.pp.050-056

and fill out the form