10.14489/vkit.2026.05.pp.012-021

DOI: 10.14489/vkit.2026.05.pp.012-021

Шигин В. В.
ПРИМЕНЕНИЕ АДВЕРСАРИАЛЬНЫХ АТАК ДЛЯ ОЦЕНКИ УСТОЙЧИВОСТИ МОДЕЛЕЙ КЛАССИФИКАЦИИ СПАМ-КЛАССИФИКАТОРОВ
(с. 12-21)

Аннотация. Рассмотрена проблема оценки устойчивости современных моделей обработки текстовых сообщений, используемых в составе спам-классификаторов, к адверсариальным атакам. Актуальность исследования обусловлена ростом мошеннических и искаженных сообщений, способных нанести значительный ущерб информационной безопасности. Дан анализ моделей BERT, BiLSTM и XGBoost, обученных на наборе данных Enron-Spam. Представлены различные адверсариальные методы изменения текста, начиная от частичной замены символов и вставки искажений до перестановки слов. Рассмотрено влияние преобразований каждого вида по метрикам Accuracy и F1-score, а также указаны сильные и слабые стороны исследуемых моделей. Результаты экспериментов показали, что BERT и BiLSTM обладают повышенной устойчивостью к большинству атак, в то время как модель XGBoost, использующая TF-IDF-признаки, более чувствительна к изменениям лексики и структуры текста.

Ключевые слова: адверсариальные атаки; классификация текстов; спам; информационная безопасность.

Shigin V. V.
USING ADVERSARIAL ATTACKS TO EVALUATE THE ROBUSTNESS OF SPAM CLASSIFICATION MODELS
(pp. 12-21)

Abstract. This study examines the robustness of modern text classification models, particularly for spam filtering, against adversarial attacks. Spam messages, which contribute to risks like phishing, fraud, and user time loss, are a growing concern in information security. As spam filtering relies heavily on machine learning and natural language processing (NLP) techniques, attackers have developed adversarial strategies to subtly modify spam content, making it harder for classifiers to detect. This research evaluates three widely used models – BERT, BiLSTM, and XGBoost-trained on the Enron-Spam dataset under various adversarial conditions. These include partial character replacements, the insertion of random noise, and word order swapping. The study uses accuracy and F1-score to assess the models performance in these adversarial scenarios. Results indicate that BERT and BiLSTM are more robust, demonstrating only minor performance degradation when subjected to most types of attacks. In contrast, XGBoost, which relies on TF-IDF features, is more susceptible to changes in word frequency and syntactic structure. The research shows that while BERT and BiLSTM advanced architectures allow them to maintain high accuracy even with subtle alterations, XGBoost struggles with manipulations that disrupt the frequency-based features it depends on. The findings suggest that BERT and BiLSTM are better suited to handle adversarial perturbations that do not significantly alter the message's structure. The study highlights the need for continuous research into adversarial defense strategies for spam classifiers. It emphasizes the importance of further exploring model architectures and training methodologies that can better handle subtle text manipulations.

Keywords: Adversarial attacks; Text classification; Spam; Information security.

+ - Информация об авторах (About the Authors) Click to collapse

Рус

В. В. Шигин (Московский государственный технологический университет «Станкин», Москва, Россия) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Eng

V. V. Shigin (Moscow State Technological University Stankin, Moscow, Russia) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

+ - Библиографический список (References) Click to collapse

Рус

1. A comprehensive examination of email spoofing: Issues and prospects for email security / Sethuraman S. Chakkaravarthy, Priya V. S. Devi, T. Reddi et al. // Computers & Security. 2024. V. 137, Art. 103600. DOI: 10.1016/j.cose.2023.103600
2. Patil K., Arra S. R. Detection of Phishing and User Awareness Training in Information Security: A Systematic Literature Review, 2022. DOI: 10.1109/ICIPTM54933.2022.9753912
3. APWG. Spam and Phishing in Modern Email Traffic: Challenges, Trends, and Solutions. 3rd Quarter 2024 (https://apwg.org/trendsreports)
4. Email Spam: A Comprehensive Review of Optimize Detection Methods, Challenges, and Open Research Problems / E. H. Tusher, M. A. Ismail, M. A. Rahman et al. // IEEE Access. 2024. V. 12. DOI: 10.1109/ACCESS.2024.3467996
5. Ebrahimi J., Rao A., Lowd D., Dou D. HotFlip: White-Box Adversarial Examples for Text Classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Melbourne, Australia, 15–20 July 2018. P. 31–36. DOI: 10.18653/v1/P18-2006
6. Metsis V., Androutsopoulos I., Paliouras G. Spam Filtering with Naive Bayes – Which Naive Bayes? // Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006). Mountain View, CA, USA, 27–28 July 2006
7. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota, USA, 2–7 June 2019. P. 4171–4186. DOI: 10.18653/v1/N19-1423
8. Sahmoud T., Mikki M. Spam Detection Using BERT. arXiv preprint, 2022. arXiv:2206.02443
9. Siami-Namini S., Tavakoli N., Siami Namin A. The Performance of LSTM and BiLSTM in Forecasting Time Series // 2019 IEEE International Conference on Big Data (Big Data). Los Angeles, CA, USA, 9–12 December 2019. P. 3285–3292. DOI: 10.1109/BigData47090.2019.9005997
10. Xie J., Chen B., Gu X., Liang F., Xu X. Self-Attention-Based BiLSTM Model for Short Text Fine-Grained Sentiment Classification. IEEE Access, 2019, vol. 7, pp. 180558–180570. DOI: 10.1109/ACCESS.2019.2957510
11. Ali Z. A., Abduljabbar Z. H., Tahir H., Sallow A. B. Exploring the Power of eXtreme Gradient Boosting Algorithm in Machine Learning: a Review // Academic Journal of Nawroz University. 2023. V. 12, No. 2. P. 320–334. DOI: 10.25007/ajnu.v12n2a1612
12. Haumahu J. P., Permana S. D. H., Yaddarabullah Y. Fake news classification for Indonesian news using Extreme Gradient Boosting (XGBoost). In: IOP Conference Series: Materials Science and Engineering. vol. 1098. Bandung, Indonesia, 20–21 April 2020. DOI: 10.1088/1757-899X/1098/5/052081
13. Joy J., Selvan M. P. A comprehensive study on the performance of different Multi-class Classification Algorithms and Hyperparameter Tuning Techniques using Optuna // 2022 International Conference on Computing, Communication, Security and Intelligent Systems (IC3SIS 2022). Kochi, India, 23–25 June 2022. P. 1–5. DOI: 10.1109/IC3SIS54991.2022.9885695
14. Li J., Jia R., He H., Liang P. TextBugger: Generating Adversarial Text Against Real-world Applications // 26th Annual Network and Distributed System Security Symposium (NDSS). San Diego, CA, USA, 24–27 February 2019. arXiv:1812.05271
15. Morris J., Lifland E., Lanchantin J., Ji Y., Qi Y. Reevaluating Adversarial Examples in Natural Language. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Online, 16–20 November 2020. P. 3829–3839. DOI: 10.18653/v1/2020.findings-emnlp.341
16. Jin D., Jin Z., Zhou J. T., Szolovits P. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment // arXiv preprint, 2019. arXiv:1907.11932
17. Samanta S., Mehta S. Towards Crafting Text Adversarial Samples // arXiv preprint, 2017. arXiv: 1707.02812
18. Generating Natural Language Adversarial Examples / M. Alzantot, Y. Sharma, A. Elgohary, et al. // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, 31 October – 4 November 2018. P. 2890–2896. DOI: 10.18653/v1/D18-1316
19. Attention Is All You Need / A. Vaswani, N. Shazeer, N. Parmar et al. // arXiv preprint, 2023. arXiv: 1706.03762

Eng

1. Chakkaravarthy, S. S., Devi, P. V. S., Reddi, T., et al. (2024). A comprehensive examination of email spoofing: Issues and prospects for email security. Computers & Security, 137, Article 103600. https://doi.org/10.1016/j.cose.2023.103600
2. Patil, K., & Arra, S. R. (2022). Detection of phishing and user awareness training in information security: A systematic literature review. https://doi.org/10.1109/ICIPTM54933.2022.9753912
3. APWG. (2024). Spam and phishing in modern email traffic: Challenges, trends, and solutions (3rd Quarter 2024). https://apwg.org/trendsreports
4. Tusher, E. H., Ismail, M. A., Rahman, M. A., et al. (2024). Email spam: A comprehensive review of optimize detection methods, challenges, and open research problems. IEEE Access, 12. https://doi.org/10.1109/ACCESS.2024.3467996
5. Ebrahimi, J., Rao, A., Lowd, D., & Dou, D. (2018). HotFlip: White box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 31–36). https://doi.org/10.18653/v1/P18-2006
6. Metsis, V., Androutsopoulos, I., & Paliouras, G. (2006). Spam filtering with Naive Bayes – Which Naive Bayes? In Proceedings of the 3rd Conference on Email and Anti Spam (CEAS 2006).
7. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423
8. Sahmoud, T., & Mikki, M. (2022). Spam detection using BERT. arXiv preprint. arXiv:2206.02443
9. Siami Namini, S., Tavakoli, N., & Siami Namin, A. (2019). The performance of LSTM and BiLSTM in forecasting time series. In 2019 IEEE International Conference on Big Data (Big Data) (pp. 3285–3292). https://doi.org/10.1109/BigData47090.2019.9005997
10. Xie, J., Chen, B., Gu, X., Liang, F., & Xu, X. (2019). Self attention based BiLSTM model for short text fine grained sentiment classification. IEEE Access, 7, 180558–180570. https://doi.org/10.1109/ACCESS.2019.2957510
11. Ali, Z. A., Abduljabbar, Z. H., Tahir, H., & Sallow, A. B. (2023). Exploring the power of eXtreme Gradient Boosting algorithm in machine learning: A review. Academic Journal of Nawroz University, 12(2), 320–334. https://doi.org/10.25007/ajnu.v12n2a1612
12. Haumahu, J. P., Permana, S. D. H., & Yaddarabullah, Y. (2020). Fake news classification for Indonesian news using Extreme Gradient Boosting (XGBoost). In IOP Conference Series: Materials Science and Engineering, 1098. https://doi.org/10.1088/1757-899X/1098/5/052081
13. Joy, J., & Selvan, M. P. (2022). A comprehensive study on the performance of different multiclass classification algorithms and hyperparameter tuning techniques using Optuna. In 2022 International Conference on Computing, Communication, Security and Intelligent Systems (IC3SIS 2022) (pp. 1–5). https://doi.org/10.1109/IC3SIS54991.2022.9885695
14. Li, J., Jia, R., He, H., & Liang, P. (2019). TextBugger: Generating adversarial text against real world applications. arXiv preprint. arXiv:1812.05271
15. Morris, J., Lifland, E., Lanchantin, J., Ji, Y., & Qi, Y. (2020). Reevaluating adversarial examples in natural language. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 3829–3839). https://doi.org/10.18653/v1/2020.findings-emnlp.341
16. Jin, D., Jin, Z., Zhou, J. T., & Szolovits, P. (2019). Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. arXiv preprint. arXiv:1907.11932
17. Samanta, S., & Mehta, S. (2017). Towards crafting text adversarial samples. arXiv preprint. arXiv:1707.02812
18. Alzantot, M., Sharma, Y., Elgohary, A., et al. (2018). Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 2890–2896). https://doi.org/10.18653/v1/D18-1316
19. Vaswani, A., Shazeer, N., Parmar, N., et al. (2023). Attention is all you need. arXiv preprint. arXiv:1706.03762

+ - Заказать электронную версию статьи (Purchase digital version of a single article) Click to collapse

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 700 руб. (в том числе НДС 20%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа скопируйте doi статьи:

10.14489/vkit.2026.05.pp.012-021

и заполните форму

Отправляя форму вы даете согласие на обработку персональных данных.

Eng

This article is available in electronic format (PDF).

The cost of a single article is 700 rubles. (including VAT 20%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please copy the article doi:

10.14489/vkit.2026.05.pp.012-021

and fill out the form