| Русский Русский | English English |
   
Главная Current Issue
18 | 03 | 2026
10.14489/vkit.2026.03.pp.044-049

DOI: 10.14489/vkit.2026.03.pp.044-049

Куриленко С. М.
ГИБРИДНЫЙ МЕТОД ЗАЩИЩЕННОГО СЕМАНТИЧЕСКОГО ПОИСКА НА ОСНОВЕ ГОМОМОРФНОГО ШИФРОВАНИЯ И СЛУЧАЙНЫХ ПРОЕКЦИЙ
(c. 44-49)

Аннотация. Рассмотрена проблема обеспечения конфиденциальности данных в системах семантического поиска на основе векторных эмбеддингов. Из векторных эмбеддингов возможно восстановление исходного текста с высокой точностью, а это создает риски утечки конфиденциальной информации при компрометации базы эмбеддингов. Существующие решения с использованием полного гомоморфного шифрования обеспечивают криптографическую защиту, но демонстрируют неприемлемую для практического применения производительность. Предложен новый гибридный метод, использующий случайные проекции для защиты векторной базы данных в сочетании с CKKS-шифрованием запросов и двухуровневым реранкингом. Разработана математическая модель, формализующая гарантии безопасности случайных проекций и доказывающая, что они создают информационно-теоретический барьер для восстановления исходного текста.

Ключевые слова:  гомоморфное шифрование; семантический поиск; обеспечение конфиденциальности; векторные эмбеддинги; восстановление текста; случайные проекции; приближенный поиск ближайших соседей.


Kurilenko S. M.
HYBRID METHOD FOR PRIVACY-PRESERVING SEMANTIC SEARCH BASED ON HOMOMORPHIC ENCRYPTION AND RANDOM PROJECTIONS
(pp. 44-49)

Abstract. Modern semantic search systems built on vector embeddings face a critical security challenge: recent embedding inversion attacks have demonstrated that attackers can reconstruct most original text from stored embeddings with high accuracy, creating serious risks for confidential data exposure. Existing cryptographic solutions relying on fully homomorphic encryption (FHE) provide strong security guarantees but exhibit impractical latency, often exceeding ten seconds per query, which prevents their deployment in production environments. This paper introduces a novel hybrid method that achieves sub-second query latency while maintaining robust protection against text reconstruction attacks. The proposed approach employs random projections as a cryptographic primitive to protect the embedding database, combined with CKKS encryption for query privacy and two-stage reranking for result accuracy. A key theoretical contribution is the mathematical formalization demonstrating that random projections create an information-theoretic barrier preventing original text recovery. The reconstruction error induced by dimensionality reduction provably exceeds thresholds required for successful inversion attacks. Experimental validation confirms practical applicability: the system achieves 0.96-second latency per query, nearly twice as fast as standard CKKS ciphertext-ciphertext mode and significantly outperforming existing secure search systems. Search quality degradation remains below 5 % when using medium protection profiles. The hybrid architecture effectively balances security requirements, computational efficiency, and retrieval accuracy, enabling organizations in regulated industries to leverage semantic search technology while maintaining data confidentiality compliance.

Keywords: Homomorphic encryption; Semantic search; Ensuring the confidentiality; Vector embeddings; Text reconstruction; Random projections; Approximate nearest neighbor search.

Рус

C. М. Куриленко (Московский физико-технический институт (национальный исследовательский университет), Москва, Россия) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Eng

S. M. Kurilenko (Moscow Institute of Physics and Technology (National Research University), Moscow, Russia) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript

Рус

1. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding // Proceedings of NAACL-HLT 2019. Minneapolis: Association for Computational Linguistics, 2019. P. 4171–4186.
2. Voigt P., von dem Bussche A. The EU General Data Protection Regulation (GDPR): A Practical Guide. Cham: Springer International Publishing, 2017. 367 p.
3. Song C., Raghunathan A. Information Leakage in Embedding Models // Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. Virtual Event: ACM, 2020. P. 377–390.
4. Morris J., Kuleshov V., Shmatikov V., Rush A. Text Embeddings Reveal (Almost) as Much as Text // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Singapore: Association for Computational Linguistics, 2023. P. 12448–12460.
5. Li H., Xu M., Song Y. Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence // Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Toronto: Association for Computational Linguistics, 2023. P. 6513–6525.
6. Kalia. M. Sensitive Data in Text Embeddings is Recoverable. Technical Report. 2024. URL: https://www.tonic.ai/blog/sensitive-data-in-text-embeddings-is-recoverable (дата обращения: 20.02.2026).
7. Curtmola R., Garay J., Kamara S., Ostrovsky R. Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions // Proceedings of the 13th ACM Conference on Computer and Communications Security. Alexandria: ACM, 2006. P. 79–88.
8. Piano: Extremely Simple, Single-Server PIR with Sublinear Server Computation / M. Zhou, W. Park, Q. Guan et al. // Proceedings of 2024 IEEE Symposium on Security and Privacy. San Francisco: IEEE, 2024. P. 445–462.
9. One Server for the Price of Two: Simple and Fast Single-Server Private Information Retrieval / A. Henzinger, E. Hong, H. Ma et al. // Proceedings of the 32nd USENIX Security Symposium. Anaheim: USENIX Association, 2023. P. 3889–3906.
10. Paillier P. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes // Advances in Cryptology – EUROCRYPT '99. Prague: Springer, 1999. P. 223–238.
11. Chen H., Laine K., Rindal P. Fast Private Set Intersection from Homomorphic Encryption // Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Dallas: ACM, 2017. P. 1243–1255.
12. Compass: High-Accuracy Encrypted Semantic Search with HNSW and ORAM / C. Zhu, K. Chow, Z. Zhang et al. // Proceedings of the 2025 USENIX Symposium on Operating Systems Design and Implementation. Seattle: USENIX Association, 2025. P. 1–18.

Eng

1. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), 4171–4186.
2. Voigt, P., & von dem Bussche, A. (2017). The EU General Data Protection Regulation (GDPR): A practical guide. Springer International Publishing.
3. Song, C., & Raghunathan, A. (2020). Information leakage in embedding models. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS '20), 377–390.
4. Morris, J., Kuleshov, V., Shmatikov, V., & Rush, A. (2023). Text embeddings reveal (almost) as much as text. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), 12448–12460.
5. Li, H., Xu, M., & Song, Y. (2023). Sentence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), 6513–6525.
6. Kalia, M. (2024). Sensitive data in text embeddings is recoverable [Technical report]. Tonic.ai. Retrieved February 20, 2026, from https://www.tonic.ai/blog/sensitive-data-in-text-embeddings-is-recoverable
7. Curtmola, R., Garay, J., Kamara, S., & Ostrovsky, R. (2006). Searchable symmetric encryption: Improved definitions and efficient constructions. Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS '06), 79–88.
8. Zhou, M., Park, W., Guan, Q., et al. (2024). Piano: Extremely simple, single-server PIR with sublinear server computation. Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP 2024), 445–462.
9. Henzinger, A., Hong, E., Ma, H., et al. (2023). One server for the price of two: Simple and fast single-server private information retrieval. Proceedings of the 32nd USENIX Security Symposium (USENIX Security 2023), 3889–3906.
10. Paillier, P. (1999). Public-key cryptosystems based on composite degree residuosity classes. Advances in Cryptology – EUROCRYPT '99, 223–238.
11. Chen, H., Laine, K., & Rindal, P. (2017). Fast private set intersection from homomorphic encryption. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS '17), 1243–1255.
12. Zhu, C., Chow, K., Zhang, Z., et al. (2025). Compass: High-accuracy encrypted semantic search with HNSW and ORAM. Proceedings of the 2025 USENIX Symposium on Operating Systems Design and Implementation (OSDI 2025), 1–18.

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 700 руб. (в том числе НДС 20%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа скопируйте doi статьи:

10.14489/vkit.2026.03.pp.044-049

и заполните  форму 

Отправляя форму вы даете согласие на обработку персональных данных.

.

 

Eng

This article  is available in electronic format (PDF).

The cost of a single article is 700 rubles. (including VAT 20%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please copy the article doi:

10.14489/vkit.2026.03.pp.044-049

and fill out the  form  

 

.

 

 

 
Search
Баннер
Rambler's Top100 Яндекс цитирования