10.14489/vkit.2014.010.pp.023-029

DOI: 10.14489/vkit.2014.010.pp.023-029

Крыжановский В. М., Мальсагов М. Ю., Желавская И. С.
ПОИСК БЛИЖАЙШЕГО СОСЕДА В БИНАРНОМ ПРОСТРАНСТВЕ БОЛЬШОЙ РАЗМЕРНОСТИ С ПОМОЩЬЮ НЕЙРОСЕТЕВОГО БИНАРНОГО ДЕРЕВА
(с. 23-29)

Аннотация. Представлен алгоритм нейросетевого бинарного дерева, являющийся расширением бинарного дерева поиска на случай, когда входной вектор содержит искажения. Узлами дерева являются персептроны. Отличительная особенность алгоритма – итеративный поиск решения и наличие критерия остановки. Установлено, что с увеличением размерности пространства (N > 1000) известные алгоритмы, такие как LSH (Locality Sensitive Hashing), kd-дерево, BBF-дерево (Best-Bin-First) и spill-дерево, перестают работать в отличие от рассматриваемого. При таких размерностях скорость работы предложенного алгоритма в семь раз больше скорости полного перебора.

Ключевые слова: поиск ближайшего соседа; персептрон; дерево поиска; иерархический классификатор; мультиклассовый классификатор.

Kryzhanovsky V. M., Malsagov M. Yu., Zhelavskaya I. S.
NEAREST NEIGHBOR SEARCH IN HIGH-DIMENSIONAL BINARY SPACE BY SCALAR NEURAL NETWORK TREE
(pp. 23-29)

Abstract. The problem of nearest-neighbor search in a high-dimensionality (N > 1000) configuration space is considered. The components of reference vectors take either ±1 equiprobably, so the vectors are the same distance apart from each other and distributed evenly. We measure the distance between two points with the Hamming distance. The use of most popular methods (Locality Sensitive Hashing, kd-tree, spill-tree, best-bin-first tree) proved to be inefficient in this case. We have offered a tree-like algorithm that solves the given problem statistically. The idea of the algorithm is that the search area becomes consecutively smaller. In the beginning the whole set of patterns is divided into two nonoverlapping subsets. A subset that may contain an input vector is picked using the procedure described below. The subset is divided into another two nonoverlapping subsets, and a subset that may contain the input vector is chosen again. The procedure continues until each subset consists of a single pattern. Then the input vector is associated with one of the remaining patterns using the same procedure. The division of the space into subsets and search for a set containing a particular vector can be quickly done by a simple perceptron with a “winner takes all” decision rule. Each set is controlled by a perceptron trained with the use of patterns of corresponding subset. The algorithm excels in operation speed, exceeding the exhaustive search algorithm by up to 7 times on average (with the network load of 0.25 and N = 2048, b = 0.2, the gain is 28 times). This superiority will increase with the growth of dimensionality.

Keywords: Nearest neighbor searching; Perceptron; Search tree; Hierarchical classifier; Multiclass classification.

+ - Информация об авторах (About the Authors) Click to collapse

Рус

В. М. Крыжановский, М. Ю. Мальсагов (Научно-исследовательский институт системных исследований РАН, Москва) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript
И. С. Желавская (Сколковский институт науки и технологий)

Eng

V. M. Kryzhanovsky, M. Yu. Malsagov (Scientific Research Institute for System Analysis of RAS, Moscow) E-mail: Этот e-mail адрес защищен от спам-ботов, для его просмотра у Вас должен быть включен Javascript
I. S. Zhelavskaya (Skolkovo Institute of Science and Technology)

+ - Библиографический список (References) Click to collapse

Рус

1. Friedman J. H., Bentley J. L., Finkel R. A. An Algorithm for Finding Best Matches in Logarithmic Expected Time // ACM Transactions on Mathematical Soft-ware. 1977. V. 3, № 3. P. 209 – 226.
2. An Investigation of Practical Approximate Nearest Neighbor Algorithms / Liu T. et al. // Proc. of the Advances in Neural Information Processing Systems (NIPS). 2004. P. 825 – 832.
3. Indyk P., Motwani R. Approximate Nearest Neigh-bors: Towards Removing the Curse of Dimensionality // Proc. 30th STOC. 1998. P. 604 – 613.
4. Beis J. S., Lowe D. G. Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces // Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1997. P. 1000 – 1067.
5. Kryzhanovsky B., Kryzhanovskiy V., Litinskii L. Machine Learning in Vector Models of Neural Networks // Advances in Machine Learning II. Dedicated to the Memory of Professor Ryszard S. Michalski. Eds.: Koronacki J., Ras Z. W., Wierzchon S. T. (et al.). Series «Studies in Computational Intelligence». Springer, 2010. SCI 263. P. 427 – 443.
6. Hopfield J. J. Neural Networks and Physical Systems with Emergent Collective Computational Abilities // Proc. Nat. Acad. Sci. USA. 1982. V. 79. P. 2554 – 2588.

Eng

1. Friedman J. H., Bentley J. L., Finkel R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Soft-ware, 3(3), pp. 209-226.
2. Liu T. et al. (2004). An Investigation of Practical Approximate Nearest Neighbor Algorithms. Proc. of the Adavances in Neural Information Processing Systems (NIPS), pp. 825-832.
3. Indyk P., Motwani R. (1998). Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. Proc. 30th STOC, pp. 604-613.
4. Beis J. S., Lowe D. G. (1997). Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces. Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1000- 1067.
5. Koronacki J., Ras Z. W., Wierzchon S. T. et al. (Eds.), Kryzhanovsky B., Kryzhanovskiy V., Litinskii L. (2010). Machine Learning in Vector Models of Neural Networks. Advances in Machine Learning II. Dedicated to the Memory of Professor Ryszard S. Michalski. Series «Studies in Computational Intelligence». Springer, SCI 263, pp. 427-443.
6. Hopfield J. J. (1982). Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proc. Nat. Acad. Sci. USA. Vol. 79, pp. 2554-2588.

+ - Заказать электронную версию статьи (Purchase digital version of a single article) Click to collapse

Рус

Статью можно приобрести в электронном виде (PDF формат).

Стоимость статьи 250 руб. (в том числе НДС 18%). После оформления заказа, в течение нескольких дней, на указанный вами e-mail придут счет и квитанция для оплаты в банке.

После поступления денег на счет издательства, вам будет выслан электронный вариант статьи.

Для заказа статьи заполните форму:

{jform=1,doi=10.14489/vkit.2014.010.pp.023-029}

Eng

This article is available in electronic format (PDF).

The cost of a single article is 250 rubles. (including VAT 18%). After you place an order within a few days, you will receive following documents to your specified e-mail: account on payment and receipt to pay in the bank.

After depositing your payment on our bank account we send you file of the article by e-mail.

To order articles please fill out the form below:

{jform=2,doi=10.14489/vkit.2014.010.pp.023-029}