DOI: 10.14489/vkit.2015.010.pp.044-049

Потемкин А. В.
(с. 44-49)

Аннотация. Предложен новый подход к анализу информационных потоков в сети Интернет, основанный на определении структуры распространения сообщений в потоке, учитывающей хронологию их появления, подобие текстов сообщений между собой и характеристики источников. С помощью алгоритма поиска в глубину определяют компоненты слабой связности, анализ кото-рых позволяет найти изменения информации в сообщениях. Применение данного подхода  снижает размерность задачи анализа информационного потока.

Ключевые слова:  информационный поток; структура распространения информации; средства массовой информации; нечеткий дубликат.


Potemkin A. V.
(pp. 44-49)

Abstract. The Internet is a complex system, affecting society. Large volume, highspeed transmission of information is determined by the need to automate processing. To reduce the dimension of the problem of information analysis in the Internet an approach based on thematic infor-mation flows is used. The scientific literature shows the methods of analysis of thematic information flows based on their intensity or the presence of citation. These approaches do not allow to determine the change information on a variety of sources. Citation analysis determines only a part of the relationships between messages. A new approach to the analysis of information flows on the Internet is given, based on the determination of the structure of the dissemination of messages in the flow, taking into account the chronology of their appearance, similarity of text messages between them and the characteristics of the sources. The structure of the information flow is a weighted directed graph which nodes are the messages, edges - relationships between them. The edge weight depends on the similarity measure of messages text and values of the time interval between their appearances. The direction of communication is determined using the chronology of a message. On the basis of the scientific literature the threshold of similarity of texts of news reports, which are near-duplicate, is substantiated. The structure of the dissemination messages is the base for determination of the connected components, using depth-first search algorithm. The study showed that the use of this approach makes it possible to define more precisely the relationship between messages of the thematic information flows. The analysis of the connected components allows the search for information changes in the messages. The use of this approach also reduces the dimension of the information flow analysis problem.

Keywords: Information flow; Information dissemination structure; Mass media; Near duplicate document.


А. В. Потемкин (Академия Федеральной службы охраны Российской Федерации, г. Орел)  


A. V. Potemkin (Academy of Federal Security Service of the Russian Federation, Orel)  


