11 | 03 | 2025

DOI: 10.14489/vkit.2014.08.pp.029-034

Соколов А. П., Баженов А. Г.
(с. 29-34)

Аннотация. Рассмотрены вопросы оптимизации базы данных, учитывающей использование модульной структуры документов. Показано развитие принципа виртуальности документа, а также системы поиска информации на основе системы кодирования модулей данных. Предложено представление документа и модуля данных как совокупности лексических и логических примитивов. Показаны примеры практического применения конвертации лексических примитивов в логические для уменьшения размера базы данных, улучшения синхронизации данных и оптимизации системы поиска.

Ключевые слова: техническая документация; модуль данных; динамический контент; онтологическая система поиска.


Sokolov A. P., Вazhenov A. G.
(pp. 29-34)

Abstract. The paper deals with optimization of a database, used to store data modules instead of whole documents. Here the documents are com-pletely virtual and the search engine is based on the system of data module codes, proposed by the authors earlier. The paper does not con-sider choice or optimization of a database engine, since these aspects better depend on the data amount than on the data structure. All the optimization methods proposed concern re-use of some data modules in various documents, including converting multiple data modules to a single one. This approach affects not only the database size, but also optimizes search engine and data synchronization. The basic statement is that each data module should be unique and if it is used in various documents, it should have an alias code or alias codes corresponding to each implementation of this data module. So the database itself consists of three independent subspaces. The first one contains data modules and multimedia objects. The data modules and the aliases, belonging to documentation for some groups of items, are gathered in projects that also include some control and resource units called service data modules. The second one contains the tables of contents for specific documents, referred to the data modules or their aliases. The last one contains sets of styles and scripts used to process data modules when building a publication. Authors propose a kind of generalized .xml language to describe the tagged text. Both document and data module are considered as a composition of some primitives. The primitives are lexical, logical and style units, where lexical units are subsets of the space of words and sentences; the logical ones (tags) are the operators, operating on the lexical units that give them sense, as the style ones (styles) are the attributes of logical units responsible for the representation of lexical units under logical ones in a document. Due to the virtual nature of a document, the style units can be neglected, and the resulting variable lexical units can be used to convert multiple data modules into a single one using the alias technique. The article presents practical examples of converting lexical units into logical ones. A special attention is paid to the cases when the required dynamic content can be imported from the code of data module or comment for corresponding level of code that is also stored in the database. These techniques may cause a dramatic effect on the rate of re-used information up to the completely automatic build of some standard documents. It should be noted that all the results are applied on practice in CSRI Elektropribor (Russian Federation).

Keywords: Technical documents; Data module; Dynamic content; Ontological search engine.


А. П. Соколов, А. Г. Баженов (ОАО «Концерн «Центральный научно-исследовательский институт «Электроприбор» ГНЦ РФ, Санкт-Петербург)  


A. P. Sokolov, A. G. Bazhenov (State Research Center of the Russian Federation Concern CSRI Elektropribor, JSC)  


