DOI: 10.14489/vkit.2017.09.pp.024-031

Корсун О. Н., Михайлов Е. И.
Аннотация. Рассмотрена проблема выделения полезного речевого сигнала из потока «речь–фон» и «речь–шум». Предложен алгоритм детектирования речевых команд, основанный на использовании спектральной параметризации речевого сигнала, множественной регрессии и методов статистического анализа, в том числе, контрольных карт. Приведены результаты тестирования алгоритма на речевой базе, включающей в себя 20 различных изолированных слов.

Ключевые слова:  детектирование речевых команд; множественная регрессия; метод контрольных карт.


Korsun O. N., Mikhaylov E. I.
Abstract. The speech technologies are being developed intensively in the recent years, especially the automatic speech recognition as an additional channel between human interface and technical devices. One of the most important parts of speech recognition is speech signal detection from non-speech signals. Widespread methods, like amplitude detection, have low accuracy in noisy conditions and at low signal/noise ratio. The article is devoted to a problem of improving accuracy of detecting a useful speech signal from the stream «speech–background» and «speech–noise». A new approach is proposed for this purpose. This new approach is based on the joint use of spectral parametrization, linear multiple regression and methods of statistical analysis, including control charts. It is assumed that basic features of the speech signal distinguishing it from the background, noise and other non-speech signals are contained in a small number of different words (reference words). For example, group of three words was selected in this article. The speech signal is decomposed by a basis of reference words. The estimation of the speech signal level is calculated as the sum of squares of the projections onto the vectors of reference words, using sliding window. In the absence of a speech signal in the sliding window, the value of the estimation is small, because noises and non-speech signals are weakly correlated with speech. When a speech signal appears in the sliding window, the value of the estimation increases rapidly, which is a diagnostic sign of the word. The advantage of the proposed algorithm is that it detects almost any words that do not match with the selected basis of reference words. The practical importance of the problem is that detected words are used for purposes of automatic speech recognition. This article presents the detection algorithm based on proposed methods. The experimental results of approbation of the developed algorithm on the base, which includes twenty different isolated words, are also discussed in the article.

Keywords: Detection of speech commands; Multiple regression; Method of control charts.


О. Н. Корсун (ФГУП «Государственный научно-исследовательский институт авиационных систем» ГНЦ РФ, Москва, Россия)
О. Н. Корсун (ФГУП «Государственный научно-исследовательский институт авиационных систем» ГНЦ РФ, Москва, Россия)
Е. И. Михайлов (Московский физико-технический институт (государственный университет), Москва, Россия)



O. N. Korsun (State Research Institute of Aviation Systems State Scientific Center of Russian Federation, Moscow, Russia)
O. N. Korsun (State Research Institute of Aviation Systems State Scientific Center of Russian Federation, Moscow, Russia)
E. I. Mikhaylov (Moscow Institute of Physics and Technology (State University), Moscow, Russia)



