Speaker
Description
Most cognitive services deal with voice understanding of emotions, speech and speaker recognition. Thus, the actual problem is creating a general approach for speech embedding, such as speaker recognition. The state-of-the-art speaker recognition methods have significant restrictions on their use because these methods are sensitive to durations of the speech signals. In this paper, we've proposed a new approach to the speech signals embedding using a recurrent neural network, which can be used for the speaker, speech and emotion recognition. It has been shown experimentally that the use of the proposed approach allowed to reduce the speaker recognition error equal rate by 7.5% compared with the state-of-the-art «i-vector» approach with voice models vector dimension 16 and 100, respectively, for 2 sec. speech signals.