ПОРІВНЯЛЬНИЙ АНАЛІЗ НЕЙРОМЕРЕЖНИХ МОДЕЛЕЙ  ДЛЯ РОЗВ’ЯЗАННЯ ЗАВДАНЬ РОЗПІЗНАВАННЯ СПІКЕРА

Холєв, Владислав; Барковська, Олеся

Repository landing page

ПОРІВНЯЛЬНИЙ АНАЛІЗ НЕЙРОМЕРЕЖНИХ МОДЕЛЕЙ ДЛЯ РОЗВ’ЯЗАННЯ ЗАВДАНЬ РОЗПІЗНАВАННЯ СПІКЕРА

Authors: Владислав Холєв
Олеся Барковська
Publication date: 18 August 2023
Publisher: Kharkiv National University of Radio Electronics
Doi

Abstract

  The subject matter of the article are the neural network models designed or adapted for the problem of voice analysis in the context of the speaker identification and verification tasks. The goal of this work is to perform a comparative analysis of relevant neural network models in order to determine the model(s) that best meet the chosen formulated criteria, – model type, programming language of model’s implementation, parallelizing potential, binary or multiclass, accuracy and computing complexity. Some of these criteria were chosen because of universal importance, regardless of particular application, such as accuracy and computational complexity. Others were chosen due to the architecture and challenges of the scientific communication system mentioned in the work that performs tasks of the speaker identification and verification. The relevance of the paper lies in the prevalence of audio as a communication medium, which results in a wide range of practical applications of audio intelligence in various fields of human activity (business, law, military), as well as in the necessity of enabling and encouraging efficient environment for inward-facing audio-based scientific communication among young scientists in order for them to accelerate their research and to acquire scientific communication skills. To achieve the goal, the following tasks were solved: criteria for models to be judged upon were formulated based on the needs and challenges of the proposed model; the models, designed for the problems of speaker identification and verification, according to formulated criteria were reviewed with the results compiled into a comprehensive table; optimal models were determined in accordance with the formulated criteria. The following neural network based models have been reviewed: SincNet, VGGVox, Jasper, TitaNet, SpeakerNet, ECAPA_TDNN. Conclusions. For the future research and practical solution of the problem of speaker authentication it will be reasonable to use a convolutional neural network implemented in the Python programming language, as it offers a wide variety of development tools and libraries to utilize.Предметом дослідження є нейромережні моделі, розроблені або адаптовані для розв’язання проблеми аналізу голосу в контексті завдань ідентифікації та верифікації спікера. Метою роботи є проведення порівняльного аналізу відповідних нейромережних моделей для визначення однієї (або кількох), що якнайкраще відповідає таким обраним критеріям: тип моделі, мова програмування реалізації моделі, потенціал розпаралелювання, чи є модель бінарна, чи мультикласова, точність та обчислювальна складність. Деякі з цих критеріїв обрані, оскільки є універсально важливими, незалежними від того чи іншого завдання, наприклад точність і обчислювальна складність. Інші критерії обрані у зв’язку з архітектурою та недоліками системи наукової комунікації, що виконує завдання ідентифікації та перевірки спікера. Актуальність роботи полягає в поширенні аудіо як комунікативного засобу, зокрема йдеться про практичне застосування його інтелектуального аналізу в різних сферах людської діяльності (бізнес, право, військова справа). Крім того, постає питання про необхідність створення ефективного середовища внутрішньої наукової комунікації на основі аудіо серед молодих учених, що дасть їм змогу прискорити свої дослідження й набути навичок наукового спілкування. Для досягнення мети в роботі розв’язані такі завдання: сформульовано критерії для оцінюваних моделей з огляду на конкретні потреби й завдання; за певними критеріями досліджено моделі, розроблені для завдань ідентифікації та верифікації спікера. Результати: розглянуто моделі SincNet, VGGVox, Jasper, TitaNet, SpeakerNet, ECAPA_TDNN; результати дослідження нейромережних моделей зведено в загальну таблицю; визначено оптимальні моделі відповідно до сформульованих критеріїв. Висновки: для майбутніх досліджень і практичного розв’язання проблеми автентифікації спікера доцільно використовувати згорткову нейронну мережу, реалізовану мовою програмування Python, оскільки вона пропонує широкий вибір інструментів розроблення та бібліотек

Similar works

Full text

Современное состояние научных исследований и технологий в промышленности

oai:ojs.itssi-journal.com:arti...

Last time updated on 02/09/2023

This paper was published in Современное состояние научных исследований и технологий в промышленности.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.