Video Based Automatic Speech Recognition Using Neural Networks

Lin, Alvin

Repository landing page

oai:digitalcommons.calpoly.edu:theses-3959

Video Based Automatic Speech Recognition Using Neural Networks

Authors: Alvin Lin
Publication date: 1 December 2020
Publisher: DigitalCommons@CalPoly
Doi

Abstract

Neural network approaches have become popular in the field of automatic speech recognition (ASR). Most ASR methods use audio data to classify words. Lip reading ASR techniques utilize only video data, which compensates for noisy environments where audio may be compromised. A comprehensive approach, including the vetting of datasets and development of a preprocessing chain, to video-based ASR is developed. This approach will be based on neural networks, namely 3D convolutional neural networks (3D-CNN) and Long short-term memory (LSTM). These types of neural networks are designed to take in temporal data such as videos. Various combinations of different neural network architecture and preprocessing techniques are explored. The best performing neural network architecture, a CNN with bidirectional LSTM, compares favorably against recent works on video-based ASR

Similar works

Full text

Open in the Core reader

Download PDF

DigitalCommons@CalPoly

oai:digitalcommons.calpoly.edu...

Last time updated on 28/10/2021

This paper was published in DigitalCommons@CalPoly.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.