Global information

  author       = {Sterpu, George and Saam, Christian and Harte, Naomi},
  title        = {{How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition}},
  journal      = {IEEE/ACM Trans. Audio Speech Lang. Process.},
  volume       = {28},
  pages        = {1052--1064},
  year         = {2020},
  month        = {Mar},
  issn         = {2329-9304},
  publisher    = {IEEE},
  doi          = {10.1109/TASLP.2020.2980436}


AVSR-tf1 is an open-source research system for Speech Recognition.

Written entirely in Python, AVSR-tf1 aims to provide a simple and reproducible way of training and evaluating speech recognition models based on sequence to sequence neural networks. AVSR-tf1 can exploit both auditory and visual speech modalities, considered either independently (ASR, VSR) or together (AVSR).