Global information
- Download link: https://sigmedia.tcd.ie/
- Contact: sigmedia_database@tcd.ie - The subject must start with the tag [TCDTimit Question] (If the tag is not provided or invalid, your email will be ignored)
- License:
- Reference:
@article{harte2015tcd,
title = {TCD-TIMIT: An audio-visual corpus of continuous speech},
author = {Harte, Naomi and Gillen, Eoin},
journal = {IEEE Transactions on Multimedia},
volume = {17},
number = {5},
pages = {603--615},
year = {2015},
publisher = {IEEE}
}
Description
Automatic audio-visual speech recognition currently lags behind its audio-only counterpart in terms of major progress. One of the reasons commonly cited by researchers is the scarcity of suitable research corpora. This paper details the creation of a new corpus designed for continuous audio-visual speech recognition research . TCD-TIMIT consists of high-quality audio and video footage of 62 speakers reading a total of 6913 phonetically rich sentences. Three of the speakers are professionally-trained lipspeakers, recorded to test the hypothesis that lipspeakers may have an advantage over regular speakers in automatic visual speech recognition systems. Video footage was recorded from two angles: straight on, and at 30degrees. The paper outlines the recording of footage, and the required post-processing to yield video and audio clips for each sentence. Audio, visual, and joint audio-visual baseline experiments are reported. Separate experiments were run on the lipspeaker and non-lipspeaker data, and the results compared. Visual and audio-visual baseline results on the non-lipspeakers were low overall. Results on the lipspeakers were found to be significantly higher. It is hoped that as a publicly available database, TCD-TIMIT will now help further state of the art in audio-visual speech recognition research.