The Human Computer Interface (HCI) has evolved over the last 4 decades through command line interface, GUI, mouse, to touch/camera interaction on mobile devices. The launch of Apple’s Siri system in 2011 heralded the dawn of the ‘voice-first’ user interface. The use of voice as a primary means of HCI is projected to grow exponentially over the next few years. The advantages to the consumer are obvious. The voice UI –

  • is much faster – humans can speak 150 words per minute vs type 40 words per min
  • is much easier to use – convenient, hands-free & instant

However market projections for the voice UI are always caveated by the need to improve accuracy in real-world (i.e. noisy) environments. Speech recognition technologies are all audio-based and, despite advances in noise cancellation techniques, word accuracy rates continue to decline markedly when background noise levels rise. This inability to perform in ‘real-world’ environments is seen as the key obstacle to the uptake of the voice interface.

LipREAD is a visual speech recognition system that performs consistently regardless of background audio noise levels. In environments where a camera can be trained on the head of the speaker, LipREAD can assist ASR systems and boost word accuracy levels, thus improving user experience and helping to eradicate the main impediment to full market exploitation of the voice interface.