Steve Beet's Speech Technology Page

Well I’ve been working on speech technology since 1980, and if you want to see what I’ve been up to all this time, take a look at my publications.

While at Aculab, I’ve been responsible for the development of four distinct telephone speech recognition systems:

A high-efficiency isolated word recogniser based on whole-word models (for British and American English, French, and German), running up to 16 simultaneous channels on one Analog Devices SHARC processor.
A high-accuracy connected word recogniser using a combination of phonetic and whole-word models (for British and American English, French, German, Italian, and US Spanish), running as a distributed system on networked PCs. The accuracy of this system has increased dramatically, response times have reduced, and memory requirements have halved since its creation. Thanks to a colleague of mine, this now included a speaker verification system as well.
A modern cloud-based speech recognition system, operating on streamed audio and producing results with very low latency.
A lightweight (less than 5 Mb) standalone small-vocabulary connected-word recogniser, incorporated within a larger speaker verification/authentication package.

Of course there’s still more to be done, but I keep getting distracted with forays into alternative methodologies for speech synthesis (a.k.a. text-to-speech conversion, or TTS), electrical (“hybrid”) echo cancellation in telephone systems, voice morphing, codec development, and things like that.

For some years, I took a break from writing academic publications, which is a shame because some of my original work on voice activity detection, extremely fast and accurate adaptive filtering, and short-time spectral analysis on arbitrary frequency scales, remains unpublished and was mathematically interesting.

To see what’s been happening in the World at large, follow some of the links on this page. It includes sources of all sorts of speech-related goodies (software, data, and the like).