The History of Speech Recognition

The history of speech recognition

Speech recognition software enables phones and computers to understand human utterances - be that a question, a command, or a general exclamation. And while a few decades ago this would have been in the realms of science fiction, nowadays it's a firmly established part of everyday life.

From checking the weather forecast and picking a playlist to sending texts and verifying your identity, the use of speech recognition is already so ingrained in society that we rarely give it a second thought!

But where did this technology come from? When did it all begin? And what does the future look like? Let's take a look at the history of speech recognition, how it's used today, and what the future has in store.

A brief history of speech recognition: a timeline

The 1950s

The first ever speech recognition system was built in 1952 by Bell Laboratories. Nicknamed 'Audrey', the clever system could recognize the sound of a spoken digit (zero to nine) with more than 90% accuracy - but only when spoken by its developer. It was much less accurate with unfamiliar voices.

The 1960s

IBM showcased the Shoebox at the 1962 World Fair in Seattle. The device could understand 16 spoken English words. Later in the 1960s, the Soviets created an algorithm that could recognize 200 words. These were based on individual words being matched against stored voice patterns.

The 1970s

A US Department of Defense-funded program at Carnegie Mellon University developed the Harpy, which had a vocabulary of over 1,000 words. The biggest breakthrough here was that it could recognize not only words, but whole sentences.

The 1980s

IBM was back at the forefront in the 1980s with a voice-activated typewriter called Tangora. It had a 20,000-word vocabulary and used statistics to predict and identify words.

The 1990s

In the early 90s, Dragon Systems released the first consumer speech recognition product, called the Dragon Dictate. In 1997, an upgrade called Dragon NaturallySpeaking was released. This was the first continuous speech recognition product, and it could recognise speech at a rate of 100 words per minute. This technology is still used today - in fact, it was acquired by Microsoft in 2021!

The 2000s onwards

AI speech-to-text technology has come on leaps and bounds in the past couple of decades. Google has led the way with its voice search product, and the likes of Apple, Amazon and Microsoft are all key players too.

What are the two types of speech recognition

There are two types of speech recognition: speaker-dependent and speaker-independent.

  • Speaker-dependent

Speaker-dependent speech recognition software is trained to recognize a specific voice, in a similar way to voice recognition software.

New users have to 'train' the program by speaking to it - which often involves reading a few pages of text. This way, the computer can analyze the voice and learn to recognize it.

Speaker-dependent speech recognition generally provides very high accuracy.

  • Speaker-independent

Speaker-independent software is designed to recognize anyone's voice, which means no training is involved. The software is focused on word recognition rather than a specific voice.

This type of speech recognition is generally less accurate, but it's the only real option for interactive voice response (IVR) applications, such as those used by call centers, as businesses can't ask callers to read pages of text before using their systems.

How is speech recognition used today

Here are some of the ways that speech recognition software is now used in everyday life:

  • Smartphones

Whenever you say "hey Siri", it's speech recognition software that powers these virtual assistants and allows us to use our devices just by talking!

  • Smart speakers

Smart speakers like Amazon Echo and Apple HomePod also have virtual assistants built into them. 320 million smart speakers were in use in 2020, and this is set to double by 2024!

  • Call centers

Speech recognition is at play every time you call a call center and a recorded voice asks you to state your name, reference number, or a summary of your query. This is known as Interactive Voice Response.

  • Security systems

Many security systems, like those used by banks, use voice biometry as a means of security checking a customer.

  • Transcription software

Automatic transcription services, like Transcribe, use speech recognition to convert speech into text, providing you with transcripts within minutes, if not seconds.

The future of speech recognition

Speech recognition is set to become more and more widely used. For example:

The more it's used, the more speech data that's collected, and the more investment that's pumped into it, the more accurate speech recognition software will get. It will get better at understanding different accents, differentiating between speakers, and even recognising emotions. Eventually it may also learn to understand different languages and dialects simultaneously.

No one can know for sure exactly what the future holds, but speech recognition software is on track to get better, more accurate, and more useful than ever before.

If you found this interesting, you might like to learn more about AI transcription, including how it works, how it's used today, and what you can expect from it in the future.

Written By Katie Garrett

Subscribe to news

Thank you for subscribing to our newsletter!