The Suitability of Speech Recognition

I got thinking about speech recognition and how it has developed over the years. My first experience with speech recognition was when my dad bought IBM ViaVoice back in the 90s for our computer (a Pentium 120MHz with 16MB RAM). The software needed a good amount of training to improve its “out of the box” accuracy and required a bit of grunt to work properly (which was more than the old Pentium could muster at the time).

Anyway, once the software was all trained there was the small matter of using it. To put things in perspective, at the time ICQ was becoming one of the dominant instant messengers, dial-up internet access was the most common method of accessing the web (which was bought in hourly increments) and hard drives had only just tipped over the 1GB mark. The MP3 format had not yet made the mainstream.

Being a school student at the time, I had a myriad of written assessments I had to complete over the years. To be honest, I was never fond of these sorts of assignments and, as a result, I needed to make numerous revisions before submission. Attempting to dictate an assignment was not ideal nor was making corrections on the fly. Rarely, would I get things right first go (either the written content or the speech recognition) plus it felt a bit stupid trying to dictate a document out loud (even without competing with background noise). It was just easier to type it out manually (particularly when people were intentionally making noise to confuse the software).

These days, speech recognition is used for short and sharp commands (like dialing a contact on your phone) or to compose messages of relatively shorter length (like SMS or e-mail messages). Mobile devices have more power in them than the desktop computers commonly found fifteen years ago and we also have access to the cloud to handle speech processing if required. It’s probably easier to get a sound bite correctly recognised or corrected than it is a whole slab of  text (particularly when you are driving).

But will speech recognition eschew the humble keyboard in this next decade?

Personally, I don’t think so mainly because I reckon it’s difficult for people to go paragraphs at a time without:

  • losing their train of thought,
  • encountering a mistake (either human error or machine error),
  • needing to go back and change content.

What I do think might become more prevalent may be the implementation of speech recognition for home automation devices ranging from viewing content in the home theatre through to climate control. Mobile speech recognition will continue to improve as cellular devices increasingly become the primary portal for our digital lives.

The bottom line is that not only are there suitable a times and places for speech recognition there are also specific types of speech most suitable for speech recognition.

Leave a Reply

Your email address will not be published.