Speech-recognition software has been waiting in the wings for so long, you do have to wonder if it will ever make centre stage.
Technical problems have hindered the technology from its conception at Bell Labs in 1952, when researchers invented a system that recognised spoken numbers over a phone.
A lot of improvements have flowed under the bridge since then, with the technology now popular in call centers, directory assistance services and voice portals - speech-driven data services supplying the likes of airline arrival/departure information, weather forecasts etc.
Opus Research and Datamonitor are predicting big things for the technology - particularly in the area of voice-driven mobile search. Search terms are spoken into a mobile device, rather than typed in. The improved technology along with falling costs of processing power and storage capacity has made this area of the technology more accurate for businesses to use in communicating with customers
Locally the Inland Revenue Department uses a voice portal for customers to secure tax returns. After a letter notifying me of a tax return I rang the IRD phone number to connect to the portal. An automated voice service asked me to say my IRD number, and repeated it back to me. Then the portal asked me to say yes to confirm my bank account number nominated in the letter was the one I wanted the money deposited in. The hardest part was to resist the urge to thank the automated machine/voice that had so kindly agreed to put money in my bank account - instead I just hung up.
That’s all well and good with speech recognition being utilised over the telephone and as a search device on mobile applications. Though its use in the car, in the office and at home has some severe human limitations.
An 20-year-old driver is hurtling down a winding country road at night when his drunken mate in the back seat screams out ”lights-off”. Not funny and quite possibly fatal. And as for your suburban dad zipping down the highway, how will he react when his wife decides the wipers should be on now because it is raining hard enough for her.
The two examples are why I believe speech recognition in a car is a no-brainer. The driver loses control of some basic functions and I personally wouldn’t want a bar of that technology in any car I was driving.
And at the office, you might like to finish your day by saying ”computer off”. Just don’t say it loudly or you might turn off your fellow-workers’ PCs.
The scenario for speech recognition uptake in the home isn’t that great either. For those living alone speech recognition for light switches, TVs, stereos etc would be a disaster. One day you’re authoritatively giving commands to your appliances and before you know it the years roll over and you’re saying out loud. ”Where is the newspaper?”, ”I must feed the cat” while alone in the house.
Consider speech-recognition technology in a house with six 20-something flatmates, (worse, five of them are graduates of Otago university!) - hmmm, I could pass on that lifestyle option
In terms of people switching devices on and off by voice commands, I think the humble physically-operated switch and its more nifty cousin, the remote control, are safe for a while longer.
And for those who think speech-recognition software will allow them to dictate the great novel that is just bursting to get out. Forget it. The best speech-recognition software that will ever be invented will not compensate for a lack of talent. If you have not been able to type your great novel, don’t think your going to be able to recite it.




Hi there,
Just wanted to shed some light on how speech is used in cars today, because I think your examples suggest that speech in cars is dangerous. If anything, speech in cars increases safety.
First off, there are voice-activated dialing capabilities, so that people are making phone calls with their voice instead of looking down at a cell phone or paging through an address list.
Secondly, there are navigation systems. Much better to enter in an address by voice than to be distracted trying to manipulate a touch screen while going down the highway.
Finally, the command and control type features that you alluded to are more often used for things like temperature control, finding a radio station (especially with all the channels that satellite radio offers), or other distracting, hands-off-the-wheel activities. It doesn't make much sense to speech enable things like your lights or turn signals because those are easily reachable with your hands while driving. And even if they were speech-enabled, the push-to-talk key used in most speech-enabled cars would prevent others in the car from saying anything to distract the system.
It's estimated that by 2008, 50% of all cars sold will have or offer speech as an option. It's a reality today, not something that is coming soon. You'd be surprised!
I enjoyed your entry and hope you keep writing about speech.
Best,
Jeff Foley
Solutions Marketing Manager
Nuance
http://customercarecommunity.com/blogs/conversations2007/default.aspx
Posted by: Jeff Foley | September 6, 2007 4:42 AM
you write that vox reco might not be able to properly discriminate a user's identity in a group setting;
however, vox reco can quite easily discriminate amongst different users ...
eg, before os/x stopped using the feature, apple shipped macintosh computers in the 90's (running 'Classic' os/9) that used vox reco to authenticate the log-in ...
the famous pass-phrase was "my voice is my passowrd" (but it could be any arbitrary text).
apple was entirely confident about the security of its vox reco: i personally verified their claim that it could distinguish between a live spoken version of the pass-phrase and a digitally recorded version of the same pass-phrase by the same speaker: the apple vox reco engine would always reject the recorded version.
while authetication/reliability of vox reco is indeed impressive as a biometric tool, it is otherwise strange that apple has given up on vox reco technology as a broader feature, especially after being a pioneer in teh firld decades ago! (ironically, the brilliant scientist apple recruited from cmu - who went on to head up the whole multimedia/quicktime group at apple - was later poached my microsoft, and then poached again by google - where he now is the head of research ... so perhaps it will not surprise any serious observer if things were to turn full circle & google were introduce speech-enhanced search technology in the coming years based on the initial impetus apple gave to the career of one of speech technologies great pioneers!)
however, what will not be surprising is that any innovative applications for speech do *NOT* come from apple - given the fact that steve jobs' recklessly slashed apple's R&D budget in half when he returned to apple.
since then, apple has retreated into a few narrow niches ... at which, it must be said, it has done brilliantly ... but at the cost of sacrificing any serious, fundamental innovation on the scale of its early breakthroughs (ie GUI, DTP, Multimedia, Plug-And-Play, the PDA, etc).
however, this autum, the 'leopard' version of os/x will represent the first (token) effort in vox technology after more than a decade of inactivity at apple -- the tts (text-to-speech) will be improved to sound much much more realistic than the creaky, legacy version that has been shipping for too long.
yet even this modest improvement will be (another) victim of Job's short-term tunnel vision ...
why? ... because vox (TTS, let alone vox reco) does not ship on the ipod or on an iphone (despite the adequate cpu resources to do so).
as a result, apple is over-looking one of the most obvious, KILLER APPS for vox: namely portable (second) language learning.
today, this is already a multi-billion dollar industry that is fragmented over other inchoate mixture of legacy learning technologies (practice tapes & discs & language-labs; CBT; flash-cards; web-based distance learning, in-class teaching; textbooks; etc etc).
it is completely obvious that the ipod/iphone is the PERFECT platform on which to deliver multi-lingual dictionaries, and language-learning tools that can _uniquely_ exploit vox technologies ... eg: Rosetta Stone's cbt is based on using vox reco to identify errors in pronunciation (which is a vital form of feedback that is not available in 'linear' approach of a language lab) - so it would be a natural candidate mobile learning *IF* apple were to have added vox reco & tts facilities to the ipod/iphone!
certainly apple has not reached out to any of the electronic dictionary makers (like BESTA in the chinese market) to create new growth categories -- instead apple devotes what little content development energy it does to re-cycling decades old arcade _games_ as the pinnacle of its interactive experience for mobile media platforms!
the potential is enormous -- even apart from the student marketplace (which is already GIANT in china & india etc) ... the travel/tourist market all on its own is large enough to justify adding vox features to the ipod/iphone.
but this kind of basic functionality (just like the omission of GPS) is not likely to happen while Jobs is still at apple -- long ago he lost his passion for innovation ... now it is all about revenge (proving how wrong everyone else was to pushed him out of apple 20 years ago & then ignore his subsequent ventures for so long).
so, sadly, there is little prospect of seeing cool uses for vox technologies on the most popular mobile platforms in the world (ipod/iphone will have an installed base of 200M in the next few years) - while Jobs is still holding back apple from achieving true greatness.
in the interim, we will be stuck with using underwhelming examples of vox tech - like log-in/out.
Posted by: zahadum | September 10, 2007 3:56 PM