Okay, before I get to my point, I have to stage a little hissy fit that is related to the topic. The topic is the news that Nuance has announced “Dragon TV” which will let future televisions understand your spoken command. And an enormous number of my colleagues in the technical press have written about this as a great advance in”voice recognition.” And it’s not. “Voice recognition” is the recognition of a specific voice, as in biometric security applications. When you try to dictate to a machine, the application is “speech recognition.” There’s a big difference, folks. Getting it wrong is just sloppy. Okay, I feel a little better now; let’s move on.
So Nuance announced “Dragon TV” which is a new interface platform that HDTV manufacturers can incorporate in their future television designs. The system recognizes spoken commands (not VOICES! — oops, sorry about that). The cool thing about this is not just that you can say channel numbers or channel names to switch, but it also becomes part of the search interface. You can speak the name of an actor, and it will seek out programming options where that actor appears.
From where I stand, this is far more important than any of the gesture interface announcements that came out of CES 2012. (Who wants to do aerobics just to change channels?) The key to the future of television programming is the ability to access the content that you want to watch, when you want to watch it. The rapid growth of “over the top” Internet streaming demonstrates how much American viewers want to break out of the confines of the traditional channel grid, but the big problem is how to access all that content. (This is something I know a little about, as I wrote a major industry overview report for GigaOM Pro on the subject.) The traditional remote control is not the answer, and as Logitech discovered the hard way, a QWERTY keyboard doesn’t go over too well in most living rooms these days. And trying to spell out T-O-M C-R-U-I-S-E by waving at the screen is probably a non-starter.
Speech recognition could be the answer. If it is going to work, nobody is in a better position to deliver on the promise than Nuance. The company’s history starts with Visioneer, a scanning company; it turns out that the same algorithms that help with optical character recognition (OCR) also work with speech recognition. It later acquired ScanSoft (which was a descendant of the famous Kurzweil Computer Products), which in turn goobled up many of the OCR and speech recognition companies of the day including Caere, Lernout & Hauspie, Philips, SpeechWorks, and Locus Dialog. After merging with Nuance in 2005, the company has continued to grow through acquisition, buying Dictaphone, Tegic, and more than a dozen other companies. As a result, Nuance is the repository of perhaps the most extensive collection of speech recognition technology.
I also suspect that lurking in some of those IP collections are some algorithms that can help identify meaning. This will be an essential component to success of any new television interface that tries to sort through the metadata for the universe of movies and TV shows and YouTube clips in order to find matches to recommend in response to a user query.
There are no Dragon TVs on the market yet, but this still could be one of the most significant developments for the HDTV market this year.