Minggu, 15 Februari 2009

Speech recognition: Almost ready for mobile prime time

I've always wanted to see speech recognition incorporated into mobile devices. Since you don't have a big keyboard when you're on the go, you ought to be able to just talk to your phone and tell it what to do, or dictate memos to it and have it convert them into e-mails or SMS messages. In addition to being incredibly convenient, this would increase the safety of a lot of drivers. It's a spooky fact, but in surveys I've done more than 10 percent of the US population admitted to sometimes sending text messages while driving.

Not smart, not safe.

So, is voice recognition good enough to let you just talk to your mobile device and then send the converted text as a message?

I first asked myself that question a couple of years ago when I bought a copy of Dragon NaturallySpeaking and a small voice recorder. I tried recording weblog posts and other documents while driving, and then brought the recorded sound back to my computer to convert it into text. The result was a disaster. Dragon was unable to keep pace with the recorded sound in the files, and started dropping sentences, paragraphs, and eventually entire pages of spoken text. I was so disgusted, and so disappointed, that I gave up and went back to listening to sports talk radio while I drove.

Recently a newly appointed product manager at Nuance (publisher of Dragon) sent out a survey asking for feedback on the product. Unlike most product managers, she signed the survey form with her own name and with her own e-mail address. Most product managers wouldn't do that because they don't want to be overwhelmed with feedback. I don't know how much feedback she got in general, or how overwhelming it was, but she got a note back from me describing my problems with the product and explaining why I really wasn't satisfied with it.

I didn't expect to get any reply from the company; Nuance has a remarkably restrictive policy on providing technical support unless you pay extra for it. Usually, companies that do that aren't interested in getting any sort of conversation going with their customers. But to my surprise, I got a note from the product manager not only sympathizing with my problems but offering to send me a copy of the latest version of the software and a voice recorder that she said would work well with the software. I wish my weblog address hadn't been in my signature, so I would know if they do this sort of thing for every frustrated user. But anyway I took her up on the offer.

You can see the results here. I dictated this weblog post using the voice recorder, synced it onto my computer for recognition, and then corrected the (few) errors by hand. There are pluses and minuses to the dictation system. The good news is that the program can now keep up with my dictated speech. I no longer lose sentences or paragraphs of text. I'm also surprised with the way the product recognizes trade names, so for instance when I say Home Depot or McDonald's or Nike or Apple or IKEA or Lowes, Dragon gets the names correct and properly capitalized (I didn't have to fix anything in that sentence).

On the other hand it does make mistakes -- the packaging claims about 99% accuracy, which means that you should expect one word in every hundred to be incorrect. My guess is that I'm getting somewhere between 97 and 99% accuracy. That's not bad. In fact, it's pretty darned impressive. But in practice it still means you have to go back and do a lot of corrections.

The training is close to torture: reading aloud a 20-minute excerpt from a Dilbert book while trying to pronounce every word correctly. Later I tried setting up the program without any training, and it worked exactly the same. So my advice is to skip the training.

The software is not great at understanding where punctuation should be placed in the text. I have learned that I have to give grammatical guidance by saying things like "comma," "period," and "new paragraph" in order to make sure that the text will be reasonably well formatted.

If I just speak naturally the text will come out like this making it very difficult for anyone else to read and even making it hard for me to edit without punctuation inserted it is very hard to get tell where a sentence was supposed to end and another one start add in a few wreck cognition errors by the soft wear and the text is not something you would want to send to someone uncorrected

Speaking with punctuation is unnatural, and could be somewhat distracting while driving. I have to think carefully about the text that I'm dictating, and I believe for some people that could cause them not to pay enough attention to what's happening on the road. I think I can do it safely or I wouldn't do it, but it definitely is an issue to consider.

Overall, I think this approach will make me a bit more productive, so I should be able to produce a little bit more weblog content and maybe get some other sorts of things done as well.

So it's nice for me, and I finally feel like I got my money's worth from Dragon. But is the technology ready for broad deployment in mobile devices?

I think the answer is technically yes, but practically no. Mobile devices are casual-use; tasks that require too much commitment or effort just don't get used. Without careful attention to spoken punctuation, the software produces errors and the sort of run-on text you saw above. Even in a short message, I think it's likely that you'd get more mistakes than you'd find acceptable. Correcting those errors on a small screen with no mouse would be tedious at best (it's an annoying task even on a PC).

More importantly, the software is very sensitive to the quality of the sound file coming into it. I believe most phone microphones and headsets wouldn't produce the required quality. You'd probably get better results with a service that just records your speech and has someone in India retype it (such services exist today).

So, the news from the world of voice recognition is hopeful for mobile users but not yet wonderful. The technology is good enough that you can definitely use it as a substitute for typing if you have physical problems. It's also a useful PC productivity tool for someone who generates a lot of text for a living.

However, I think we're not yet quite at the point where you can just talk to your phone and have it reliably transform all of your speech into text. It's getting better, but it's not all the way there yet. For a mobile device, the dream of just talking is still a dream. But I do think it's a dream that's getting closer to reality.

===========

PS: I'd also like to compliment Kristen Wylie, the product manager at Nuance who responded to my message. Take notes, folks, this is the right way to communicate with customers online -- sign your real name, use an address they can respond to rather than a no-replies mailbox, and when someone has a problem help them solve it.

Jumat, 13 Februari 2009

VISION VAMPIRING! ...SANS THE BLOODY TRAVAILS


“We don’t like their sound, and guitar music is on the way out” – Deccan Recording Company, rejecting the Beatles, 1962


Nothing is more critical and more elusive than the vision of the top management. Guys at the Deccan Recording Company are not the only ones who lost a huge opportunity for lack of vision (Beatles went on to become the most legendary rock & pop band ever), it’s rumoured that Elvis – after his first live performance – was advised by his producer to become a truck driver. So much for the producer’s vision!? In the cat-eat-cat world of contemporary business, Vision Vampiring as a philosophy is about looking into the best that the future can ‘NOT’ offer and ensuring that the organisation has a burning desire to vie for that seemingly unbelievable and unachievable objective. But to inculcate Vision Vampiring in your firm, the first step would be to understand the Vision Annihilation Theology...

Vision annihilation theology

Annihilation! Because ‘vision’ should necessarily annihilate and takeover an organisation’s senses and functions at all identifiable levels! Theology... Because it requires an ocean of spirituality and mysticism to grasp the wonder that is encompassed by ‘vision’! Vision is not something that can be calculated using a formula, instead it needs to be determined with an amazing sense of future-thought.

Vision Vampiring therefore is all about ensuring that the Vision dream of a firm is developed and spread to every link in the organisation using compelling transactions, akin to how vampires spread their clan. Vision Vampires are the individuals who ensure that the Vision dream is developed and spread across the organisation. All individuals bitten by Vision Vampires become the new Vision Vampires, who in turn spread the Vision to others; creating more Vision Vampires. Firms need to ensure that Vision Vampiring Groups (VVG) are developed at all identifiable organisational levels, with dedicated personnel focused on the Vision job. Walloping vision-ridicule-entrapment (where managers refuse to think beyond previously set boundaries and get entrapped due to fear of ridicule) is the first step for any discernible vision development at multiple organisational levels. VVGs have to be blasted with the message of not fearing ridicule while developing Vision, else the concept of Vision gets thwarted at the very outset.

Vision Vampiring is the future foresight that provides annihilating inspiration to a firm’s stakeholders; it is the ability to be clairvoyant with compelling reason, and spread this futuristic compulsion and attitude among necessary people. It must be backed by this obsessive pang for achievement that has the capability to be called a revolution. So effectively, Vision Vampiring is where the basis of any corporate planning starts. And ends. And starts again… It, in fact, is why any organisation demands to be the absolute leader. It is indeed Vision Vampiring, which creates targets that seem unachievable; and also creates structures and people who believe in achieving those seemingly unimaginable targets... Obsessively, compulsively, futuristically!

Vision-strangulation

Critics argue that once defined, each employee of the organisation should become enmeshed in this Vision concept. But the Vision Annihilation Theology demolishes this idea. The Vision-Strangulation concept says that as we move down the levels of management, the principle and spirit of Vision is perfectly strangulated and killed because employees at lower levels get enmeshed in their job responsibilities and work pressures, instead of getting enmeshed in Vision. This occurs because stakeholders like lower level managers and employees get totally engrossed and stressed out in achieving their short to medium term objectives. Also because more often than not, the magnanimous Corporate-level Vision does not relate to the everyday job of employees. The more one moves to lower levels in the organisation, the more the top-level Vision Statement has to be refined to include perspectives of the lower levels. A sales officer in any organisation, irrespective of the company’s Vision to be world’s number one, would keep worrying about his own monthly targets. A recruitment executive would be more worried about planning the next recruitment requirement that might be expected within the organisation and so on. For example, the Vision of Microsoft is ‘Empowering people through great software – any time, any place and on any device’. However, the Business Head of Microsoft’s Xbox business would obviously be worried about empowering people less through software and more through Xbox! This exemplifies how work pressures at functional levels succeed in strangulating orientation towards the overall organisational Vision.

Clearly, if Vision-Strangulation has to be effectively whipped, then there can be no single Vision that can be applied to the complete organisation. The demand of Vision Annihilation Theology is that for each identifiable and link level in the organisation, one should develop a separate Vision. So instead of attempting to force the Corporate Vision down the throats of junior stakeholders and employees, employees should be instead provided a terrific mix of top level Vision philosophy and expected best-level achievements at their own management (or functional) level. This is where VVGs at each level include link representatives from other levels who develop a relevantly different spin of what the organisation Vision is all about (quite similar to what Microsoft’s Advanced Technology Group does; or GE’s Work-Out programme at different levels). So a sales officer could be roped into the spirit of the Corporate Vision – let’s say of being the world’s number one – by ensuring he understands the Corporate Vision in principal; and is just as aware about the stupendous results expected out of him and his team in objective terms. Ditto with other levels and functions. This is the process of Sub-Vision development. However, all these sub-Visions have to be linearly aligned with each other, and most importantly with the top-level Corporate Vision.

Vision-intervention-exercises

The compelling transactions that VVGs use could range from standard workshops & meetings, to specific intervention exercises attempting to recreate and regenerate vision across the organisation. But these intervention exercises need to be formally designed, planned & implemented. The firm’s CEO (and his core team) should accept the role of being the key Vision Propagator and ensure that individual Vision Vampires & VVGs are developed at various levels. But there are issues to be handled deftly while propagating ‘Walloping Vision-Ridicule-Entrapment’. Vision has to be defined in concrete terms at the top. The top-level Vision Statement should act as a benchmark for progress. This top-level Vision could be defined using a combination of quantitative and qualitative statements. Correspondingly, Vision Vampiring needs structures and people for its success in adoption. More importantly, it is about how fervently the concept has been adopted. Some organisations feel proud that their Vision is created by their bottom level employees. And some feel proud that their Vision has been developed and approved by customers. Vision Vampiring has to ‘soar’ from top to bottom, and from inside to outside. Vision should not be created from bottom to up. Vision should not be developed by outsiders (unless the organisation is Enron where the corporation’s vision smells of totally illegal, immoral & unethical endeavours). Moreover, Vision should preferably not come out of just one person’s thought processes (Rupert Murdoch is one of the better known exceptions to this theory), instead it should be something that is well-thought about by a responsible top manager. Other levels of management and employees should feel extremely related with the developed Vision statement.

Glorious Vision Statements are worth trash if they’ve been made without reckoning the current & future capability and competence advancement agenda of the organisation. Stanford’s Vision of being the Harvard of the west was believably backed up by its agendas toward development of capabilities and competencies. Further, the world’s best Vision Statement cannot get the corporation anywhere unless it is backed up by sincere strategic plans and implementation controls. Initially, Apple failed & Microsoft won because of just this: Sky-high Vision; Ground-level Flop! (Later, Apple realised this; and it served the Cupertino giant well.)

Oh! And needless to say, Vision Vampiring is always the ‘continuous’ process of looking into the future. So the trick, my friend, is to regularly revisit & ReVision! Remember the dynamic nature of Vision Annihilation Theology?!