Friday, June 15, 2012

Windows Phone 7 Prototype 001: Speech Recognition on WP7


At some point in the future it will be awesome when you can just tell your computer what to do and it does it – without typing to help those of us with a blistering 11 WPM hunk and peck technique. Siri, a mobile digital assistant using speech recognition was voted best tech at SXSW. I don’t know about that one. Although, I’m sure it will get better when Apple rebuilds it and  bundles on iPhone 5. So how would you do that on WP7? There have been some videos floating around showing Bing with some voice control so obviously the phone has speech recognition. So what options are there:
  • System.Speech? Not included in WP7/SL
  • Nuance software like Siri? No WP7/SL version yet.
  • Invoking the SAPI dlls on the phone? No automation factory in WP7 SL.
  • Web services using System.Speech and mic on the phone? YES!
The last one was my least favorite but that works for now.
I built a quick sample app to show how to do text-to-speech and speech recognition on WP7.






In this sample there is web service with provides access to the system.speech APIs in .NET. Basically it’s just passing around byte arrays. On the phone it’s using the XNA audio frameworks to play the text-to-speech stream and to record using the microphone. The code is pretty simple and you can download from the link at the end of this post. The only things to note are adjusting the WCF config to handle larger byte uploads and the Microphone API is a little weird with that 1 second buffer. It would be nice if you could just to mic.start and mic.end which would return an array of bytes instead of managing your own stream inside the buffer ready callback.
Couple of downsides to this approach:
  • Recoding from the phone has some static. Could be my code or the my mic is bad / not calibrated right.
  • Having to make web service calls instead of local access is not ideal (Microsoft, please add an API for the SAPI dlls) Although in the context of an app like Siri it’s not so bad since you need to do web service lookups to get data back
  • Speech recognition quality really depends on either a,,) a limited grammar set like that pizza grammar in the sample or b) training the recognizer. For the latter it would be annoying to have users train the system. Using the System.Speech stuff you’d have to have a profile for each user.
So until Microsoft adds some speech client APIs on the phone or Nuance releases a wp7 product, this is a decent workaround. In the future I’d like to build something similar to Siri. I shall call it Iris in homage. I’m a big fan of mobile speech apps because frankly it’s just not safe to Google while driving.
Since some of my designer co-workers have been posting UI sketches for WP7, I’d like to start posting some code prototypes for things I try out on the phone. That will probably last 2 weeks, but for the moment I have like 10 posts in the queue.

Sample Code

Note :You can also use Web service by Nuance 100% guaranteed to work.



1 comment:

  1. Could you upload the sample code again? or send it to me? Please

    ReplyDelete