• About
  • News & Events
  • Partners
  • Contact Us
  • 800.834.4969
In order to provide complete functionality, this web site needs your explicit consent to store browser cookies. If you don't allow cookies, you may not be able to use certain features of the web site including but not limited to: log in, buy products, see personalized content, switch between site cultures. It is recommended that you allow all cookies.
I hereby accept that the provided information can be used for marketing purposes and targeted website content.

You can find more information on our privacy page.

Siri-ously Speaking: How Voice Activation and Text-to-Speech Really Works

November 11, 2011

Siri Lives in a Data CenterWhen Apple recently unveiled the iPhone 4S, many were surprised by the lack of any external design change or hardware configuration. Yet for all its surface-level sameness, it's since become apparent that the phone's true beauty is much more than skin deep.

We're talking of course about "Siri," a mild-mannered, informative and surprisingly witty personal assistant baked straight into the operating system of each new iPhone. Press a button, ask a question (out loud) and get an answer (yup, out loud), almost instantly. Same goes for reminders, text messages and more - simply ask (or tell, nicely!) Siri to do something, sit back and listen to the sage advice of one wise lady (did we mention the computer-generated voice is female?)

But just what makes Siri tick? Software, and lots of it. Truth be told, voice activation and voice recognition software has been in the works (and in use) for some time now. Siri's just the latest in a line of technology raising the bar on how intuitive voice recognition between us (humans) and them (computers) can be.

Here's a quick, nuts and bolts breakdown of the technology at play:

  • We verbalize a command or question to our handheld device. That information is captured as audio and routed over the internet to a far-off data center.
  • Once there, the audio information is funneled (or in our case, routed through superior cable management) to stacks and stacks of servers (resting comfortably in racks, cabinets or wall-mount systems).
  • Inside these servers, complex computations begin to take effect. First, automatic speech recognition (ASR) software transcribes our speech into text. This text is then processed, tagged and parsed.
  • The information within that text is then analyzed for content and a (hopefully) acceptable answer or resolution is formulated.
  • That answer is then transformed back into natural language text, and using text-to-speech software, synthesized back into device-generated speech.

All in a matter of seconds, the entire process runs its course (sound waves > phone > data center > back to phone = you, amazed!). The glue that holds it all together? The data center. The glue that holds the data center together? Who else? Chatsworth Products, Inc. Jeff Cihocki, eContent Specialist 

Search CPI in the News
Categories
Archives