I know that I shouldn't be surprised at this point, but making it a 4S exclusive...

IanMikutel · on Oct 4, 2011

I wrote a post last night (http://news.ycombinator.com/item?id=3069745) analyzing Tom Gruber, Siri's Co-Founder, CTO, and VP of Design, original Siri keynote in 2008.

The post got pretty popular on HN and I had 5 key questions/predictions, feel free to read all the details (and lots of quotes I transcribed straight from Gruber's video) but now I'll summarize my thoughts quickly below:

1. Very few languages for Siri (where's Spanish?)

2. No API announced for developers to add tasks to Siri

3. It's still named...Siri? What happened to Assistant? I guess it is quicker to say in practice...

4. Siri's in BETA? Is this the first time Apple's released a major iPhone feature with a beta sticker?

5. No payments integration with Siri mentioned. Can it buy stuff for me? as Gruber talked about in 2008?

6. No Facebook partnership for social knowledge on Siri, or even iPad app.

mercurio · on Oct 4, 2011

I don't think Siri is in beta. I think only the 'voice dictation for arbitrary text input' feature is in beta.

revorad · on Oct 4, 2011

So I guess you're not that fascinated any more, Ian? Maybe now you understand my cynical reactions to your original post :-P

wvenable · on Oct 4, 2011

I'm sure the jail breakers will solve this problem.

0x12 · on Oct 4, 2011

They shouldn't have to. If there are no technical reasons why this can't work it should.

blhack · on Oct 4, 2011

This has been standard operating procedure on iOS devices since...ever.

There's also no reason that Safari can't upload photos out of the browser...except the fact that it would allow devs to write web-based apps that use the camera, meaning less mindshare going to iOS dev. Right now (well, as of about a month ago, the last time I looked), that's impossible.

Terretta · on Oct 4, 2011

Agreed about Safari, but iCab for iOS can upload photos from the browser. I have uploaded avatars to Twitter etc using iCab and those sites' regular upload forms.

illumin8 · on Oct 4, 2011

I believe the voice recognition accuracy of Siri is far superior to Google Voice Actions and requires the dual core CPU.

Google has taken a different approach, where your voice sample is uploaded to a Google server, processed, and downloaded back to the device. This takes less CPU power but is also far less accurate, as the voice sample must be very low quality to have a quick response time from Google's server.

Apple/Siri are taking the approach that high quality voice recognition must be done on device in order to provide the level of performance and accuracy that voice recognition requires. I think we will find that Siri actually works and doesn't have as many errors as Google's voicemail transcription.

This is the reason for requiring iPhone 4S.

drzaiusapelord · on Oct 4, 2011

Why would it be less accurate? I've worked with voice quite a bit at my last job and the codecs and bitrates for voice don't need to be this heavy-duty CD-quality stuff. Your big limitation is going to be those cheesy mics and background noise, wind, etc.

Voice compresses nicely. Turns out we humans aren't capable of making such varied sound that it can't compress. Our mouth holes are ancient technology.

In practice, I'm in love with Google's voice capabilities. It seems to understand context. Its crazy how accurate it is. I often tease my iphone friends with it. I'm also highly skeptical that an application on a phone can outdo google's massive libraries and server infrastructure. If anything, I'd expect the Apple voice to be worse. Regardless, I can't wait to see this stuff in action. A war for the best voice recognition would be great right now as its been a patent blocked and ignored field for the most part.

brandonb · on Oct 4, 2011

You've got it backwards. Modern speech recognizers have a vocabulary of a million words and multi-gigabyte models. It's generally much more accurate to do speech recognition in the cloud, since you have more processing power and more RAM to hold large statistical models.

The rumor is that Apple is sending the audio to Nuance servers, i.e., they're doing cloud-based speech recognition.

illumin8 · on Oct 5, 2011

False, they're doing on device speech recognition. Servers are only used if you request information from the Internet. Look at the demos and read the hands on reports. On device recognition makes it much more usable than Google Voice Actions.

timrichard · on Oct 5, 2011

+1 on the cloud advantage.

I tried Google Voice Actions on my Nexus One quite a while ago, but it was optimised for the US market. The accuracy for me was so bad that I didn't bother with it.

Then recently, a Google blog on RSS said the latest app had been optimised for my locale. Now, of course, it's spookily accurate.

You can't beat your algorithms in the cloud being bombarded with sample data round the clock.

rwolf · on Oct 4, 2011

    This takes less CPU power but is also far less accurate,
    as the voice sample must be very low quality to have a
    quick response time from Google's server.

Has there been a comparison of the accuracy of the two services? This claim seems unsupported.

doctoboggan · on Oct 4, 2011

Yeah but there is a monetary reason why this shouldn't work, and that is reason enough for most companies.

rkudeshi · on Oct 4, 2011

They probably don't want to worry about backwards compatibility.

By focusing exclusively on the newest model, they can maximize the power of the feature.

jcampbell1 · on Oct 4, 2011

I think the main issue is they don't want to rollout a beta product to 100M devices all at once. The secret to quality voice recognition is a great training set. The experience will get better over time. My guess is it will be rolled out to all devices after a beta period.

mercurio · on Oct 4, 2011

It might be that the Siri functionality is tied to the A5 chip. The A5 design is a lot larger than other dual core ARM A9 designs (even after accounting for the larger GPU) and thus has lots of spare silicon to use for specialized circuits. It is possible that Apple added a custom DSP to aid speech recognition.

rev087 · on Oct 5, 2011

It seems the actual speech recognition happens on a remote server, not in the device itself.

shriphani · on Oct 4, 2011

I was just checking the hardware specs on ifixit. Looks like the iPhone4 has 3 microphones in it. So I am guessing they are sending audio from more than 1 mic. I found no such hardware info on the teardown of 3GS and below. Maybe I didn't look hard enough but there does seem to be a hardware-related reason for doing this.

EDIT : I believe > 1 src is needed for fighting noise so you sample the same audio from 2 locations and can distinguish between multiple audio sources in the input signal. This seemed to be one of the selling points of the mic array in the kinect

RexRollman · on Oct 4, 2011

We won't know this for a little while but there might have been changes made to the software that Apple felt required the extra processing power of the A5. Of course, we won't know for sure until someone jailbrakes and tried it on older hardware.

roc · on Oct 4, 2011

For that matter there may be custom signal processing hardware inside to support Apple's implementation.

Particularly as they did just advertise that they added a custom ISP for the camera. I'd be surprised if they wouldn't go to similar lengths for the voice assistant.

saygt · on Oct 4, 2011

It would've been great to see Siri being adopted on multiple platforms. Such a waste!