I know that I shouldn't be surprised at this point, but making it a 4S exclusive is one of the more blatant bits of planned obsolescence I've seen. I was running Siri just fine on an 2nd generation iPod touch over two years ago, and looking at the demo, it doesn't seem to be much different, just able to hook into Apple internal APIs.
The post got pretty popular on HN and I had 5 key questions/predictions, feel free to read all the details (and lots of quotes I transcribed straight from Gruber's video) but now I'll summarize my thoughts quickly below:
1. Very few languages for Siri (where's Spanish?)
2. No API announced for developers to add tasks to Siri
3. It's still named...Siri? What happened to Assistant? I guess it is quicker to say in practice...
4. Siri's in BETA? Is this the first time Apple's released a major iPhone feature with a beta sticker?
5. No payments integration with Siri mentioned. Can it buy stuff for me? as Gruber talked about in 2008?
6. No Facebook partnership for social knowledge on Siri, or even iPad app.
This has been standard operating procedure on iOS devices since...ever.
There's also no reason that Safari can't upload photos out of the browser...except the fact that it would allow devs to write web-based apps that use the camera, meaning less mindshare going to iOS dev. Right now (well, as of about a month ago, the last time I looked), that's impossible.
Agreed about Safari, but iCab for iOS can upload photos from the browser. I have uploaded avatars to Twitter etc using iCab and those sites' regular upload forms.
I believe the voice recognition accuracy of Siri is far superior to Google Voice Actions and requires the dual core CPU.
Google has taken a different approach, where your voice sample is uploaded to a Google server, processed, and downloaded back to the device. This takes less CPU power but is also far less accurate, as the voice sample must be very low quality to have a quick response time from Google's server.
Apple/Siri are taking the approach that high quality voice recognition must be done on device in order to provide the level of performance and accuracy that voice recognition requires. I think we will find that Siri actually works and doesn't have as many errors as Google's voicemail transcription.
Why would it be less accurate? I've worked with voice quite a bit at my last job and the codecs and bitrates for voice don't need to be this heavy-duty CD-quality stuff. Your big limitation is going to be those cheesy mics and background noise, wind, etc.
Voice compresses nicely. Turns out we humans aren't capable of making such varied sound that it can't compress. Our mouth holes are ancient technology.
In practice, I'm in love with Google's voice capabilities. It seems to understand context. Its crazy how accurate it is. I often tease my iphone friends with it. I'm also highly skeptical that an application on a phone can outdo google's massive libraries and server infrastructure. If anything, I'd expect the Apple voice to be worse. Regardless, I can't wait to see this stuff in action. A war for the best voice recognition would be great right now as its been a patent blocked and ignored field for the most part.
You've got it backwards. Modern speech recognizers have a vocabulary of a million words and multi-gigabyte models. It's generally much more accurate to do speech recognition in the cloud, since you have more processing power and more RAM to hold large statistical models.
The rumor is that Apple is sending the audio to Nuance servers, i.e., they're doing cloud-based speech recognition.
False, they're doing on device speech recognition. Servers are only used if you request information from the Internet. Look at the demos and read the hands on reports. On device recognition makes it much more usable than Google Voice Actions.
I tried Google Voice Actions on my Nexus One quite a while ago, but it was optimised for the US market. The accuracy for me was so bad that I didn't bother with it.
Then recently, a Google blog on RSS said the latest app had been optimised for my locale. Now, of course, it's spookily accurate.
You can't beat your algorithms in the cloud being bombarded with sample data round the clock.
This takes less CPU power but is also far less accurate,
as the voice sample must be very low quality to have a
quick response time from Google's server.
Has there been a comparison of the accuracy of the two services? This claim seems unsupported.
I think the main issue is they don't want to rollout a beta product to 100M devices all at once. The secret to quality voice recognition is a great training set. The experience will get better over time. My guess is it will be rolled out to all devices after a beta period.
It might be that the Siri functionality is tied to the A5 chip. The A5 design is a lot larger than other dual core ARM A9 designs (even after accounting for the larger GPU) and thus has lots of spare silicon to use for specialized circuits. It is possible that Apple added a custom DSP to aid speech recognition.
I was just checking the hardware specs on ifixit. Looks like the iPhone4 has 3 microphones in it. So I am guessing they are sending audio from more than 1 mic. I found no such hardware info on the teardown of 3GS and below. Maybe I didn't look hard enough but there does seem to be a hardware-related reason for doing this.
EDIT : I believe > 1 src is needed for fighting noise so you sample the same audio from 2 locations and can distinguish between multiple audio sources in the input signal. This seemed to be one of the selling points of the mic array in the kinect
We won't know this for a little while but there might have been changes made to the software that Apple felt required the extra processing power of the A5. Of course, we won't know for sure until someone jailbrakes and tried it on older hardware.
For that matter there may be custom signal processing hardware inside to support Apple's implementation.
Particularly as they did just advertise that they added a custom ISP for the camera. I'd be surprised if they wouldn't go to similar lengths for the voice assistant.