Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem isn't voice, it's natural language.

Natural language is a fundamentally wrong vehicle to convey information to a computer. It can be useful for some specific tasks, automated Q/A, simple interfaces to databases, stuff where I can't be properly f_ed to remember the syntax or the shortcut like IDE commands.

But the idea it can replace formal language is fundamentally and dangerously incorrect. I agree with Dijkstra's quip, we shouldn't regard formal language as a burden, but rather as a privilege.



I'd be perfectly happy with a list of Siri commands that I would have to learn to be able to do things. I don't care if I ended up sounding like:

Hey Siri

Turn lights on 50 percent

For one hour

Dim over that time

Play music.

I can learn what I need to do; JUST LET ME KNOW THE MAGIC WORDS!


It's like playing Zork all over again.


A lisp compiler in a voice assistant would seem like an improvement in that the user could define objects and then express the actions to be performed in the same room. But these assistants seem to drop objects between commands making them hard to program conversationally.

I guess a list like language would be ideal and the pauses would be like parentheses


But with the added complexity that sometimes the speech-to-text will just crap out completely.


Alexa, turn on lights

...I don't know how to do that

Alexa, turn lights on

...What do I turn the lights with?

Alexa, activate lights

...I don't know what you mean

...It is pitch black. You are likely to be eaten by a grue.

ALEXA TURN ON THE DAMN LIGHTS

...I don't know the word "lights"

...Oh no! You have walked into the slavering fangs of a grue!

** You have died **


Siri, turn on bathroom lights.

Downstairs or upstairs bathroom?

Downstairs.

Sorry, I didn’t understand. Downstairs it upstairs bathroom?

Downstairs bathroom.

Sorry, I didn’t understand. Downstairs it upstairs bathroom?

Cancel.

Ok. Cancelling.

Siri turn on downstairs bathroom lights.

(Turns off all lights)


For me, about once a week it's

"hey siri?"

(no response, no icon),

"hey siri?"

(no response, no icon),

"hey siri?" (louder)

(no response, no icon),

"hey siri?" (louder and slower)

(no response, no icon),

reboot iphone 13 pro

"hey siri?"

works


“Did you mean ‘bathroom LED’ or ‘bathroom’?”

Because god help you if your device names are similar to your room names…


I’ve taken to naming my lights things like Greg, The Beacons, etc.

And I added scenes so I can say “Gondor calls for aid” and the beacons will light.


Yes. And it may be worth noting that Zork is literally something like 50 year old parser technology.


Not to take away from your point (I'd like the magic list too) but to some degree, this can be worked around using Shortcuts. If you use inputs, Siri will prompt for them which is a bit slow but you could even use a dictate text and parse yourself if desired.


I highly doubt there is "a" magic list. I'll bet the magic list changes constantly.


I noticed a drop in usability about the time they went with ML.


Same with the predictive keyboard, it feels more random now.


i don’t know that you can do exactly all these things, but is this the use case for custom routines in the amazon ecosystem.

you great the prompt and add one or more actions to take.


On the other side, humans have been fine using natural language to delegate commands to each other.

So maybe it's just that the subfield of natural language understanding is still too early to be really useful. Speech recognition itself has gotten really good but then understanding the context, the intent, etc, all that is natural language understanding, and that is often the problem.


> have been fine

Citation needed, there's a lot of disagreements and misunderstandings (some have cost lives) that could've been avoided if we didn't have 10 different ways to say the same vague thing that can be interpreted in 20 ways. You think the military uses a phonetic alphabet and specifically structured communications for fun? Or the way planes talk to ATC for example. Where precision and unambiguity is crucial, natural language always gets ditched for something more formal.


This is actually an interesting point. In the Army, we used terms that limited ambiguity thereby increasing efficiency. Even if one eliminates the complexity of language, there's still a specification problem.

I only use voice assistants to set alarms. I cannot imagine voice as a primary input. Then again, many have opted out of owning desktops and laptops in favor of mobile phones. That also seems terribly inefficient.


>Then again, many have opted out of owning desktops and laptops in favor of mobile phones. That also seems terribly inefficient

A lot of people don't need computers in the general purpose sense. I admit my mind boggles a bit when co-workers tell me their kids don't want a computer to do their school papers because their phone is fine. But, then, I'm used to keyboards and what we think of as a "computer" and have been using one for decades--and grab one when I can for any remotely complex or input-heavy task.


> A lot of people don't need computers in the general purpose sense. I admit my mind boggles a bit when co-workers tell me their kids don't want a computer to do their school papers because their phone is fine.

I grew up in the 1980s, when handwritten papers were still the norm. I do see the advantages of using a word-processor for writing papers, but don't see why it would be a necessity (at least, until University).


I think the implication is that the kids use a word processor on their phone.


It sounds ridiculous, but I'll admit that when you've got something like Dex that lets you dock the phone for usb and hdmi out and gives you close to a full desktop OS I'd imagine it really is enough for the casual user.


I certainly know colleagues in the industry who travel with just a tablet and external keyboard. No, they're not running IDEs etc., but they find it OK for emails, editing docs, taking notes, etc. Personally I'll spend the extra few pounds to also carry along a laptop. But I can imagine not needing/wanting a dedicated laptop when I travel at some point.


Is a tablet and keyboard really much lighter than a laptop?

https://www.theverge.com/2020/4/20/21227741/apple-ipad-pro-m...

Suggests a keyboard and large tablet is heavier than a laptop


I'm usually carrying a tablet anyway though for entertainment/reading purposes. So it's usually a choice of tablet + laptop vs. tablet + keyboard. (I admittedly don't really have a weight optimized travel laptop these days either.)

I actually do wish there were good Mac or Chromebook choices for a travel 11" or so laptop but the market seems to have settled on a thin 13" as the floor and, admittedly, the weight/size difference isn't huge.


While I am mostly a Mac person, for travel I often prefer a tiny and cheap Lenovo Chromebook that does everything (a bit poorly): Linux containers for light weight programming and writing, consume media like books, audiobooks, and streaming.

In response to a grandparent comment about weight for tablets: I prefer Apple’s folio old style of cases/keyboards because of weight. I have one for both my small and large iPad Pros. Whenever I travel, I usually just take one of my iPads if I don’t need a dev environment [1].

[1] but with GitHub Codespaces and Google Colab, development on an iPad is sort of OK.


I still don't see the point of tablets. It's just a smartphone with a larger screen, and practically all people already carry phones.

Might as well go for the laptop at that point given that it can actually do far more imo, unless you ditch the phone and go for one of those half phone half tablets I guess.


I'd rather watch movies, read, play certain games, etc. on my tablet than on a phone. (Obviously there are also specific use cases like digital art.) That said, I mostly use my tablet when traveling and it's a distant third in necessity compared to either a laptop or a phone--and only somewhat more useful than a smartwatch.


Watching movies on a tablet is terrible, though. All methods for propping the device up so you can watch the movie are inferior to the way a laptop screen props itself up via hinges and a base.


On a plane I'd rather use the tablet in my lap than have to put the tray table down. And in a hotel room I'm watching on the couch if there is one. (I do also have an attachment for my tablet that will let you prop it up on a table but I mostly don't use it because it adds weight.)

For reading, I'm probably bringing my Kindle along if I don't bring my tablet.


I bought a surface for that reason. I like the portability, and it is just a normal PC with a pretty bad keyboard.


If you do not have one, buy a dock! I have a sp6 and 4 , and having the dock makes it quite the device. Speakers, multiple external monitors, keyboard, mouse -- a full desktop setup, I can grab and either stick a keyboard cover on or just use as a reading device on the couch.

Back to work? Sit on table, one cable and it's back to a desktop and charging up again.

Makes the whole thing make far more sense.


How old are you? Because larger screens become really nice as your eyes go bad. And I don't need the full size of a laptop for things I'd want to do on a tablet.


The obsession with being lighter definitely has diminishing returns. At some point another few ounces doesn't make any difference in a real, practical sense. I think have just started to associate "lightness" == "better" despite there being no actual benefit past a certain threshold.


Right at some point. But at the current point my tablet is too heavy to hold in hand for more than 20 secs perhaps. Phone is ok. Tablet is not (for me). I only use tablet by placing it on table or a stand. Then actually using a laptop is much better than a table.

The killer-tech will be when we have a tablet that is as light as phone.


Thanks for that. A lot of energy is currently sunk because of natural language, and I'd argue gains from employing software (instead of human processes) for various tasks is in part due to scaling up the results of many confusing discussions in natural language about what a specific process actually comprises.


This is part of the reason Google search sucks more and more.

Around when Android appeared, and the first voice searches began, Google suddenly started to alias everything.

Search for 'Andy', 'Andrew' appears. Search for 'there', and 'they're' appears.

This has been taken further, now silly aliases such as debian .. ubuntu exist, and as google happily drops words in your search, to find a match, this makes precision impossible.

But, that's the only way to make voice search remotely work, so...


I don't think this is to support voice search: Google generally knows whether a query was initiated by voice or typing. Instead, I think it's because most users find what they're looking for faster with it.

If you have terms you don't want interpreted broadly you can put them in quotes.


Google "helpfully" ignores the quotes sometimes too. They're not the hard and fast rule they used to be.

I preached the Gospel of Google when the competition was composed of web rings and Altavista, but Google in its infinite wisdom has abandoned the advanced user with changes of this nature.


Pretty sure quote support has improved recently.

https://blog.google/products/search/how-were-improving-searc...


Considering the article lies, and tries to claim quotes always are respected, I wouldn't put much faith in it.


So what is the gospel de jour, or are we forsaken in these benighted times?


Most people are not precise enough in their terminology.


I find voice-assistant often useful for using the phone such as opening a given setting, say make the display brighter. Trying to navigate the settings pages is very error-prone. There seems to be no universal standard as to where each setting should be found.


The real problem is people keep reorganizing where the settings are found.


There is a widely accepted and straightforward thinking that humans has ideas, which are expressed in languages, and that languages being ambiguous is problematic: this I'm starting to have doubts on.

Maybe we don't have clear intentions in the first place, maybe languages are not just ambiguous, but only meant to narrow realms of valid interpretations down to a desired precision, rather than intended to form a logically fully constrained statements. Maybe this is why intelligent entities are needed to "correctly" interpret natural language statements, because an act of interpretation itself is a decision making and an action.

Just my thoughts but I do think there are more to be said than "natural languages are ambiguous".


> On the other side, humans have been fine using natural language to delegate commands to each other.

Using language to instruct humans goes wrong all the time. Just a short while ago on British Bakeoff I saw 2 of the contestants make white chocolate feathering on their biscuits by making actual feathers out of white chocolate and placing them on their biscuits. And I'm sure that will confuse quite a few people reading this too. It certainly confuses image searches. Language is a fuzzy interface. Compare to interface like clicking on a button that does the thing I want done.


How would you (easily) describe the concept of chocolate feathering to a computer without using natural language? (e.g. if you wanted the computer to generate an image, or search for an image of / recipe with chocolate feathering).


> On the other side, humans have been fine using natural language to delegate commands to each other.

And that's why all of aviation has moved to a tight phraseology, such that delegated commands are universally understood and their meaning is set in stone.

Natural language has cost many lives.


> humans have been fine using natural language to delegate commands to each other.

Not always resulting in unambiguous instructions:

"Lord Raglan wishes the cavalry to advance rapidly to the front, follow the enemy, and try to prevent the enemy carrying away the guns." ~Lord Raglan, Balaclava

"I wish him to take Cemetery Hill if practicable." ~Robert E. Lee, Gettysburg


> On the other side, humans have been fine using natural language to delegate commands to each other.

On the other hand, legalese exists and is the lingua franca of telling people what to do, and math exists.


> On the other side, humans have been fine using natural language to delegate commands to each other.

I think this is really a characterization. Mostly human communication is full of errors and problems.

What is true is that when it is important enough, humans have come up with ways that minimize communication errors and frameworks to deal with ambiguity - mostly these involve training and effort though, it really doesn't come naturally.


"really a problematic characterization"...


> humans have been fine using natural language to delegate commands to each other.

Every time we try to minimize errors, we formalize a language. I don't even think people use natural language to issue commands often. Commanding people is often considered rude.


I agree with this. We have evidence that natural language works well enough to run most of the world. AI will eventually get there.


The problem is that it's not actually a conversation. To significantly improve it, you'd want to:

- identify users by voice

- ask them clarifying questions

- remember the answers on a per-user basis

- understand "no, that was the wrong answer"

If you're going to provide a formal interface to the computer, you also have to provide teaching in that formal interface, which is far more of a burden to the user than the cost of the device. And we've completely moved away from that model (not necessarily a good thing, but that's what the market has chosen).


Calling it a burden is an assumption that ignores and belittles the end user. Sure, there are people who won't want to train their personal ai.

But I imagine there are significantly more who would appreciate clarifying requests by a teachable assistant capable of interacting with the entire digital world on their behalf, efficiently and intelligently.


I think you're right. There are glimpses of this in the voice interfaces right now. For example, Alexa will distinguish between voices and preferentially take actions for me, saying "Play Music" plays Spotify, and for my kids, it plays Amazon music.


An example backing this is voice assistants that DO work, e.g. Talon voice. But these require defining a language, and then they are very accurate and powerful.

I don't see why a voice assistant for the masses couldn't "train it's own users", for example suggesting the language it does expect. But even then, most times people are talking in noisy environments or talk to fast or don't have an understand of how the machine might work. Regardless, who cares. They ruin the audio environment of a home. They're good for setting timers while you're cooking, that's about it.


Car voice assistants do this, but they're still clunky and it takes them forever to list their options. Voice interfaces just like CLI suffer from extremely bad discoverability and presentation compared to GUIs and thus will always be limited to specialty applications. CLIs at least have a league of try-hards and hobby linux users to keep them alive.


They're also fantastic at playing soothing music while your hands are busy holding a crying baby.


Only thing I use Siri for as well.


Right - natural language works for people because we have minds that are communicating. A virtual assistant has a list of things it can do, and uses language as an interface to them. So the language just becomes obfuscation instead of allowing clarification.

I've said before, I would prefer a voice assistant that optimized for traversing its menu system, in response to unambiguous noises (could be high and low pitch hums or whatever) that lets me bypass the guessing game and use the menu it's hiding


Like this: https://www.youtube.com/watch?v=8SkdfdXWYaI ? Here you traverse the AST, but the idea is similar, I think.


The problem is that it doesn't make money.

Otherwise, it works great :-) We love the hands-off usage mode because we cook a lot, so adding things to shopping lists or looking stuff up doesn't require cleaning hands in the middle of prep. Also the speakers are pretty darn good for the size and work well for music.

Doing complicated things is right out though. But the simple stuff works fine.


I'm just waiting for someone to finally release a voice assistant built around an actual language model, like GPT-3 or LaMDA.

It would be more error prone in a lot of ways, which is probably why nobody's done it yet, but it would also be a _lot_ more powerful, and fulfill the vision of conversational AI in a way the current rules-based assistants do not.

I think if powerful language models were easily accessible to normal people (in an inexpensive and completely unrestricted fashion, like with Stable Diffusion) we'd already see this happening in the open source world. Companies are going to be a lot more hesitant to try it though until they have a way to 100% prevent the models from making mistakes that could reflect poorly on the company, which is going to take _way_ longer to achieve.


Are you trying to say, Alexa should be funding the synthetic language nerds over at Lojban[0] or the Universal Networking Language[1]???

That would be a fun universe.

[0] https://mw.lojban.org/index.php?title=Lojban&setlang=en-US

[1] https://en.wikipedia.org/wiki/Universal_Networking_Language


Natural language conveys information to other people just fine. So the problem isn't that "Natural language is a fundamentally wrong vehicle to convey information to a computer". The problem is getting the computer to understand natural language to the same level as a human.


The problem is both


> we shouldn't regard formal language as a burden, but rather as a privilege

What the hell? Is riding public transport or riding a bike either a burdain or a privilidge? Is Driving a car?

I am trying to control shit in my home, it should be neither.


Dijkstra's full essay[1] is a bit more illuminating, but essentially it's about how, for example, developing a system of symbols and formal language around mathematics has allowed "school children [to] learn to do what in earlier days only genius could achieve".

1: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/E...


I think his argument even generalizes to literacy in general. Remember that reading and writing skills don't develop naturally (as opposed to spoken language). They require a large educational investment, and used to be reserved for the wealthy and the privileged.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: