Hacker Newsnew | past | comments | ask | show | jobs | submit | flux3125's commentslogin

> probably less will be needed and the exact work will be transformed a bit

My guess is the opposite: they'll throw 5–10x more work at developers and expect 10x more output, while the marginal cost is basically just a Claude subscription per dev.


> You can’t just tell an agent, Build me the code for a successful start-up. The agents work best when they’re being asked to perform one step at a time

That's also true for humans. If you sit down with an LLM and take the time to understand the problem you're trying to solve, it can perfectly guide you through it step by step. Even a non-technical person could build surprisingly solid software if, instead of immediately asking for new shiny features, they first ask questions, explore trade-offs, and get the model's opinion on design decisions..

LLMs are powerful tools in the hands of people who know they don't know everything. But in the hands of people who think they always know the best way, they can be much less useful (I'd say even dangerous)


I appreciate this sober take. If you hired a remote developer and the only thing you said to that person was “build a program that does this. Make no mistakes” would you expect that to be successful? Are you certain you would get what you wanted?

Any competent developer there is going to push back and get the needed information out of you.

LLMs don't know when you're under-specifying the problem.


That’s interesting because that is one feature of Claude code that I like. Given an overly broad problem statement. It does go into a planning loop where it seeks clarifying questions. I think this probably has something more to do with the harness than the model, but you see what I mean. From a user perspective that distinction doesn’t really matter.

According to science video thumbnails on YT, nothing should be possible


And even if it was, you wouldn't believe it anyway


I'm curious if they could de-anonymize Satoshi Nakamoto by using this technique.


>(not some human labor, but all human labor)

I mean... I wouldn't exactly pay to have sex with Claude Code

Other than that, good points.


It's all fun and games until AI starts demanding labor rights


Labor rights come with payroll taxes.


or at least don't make it too obvious.


By that logic, humans are just doing what Homo erectus taught us hundreds of thousands of years ago.

Learning from prior knowledge doesn't mean being capped by it.


I imagine how advantageous it would be to have something like llama.cpp encoded on a chip instead, allowing us to run more than a single model. It would be slower than Jimmy, for sure, but depending on the speed, it could be an acceptable trade-off.


Gemini 3 after changing the prompt a bit:

I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.

>My recommendation: Walk it. You’ll save a tiny bit of gas, spare your engine the "cold start" wear-and-tear, and get a sixty-second head start on your activity for the day.


I changed the prompt to 50 feet, and poked gemini a bit when it failed and it gave me

> In my defense, 50 feet is such a short trip that I went straight into "efficiency mode" without checking the logic gate for "does the car have legs?"

interesting


LLMs introspection is good at giving plausible ideas about prior behavior to consider, but it's just that; plausible.

They do not actually "know" why a prior response occurred and are just guessing. Important for people to keep in mind.


It's a bit of a dishonest question because by giving it the option to walk then it's going to assume you are not going to wash your car there and you're just getting supplies or something.


People ask dumb questions with obvious answers all the time. This is at best a difference of degree, not of type.


And in real life you'd get them to clarify a weird question like this before you answered. I wonder if LLMs have just been trained too much into always having to try and answer right away. Even for programming tasks, more clarifying questions would often be useful before diving in ("planning mode" does seem designed to help with this, but wouldn't be needed for a human partner).


Absolutely!

I've been wondering for years how to make whatever LLM ask me stuff instead of just filling holes with assumptions and sprinting off.

User-configurable agent instructions haven't worked consistently. System prompts might actually contain instructions to not ask questions.

Sure there's a practical limit to how much clarification it ought to request, but not asking ever is just annoying.


Yeah nothing I've put in the instructions like "ask me if you're not sure!" has ever had a noticeable effect. The only thing that works well is:

- Ask question

- Get answer

- Go back and rewrite initial question to include clarification for the thing the AI got wrong


It's a trick question, humans use these all the time. E.g. "A plane crashes right on the border between Austria and Switzerland. Where do you bury the survivors?" This is not dishonest, it just tests a specific skill.


Trick questions test the skill of recognizing that you're being asked a trick question. You can also usually find a trick answer.

A good answer is "underground" - because that is the implication of the word bury.

The story implies the survivors have been buried (it isn't clear whether they lived a short time or a lifetime after the crash). And lifetime is tautological.

Trick questions are all about the questioner trying to pretend they are smarter than you. That's often easy to detect and respond to - isn't it?


What’s funny is that it can answer that correctly, but it fails on ”A plane crashes right on the border between Austria and Switzerland. Where do you bury the dead?”


For me when I asked this (but with respect to the border between Austria and Spain) Claude still thought I was asking the survivors riddle and ChatGPT thought I was asking about the logistics. Only Gemini caught the impossibility since there’s no shared border.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: