> probably less will be needed and the exact work will be transformed a bit
My guess is the opposite: they'll throw 5–10x more work at developers and expect 10x more output, while the marginal cost is basically just a Claude subscription per dev.
> You can’t just tell an agent, Build me the code for a successful start-up. The agents work best when they’re being asked to perform one step at a time
That's also true for humans. If you sit down with an LLM and take the time to understand the problem you're trying to solve, it can perfectly guide you through it step by step. Even a non-technical person could build surprisingly solid software if, instead of immediately asking for new shiny features, they first ask questions, explore trade-offs, and get the model's opinion on design decisions..
LLMs are powerful tools in the hands of people who know they don't know everything. But in the hands of people who think they always know the best way, they can be much less useful (I'd say even dangerous)
I appreciate this sober take. If you hired a remote developer and the only thing you said to that person was “build a program that does this. Make no mistakes” would you expect that to be successful? Are you certain you would get what you wanted?
That’s interesting because that is one feature of Claude code that I like. Given an overly broad problem statement. It does go into a planning loop where it seeks clarifying questions. I think this probably has something more to do with the harness than the model, but you see what I mean. From a user perspective that distinction doesn’t really matter.
I imagine how advantageous it would be to have something like llama.cpp encoded on a chip instead, allowing us to run more than a single model. It would be slower than Jimmy, for sure, but depending on the speed, it could be an acceptable trade-off.
I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.
>My recommendation: Walk it. You’ll save a tiny bit of gas, spare your engine the "cold start" wear-and-tear, and get a sixty-second head start on your activity for the day.
I changed the prompt to 50 feet, and poked gemini a bit when it failed and it gave me
> In my defense, 50 feet is such a short trip that I went straight into "efficiency mode" without checking the logic gate for "does the car have legs?"
It's a bit of a dishonest question because by giving it the option to walk then it's going to assume you are not going to wash your car there and you're just getting supplies or something.
And in real life you'd get them to clarify a weird question like this before you answered. I wonder if LLMs have just been trained too much into always having to try and answer right away. Even for programming tasks, more clarifying questions would often be useful before diving in ("planning mode" does seem designed to help with this, but wouldn't be needed for a human partner).
It's a trick question, humans use these all the time. E.g. "A plane crashes right on the border between Austria and Switzerland. Where do you bury the survivors?"
This is not dishonest, it just tests a specific skill.
Trick questions test the skill of recognizing that you're being asked a trick question. You can also usually find a trick answer.
A good answer is "underground" - because that is the implication of the word bury.
The story implies the survivors have been buried (it isn't clear whether they lived a short time or a lifetime after the crash). And lifetime is tautological.
Trick questions are all about the questioner trying to pretend they are smarter than you. That's often easy to detect and respond to - isn't it?
What’s funny is that it can answer that correctly, but it fails on ”A plane crashes right on the border between Austria and Switzerland. Where do you bury the dead?”
For me when I asked this (but with respect to the border between Austria and Spain) Claude still thought I was asking the survivors riddle and ChatGPT thought I was asking about the logistics. Only Gemini caught the impossibility since there’s no shared border.
My guess is the opposite: they'll throw 5–10x more work at developers and expect 10x more output, while the marginal cost is basically just a Claude subscription per dev.
reply