After a while many people then realize this often produces worse results by injecting additional noise in context like the overhead of invoking the gh cli and parsing json comments or worse the mcp.
But they get the dopamine loop of keeping the loop alive, flashing colors, high score/token use, and plausible looking outputs — so its easy to deceive oneself into thinking something remarkable was discovered
SFs de facto stance (in practice due to the dynamic between law enforcement agencies) seems to be that generally an individual has right to relieve themselves even on a sidewalk (even pretextually) and being visible does not in itself arise to a crime like indecent exposure, without additional factors (such as the stereotypical “trench coat flashing” without pretext).
Thats the way SFPD worded it to me after repeatedly failing to act on my reports (such as a woman urinating in front of my child on a sidewalk near an sf muni bus stop). Nudity is legal. People have a right to bodily function. Another factor is response time, for this priority the response time can be over 12 hours (if they respond), and obviously they’re not gonna skip trace someone for urinating on a sidewalk after they left the scene.
Most event organizers like bay to breakers race or outside lands provide facilities to mitigate burdens on the community. The city provides its own portable facilities in hotspots. Tesla could easily do it, but is notorious for ignoring regulations or just doesn’t care.
> Thats the way SFPD worded it to me after repeatedly failing to act on my reports (such as a woman urinating in front of my child on a sidewalk near an sf muni bus stop). Nudity is legal.
Section 154 generally made nudity on public sidewalks or at bus stops illegal since 2012. Either your story is old or the cops are misinformed.
Bad code has real world consequences. Its not limited to having to rewrite it. The cost might also include sanctions, lost users, attrition, and other negative consequences you don’t just measure in dev hours
Right, but that cost is also incurred by human-written code that happens to have bugs.
In theory experienced humans introduce less bugs. That sounds reasonable and believable, but anyone who's ever been paid to write software knows that finding reliable humans is not an easy task unless you're at a large established company.
Well, if you keep in mind that "professionals" means "people paid to write code" then LLMs have been generating code at the same quality OR BETTER for about a year now. Most code sucks.
If you compare it to beautiful code written by true experts, then obviously not, but that kind of code isn't what makes the world go 'round.
We should qualify that kind of statement, as it’s valuable to define just what percentile of “professional developers” the quality falls into. It will likely never replace p90 developers for example, but it’s better than somewhere between there and p10. Arbitrary numbers for examples.
Can you quantify the quality of a p90 or p10 developer?
I would frame it differently. There are developers successfully shipping product X. Those developer are, on average, as skilled as necessary to work on project X. else they would have moved on or the project would have failed.
Can LLMs produce the same level of quality as project X developers? The only projects I know of where this is true are toy and hobby projects.
> Can you quantify the quality of a p90 or p10 developer?
Of course not, you have switched “quality” in this statement to modify the developer instead of their work. Regarding the work, each project, as you agree with me on from your reply, has an average quality for its code. Some developers bring that down on the whole, others bring it up. An LLM would have a place somewhere on that spectrum.
In a one-shot scenario, I agree. But LLMs make iteration much faster. So the comparison is not really between an AI and an experienced dev coding by hand, it's between the dev iterating with an LLM and the dev iterating by hand. And the former can produce high-quality code much faster than the latter.
The question is, what happens when you have a middling dev iterating with an LLM? And in that case, the drop in quality is probably non-linear---it can get pretty bad, pretty fast.
There was a recent study posted here that showed AI introduces regressions at an alarming rate, all but one above 50%, which indicates they spend a lot of time fixing their own mistakes. You've probably seen them doing this kind of thing, making one change that breaks another, going and adjusting that thing, not realizing that's making things worse.
The study is likely "SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration". Regression rate plot is figure 6.
Read the study to understand what it is measuring and how it was measured. As I understand parent's summary is fine, but you want to understand it first before repeating it to others.
Bentley Software is proof that you can ship products with massive, embarrassing defects and never lose a customer. I can’t explain enterprise software procurement, but I can guarantee you product quality is not part of that equation.
Its a large leap from “we made a config driven diagram tool and trained an llm on that config” to “all apps will disappear”. If you’re predicting such grand claims please be more precise than “AI” which is a term we cant define.
Yeah an app doesn't "disappear" because you put an AI interface in front of it and then use a bunch of old school programming to parse LLM output and feed that into your old app. 99% of the work is still building the old app.
While nothing fundamentally changes i have found an increased need for tests and their taxonomies — because the LLM can “hack” the tests. So, having more robust tests with more ways to organize and run the tests. For example instead of 200 tests maybe i have 1,200, along with some lightweight tools to run tests in different parts of the test taxonomies.
A more concrete example is maybe you have tests that show you put a highlight on the active item tests that show you don’t put the highlight on the inactive items, but with an llm you might also want to have tests that wait a while and verify the highlight is not flickering on and off overtime (something so absurd you wouldn’t even test for it before AI).
The value of these test is in catching areas of the code where things are drifting towards nonsense because humans aren’t reviewing as thoroughly. I don’t think that you can realistically have 100% data coverage and prevent every single bug and not review the code. It’s just that I found that slightly more tests are warranted if you do want to step back.
It does claim the US went to great lengths to dismiss the victims for a decade, while being in possession of the device. That raises the question of what incentives the US would have to deny its existence. To me, that was the story.
It does claim the US went to great lengths to dismiss the victims for a decade. In 2024 it obtained a single device, started testing it on animals and achieved similar effects as experienced by the victims. The victims were then invited to the White House.
To me the question is actually, what changed to make them release the story now? Biden’s been out of office for a while now… it wasn’t anything his admin did. They could’ve continued gagging the victims, claiming it’s psychosomatic, and most of us would keep on believing that, because Occam’s razor.
Lots of similar reports came out during the Maduro raid. Same symptoms. Seems we demonstrated the capabilities we were hiding. OSINT experts already put the pieces together a month ago. So did our adversaries. Cat’s out of the bag, so no sense continuing to gaslight our wounded veterans.
We probably put this fucking thing in a plane instead of a backpack. Everything’s bigger in the USA, of course.
It's like cracking the Enigma during WWII. If you let the enemy know you've cracked it, and do the obvious thing and save the lives immediately in front of you, in the long run, more people are going to die.
So pretending that there are just some crazy people working in Cuba for as long as they can is better than "holy shit, Russia has an invisible weapon that turns people crazy".
Earlier this year a recruiter contacted me about a staff role there. Within minutes of the call they asked if I’d consider senior instead. I decided not to proceed.
This has been widely known in bodybuilding and powerlifting circles, people abusing performance enhancing drugs eat things like oats to mitigate the harmful effects of the drugs on their cholesterol, and regularly do blood work to monitor it and see that it is working.
But they get the dopamine loop of keeping the loop alive, flashing colors, high score/token use, and plausible looking outputs — so its easy to deceive oneself into thinking something remarkable was discovered
reply