Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's really difficult for me to understand the level of cynicism in the HN comments on this topic, at all. The amount of goalpost-moving and redefining is absolutely absurd. I really get the impression that the majority of the HN comments are just people whining about sour grapes, with very little value added to the discussion.

I'd like to see someone disagree with the following:

Building a C compiler, targeting three architectures, is hard. Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard. Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.

To the specific issues with the concrete project as presented: This was the equivalent of a "weekend project", and it's amazing

So what if some gcc is needed for the 16-bit stuff? So what if a human was required to steer claude a bit? So what if the optimizing pass practically doesn't exist?

Most companies are not software companies, software is a line-item, an expensive, an unavoidable cost. The amount of code (not software engineering, or architecture, but programming) developed tends towards glue of existing libraries to accomplish business goals, which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc. No one is seriously saying that you have to use an LLM to build your high-performance math library, or that you have to use an LLM to build anything, much in the same way that no one is seriously saying that you have to rewrite the world in rust, or typescript, or react, or whatever is bothering you at the moment.

I'm reminded of a classic slashdot comment--about attempting to solve a non-technical problem with technology, which is doomed to fail--it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology.



Someone will have to know how to steer the LLM to fix/update/maintain the bespoke software they decided to use, so still a large cost there


> This was the equivalent of a "weekend project", and it's amazing

I mean, $20k in tokens, plus the supervision by the author to keep things running, plus the number of people that got involved according to the article... doesn't look like "a weekend project".

> Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard.

Is it correctly compiling it? Several people have pointed out that the compiler will not emit errors for clearly invalid code. What code is it actually generating?

> Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.

It's even harder to have a C compiler that can correctly compile SQLite and pass the test suite but then the SQLite binary itself fails to execute certain queries (see https://github.com/anthropics/claudes-c-compiler/issues/74).

> which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc.

That code might be less complex for us, but more complex for an LLM if it has to deal with lots of domain-specific context and without a test suite that has been developed for 40 years.

Also, if the end result of the LLM has the same problem that Anthropic concedes here, which is that the project is so fragile that bug fixes or improvements are really hard/almost impossible, that still matters.

> it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology

It's a discussion about what the LLMs can actually do and how people represent those achievements. We're point out that LLMs, without human supervision, generate bad code, code that's hard to change, with modifications specifically made to address failing tests without challenging the underlying assumptions, code that's inconsistent and hard to understand even for the LLMs.

But some people are taking whatever the LLM outputs at face value, and then claiming some capabilities of the models that are not really there. They're still not viable for using without human supervision, and because the AI labs are focusing on synthetic benchmarks, they're creating models that are better at pushing through crappy code to achieve a goal.


"sour grapes" means nothing in this context




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: