Hacker Newsnew | past | comments | ask | show | jobs | submit | mnicky's commentslogin

This observation makes sense, because all models currently probably use some kind of a sparse attention architecture.

So the closer the two related pieces of information are to each other in the input context, the larger the chance their relationship will be preserved.


He's trying to make it sound so, but in legal domain, devil lies in the details.

It seems that government wanted to use Claude for mass analysis of commercially obtained data on American people and Anthropic wouldn't let them (source: https://www.theatlantic.com/technology/2026/03/inside-anthro... ).

DoD kept asking for changes of contract where at least the legalese would be changed to somewhat more permissive but Anthropic stayed their ground.

Sam Altman probably let them do that, while using language like "we have technical means of oversight and the same red lines as Anthropic". But in reality they will allow DoD to do what Anthropic didn't.

See this for more information: https://www.lesswrong.com/posts/PBrggrw4mhgbksoYY/a-tale-of-...


> Very often, after a correction, it will focus a lot on the correction itself making for weird-sounding/confusing statements in commit messages and comments.

I've experienced that too. Usually when I request correction, I add something like "Include only production level comments, (not changes)". Recently I also added special instruction for this to CLAUDE.md.


Since some time, Claude Codes's plan mode also writes file with a plan that you could probably edit etc. It's located in ~/.claude/plans/ for me. Actually, there's whole history of plans there.

I sometimes reference some of them to build context, e.g. after few unsuccessful tries to implement something, so that Claude doesn't try the same thing again.


Can you compare it to Opus 4.6 with thinking disabled? It seems to have very impressive benchmark scores. Could also be pretty fast.


Added a thinking-disabled Opus 4.6 timing. It took 1m 4s – coincidentally the same as 5.3-codex-low.


> What am I missing?

Largest production capacity maybe?

Also, market demand will be so high that every player's chips will be sold out.


> Largest production capacity maybe?

Anyone can buy TSMC's output...


Which I'm sure is 100% reserved through at least 2030.


Aren't they building new fabs, though? Or even those are already booked?


Can anyone buy TSMC though?


No. TSMC will not take the risk on allocating capacity to just anyone given the opportunity cost.


Not without an army


Well, fair comparison would be with GPT-5.x Pro, which is the same class of a model as Gemini Deep Think.


> can a sufficiently large non thinking model perform the same as a smaller thinking?

Models from Anthropic have always been excellent at this. See e.g. https://imgur.com/a/EwW9H6q (top-left Opus 4.6 is without thinking).


its interesting that opus 4.6 added a paramter to make it think extra hard.


At least now we also have a tracker: https://marginlab.ai/trackers/claude-code/


Saw this the other day and loved it. Especially seeing Opus 4.5 degrading prior to the 4.6 release (IIRC) and Codex staying very stable and even improving over time.

But FYI the blog post is not about the actual model being dumbed down, but the command line interface.


What I haven't seen discussed anywhere so far is how big a lead Anthropic seems to have in intelligence per output token, e.g. if you look at [1].

We already know that intelligence scales with the log of tokens used for reasoning, but Anthropic seems to have much more powerful non-reasoning models than its competitors.

I read somewhere that they have a policy of not advancing capabilities too much, so could it be that they are sandbagging and releasing models with artificially capped reasoning to be at a similar level to their competitors?

How do you read this?

[1] https://imgur.com/a/EwW9H6q


Intelligence per token doesn't seem quite right to me.

Intelligence per <consumable> feels closer. Per dollar, or per second, or per watt.


It is possible to think of tokens as some proxy for thinking space. At least reasoning tokens work like this.

Dollar/watt are not public and time has confounders like hardware.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: