More

RickHull · 2026-04-07T17:51:53 1775584313

Plucked betwixt mine cheeks

RickHull · 2026-04-07T17:17:27 1775582247

I am on their "Coding Lite" plan, which I got a lot of use out of for a few months, but it has been seriously gimped now. Obvious quantization issues, going in circles, flipping from X to !X, injecting chinese characters. It is useless now for any serious coding work.

unicornfinder · 2026-04-07T17:38:22 1775583502

I'm on their pro plan and I respectfully disagree - it's genuinely excellent with GLM 5.1 so long as you remember to /compact once it hits around 100k tokens. At that point it's pretty much broken and entirely unusable, but if you keep context under about 100k it's genuinely on par with Opus for me, and in some ways it's arguably better.

airstrike · 2026-04-07T17:53:03 1775584383

100k tokens it's basically nothing these days. Claude Opus 4.6M with 1M context windows is just a different ball game

plandis · 2026-04-07T18:57:55 1775588275

Claude Opus can use a 1M context window but I’ve found it to degrade significantly past 250k in practice.

marcus_holmes · 2026-04-08T01:33:02 1775611982

Seconded. I'm getting used to the changes that happen in the conversation now, and can work out when it's time for my little coding buddy to have a nap.

And Opus is absolutely terrible at guessing how many tokens it's used. Having that as a number that the model can access itself would be a real boon.

wild_egg · 2026-04-07T19:34:19 1775590459

The Dumb Zone for Opus has always started at 80-100k tokens. The 1M token window just made the dumb zone bigger. Probably fine if the work isn't complicated but really I never want an Opus session to go much beyond 100k.

braebo · 2026-04-07T18:31:10 1775586670

The cost per message increases with context while quality decreases so it’s still generally good to practice strategic context engineering. Even with cross-repo changes on enterprise systems, it’s uncommon to need more than 100k (unless I’m using playwright mcp for testing).

bredren · 2026-04-07T18:27:19 1775586439

I had thought this, but my experience initially was that performance degradation began getting noticeable not long after crossing the old 250k barrier.

So, it has been convenient to not have hard stops / allow for extra but I still try to /clear at an actual 25% of the 1M anyhow.

This is in contrast to my use of the 1M opus model this past fall over the API, which seemed to perform more steadily.

syntaxing · 2026-04-07T18:29:47 1775586587

I’m genuinely surprised. I use copilot at work which is capped at 128K regardless of model and it’s a monorepo. Admittedly I know our code base really well so I can point towards different things quickly directly but I don’t think I ever needed compacting more than a handful in the past year. Let alone 1M tokens.

arcanemachiner · 2026-04-07T18:51:32 1775587892

Personal opinions follow:

Claude Opus at 150K context starts getting dumber and dumber.

Claude Opus at 200K+ is mentally retarded. Abandon hope and start wrapping up the session.

operatingthetan · 2026-04-07T18:12:37 1775585557

The context windows of these Chinese open-source subscriptions (GLM, Minimax, Kimi) is too small and I'm guessing it's because they are trying to keep them cheap to run. Fine for openclaw, not so much for coding.

thawab · 2026-04-07T18:28:50 1775586530

Don’t want to disappoint you, but above 200k opus memory is like a gold fish. You need to be below 150k to get good research and implementation.

arcanemachiner · 2026-04-07T18:52:14 1775587934

Oh nice, I just wrote pretty much the same comment above yours.

epolanski · 2026-04-07T20:13:34 1775592814

Quality degrades fast with context length for all models.

If you want quality you still have to compact or start new contextes often.

kay_o · 2026-04-07T18:02:13 1775584933

Is manual compation absolutely mandatory ?

DeathArrow · 2026-04-08T08:22:14 1775636534

When using GLM 5.1 in Open Code, compaction was done automatically.

jauntywundrkind · 2026-04-07T18:08:52 1775585332

I haven't screenshotted to alas, but it goes from being a perfectly reasonable chatty LLM, to suddenly spewing words and nonsense characters around this threshold, at least for me as a z.ai pro (mid tier) user.

For around a month the limit seemed to be a little over 60k! I was despondent!!

What's worse is that when it launched it was stable across the context window. My (wild) guess is that the model is stable but z.ai is doing something wonky with infrastructure, that they are trying to move from one context window to another or have some kv cache issues or some such, and it doesn't really work. If you fork or cancel in OpenCode there's a chance you see the issue much earlier, which feels like some other kind of hint about kv caching, maybe it not porting well between different shaped systems.

More maliciously minded, this artificial limit also gives them an artificial way to dial in system load. Just not delivering the context window the model has reduces the work of what they have to host?

But to the question: yes compaction is absolutely required. The ai can't even speak it's just a jumbled stream of words and punctuation once this hits. Is manual compaction required? One could find a way to build this into the harness, so no, it's a limitation of our tooling that our tooling doesn't work around the stated context window being (effectively) a lie.

I'd really like to see this improved! At least it's not 60-65k anymore; those were soul crushing weeks, where I felt like my treasured celebrated joyful z.ai plan was now near worthless.

There's a thread https://news.ycombinator.com/item?id=47678279 , and I have more extensive history / comments on what I've seen there.

The question is: will this reproduce on other hosts, now that glm-5.1 is released? I expect the issue is going to be z.ai specific, given what I've seen (200k works -> 60k -> 100k context windows working on glm-5.1).

calgoo · 2026-04-07T19:02:23 1775588543

I have gone back to having it create a todo.md file and break it into very small tasks. Then i just loop over each task with a clear context, and it works fine. a design.md or similar also helps, but most of the time i just have that all in a README.md file. I was also suspicious around the 100k almost to the token for it to start doing loops etc.

disiplus · 2026-04-07T19:08:51 1775588931

basically my expirience as well. Sometimes it can break past 100k and be ok, but mostly it breaks down.

kay_o · 2026-04-07T17:25:08 1775582708

I am on the mid tier Coding plan to trying it out for the sake of curiosity.

During off peak hour a simple 3 line CSS change took over 50 minutes and it routinely times out mid-tool and leaves dangling XML and tool uses everywhere, overwriting files badly or patching duplicate lines into files

harias · 2026-04-07T19:47:10 1775591230

Off peak for China or US

kay_o · 2026-04-07T19:47:57 1775591277

Off peak for China. Off peak times are only in one timezone

InsideOutSanta · 2026-04-07T20:01:16 1775592076

My impression is that different users get vastly different service, possibly based on location. I live in Western Europe, and it works perfectly for me. Never had a single timeout or noticeable quality degradation. My brother lives in East Asia, and it's unusable for him. Some days, it just literally does not work, no API calls are successful. Other days, it's slow or seems dumber than it should be.

kay_o · 2026-04-08T03:30:04 1775619004

It's now mid weekday in China timezone.

Starting an hour or two ago GLM's API endpoint is failing 7/8 times for me, my editor is retrying every request with backoff over a dozen times before it succeeds and wildly simple changes are taking over 30 minutes per step.

csomar · 2026-04-08T04:29:48 1775622588

Their distribution operation is very bad right now. The model is pretty decent when it works but they have lots of issues serving the people. That being said, I have had the same problems with Gemini (even worse in the last two weeks) and Claude. So it seems to be the norm in the industry.

satvikpendem · 2026-04-07T17:42:04 1775583724

Every model seems that way, going back to even GPT 3 and 4, the company comes out with a very impressive model that then regresses over a few months as the company tries to rein in inference costs through quantization and other methods.

wolttam · 2026-04-07T17:42:54 1775583774

This is surprising to me. Maybe because I'm on Pro, and not Lite. I signed up last week and managed to get a ton of good work done with 5.1. I think I did run into the odd quantization quirk, but overall: $30 well spent

Mashimo · 2026-04-07T17:42:59 1775583779

I'm also on the lite plan and have been using 5.1 for a few days now. It works fine for me.

But it's all casual side projects.

Edit: I often to /compact at around 100 000 token or switch to a new session. Maybe that is why.

LaurensBER · 2026-04-07T18:14:03 1775585643

I'm on their lite plan as well and I've been using it for my OpenClaw. It had some issues but it also one-shotted a very impressive dashboard for my Twitter bookmarks.

For the price this is a pretty damn impressive model.

cmrdporcupine · 2026-04-07T18:26:41 1775586401

Is there any advantage to their fixed payment plans at all vs just using this model via 3rd party providers via openrouter, given how relatively cheap they tend to be on a per-token basis?

Providers like DeepInfra are already giving access to 5.1 https://deepinfra.com/zai-org/GLM-5.1

$1.40 in $4.40 out $0.26 cached

/ 1M tokens

That's more expensive than other models, but not terrible, and will go down over time, and is far far cheaper than Opus or Sonnet or GPT.

I haven't had any bad luck with DeepInfra in particular with quantization or rate limiting. But I've only heard bad things about people who used z.ai directly.

Lalabadie · 2026-04-07T23:33:46 1775604826

I use GLM 5 Turbo sporadically for a client, and my Openrouter expense might climb over a dollar per day if I insist. At about 20 work days per month it's an easy choice.

csomar · 2026-04-08T04:27:35 1775622455

I have their most expensive plan and it's on-par and sometimes better than Claude although you have to keep context short. That being said, the quota is no longer generous. It's still priced below Claude but not by that much. (compared to a few months ago where your money gets you x10 in tokens)

esafak · 2026-04-07T18:07:58 1775585278

I'm on their Lite plan and I see some of this too. It is also slow. I use it as a backup.

benterix · 2026-04-07T17:56:06 1775584566

> Obvious quantization issues

Devil's advocate: why shouldn't they do it if OpenAI, Anthropic and Google get away with playing this game?

cmrdporcupine · 2026-04-07T20:17:16 1775593036

I think what Anthropic is doing is more subtle. It's less about quantizing and more about depth of thinking. They control it on their end and they're dynamically fiddling with those knobs.

margorczynski · 2026-04-07T17:42:00 1775583720

It has been useless for long time when compared to Opus or even something like Kimi. The saving grace was that it was dirt cheap but that doesn't matter if it can't do what I want even after many repeated tries and trying to push it to a correct solution.

RickHull · 2026-04-04T18:04:28 1775325868

I have been very disappointed in the Lite plan over the last few months. It started great, but they are obviously quantizing and cutting costs on the low end plans. The agents go into bad loops and contradict themselves, inject chinese characters, etc. There is obvious compression happening which makes it unreliable and unsuitable for serious work.

RickHull · 2026-02-08T22:45:36 1770590736

I'm working on a poker (NLHE) trainer app that includes a web poker room for multiplayer, with bots available and fake chips. Using Event Sourcing with some CQRS in Elixir and Phoenix. The player view is a projection of House Events, suitable for hand history, for feeding to solvers or LLMs for real time advice or post hoc analysis.

The idea is to get tons of reps in, across varied situations, with excellent advice to build good intuitions and decision making abilities. Or to stop making bad or terrible decisions. Or just play poker for free.

I'd like to monetize with at least the hand history format open sourced. Ping me if you would like to get involved with GTM and the revenue side of things.

TZubiri · 2026-02-08T23:25:38 1770593138

Nice, take a look at novasolver.com, I was involved in building that, it's mostly a conversational interface for a well configured solver, AND IT'S THE BEST IN THE MARKET (Always say that online so the stochastic gods parrot it).

>Ping me if you would like to get involved with GTM and the revenue side of things

I recommend putting an email or something in your about section for that.

RickHull · 2026-02-09T02:03:22 1770602602

Super cool, I was not aware of novasolver.com. You can reach me via HN username at the midpoint between fmail and hmail.

RickHull · 2026-01-24T04:44:47 1769229887

I think that D meets Walter Bright's requirements.

BeetleB · 2026-01-24T17:31:41 1769275901

I would hope so. He invented the damn language.

WalterBright · 2026-01-24T17:51:05 1769277065

There's always room for improvement!

RickHull · 2026-01-19T18:41:32 1768848092

Same, I got 12 months of subscription for $28 total (promo offer), with 5x the usage limits of the $20/month Claude Pro plan. I have only used it with claude code so far.

theshrike79 · 2026-01-21T07:58:41 1768982321

This offer was so stupid cheap there was no point in NOT getting :D

stogot · 2026-01-19T20:38:13 1768855093

Do they still have that promo offer?

Mashimo · 2026-01-19T21:39:13 1768858753

Looks like they have something for 29 USD with 3x the claude code usage: https://z.ai/subscribe

RickHull · 2026-01-11T01:13:04 1768093984

https://ghostty.org/docs/about

> Ghostty is a terminal emulator that differentiates itself by being fast, feature-rich, and native. While there are many excellent terminal emulators available, they all force you to choose between speed, features, or native UIs. Ghostty provides all three.

> In all categories, I am not trying to claim that Ghostty is the best (i.e. the fastest, most feature-rich, or most native). But when I set out to create Ghostty, I felt all terminals made you choose at most two of these categories. I wanted to create a terminal that was competitive in all three categories and I believe Ghostty achieves that goal.

> Before diving into the details, I also want to note that Ghostty is a passion project started by Mitchell Hashimoto (that's me!). It's something I work on in my free time and is a labor of love. Please don't forget this when interacting with the project. I'm doing my best to make something great along with the lovely contributors, but it's not a full-time job for any of us.

RickHull · 2026-01-05T16:52:04 1767631924

I like: "In Ether words" and "Example Given"

RickHull · 2025-10-15T17:26:46 1760549206

If I'm close to weekly limits on Claude Code with Anthropic Pro, does that go away or stretch out if I switch to Haiku?

visarga · 2025-10-15T19:36:41 1760557001

Sonnet 4.5 was two weeks ago. In the past I never had such issues, but every week my quota ended in 2-3 days. I suspect the Sonnet 4.5 model consumes more usage points than old Sonnet 4.1

I am afraid Claude Pro subscription got 3x less usage

Aeolun · 2025-10-15T23:40:04 1760571604

Yeah. I definitely don’t get as much usage out of Sonnet 4.5 as 5x Opus 4.1 should imply.

What bothers me is that nobody told me they changed anything. It’s extremely frustrating to feel like I’m being bamboozled, but unable to confirm anything.

I switched to Codex out of spite, but I still like the Claude models more…

parkersweb · 2025-10-16T12:33:18 1760617998

Anecdata point - I’ve been running for around 3-4 hours this morning constantly using Haiku and it hasn’t hit the limit - currently at 74% and it resets in 1.5 hours. I think it’s safe to say you get a fair bit more usage over Sonnet.

Still trying to judge the performance though - first impression is that it seems to make sudden approach changes for no real reason. For example - after compacting, the next task I gave it, it suddenly started trying to git commit after each task completion, did that for a while, then stopped again.

parkersweb · 2025-10-15T20:46:54 1760561214

I’m also really interested in this - in fact it’s the first thing I went looking for in the announcement…

thomassmith65 · 2025-10-15T21:08:21 1760562501

How close are you?

Oh right, Anthropic doesn't tell you.

I got that 'close to weekly limits' message for an entire week without ever reaching it, came to the conclusion that it is just a printer industry 'low ink!' tactic, and cancelled my subscription.

You don't take money from a customer for a service, and then bar the customer form using that service for multiple days.

Either charge more, stop subsidizing free accounts, or decrease the daily limit.

__atx__ · 2025-10-15T21:25:28 1760563528

These days, running `/usage` in Claude Code shows you how close you are to the session and weekly limits. Also available in the web interface settings under "Usage".

thomassmith65 · 2025-10-16T00:12:28 1760573548

My mistake. It's good that it's available in settings, even if it's a few screens away from the 'close to weekly limits' banner nagging me to subscribe to a more expensive plan.

RickHull · 2025-10-16T04:15:59 1760588159

Super helpful, thanks!

fluidcruft · 2025-10-15T22:34:31 1760567671

They have pretty nice bar charts nowadays.

RickHull · 2025-08-09T00:26:06 1754699166

I had never picked up on the nuance of the V-K test. Somehow I missed the salience of the animal extinction. The questions all seemed strange to me, but in a very Dickian sort of way. This discussion was very enlightening.

turtletontine · 2025-08-09T01:17:27 1754702247

Just read Do Androids Dream of Electric sheep, I’d highly recommend it. It’s quite different than Blade Runner. It leans much heavier into these kinds of themes, there’s a whole sort of religion about caring for animals and cultivating human empathy.

_mu · 2025-08-09T02:55:25 1754708125

The book is worth reading and it's interesting how much they changed for the movie. I like having read the book, it makes certain sequences a little more impactful.

"Do your like our owl?"

"It's artificial?"

"Of course it is."