My understanding is that ollama is more of an "LLM backend", i.e. it provides a ...

codybontecou · 2025-05-16T16:04:57 1747411497

Is there more to memory than just an entry into the context/messages array passed to the LLM?

lxgr · 2025-05-16T16:34:39 1747413279

There must be some heavy compression/filtering going on, as there's no chance GPT can hold everybody's entire ChatGPT conversation history in its context.

But practically, I believe that Ollama just doesn't have a concept of server-side persistent state at the moment to even do such a thing.

codybontecou · 2025-05-16T16:54:28 1747414468

I _think_ the compression used is literally “Chat, compress this array of messages”. This is the technique used in Claude Plays Pokemon.

I’m sure there’s more to the prompt and what to do with this newly generated messages array, but the gist is there.

If this is the case, an Ollama implementation shouldn’t be too difficult.