My understanding is that ollama is more of an "LLM backend", i.e. it provides a server process on your machine that answers requests relatively statelessly.
I believe it keeps the model loaded across sessions, and possibly keeps the KV cache warm for ongoing sessions (but I doubt it, based on the API shape; I don't see a "session" parameter), but that's about it. Nothing seems to be written to disk.
Features like ChatGPT's "memories" or cross-chat context require a persistence layer that's probably best suited for a "frontend". Ollama's API does support passing in requests with history, for example: https://github.com/ollama/ollama/blob/main/docs/api.md#chat-...
There must be some heavy compression/filtering going on, as there's no chance GPT can hold everybody's entire ChatGPT conversation history in its context.
But practically, I believe that Ollama just doesn't have a concept of server-side persistent state at the moment to even do such a thing.
I believe it keeps the model loaded across sessions, and possibly keeps the KV cache warm for ongoing sessions (but I doubt it, based on the API shape; I don't see a "session" parameter), but that's about it. Nothing seems to be written to disk.
Features like ChatGPT's "memories" or cross-chat context require a persistence layer that's probably best suited for a "frontend". Ollama's API does support passing in requests with history, for example: https://github.com/ollama/ollama/blob/main/docs/api.md#chat-...