Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

easiest is probably with ollama [0]. I think the ollama API is OpenAI compatible.

[0]https://ollama.com/



Most inference servers are OpenAI-compatibile. Even the "official" llama-cpp server should work fine: https://github.com/ggerganov/llama.cpp/blob/master/examples/...


Ollama runs locally. What's the best option for calling the new Mixtral model on someone else's server programmatically?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: