I tried Kokoro for voicing blog posts and articles and wasn't impressed to be honest. Right now Gemini 2.5 Flash TTS is a much more capable system with generous free limits (about 10 minutes per generation and about 90 minutes per day). Voices are not very consistent between generations, but for shorter pieces it's not a big deal (but will obviously be for books)
I played with ebook generation a bunch and find that (at least for English text) around 1B is needed to get something usable emotionally (Chatterbox is 0.5B, Orpheus is 3B).