This is HN, so I'm surprised that no one in the comments section has run this lo...

bambax · on March 29, 2024

I am trying to run it locally but it doesn't quite work for me.

I was able to run the demos allright, but when trying to use another reference speaker (in demo_part1), the result doesn't sound at all like the source (it's just a random male voice).

I'm also trying to produce French output, using a reference audio file in French for the base speaker, and a text in French. This triggers an error in api.py line 75 that the source language is not accepted.

Indeed, in api.py line 45 the only two source languages allowed are English and Chineese; simply adding French to language_marks in api.py line 43 avoids errors but produces a weird/unintelligible result with a super heavy English accent and pronunciation.

I guess one would need to generate source_se again, and probably mess with config.json and checkpoint.pth as well, but I could not find instructions on how to do this...?

Edit -- tried again on https://app.myshell.ai/ The result sounds French alright, but still nothing like the original reference. It would be absolutely impossible to confuse one with the other, even for someone who didn't know the person very well.

randkyp · on March 29, 2024

I played with it some more and I have to agree. For actual voice _cloning_, XTTS2 sounds much, much closer to the original speaker. But the resulting output is also much more unpredictable and sometimes downright glitchy compared to OpenVoice. XTTS2 also tries to "act out" the implied emotion/tone/pitch/cadence in the input text, for better or worse.

But my use case is just to have a nice-sounding local TTS engine, and current text-to-phoneme conversion quirks aside, OpenVoice seems promising. It's fast, too.

echelon · on March 29, 2024

And StyleTTS2 generalizes out of domain even better than that.

dragonwriter · on March 31, 2024

> but when trying to use another reference speaker (in demo_part1), the result doesn’t sound at all like the source

I’ve noticed the same thing and I wonder if there is maybe some undocumented information about what makes a good voice sample for cloning, perhaps in terms of what you might call “phonemic inventory”. The reference sample seems really dense.

> Indeed, in api.py line 45 the only two source languages allowed are English and Chinese

If you look at the code, outside of what the model does it relies on the surrounding infrastructure converting the input text to the international phonetic alphabet (IPA) as part of the process, and only has that implemented for English and Mandarin (though cleaners.py has broken references to routines for Japanese and Korean.

causi · on March 29, 2024

We're so close to me being able to open a program, feed in an epub, and get a near-human level audiobook out of it. I'm so excited.

aedocw · on March 29, 2024

Give https://github.com/aedocw/epub2tts a look, the latest update enables use of MS Edge cloud-based TTS so you don't need a local GPU and the quality is excellent.

causi · on April 1, 2024

Interesting. Seems like a pain to get running but I'll give it a shot. Thanks.

jurimasa · on March 29, 2024

I think this is creepy and dangerous as fuck. Not worth the trouble it will be.

_zoltan_ · on March 29, 2024

you're gonna be REALLY surprised out there in the real world.

CamperBob2 · on March 29, 2024

Other sites beckon.

aftbit · on March 29, 2024

I want to try chaining XTTS2 with something like RVCProject. The idea is to generate the speech in one step, then clone a voice in the audio domain in a second step.

fellowniusmonk · on March 31, 2024

I'm running it locally on my M1. The reference voices sound great, trying to clone my own voice it doesn't sound remotely like me.

epiccoleman · on March 29, 2024

I have got to build or buy a new computer capable of playing with all this cool shit. I built my last "gaming" PC in 2016, so its hardware isn't really ideal for AI shenanigans, and my Macbook for work is an increasingly crusty 2019 model, so that's out too.

Yeah, I could rent time on a server, but that's not as cool as just having a box in my house that I could use to play with local models. Feels like I'm missing a wave of fun stuff to experiment with, but hardware is expensive!

sangnoir · on March 29, 2024

> its hardware isn't really ideal for AI shenanigans

FWIW, I was in the same boat as you and decided to start cheap, old game machines can handle AI shenanigans just fine wirh the right GPU. I use a 2017 workstation (Zen1) and an Nvidia P40 from around the same time, which can be had for <$200 on ebay/Amazon. The P40 has 24GB VRAM, which is more than enough for a good chunk of quantized LLMs or diffusion models, and is in the same perf ballpark as the free Colab tensor hardware.

If you're just dipping your toes without committing, I'd recommend that route. The P40 is a data center card and expects higher airflow than desktop GPUs, so you probably have to buy a "blow kit" or 3D-print a fan shroud and ensure they fit inside your case. This will be another $30-$50. The bigger the fan, the quieter it can run. If you already have a high-end gamer PC/workstation from 2016, you can dive into local AI for $250 all-in.

Edit: didn't realize how cheap P40s now are! I bought mine a while back.

beardedwizard · on March 29, 2024

I would love a recommendation for an off the shelf "gpu server" good for most of this that I can run at home.

macrolime · on March 29, 2024

Mac Studio or macbook pro if you want to run the larger models. Otherwise just a gaming pc with an rtx 4090 or a used rtx 3090 if you want something cheaper. A used dual 3090 can also be a good deal, but that is more in the build it yourself category than off the shelf.

pksebben · on March 29, 2024

I went the 4090 route myself recently, and I feel like all should be warned - memory is a major bottleneck. For a lot of tasks, folks may get more mileage out of multiple 3090s if they can get them set up to run parallel.

Still waiting on being able to afford the next 4090 + egpu case et al. There are a lot of things this rig struggles with running OOM, even on inference with some of the more recent SD models.

ckl1810 · on March 29, 2024

Depending on what models you want to run, RTX 4090 or RTX 3090 may not be enough.

Grok-1 was running on a M2 Ultra with 196GB of ram.

https://twitter.com/ibab_ml/status/1771340692364943750

101008 · on March 29, 2024

Sorry if this is a silly question - I was never a Mac user, but I quick googled Mac Studio and it seems it's just the computer. Can I plug it to any monitor / use any keyboard and mouse, or do I need to use everything from Apple with it?

macrolime · on March 29, 2024

You can, but with some caveats. Not all screen resolutions work well with MacOS, though using BetterDisplay it will still usually work. If you want touch id, it's better to get the Magic Keyboard with touch id.

timschmidt · on March 29, 2024

Any monitor and keyboard will work, however Apple keyboards have a couple extra keys not present on Windows keyboards so require some key remapping to allow access to all typical shortcut key combinations.

spectre3d · on March 29, 2024

Mainly to swap the Windows and Alt keys, which you can do in System Settings without any additional software.

If you use a mouse with more than right-click and scroll wheel, with side buttons for example, then you’ll need extra software.

lakomen · on March 29, 2024

I'm clueless about AI, but here's a benchmark list https://www.videocardbenchmark.net/high_end_gpus.html

Imo the 4070 super is the best value and consumes the least amount of Watts, 220 in all the top 10.

So anything with one and some ECC RAM aka AMD should be fine. Intel non-xeons need the expensive w680 boards and very specific RAM per board.

ECC because you wrote server. We're professionals here after all, right?

vbi8iBEX · on March 31, 2024

I have a 2080s and build my ai software for it and above. 4090 is a good purchase

antonvs · on March 29, 2024

What if I enjoy gambling with cosmic ray bitflips?

GTP · on March 29, 2024

Maybe they would make your AI model evolve into an AGI over time :D

batch12 · on March 29, 2024

So I went really cheap and got a Thunderbolt dock for a gpu and a secondhand Intel nuc that supported it. So far it has met my needs.

lardo · on March 29, 2024

CivitAI has one https://civitai.com/builds

holtkam2 · on March 29, 2024

I'm in exactly the same boat. Yeah ofc you can run LMs on cloud servers but my dream project would be to construct a new gaming PC (mine is too old) and serve a LM on it, then serve an AI agent app which I can talk to from anywhere.

Has anyone had luck buying used GPUs, or is that something I should avoid?

ssl-3 · on March 29, 2024

I bought some used GPUs during the last mining thing. They all worked fine except for some oddball Dell models that the seller was obviously trying to fix a problem on (and they took them back without question, even paying return shipping).

And old mining GPUs are A-OK, generally: Despite warnings from the peanut gallery for over over a decade that mining ruins video cards, this has never really been the case. Profitable miners have always tended to treat these things very carefully, undervolt (and often, underclock) them, and pay attention to them so they could be run as cool and inexpensively as possible. Killing cards is bad for profits, so they aimed towards keeping them alive.

GPUs that were used for gaming are also OK, usually. They'll have fewer hours of hard[er] work on them, but will have more thermal cycles as gaming tends to be much more intermittent than continuous mining is.

The usual caveats apply as when buying anything else (used, "new", or whatever) from randos on teh Interwebz. (And fans eventually die, and so do thermal interfaces (pads and thermal compound), but those are all easily replaceable by anyone with a small toolkit and half a brain worth of wit.)

zoklet-enjoyer · on March 29, 2024

I forgot all about Vocaroo!