CoquiXTTSV2 Audio Generation

Last updated on Mar 16, 2025

I had some pending experiments in my list for a really long time and this was one of those. Following is my quick guide to achieve Voice Clone based Audio Generation using it (however it is terrible in Hindi).

Installation

Because CoquiTTS has closed down, we need to use the fork, as mentioned in a Github issue here.

conda create --name coqui-xtts-v2 python=3.10
conda activate coqui-xtts-v2
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118  # For NVIDIA GPU
pip install -U transformers scipy numpy librosa
pip install -U coqui-tts

Command

 tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
     --text "Namaste! Main Prakhar hu." \
     --speaker_wav male.wav \
     --language_idx hi \
     --use_cuda

This post is just to archive my steps here. Thank you for reading.