语音编辑和零样本文本转语音(TTS),包括有声书、互联网视频和播客。VoiceCraft只需要几秒钟的参考语音就能克隆声音。
项目: https://github.com/jasonppy/VoiceCraft
模型: https://huggingface.co/pyp1/VoiceCraft/tree/main
演示页面: https://jasonppy.github.io/VoiceCraft_web/
conda create -n voicecraft python=3.9.16
conda activate voicecraft
pip install -e git+https://github.com/facebookresearch/audiocraft.git@c5157b5bf14bf83449c17ea1eeb66c19fb4bc7f0#egg=audiocraft
pip install xformers==0.0.22
pip install torchaudio==2.0.2 torch==2.0.1 # this assumes your system is compatible with CUDA 11.7, otherwise checkout https://pytorch.org/get-started/previous-versions/#v201
apt-get install ffmpeg # if you don't already have ffmpeg installed
apt-get install espeak-ng # backend for the phonemizer installed below
pip install tensorboard==2.16.2
pip install phonemizer==3.2.1
pip install datasets==2.16.0
pip install torchmetrics==0.11.1
# install MFA for getting forced-alignment, this could take a few minutes
conda install -c conda-forge montreal-forced-aligner=2.2.17 openfst=1.8.2 kaldi=5.5.1068
# conda install pocl # above gives an warning for installing pocl, not sure if really need this
# to run ipynb
conda install -n voicecraft ipykernel --no-deps --force-reinstall
# 1. clone the repo on in a directory on a drive with plenty of free space
git clone git@github.com:jasonppy/VoiceCraft.git
cd VoiceCraft
# 2. assumes you have docker installed with nvidia container container-toolkit (windows has this built into the driver)
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.13.5/install-guide.html
# sudo apt-get install -y nvidia-container-toolkit-base || yay -Syu nvidia-container-toolkit || echo etc...
# 3. Try to start an existing container otherwise create a new one passing in all GPUs
./start-jupyter.sh # linux
start-jupyter.bat # windows
# 4. now open a webpage on the host box to the URL shown at the bottom of:
docker logs jupyter
# 5. optionally look inside from another terminal
docker exec -it jupyter /bin/bash
export USER=(your_linux_username_used_above)
export HOME=/home/$USER
sudo apt-get update
# 6. confirm video card(s) are visible inside container
nvidia-smi
# 7. Now in browser, open inference_tts.ipynb and work through one cell at a time
echo GOOD LUCK
推理和训练:
查看inference_speech_editing.ipynb和inference_tts.ipynb。