最近在试试短视频，其中就是AI人声模拟了，属于 TTS（Text-To-Speech，语音合成），基础的话至少要会一些Linux操作和错误排查能力吧，然后电脑配置高一点，因为我没有NV的GPU, 都是吃CPU和内存，我看推理的时候至少内存16G左右。

Recently, I am trying short video, which is AI voice simulation, belonging to TTS (Text-To-Speech, speech synthesis), basic words at least some Linux operation and error troubleshooting ability, and then the computer configuration is higher, because I do not have NV GPU, all eat CPU and memory. At least 16 gigabytes of memory when I looked at the inference.

安装 Install （ubuntu）

克隆代码到本地 Clone the repo

要预先安装git

git should be pre-installed

git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
# If you failed to clone submodule due to network failures, please run following command until success
cd CosyVoice
git submodule update --init --recursive

安装 Conda Install Conda

Conda 是一个开源的 包管理器和环境管理器，主要用于科学计算、数据分析和机器学习领域。它不仅能管理 Python 包，还能管理非 Python 的依赖（如 C/C++ 库、R 包等），并支持在不同环境中隔离项目依赖，避免版本冲突。

Conda is an open source package manager and environment manager for scientific computing, data analytics and machine learning. It can manage not only Python packages, but also non-Python dependencies (such as C/C++ libraries, R packages, etc.), and supports isolating project dependencies in different environments to avoid version conflicts.

下载最新安装脚本 Download the latest installation script

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

运行安装脚本 Running the installation script

bash Miniconda3-latest-Linux-x86_64.sh

按提示操作： Follow the prompts:

按 Enter 阅读协议，输入 yes 同意。 Press Enter to read the agreement and enter yes to agree.
设置安装路径（默认 ~/miniconda3，建议保持默认）。 Set the installation path (the default ~/miniconda3, you are advised to keep the default).
输入 yes 初始化 Conda（自动修改 ~/.bashrc）。 Enter yes to initialize Conda (automatically modified ~/.bashrc).

验证安装 Verify installation

conda --version   # 应显示版本号（如 `conda 24.5.0`） The version number should be displayed (e.g. 'conda 24.5.0')
conda info        # 查看 Conda 详细信息 View Conda details

创建环境 Create Conda env :

注意！以后每次使用前都要确定环境已经切换到 cosyvoice，保证环境正确。

Attention! Make sure the environment has been switched to cosyvoice before each future use to ensure that the environment is correct.

conda create -n cosyvoice -y python=3.10
conda activate cosyvoice
# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform.
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com

# If you encounter sox compatibility issues
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel

下载模型 Model download

直接推荐下载 CosyVoice2-0.5B。

Download CosyVoice2-0.5B directly.

# git模型下载，请确保已安装git lfs  
# To download the git model, make sure you have git lfs installed
mkdir -p pretrained_models
git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git pretrained_models/CosyVoice2-0.5B

启动web演示 Start web demo

您可以使用我们的web演示页面来快速熟悉CosyVoice。

详细信息请参见演示网站。

You can use our web demo page to get familiar with CosyVoice quickly.

Please see the demo website for details.

启动前再次确认 Conda 环境为 cosyvoice！！！

Ensure that the Conda environment is cosyvoice！！！

python3 webui.py --port 50000 --model_dir /home/wayne/disk-A/CosyVoice2/pretrained_models/CosyVoice2-0.5B/

界面如下：

The interface is as follows:

先选择3S极速复制。 Select 3S Super copy first.
上传prompt音频文件，并且输入对应的prompt文本。是的，prompt音频文件里说的话要和prompt文本一一对应，暂不支持自动识别prompt音频的话。Upload the prompt audio file and enter the corresponding prompt text. Yes, prompt audio file words will correspond to prompt text one by one, automatic recognition of prompt audio is not supported.
合成文本输入想要模仿输出的话。 Synthetic text input to mimic output.

情绪语法 Emotional grammar

CosyVoice2 支持插入一些情绪语法，方便调整。

CosyVoice2 supports inserting some emotion syntax for easy adjustment.

用法 Usage:：

停顿 Pause: [breath]/<breath></breath>

笑声 laughter: [laughter]/<laughter></laughter>

重音 Stress: <strong></strong>

加入情绪语法之后的文案：

Text after adding emotion syntax:

我问他[breath]什么学校毕业的啊[breath][laughter]，他说他是<strong>布鲁弗莱</strong>大学，<strong>双</strong> <laughter>学士学位</laughter>，<laughter>美容美发</laughter>和<laughter>汽车</laughter><laughter>修理</laughter>[breath]，我真的<laughter>服了</laughter>。

Menu

Share

AI - CosyVoice2 - 部署入门尝鲜 / AI - CosyVoice2 - Get started with deployment

安装 Install （ubuntu）

克隆代码到本地 Clone the repo

安装 Conda Install Conda

下载模型 Model download

启动web演示 Start web demo

情绪语法 Emotional grammar

Comment

AI - CosyVoice2 - 部署入门尝鲜 / AI - CosyVoice2 - Get started with deployment

如何说孩子才会听，怎么听孩子才肯说 - 帮助孩子面对他们的感受 / How to Talk So Kids Will Listen & Listen So Kids Will Talk

JAVA基础 - CMS，G1工作模式及调优小结 / JAVA Basics - CMS, G1 working mode and tuning summary

Hello, What Do You Do？｜ Learn About Jobs ｜ Noodle & Pals ｜ Super Simple Songs

刷题LeetCode - 字母异位词分组 / 盛最多水的容器

Baby Elephant 🐘 ｜ Get Up And Move Song for Kids! ｜ Super Simple Songs

Redis - 字典（Hash）结构和rehash机制

Redis - 讲清楚集群模式（Redis Cluster）

Spring之我见 - Spring循环依赖为啥是三级缓存？

刷题LeetCode - 两数之和 / 移动零