IrodoriTTS
Irodori-TTS 500M v2 is a Japanese-focused voice-cloning TTS model. It runs via a local FastAPI wrapper server in servers/tts/irodoritts/.
Why a script instead of a plain pip install
Section titled “Why a script instead of a plain pip install”A direct pip install git+https://github.com/Aratako/Irodori-TTS fails for two reasons:
- Packaging bugs in upstream
pyproject.toml— thelicensefield uses a bare string instead of a PEP 621 table, and theconfigs/directory at the repo root causes setuptools auto-discovery to reject the flat layout. The install script patches both before building. dacvaeis not on PyPI — the upstream repo declares it via[tool.uv.sources], which is auv-only extension that pip ignores. The script pre-installsdacvaedirectly from GitHub before installing irodori-tts.
Setup (Windows PowerShell)
Section titled “Setup (Windows PowerShell)”# 1. Create and activate a venv inside the engine folderpython -m venv scripts\tts\irodoritts\.venvscripts\tts\irodoritts\.venv\Scripts\Activate.ps1
# 2. Upgrade pippython -m pip install -U pip
# 3. Install server runtime deps (FastAPI, uvicorn, PyTorch)pip install -r scripts\tts\irodoritts\requirements.txt
# 4. (GPU only) Reinstall PyTorch with CUDA support — skip for CPU-only installspip install torch torchaudio --index-url https://download.pytorch.org/whl/cu124
# 5. Install irodori-tts from source via the patch script.\scripts\tts\irodoritts\install-irodori.ps1
# 6. Start the serverpython scripts\tts\irodoritts\server.pyCUDA version: replace
cu124withcu118orcu121if your driver targets an older toolkit.
Security note
Section titled “Security note”Both Irodori-TTS and dacvae are installed from GitHub. The install script pins both to specific commit SHAs (defined at the top of install-irodori.ps1) to prevent silent upstream changes from affecting installs. When updating, replace the SHA constants with the new HEAD commits and verify the diff before deploying.
Registering in TomoriBot
Section titled “Registering in TomoriBot”After the server is running, register it with /provider custom-endpoint add:
capability:speechapi_style:tts-cloneendpoint_url:http://127.0.0.1:8013script_markup:emojisupports_instruct:false
Then select it with /model speech, upload a reference sample with /speech voice-add, and assign it with /speech voice-assign.
TomoriBot strips Discord custom emoji syntax before sending text to TTS. With script_markup: emoji, Unicode emojis are preserved for IrodoriTTS emotion control; other speech modes remove Unicode emojis too so they are not spoken literally.
Environment variables
Section titled “Environment variables”| Variable | Default | Purpose |
|---|---|---|
IRODORI_TTS_MODEL_ID | Aratako/Irodori-TTS-500M-v2 | HuggingFace model repo |
TOMORI_TTS_HOST | 127.0.0.1 | Server bind address |
TOMORI_TTS_PORT | 8013 | Server port |
IRODORI_MODEL_DEVICE | cuda / cpu | Inference device |
IRODORI_CODEC_DEVICE | same as model device | Codec device |
IRODORI_MODEL_PRECISION | bf16 (GPU) / fp32 (CPU) | Model precision |
IRODORI_CODEC_PRECISION | fp32 | Codec precision |
TOMORI_TTS_MAX_TEXT_CHARS | 1000 | Per-request text length cap |