October 6, 2025, Bishkek, Kyrgyz Republic - A new open-source text-to-speech (TTS) model, KaniTTS, is redefining what’s possible in real-time, human-like voice generation.
Developed by the AI startup NineNineSix, the system matches the performance of leading commercial models from ElevenLabs, OpenAI, Google, Microsoft, and Hume AI - while being completely open and free under the Apache 2.0 license.
Already downloaded over 15,000 times on Hugging Face, KaniTTS has drawn international attention for its combination of speed, expressivity, and accessibility. It delivers 15 seconds of natural speech in just 1 second on a consumer NVIDIA RTX 5080 GPU, enabling true real-time performance without cloud-scale hardware.
A New Standard for Voice AI
Unlike conventional TTS models, KaniTTS doesn’t just read text aloud - it captures meaning, emotion, rhythm, and nuance, producing speech that sounds spontaneous and alive.
Its design combines efficient token-based generation with a lightweight neural vocoder, allowing low latency and high fidelity even in edge deployments.
The base model currently supports six languages - English, German, Korean, Arabic, Chinese, and Spanish - with Kyrgyz and Japanese under development.
Next on the roadmap: voice cloning, which will recreate a speaker’s tone and manner of speech from just a few seconds of audio.
“Our goal was to democratize access to advanced voice AI,” said the NineNineSix team. “With KaniTTS, even small studios and independent developers can build human-like voice interfaces that once required enterprise-level infrastructure.”
Why It Matters
The release of KaniTTS signals a major shift in the AI voice landscape:
- Open access - anyone can study, adapt, and deploy the model for research or production.
- Scalability - runs on affordable hardware, from edge devices to enterprise servers.
- Speed and realism - low latency meets expressive prosody, bridging the gap between synthetic and natural speech.
- Ethical innovation - clear use-case guidelines prevent misuse for impersonation or misinformation.
With its balance of technical excellence and openness, KaniTTS stands as one of the fastest, most capable open TTS models available today.
Performance and Technical Highlights
Languages supported: English, Arabic, Chinese, German, Korean, Spanish Model size: 370M parameters
Latency: ~1 sec per 15 sec of audio
Training data: ~80k hours (LibriTTS, Common Voice, Emilia)
Hardware: 8× NVIDIA H100 GPUs, 45 hours training time
License: Apache 2.0 open source
About NineNineSix
NineNineSix is an AI research and development startup specializing in generative models, multimodal systems, and human-machine interaction.
Its mission is to create high-performance, open technologies that make next-generation AI tools accessible to everyone.
About the High Technology Park of the Kyrgyz Republic (HTP)
KaniTTS was developed within the ecosystem of the High Technology Park of the Kyrgyz Republic (HTP) - a government-backed initiative that fosters the country’s IT industry by offering 0% VAT, 0% corporate tax, and 0% sales tax for resident tech companies. HTP supports innovators building globally competitive products in AI, software engineering, and digital transformation, helping position the Kyrgyz Republic as one of the most business-friendly and fast-growing IT hubs in the region.
Technical details: