Kani TTS 2 Update

Bishkek, Feb 18: Kyrgyz Republic-based AI startup NineNineSix has released Kani TTS 2, an open-source text-to-speech model that can generate up to 40 seconds of continuous speech in a single pass and is aimed at improving access to speech AI for underrepresented languages.
The company said the extended generation length is intended to support longer outputs such as conversational agent responses, multi-turn dialogue, and narration, while maintaining more natural flow across continuous speech.
Kani TTS, the earlier release, gained traction among developers for its compact design and the ability to adapt the architecture to additional languages. Community contributors have used it as a base for models in Urdu, Vietnamese, Turkish and Creole, among others.
NineNineSix said Kani TTS 2 remains optimized for relatively modest compute, requiring about 3 GB of GPU memory, which keeps it viable for local deployment as well as server-based use.
The new version also supports zero-shot voice cloning, allowing developers to replicate a speaker’s tone and style from a short audio reference without additional fine-tuning, according to the company.
NineNineSix has also released the full pretraining code, which it said allows teams to train a text-to-speech system from scratch for any language, dialect, or domain, rather than relying only on adapting existing weights.
“Kani TTS 2 is the next step after our first release: we made speech generation more stable and enabled the model to produce longer audio segments. We focus on compact and open models – they are easier to deploy and adapt to different languages and accents, including low-resource ones. For us, it is important to demonstrate that world-class technologies can be built in Kyrgyzstan. That is why we released not only the model weights, but the entire pre-training code – so any team can train a TTS system from scratch for their own language,” said Nursultan Bakashov, co-founder of nineninesix.ai.
The company said the model currently supports English, Spanish and Kyrgyz, with Kyrgyz included as a core language in the base release rather than only through third-party extensions.
NineNineSix said Kani TTS 2 has around 400 million parameters and was pre-trained on roughly 10,000 hours of speech data. Full training was completed in approximately six hours using eight NVIDIA H100 GPUs, according to the company.
Open-source speech models have increasingly been used by teams building voice features without relying entirely on proprietary APIs, particularly where language coverage is limited or where developers need more control over deployment and retraining.
Kani TTS 2 is available through Hugging Face, alongside a separate English model release. NineNineSix has also published its pre-training code on GitHub and released a browser-based demo through Hugging Face Spaces.

Leave a Reply

Discover more from DailyStraits.com

Subscribe now to keep reading and get access to the full archive.

Continue reading