SenseVoice Is Trending

SenseVoice

953/Month

Last 90 days statistic

SenseVoice is a speech foundation model developed as part of the FunAudioLLM framework, designed to enhance natural voice interactions between humans and large language models (LLMs).

Key features of SenseVoice

It offers multiple speech understanding capabilities, including:

Automatic Speech Recognition (ASR)
Language Identification (LID)
Speech Emotion Recognition (SER)
Audio Event Detection (AED)

SenseVoice comes in two main variants:

SenseVoice-Small: An encoder-only speech foundation model for fast speech understanding.
SenseVoice-Large: An encoder-decoder speech foundation model for more accurate speech understanding with support for more languages.

Key features of SenseVoice include:

Support for over 50 languages
Exceptionally low latency processing
Ability to detect audio events such as music, applause, and laughter
Emotion recognition, including categories like happy, angry, and sad

The SenseVoice model and its related resources are open-sourced and available on various platforms:

GitHub: The FunAudioLLM organization hosts repositories related to SenseVoice, including training, inference, and fine-tuning code.
ModelScope: Offers pre-trained SenseVoice models for download and use.
Hugging Face: Provides access to SenseVoice models, such as the SenseVoiceSmall variant.

Developers can integrate SenseVoice into their projects using the provided APIs and tools, enabling applications like speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration.

SenseVoice Is Trending

SenseVoice

Key features of SenseVoice

Google SERP