What is this? This workflow trains a speech recognition system that converts spoken audio into text. It uses modern self-supervised speech encoders (wav2vec2, WavLM) as powerful feature extractors, ...
Abstract: Speech emotion recognition (SER) is an essential technology for enhancing human-computer interactions (HCI). While most SER research uses air-conducted (AC) speech, bone-conducted (BC) ...
Abstract: A robust Speech Emotion Recognition (SER) system is designed to improve human-computer interaction, particularly in healthcare, by accurately classifying seven emotions: happiness, sadness, ...
We incorporate two MOS (Mean Opinion Score) prediction models to evaluate the subjective appeal of synthesized singing. SingMOS-Pro: A specialized MOS predictor for singing voice, focusing on ...