Repository logo
 
No Thumbnail Available
Publication

Comparative Analysis of Windows for Speech Emotion Recognition Using CNN

Use this identifier to reference this record.
Name:Description:Size:Format: 
Comparative Analysis of Windows.pdf1.57 MBAdobe PDF Download

Advisor(s)

Abstract(s)

The paper presents the comparison of accuracy in the Speech Emotion Recognition task using the Hamming and Hanning windows for framing the speech and determining the spectrogram to be used as input of a convolutional neural network. The detection of between 4 and 10 emotional states was tested for both windows. The results show significant differences in accuracy between the two window types and provide valuable insights for the development of more efficient emotional state detection systems. The best accuracy between 4 and 10 emotions was 64.1% (4 emotions), 57.8% (5 emotions), 59.8% (6 emotions), 48.4% (7 emotions), 47.8% (8 emotions), 51.4% (9 emotions), and 45.9% (10 emotions). These accuracy is at the state-of-the art level.

Description

Keywords

Speech Emotion Recognition Hamming Hanning CNN

Citation

Teixeira, Felipe L.; Soares, Salviano Pinto; Abreu, J.L. Pio; Oliveira, Paulo M.; Teixeira, João P. (2024). Comparative Analysis of Windows for Speech Emotion Recognition Using CNN. In 3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023). Cham: Springer Nature, Vol. 1, p. 233–248. ISBN 978-3-031-53024-1.

Organizational Units

Journal Issue