Name: | Description: | Size: | Format: | |
---|---|---|---|---|
1.57 MB | Adobe PDF |
Advisor(s)
Abstract(s)
The paper presents the comparison of accuracy in the Speech
Emotion Recognition task using the Hamming and Hanning windows
for framing the speech and determining the spectrogram to be used as
input of a convolutional neural network. The detection of between 4
and 10 emotional states was tested for both windows. The results show
significant differences in accuracy between the two window types and
provide valuable insights for the development of more efficient emotional
state detection systems. The best accuracy between 4 and 10 emotions
was 64.1% (4 emotions), 57.8% (5 emotions), 59.8% (6 emotions), 48.4%
(7 emotions), 47.8% (8 emotions), 51.4% (9 emotions), and 45.9% (10
emotions). These accuracy is at the state-of-the art level.
Description
Keywords
Speech Emotion Recognition Hamming Hanning CNN
Citation
Teixeira, Felipe L.; Soares, Salviano Pinto; Abreu, J.L. Pio; Oliveira, Paulo M.; Teixeira, João P. (2024). Comparative Analysis of Windows for Speech Emotion Recognition Using CNN. In 3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023). Cham: Springer Nature, Vol. 1, p. 233–248. ISBN 978-3-031-53024-1.