Repository logo
 
Loading...
Thumbnail Image
Publication

Automatic Speech Recognition for Portuguese: A Comparative Study

Use this identifier to reference this record.
Name:Description:Size:Format: 
Automatic Speech.pdf657.5 KBAdobe PDF Download

Advisor(s)

Abstract(s)

This paper provides some comparisons of Automatic Speech Recognition (ASR) services for Portuguese that were developed in the scope of the Safe Cities project. ASR technology has enabled bi-directional voice-driven interfaces, and its demand in Portuguese is evident due to the language’s global prominence. However, the transcription process has complexities, and a high accuracy depends on the ability of capturing speech variability and language intricacies, while being rigorous in terms of semantics. The study first describes ASR services/models by Google, Microsoft, Amazon, IBM, and Voice Interaction regarding their main features. To compare them, three tests were proposed. Test A uses a small dataset with six audio recordings to evaluate in terms of word hit rate the accuracy of online services, with IBM outperforming others (pt-BR: 93.33%). Tests B and C utilize theMozilla Common Voice database filtered by a keywords’ set to compare online and offline models for Brazilian and European Portuguese regarding accuracy (Ratcliff-Obershelp algorithm), Word Error Rate, Match Error Rate, Word Information Loss, Character Error Rate and Response-Request Ratio. Test B highlights the higher accuracy of Google Cloud (pt-PT: 94.90%) and Azure (pt-BR: 98.11%). Test C showcases the potential of Voice Interaction’s real-time application despite its lower accuracy (pt-PT: 78.81%). The tests were carried out using a framework developed using Python 3.x on a Raspberry Pi 4 model B with a server desktop and the REST APIs from the companies’ repositories.

Description

Keywords

Automatic Speech Recognition Portuguese Language Model Transcription Mozilla Common Voice ASR accuracy

Citation

Borghi, Pedro Henrique; Teixeira, João Paulo; Freitas, Diamantino Rui (2024). Automatic Speech Recognition for Portuguese: A Comparative Study. In 3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023). Cham: Springer Nature, Vol. 1, p. 217–232. ISBN 978-3-031-53024-1.

Organizational Units

Journal Issue