Automatic Speech Recognition for Portuguese: A Comparative Study

Borghi, Pedro Henrique; Teixeira, João Paulo; Freitas, Diamantino Rui

http://hdl.handle.net/10198/30322

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
Automatic Speech.pdf		657.5 KB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Borghi, Pedro Henrique

Teixeira, João Paulo

Freitas, Diamantino Rui

Resumo(s)

This paper provides some comparisons of Automatic Speech Recognition (ASR) services for Portuguese that were developed in the scope of the Safe Cities project. ASR technology has enabled bi-directional voice-driven interfaces, and its demand in Portuguese is evident due to the language’s global prominence. However, the transcription process has complexities, and a high accuracy depends on the ability of capturing speech variability and language intricacies, while being rigorous in terms of semantics. The study first describes ASR services/models by Google, Microsoft, Amazon, IBM, and Voice Interaction regarding their main features. To compare them, three tests were proposed. Test A uses a small dataset with six audio recordings to evaluate in terms of word hit rate the accuracy of online services, with IBM outperforming others (pt-BR: 93.33%). Tests B and C utilize theMozilla Common Voice database filtered by a keywords’ set to compare online and offline models for Brazilian and European Portuguese regarding accuracy (Ratcliff-Obershelp algorithm), Word Error Rate, Match Error Rate, Word Information Loss, Character Error Rate and Response-Request Ratio. Test B highlights the higher accuracy of Google Cloud (pt-PT: 94.90%) and Azure (pt-BR: 98.11%). Test C showcases the potential of Voice Interaction’s real-time application despite its lower accuracy (pt-PT: 78.81%). The tests were carried out using a framework developed using Python 3.x on a Raspberry Pi 4 model B with a server desktop and the REST APIs from the companies’ repositories.

Palavras-chave

Automatic Speech Recognition Portuguese Language Model Transcription Mozilla Common Voice ASR accuracy

URI

http://hdl.handle.net/10198/30322

Citação

Borghi, Pedro Henrique; Teixeira, João Paulo; Freitas, Diamantino Rui (2024). Automatic Speech Recognition for Portuguese: A Comparative Study. In 3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023). Cham: Springer Nature, Vol. 1, p. 217–232. ISBN 978-3-031-53024-1.