Name: | Description: | Size: | Format: | |
---|---|---|---|---|
657.5 KB | Adobe PDF |
Advisor(s)
Abstract(s)
This paper provides some comparisons of Automatic Speech Recognition
(ASR) services for Portuguese that were developed in the scope of the Safe
Cities project. ASR technology has enabled bi-directional voice-driven interfaces,
and its demand in Portuguese is evident due to the language’s global prominence.
However, the transcription process has complexities, and a high accuracy depends
on the ability of capturing speech variability and language intricacies, while being
rigorous in terms of semantics. The study first describes ASR services/models
by Google, Microsoft, Amazon, IBM, and Voice Interaction regarding their main
features. To compare them, three tests were proposed. Test A uses a small dataset
with six audio recordings to evaluate in terms of word hit rate the accuracy of
online services, with IBM outperforming others (pt-BR: 93.33%). Tests B and C
utilize theMozilla Common Voice database filtered by a keywords’ set to compare
online and offline models for Brazilian and European Portuguese regarding accuracy
(Ratcliff-Obershelp algorithm), Word Error Rate, Match Error Rate, Word
Information Loss, Character Error Rate and Response-Request Ratio. Test B highlights
the higher accuracy of Google Cloud (pt-PT: 94.90%) and Azure (pt-BR:
98.11%). Test C showcases the potential of Voice Interaction’s real-time application
despite its lower accuracy (pt-PT: 78.81%). The tests were carried out using a
framework developed using Python 3.x on a Raspberry Pi 4 model B with a server
desktop and the REST APIs from the companies’ repositories.
Description
Keywords
Automatic Speech Recognition Portuguese Language Model Transcription Mozilla Common Voice ASR accuracy
Citation
Borghi, Pedro Henrique; Teixeira, João Paulo; Freitas, Diamantino Rui (2024). Automatic Speech Recognition for Portuguese: A Comparative Study. In 3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023). Cham: Springer Nature, Vol. 1, p. 217–232. ISBN 978-3-031-53024-1.
Publisher
Springer Nature