Publication
Automatic Speech Recognition for Portuguese: A Comparative Study
dc.contributor.author | Borghi, Pedro Henrique | |
dc.contributor.author | Teixeira, João Paulo | |
dc.contributor.author | Freitas, Diamantino Rui | |
dc.date.accessioned | 2024-10-07T11:34:10Z | |
dc.date.available | 2024-10-07T11:34:10Z | |
dc.date.issued | 2024 | |
dc.description.abstract | This paper provides some comparisons of Automatic Speech Recognition (ASR) services for Portuguese that were developed in the scope of the Safe Cities project. ASR technology has enabled bi-directional voice-driven interfaces, and its demand in Portuguese is evident due to the language’s global prominence. However, the transcription process has complexities, and a high accuracy depends on the ability of capturing speech variability and language intricacies, while being rigorous in terms of semantics. The study first describes ASR services/models by Google, Microsoft, Amazon, IBM, and Voice Interaction regarding their main features. To compare them, three tests were proposed. Test A uses a small dataset with six audio recordings to evaluate in terms of word hit rate the accuracy of online services, with IBM outperforming others (pt-BR: 93.33%). Tests B and C utilize theMozilla Common Voice database filtered by a keywords’ set to compare online and offline models for Brazilian and European Portuguese regarding accuracy (Ratcliff-Obershelp algorithm), Word Error Rate, Match Error Rate, Word Information Loss, Character Error Rate and Response-Request Ratio. Test B highlights the higher accuracy of Google Cloud (pt-PT: 94.90%) and Azure (pt-BR: 98.11%). Test C showcases the potential of Voice Interaction’s real-time application despite its lower accuracy (pt-PT: 78.81%). The tests were carried out using a framework developed using Python 3.x on a Raspberry Pi 4 model B with a server desktop and the REST APIs from the companies’ repositories. | pt_PT |
dc.description.sponsorship | The authors are grateful to the Foundation for Science and Technology (FCT, Portugal) for financial support through national and community funds (FSE), in the form of a doctoral scholarship with reference 2022.12371.BD. The authors are also grateful to the Safe Cities – Innovation for Building Urban Safety project for financial support in the form of a research grant with reference POCI-01-0247-FEDER-041435. The authors are also grateful to the Foundation for Science and Technology (FCT, Portugal) for financial support through national funds FCT/MCTES (PIDDAC) to CeDRI (UIDB/05757/2020 and UIDP/05757/2020) and SusTEC (LA/P/0007/2021). | pt_PT |
dc.description.version | info:eu-repo/semantics/publishedVersion | pt_PT |
dc.identifier.citation | Borghi, Pedro Henrique; Teixeira, João Paulo; Freitas, Diamantino Rui (2024). Automatic Speech Recognition for Portuguese: A Comparative Study. In 3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023). Cham: Springer Nature, Vol. 1, p. 217–232. ISBN 978-3-031-53024-1. | pt_PT |
dc.identifier.doi | 10.1007/978-3-031-53025-8_16 | pt_PT |
dc.identifier.isbn | 978-3-031-53024-1 | |
dc.identifier.isbn | 978-3-031-53025-8 | |
dc.identifier.uri | http://hdl.handle.net/10198/30322 | |
dc.language.iso | eng | pt_PT |
dc.peerreviewed | yes | pt_PT |
dc.publisher | Springer Nature | pt_PT |
dc.relation | Sistema de Auxílio ao Diagnóstico Médico para Anormalidades Cardíacas Baseado em Deep Learning e Transformada Wavelet | |
dc.relation | Research Centre in Digitalization and Intelligent Robotics | |
dc.relation | Research Centre in Digitalization and Intelligent Robotics | |
dc.relation | Associate Laboratory for Sustainability and Tecnology in Mountain Regions | |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | pt_PT |
dc.subject | Automatic Speech Recognition | pt_PT |
dc.subject | Portuguese | pt_PT |
dc.subject | Language Model | pt_PT |
dc.subject | Transcription | pt_PT |
dc.subject | Mozilla Common Voice | pt_PT |
dc.subject | ASR accuracy | pt_PT |
dc.title | Automatic Speech Recognition for Portuguese: A Comparative Study | pt_PT |
dc.type | conference paper | |
dspace.entity.type | Publication | |
oaire.awardTitle | Sistema de Auxílio ao Diagnóstico Médico para Anormalidades Cardíacas Baseado em Deep Learning e Transformada Wavelet | |
oaire.awardTitle | Research Centre in Digitalization and Intelligent Robotics | |
oaire.awardTitle | Research Centre in Digitalization and Intelligent Robotics | |
oaire.awardTitle | Associate Laboratory for Sustainability and Tecnology in Mountain Regions | |
oaire.awardURI | info:eu-repo/grantAgreement/FCT//2022.12371.BD/PT | |
oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F05757%2F2020/PT | |
oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F05757%2F2020/PT | |
oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/LA%2FP%2F0007%2F2020/PT | |
oaire.citation.endPage | 232 | pt_PT |
oaire.citation.startPage | 217 | pt_PT |
oaire.citation.title | 3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023) | pt_PT |
oaire.fundingStream | 6817 - DCRRNI ID | |
oaire.fundingStream | 6817 - DCRRNI ID | |
oaire.fundingStream | 6817 - DCRRNI ID | |
person.familyName | Teixeira | |
person.givenName | João Paulo | |
person.identifier | 663194 | |
person.identifier.ciencia-id | 4F15-B322-59B4 | |
person.identifier.orcid | 0000-0002-6679-5702 | |
person.identifier.rid | N-6576-2013 | |
person.identifier.scopus-author-id | 57069567500 | |
project.funder.identifier | http://doi.org/10.13039/501100001871 | |
project.funder.identifier | http://doi.org/10.13039/501100001871 | |
project.funder.identifier | http://doi.org/10.13039/501100001871 | |
project.funder.identifier | http://doi.org/10.13039/501100001871 | |
project.funder.name | Fundação para a Ciência e a Tecnologia | |
project.funder.name | Fundação para a Ciência e a Tecnologia | |
project.funder.name | Fundação para a Ciência e a Tecnologia | |
project.funder.name | Fundação para a Ciência e a Tecnologia | |
rcaap.rights | restrictedAccess | pt_PT |
rcaap.type | conferenceObject | pt_PT |
relation.isAuthorOfPublication | 33f4af65-7ddf-46f0-8b44-a7470a8ba2bf | |
relation.isAuthorOfPublication.latestForDiscovery | 33f4af65-7ddf-46f0-8b44-a7470a8ba2bf | |
relation.isProjectOfPublication | 577a7dcb-afa2-4cac-86cb-b9e9d0aa4479 | |
relation.isProjectOfPublication | 6e01ddc8-6a82-4131-bca6-84789fa234bd | |
relation.isProjectOfPublication | d0a17270-80a8-4985-9644-a04c2a9f2dff | |
relation.isProjectOfPublication | 6255046e-bc79-4b82-8884-8b52074b4384 | |
relation.isProjectOfPublication.latestForDiscovery | d0a17270-80a8-4985-9644-a04c2a9f2dff |