Automatic Speech Recognition for Portuguese: A Comparative Study

Borghi, Pedro Henrique; Teixeira, João Paulo; Freitas, Diamantino Rui

Publicação

Automatic Speech Recognition for Portuguese: A Comparative Study

2024Comunicação em conferência

dc.contributor.author	Borghi, Pedro Henrique
dc.contributor.author	Teixeira, João Paulo
dc.contributor.author	Freitas, Diamantino Rui
dc.date.accessioned	2024-10-07T11:34:10Z
dc.date.available	2024-10-07T11:34:10Z
dc.date.issued	2024
dc.description.abstract	This paper provides some comparisons of Automatic Speech Recognition (ASR) services for Portuguese that were developed in the scope of the Safe Cities project. ASR technology has enabled bi-directional voice-driven interfaces, and its demand in Portuguese is evident due to the language’s global prominence. However, the transcription process has complexities, and a high accuracy depends on the ability of capturing speech variability and language intricacies, while being rigorous in terms of semantics. The study first describes ASR services/models by Google, Microsoft, Amazon, IBM, and Voice Interaction regarding their main features. To compare them, three tests were proposed. Test A uses a small dataset with six audio recordings to evaluate in terms of word hit rate the accuracy of online services, with IBM outperforming others (pt-BR: 93.33%). Tests B and C utilize theMozilla Common Voice database filtered by a keywords’ set to compare online and offline models for Brazilian and European Portuguese regarding accuracy (Ratcliff-Obershelp algorithm), Word Error Rate, Match Error Rate, Word Information Loss, Character Error Rate and Response-Request Ratio. Test B highlights the higher accuracy of Google Cloud (pt-PT: 94.90%) and Azure (pt-BR: 98.11%). Test C showcases the potential of Voice Interaction’s real-time application despite its lower accuracy (pt-PT: 78.81%). The tests were carried out using a framework developed using Python 3.x on a Raspberry Pi 4 model B with a server desktop and the REST APIs from the companies’ repositories.	pt_PT
dc.description.sponsorship	The authors are grateful to the Foundation for Science and Technology (FCT, Portugal) for financial support through national and community funds (FSE), in the form of a doctoral scholarship with reference 2022.12371.BD. The authors are also grateful to the Safe Cities – Innovation for Building Urban Safety project for financial support in the form of a research grant with reference POCI-01-0247-FEDER-041435. The authors are also grateful to the Foundation for Science and Technology (FCT, Portugal) for financial support through national funds FCT/MCTES (PIDDAC) to CeDRI (UIDB/05757/2020 and UIDP/05757/2020) and SusTEC (LA/P/0007/2021).	pt_PT
dc.description.version	info:eu-repo/semantics/publishedVersion	pt_PT
dc.identifier.citation	Borghi, Pedro Henrique; Teixeira, João Paulo; Freitas, Diamantino Rui (2024). Automatic Speech Recognition for Portuguese: A Comparative Study. In 3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023). Cham: Springer Nature, Vol. 1, p. 217–232. ISBN 978-3-031-53024-1.	pt_PT
dc.identifier.doi	10.1007/978-3-031-53025-8_16	pt_PT
dc.identifier.isbn	978-3-031-53024-1
dc.identifier.isbn	978-3-031-53025-8
dc.identifier.uri	http://hdl.handle.net/10198/30322
dc.language.iso	eng	pt_PT
dc.peerreviewed	yes	pt_PT
dc.publisher	Springer Nature	pt_PT
dc.relation	Sistema de Auxílio ao Diagnóstico Médico para Anormalidades Cardíacas Baseado em Deep Learning e Transformada Wavelet
dc.relation	Research Centre in Digitalization and Intelligent Robotics
dc.relation	Research Centre in Digitalization and Intelligent Robotics
dc.relation	Associate Laboratory for Sustainability and Tecnology in Mountain Regions
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	pt_PT
dc.subject	Automatic Speech Recognition	pt_PT
dc.subject	Portuguese	pt_PT
dc.subject	Language Model	pt_PT
dc.subject	Transcription	pt_PT
dc.subject	Mozilla Common Voice	pt_PT
dc.subject	ASR accuracy	pt_PT
dc.title	Automatic Speech Recognition for Portuguese: A Comparative Study	pt_PT
dc.type	conference paper
dspace.entity.type	Publication
oaire.awardTitle	Sistema de Auxílio ao Diagnóstico Médico para Anormalidades Cardíacas Baseado em Deep Learning e Transformada Wavelet
oaire.awardTitle	Research Centre in Digitalization and Intelligent Robotics
oaire.awardTitle	Research Centre in Digitalization and Intelligent Robotics
oaire.awardTitle	Associate Laboratory for Sustainability and Tecnology in Mountain Regions
oaire.awardURI	info:eu-repo/grantAgreement/FCT//2022.12371.BD/PT
oaire.awardURI	info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F05757%2F2020/PT
oaire.awardURI	info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F05757%2F2020/PT
oaire.awardURI	info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/LA%2FP%2F0007%2F2020/PT
oaire.citation.endPage	232	pt_PT
oaire.citation.startPage	217	pt_PT
oaire.citation.title	3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023)	pt_PT
oaire.fundingStream	6817 - DCRRNI ID
oaire.fundingStream	6817 - DCRRNI ID
oaire.fundingStream	6817 - DCRRNI ID
person.familyName	Teixeira
person.givenName	João Paulo
person.identifier	663194
person.identifier.ciencia-id	4F15-B322-59B4
person.identifier.orcid	0000-0002-6679-5702
person.identifier.rid	N-6576-2013
person.identifier.scopus-author-id	57069567500
project.funder.identifier	http://doi.org/10.13039/501100001871
project.funder.identifier	http://doi.org/10.13039/501100001871
project.funder.identifier	http://doi.org/10.13039/501100001871
project.funder.identifier	http://doi.org/10.13039/501100001871
project.funder.name	Fundação para a Ciência e a Tecnologia
project.funder.name	Fundação para a Ciência e a Tecnologia
project.funder.name	Fundação para a Ciência e a Tecnologia
project.funder.name	Fundação para a Ciência e a Tecnologia
rcaap.rights	restrictedAccess	pt_PT
rcaap.type	conferenceObject	pt_PT
relation.isAuthorOfPublication	33f4af65-7ddf-46f0-8b44-a7470a8ba2bf
relation.isAuthorOfPublication.latestForDiscovery	33f4af65-7ddf-46f0-8b44-a7470a8ba2bf
relation.isProjectOfPublication	577a7dcb-afa2-4cac-86cb-b9e9d0aa4479
relation.isProjectOfPublication	6e01ddc8-6a82-4131-bca6-84789fa234bd
relation.isProjectOfPublication	d0a17270-80a8-4985-9644-a04c2a9f2dff
relation.isProjectOfPublication	6255046e-bc79-4b82-8884-8b52074b4384
relation.isProjectOfPublication.latestForDiscovery	d0a17270-80a8-4985-9644-a04c2a9f2dff

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: Automatic Speech.pdf
Tamanho:: 657.5 KB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 1.75 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

ESTiG - Publicações em Proceedings Indexadas à WoS/Scopus