Repository logo
 
Publication

Automatic Speech Recognition for Portuguese: A Comparative Study

dc.contributor.authorBorghi, Pedro Henrique
dc.contributor.authorTeixeira, João Paulo
dc.contributor.authorFreitas, Diamantino Rui
dc.date.accessioned2024-10-07T11:34:10Z
dc.date.available2024-10-07T11:34:10Z
dc.date.issued2024
dc.description.abstractThis paper provides some comparisons of Automatic Speech Recognition (ASR) services for Portuguese that were developed in the scope of the Safe Cities project. ASR technology has enabled bi-directional voice-driven interfaces, and its demand in Portuguese is evident due to the language’s global prominence. However, the transcription process has complexities, and a high accuracy depends on the ability of capturing speech variability and language intricacies, while being rigorous in terms of semantics. The study first describes ASR services/models by Google, Microsoft, Amazon, IBM, and Voice Interaction regarding their main features. To compare them, three tests were proposed. Test A uses a small dataset with six audio recordings to evaluate in terms of word hit rate the accuracy of online services, with IBM outperforming others (pt-BR: 93.33%). Tests B and C utilize theMozilla Common Voice database filtered by a keywords’ set to compare online and offline models for Brazilian and European Portuguese regarding accuracy (Ratcliff-Obershelp algorithm), Word Error Rate, Match Error Rate, Word Information Loss, Character Error Rate and Response-Request Ratio. Test B highlights the higher accuracy of Google Cloud (pt-PT: 94.90%) and Azure (pt-BR: 98.11%). Test C showcases the potential of Voice Interaction’s real-time application despite its lower accuracy (pt-PT: 78.81%). The tests were carried out using a framework developed using Python 3.x on a Raspberry Pi 4 model B with a server desktop and the REST APIs from the companies’ repositories.pt_PT
dc.description.sponsorshipThe authors are grateful to the Foundation for Science and Technology (FCT, Portugal) for financial support through national and community funds (FSE), in the form of a doctoral scholarship with reference 2022.12371.BD. The authors are also grateful to the Safe Cities – Innovation for Building Urban Safety project for financial support in the form of a research grant with reference POCI-01-0247-FEDER-041435. The authors are also grateful to the Foundation for Science and Technology (FCT, Portugal) for financial support through national funds FCT/MCTES (PIDDAC) to CeDRI (UIDB/05757/2020 and UIDP/05757/2020) and SusTEC (LA/P/0007/2021).pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationBorghi, Pedro Henrique; Teixeira, João Paulo; Freitas, Diamantino Rui (2024). Automatic Speech Recognition for Portuguese: A Comparative Study. In 3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023). Cham: Springer Nature, Vol. 1, p. 217–232. ISBN 978-3-031-53024-1.pt_PT
dc.identifier.doi10.1007/978-3-031-53025-8_16pt_PT
dc.identifier.isbn978-3-031-53024-1
dc.identifier.isbn978-3-031-53025-8
dc.identifier.urihttp://hdl.handle.net/10198/30322
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherSpringer Naturept_PT
dc.relationSistema de Auxílio ao Diagnóstico Médico para Anormalidades Cardíacas Baseado em Deep Learning e Transformada Wavelet
dc.relationResearch Centre in Digitalization and Intelligent Robotics
dc.relationResearch Centre in Digitalization and Intelligent Robotics
dc.relationAssociate Laboratory for Sustainability and Tecnology in Mountain Regions
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectAutomatic Speech Recognitionpt_PT
dc.subjectPortuguesept_PT
dc.subjectLanguage Modelpt_PT
dc.subjectTranscriptionpt_PT
dc.subjectMozilla Common Voicept_PT
dc.subjectASR accuracypt_PT
dc.titleAutomatic Speech Recognition for Portuguese: A Comparative Studypt_PT
dc.typeconference paper
dspace.entity.typePublication
oaire.awardTitleSistema de Auxílio ao Diagnóstico Médico para Anormalidades Cardíacas Baseado em Deep Learning e Transformada Wavelet
oaire.awardTitleResearch Centre in Digitalization and Intelligent Robotics
oaire.awardTitleResearch Centre in Digitalization and Intelligent Robotics
oaire.awardTitleAssociate Laboratory for Sustainability and Tecnology in Mountain Regions
oaire.awardURIinfo:eu-repo/grantAgreement/FCT//2022.12371.BD/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F05757%2F2020/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F05757%2F2020/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/LA%2FP%2F0007%2F2020/PT
oaire.citation.endPage232pt_PT
oaire.citation.startPage217pt_PT
oaire.citation.title3rd International Conference on Optimization, Learning Algorithms and Applications (OL2A 2023)pt_PT
oaire.fundingStream6817 - DCRRNI ID
oaire.fundingStream6817 - DCRRNI ID
oaire.fundingStream6817 - DCRRNI ID
person.familyNameTeixeira
person.givenNameJoão Paulo
person.identifier663194
person.identifier.ciencia-id4F15-B322-59B4
person.identifier.orcid0000-0002-6679-5702
person.identifier.ridN-6576-2013
person.identifier.scopus-author-id57069567500
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsrestrictedAccesspt_PT
rcaap.typeconferenceObjectpt_PT
relation.isAuthorOfPublication33f4af65-7ddf-46f0-8b44-a7470a8ba2bf
relation.isAuthorOfPublication.latestForDiscovery33f4af65-7ddf-46f0-8b44-a7470a8ba2bf
relation.isProjectOfPublication577a7dcb-afa2-4cac-86cb-b9e9d0aa4479
relation.isProjectOfPublication6e01ddc8-6a82-4131-bca6-84789fa234bd
relation.isProjectOfPublicationd0a17270-80a8-4985-9644-a04c2a9f2dff
relation.isProjectOfPublication6255046e-bc79-4b82-8884-8b52074b4384
relation.isProjectOfPublication.latestForDiscoveryd0a17270-80a8-4985-9644-a04c2a9f2dff

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Automatic Speech.pdf
Size:
657.5 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: