Browsing by Author "Manfron, Enrico"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Deep Learning and Machine Learning Techniques Applied to Speaker Identification on Small DatasetsPublication . Manfron, Enrico; Teixeira, João Paulo; Minetto, RodrigoIn this study, we explore the capabilities of speaker recognition technology for biometric authentication developing speaker recognition-based access control systems and serving as a resource for future research and improvements in secure and efficient speaker identification solutions. We focused on developing and evaluating machine learning and deep learning models for speaker identification. The models were trained and tested on private datasets with 32 speakers and public datasets with 1251 to 6112 speakers. The Gaussian Mixture Model performed well with our private datasets, with 93,10%, and 95% accuracy in correctly identifying the speakers. The Multilayer Perceptron achieved a peak accuracy of 93.33% on the Framed Trim private dataset. The VGGM model, after initial training on larger datasets, achieved an accuracy of 90.34% and 98.33% on our private datasets. At last, the model ResNet50 slightly outperformed the other models on two versions of our private dataset, achieving accuracies of 97.93% and 100%.
- Speaker recognition for door opening systemsPublication . Manfron, Enrico; Teixeira, João Paulo; Minetto, RodrigoBesides being an important communication tool, the voice can also serve for identification purposes since it has an individual signature for each person. Speaker recognition technologies can use this signature as an authentication method to access environments. This work explores the development and testing of machine and deep learning models, specifically the GMM, the VGG-M, and ResNet50 models, for speaker recognition access control to build a system to grant access to CeDRI’s laboratory. The deep learning models were evaluated based on their performance in recognizing speakers from audio samples, emphasizing the Equal Error Rate metric to determine their effectiveness. The models were trained and tested initially in public datasets with 1251 to 6112 speakers and then fine-tuned on private datasets with 32 speakers of CeDri’s laboratory. In this study, we compared the performance of ResNet50, VGGM, and GMM models for speaker verification. After conducting experiments on our private datasets, we found that the ResNet50 model outperformed the other models. It achieved the lowest Equal Error Rate (EER) of 0.7% on the Framed Silence Removed dataset. On the same dataset,« the VGGM model achieved an EER of 5%, and the GMM model achieved an EER of 2.13%. Our best model’s performance was unable to achieve the current state-of-the-art of 2.87% in the VoxCeleb 1 verification dataset. However, our best implementation using ResNet50 achieved an EER of 5.96% while being trained on only a tiny portion of the data than it usually is. So, this result indicates that our model is robust and efficient and provides a significant improvement margin. This thesis provides insights into the capabilities of these models in a real-world application, aiming to deploy the system on a platform for practical use in laboratory access authorization. The results of this study contribute to the field of biometric security by demonstrating the potential of speaker recognition systems in controlled environments.
