Loading...
15 results
Search Results
Now showing 1 - 10 of 15
- Deep-learning in identification of vocal pathologiesPublication . Teixeira, Felipe; Teixeira, João PauloThe work consists in a classification problem of four classes of vocal pathologies using one Deep Neural Network. Three groups of features extracted from speech of subjects with Dysphonia, Vocal Fold Paralysis, Laryngitis Chronica and controls were experimented. The best group of features are related with the source: relative jitter, relative shimmer, and HNR. A Deep Neural Network architecture with two levels were experimented. The first level consists in 7 estimators and second level a decision maker. In second level of the Deep Neural Network an accuracy of 39,5% is reached for a diagnosis among the 4 classes under analysis.
- A narrative review of speech and EEG features for schizophrenia detection: progress and challengesPublication . Teixeira, Felipe; Costa, Miguel Rocha; Abreu, J.L. Pio; Cabral, Manuel; Soares, Salviano; Teixeira, João PauloSchizophrenia is a mental illness that affects an estimated 21 million people worldwide. The literature establishes that electroencephalography (EEG) is a well-implemented means of studying and diagnosing mental disorders. However, it is known that speech and language provide unique and essential information about human thought. Semantic and emotional content, semantic coherence, syntactic structure, and complexity can thus be combined in a machine learning process to detect schizophrenia. Several studies show that early identification is crucial to prevent the onset of illness or mitigate possible complications. Therefore, it is necessary to identify disease-specific biomarkers for an early diagnosis support system. This work contributes to improving our knowledge about schizophrenia and the features that can identify this mental illness via speech and EEG. The emotional state is a specific characteristic of schizophrenia that can be identified with speech emotion analysis. The most used features of speech found in the literature review are fundamental frequency (F0), intensity/loudness (I), frequency formants (F1, F2, and F3), Mel-frequency cepstral coefficients (MFCC's), the duration of pauses and sentences (SD), and the duration of silence between words. Combining at least two feature categories achieved high accuracy in the schizophrenia classification. Prosodic and spectral or temporal features achieved the highest accuracy. The work with higher accuracy used the prosodic and spectral features QEVA, SDVV, and SSDL, which were derived from the F0 and spectrogram. The emotional state can be identified with most of the features previously mentioned (F0, I, F1, F2, F3, MFCCs, and SD), linear prediction cepstral coefficients (LPCC), linear spectral features (LSF), and the pause rate. Using the event-related potentials (ERP), the most promissory features found in the literature are mismatch negativity (MMN), P2, P3, P50, N1, and N2. The EEG features with higher accuracy in schizophrenia classification subjects are the nonlinear features, such as Cx, HFD, and Lya.
- Parameters for vocal acoustic analysis - cured databasePublication . Fernandes, Joana Filipa Teixeira; Silva, Letícia; Teixeira, Felipe; Guedes, Victor; Santos, Juliana Hermsdorf; Teixeira, João PauloThis paper describes the construction and organization of a database of speech parameters extracted from a speech database. This article intends to inform the community about the existence of this database for future research. The database includes parameters extracted from sounds produced by patients distributed among 19 diseases and control subjects. The set of parameters of this database consists of the jitter, shimmer, Harmonic to Noise Ratio (HNR), Noise to Harmonic Ratio (NHR), autocorrelation and Mel Frequency Cepstral Coefficients (MFCC) extracted from the sound of sustained vowels /a/, /i/ and /u/ at the high, low and normal tones, and a short German sentence. The cured database has a total number of 707 pathological subjects (distributed by the various diseases) and 194 control subjects, in a total of 901 subjects.
- Acoustic analysis of chronic laryngitis - statistical analysis of sustained speech parametersPublication . Teixeira, João Paulo; Fernandes, Joana Filipa Teixeira; Teixeira, Felipe; Fernandes, Paula OdeteThis paper describes the statistical analysis of a set of features extracted from the speech of sustained vowels of patients with chronic laryngitis and control subjects. The idea is to identify which features can be useful in a classification intelligent system to discriminate between pathologic and healthy voices. The set of features analysed consist in the Jitter, Shimmer Harmonic to Noise Ratio (HNR), Noise to Harmonic Ratio (NHR) and Autocorrelation extracted from the sound of a sustained vowels /a/, /i/ and /u/ in a low, neutral and high tones. The results showed that besides the absolute Jitter, no statistical significance exist between male and female voices, considering the classification between pathologic or healthy. Any of the analysed parameters is likely to be a statistical difference between control and Chronic Laryngitis groups. This is an important information that these features can be used in an intelligent system to classify healthy from Chronic Laryngitis voices.
- Analysis of the middle and long latency ERP components in SchizophreniaPublication . Costa, Miguel Rocha; Teixeira, Felipe; Teixeira, João PauloSchizophrenia is a complex and disabling mental disorder estimated to affect 21million people worldwide. Electroencephalography (EEG) has proven to be an excellent tool to improve and aid the current diagnosis of mental disorders such as schizophrenia. The illness is comprised of various disabilities associated with sensory processing and perception. In this work, the first 10−200 ms of brain activity after the self-generation via button presses (condition 1) and passive presentation (condition 2) of auditory stimuli was addressed. A time-domain analysis of the event-related potentials (ERPs), specifically the MLAEP, N1, and P2 components, was conducted on 49 schizophrenic patients (SZ) and 32 healthy controls (HC), provided by a public dataset. The amplitudes, latencies, and scalp distribution of the peaks were used to compare groups. Suppression, measured as the difference between both conditions’ neural activity, was also evaluated. With the exception of the N1 peak during condition (1), patients exhibited significantly reduced amplitudes in all waveforms analyzed in both conditions. The SZ group also demonstrated a peak delay in theMLAEP during condition (2) and amodestly earlier P2 peak during condition (1). Furthermore, patients exhibited less andmore N1 and P2 suppression, respectively. Finally, the spatial distribution of activity in the scalp during the MLAEP peak in both conditions, N1 peak in condition (1) and N1 suppression differed considerably between groups. These findings and measurements will be used with the finality of developing an intelligent system capable of accurately diagnosing schizophrenia.
- Harmonic to noise ratio measurement - selection of window and lengthPublication . Fernandes, Joana Filipa Teixeira; Teixeira, Felipe; Guedes, Victor; Candido Junior, Arnaldo; Teixeira, João PauloHarmonic to Noise Ratio (HNR) measures the ratio between periodic and non-periodic components of a speech sound. It has become more and more important in the vocal acoustic analysis to diagnose pathologic voices. The measure of this parameter can be done with Praat software that is commonly accept by the scientific community has an accurate measure. Anyhow, this measure is dependent with the type of window used and its length. In this paper an analysis of the influence of the window and its length was made. The Hanning, Hamming and Blackman windows and the lengths between 6 and 24 glottal periods were experimented. Speech files of control subjects and pathologic subjects were used. The results showed that the Hanning window with the length of 12 glottal periods gives measures of HNR more close to the Praat measures.
- Utilização de ferramentas de machine learning no diagnóstico de patologias da laringePublication . Teixeira, Felipe; Teixeira, João PauloEste trabalho está relacionado com o estudo e utilização de um conjunto de ferramentas de machine learning, nomeadamente árvores de decisão, support vector machines (SVM’s), Deep-learning - Deep Neural Networks, com o prepósito de fazer a classificação entre fala patológica e fala normal, e identificar a patologia com estas ferramentas. As patologias utilizadas neste estudo são a laringite crónica, disfonia e paralisia das cordas vocais. Utilizou-se a base de dados Alemã Saarbrucken Voice Database (SVD), que se encontra disponível online de forma gratuita pelo Instituto de Fonética da Universidade de Saarland. Nesta base de dados é possível encontrar sinais de voz, entre saudáveis e patológicos de mais de 2000 sujeitos. Foram utilizados três grupos de parâmetros, o grupo I (a), contêm parâmetros como Jitter relativo, Shimmer relativo e Harmonic to Noise Ratio (HNR), determinados em segmentos de fala estacionária, onde se atingiu 80.7% de exatidão para distinguir saudáveis e patológicos com SVM. O grupo I (b), contêm os parâmetros do grupo I(a), Noise to Harmonic Ratio (NHR) e Autocorrelação determinados em segmentos de fala estacionária, onde se atingiu 79.2% de exatidão para distinguir saudáveis e patológicos com SVM. O grupo II é baseado em Mel Frequency Cepstral Coefficientes (MFCC’s), determinados nos segmentos de fala estacionários, onde se atingiu 83.3% de exatidão para distinguir saudáveis e laringite com SVM. O grupo III é formado por coeficientes MFCC’s extraídos de fala contínua onde se atingiu 71% de exatidão para distinguir saudáveis e patológicos com Redes Neuronais. Realizou-se uma análise estatística referente aos parâmetros do grupo I (b), com o propósito de identificar características únicas em determinados parâmetros, que permitissem diferenciar as patologias. No decorrer deste trabalho, embora não fosse objetivo inicial, deu-se início a elaboração de um “software” protótipo para fazer gravação de voz, extração de parâmetros e classificação da patologia.
- Transfer learning with audioSet to voice pathologies identification in continuous speechPublication . Guedes, Victor; Teixeira, Felipe; Oliveira, Alessa Anjos de; Fernandes, Joana Filipa Teixeira; Silva, Letícia; Candido Junior, Arnaldo; Teixeira, João PauloThe classification of pathological diseases with the implementation of concepts of Deep Learning has been increasing considerably in recent times. Among the works developed there are good results for the classification in sustained speech with vowels, but few related works for the classification in continuous speech. This work uses the German Saarbrücken Voice Database with the phrase “Guten Morgen, wie geht es Ihnen?” to classify four classes: dysphonia, laryngitis, paralysis of vocal cords and healthy voices. Transfer learning concepts were used with the AudioSet database. Two models were developed based on Long-Short-Term-Memory and Convolutional Network for classification of extracted embeddings and comparison of the best results, using cross-validation. The final results allowed to obtaining 40% of f1-score for the four classes, 66% f1-score for Dysphonia x Healthy, 67% for Laryngitis x healthy and 80% for Paralysis x Healthy.
- Classification of control/pathologic subjects with support vector machinesPublication . Teixeira, Felipe; Fernandes, Joana Filipa Teixeira; Guedes, Victor; Candido Junior, Arnaldo; Teixeira, João PauloThe diagnosis of pathologies using vocal acoustic analysis has the advantage of been noninvasive and inexpensive technique compared to traditional technique in use. In this work the SVM were experimentally tested to diagnose dysphonia, chronic laryngitis or vocal cords paralysis. Three groups of parameters were experimented. Jitter, shimmer and HNR, MFCCs extracted from a sustained vowels and MFCC extracted from a short sentence. The first group showed their importance in this type of diagnose and the second group showed low discriminative power. The SVM functions and methods were also experimented using the dataset with and without gender separation. The best accuracy was 71% using the jitter, shimmer and HNR parameters without gender separation.
- F0, LPC, and MFCC analysis for emotion recognition based on speechPublication . Teixeira, Felipe; Teixeira, João Paulo; Soares, Salviano; Abreu, J.L. PioIn this work, research was done to understand what is needed to build a database to recognise emotions through speech. Some features that can highlight a good success rate for emotion recognition through speech were investigated. Also studied were some characteristics (symptoms) that can be associated with a specific emotional state. On the other hand, we also studied some features that can be used to identify some emotional states. A System Emotion Recognition (SER) was built with SVM, and the binary analysis was compared with a multi-category analysis. The binary analysis achieved an accuracy of 87.5% and the multi-class 42.6%. The parameters Fundamental Frequency-F0, Linear Predictive Coefficients (LPC), and Mel Frequency Cepstral Coeficients (MFCC) were used. The modest accuracy of this work was achieved using only F0, LPC and MFCC features.