Percorrer por autor "Souza, Felipe Bueno de"
A mostrar 1 - 3 de 3
Resultados por página
Opções de ordenação
- Extraction of discriminative regions over genomic sequencesPublication . Souza, Felipe Bueno de; Rufino, José; Pinto, Maria alice; Lopes, Fabrício MartinsAs computing technologies continue to evolve, new generations of processors have achieved increased levels of computational power and efficiency. This progress enables the execution of tasks that, in the past, required high-end computers and can now be performed on personal systems, allowing many scientific fields to benefit from this progress, including biology. Along with this computational progress, the advancement of DNA sequencing technology is responsible for the exponential growth in the volume and complexity of available genomic data. This scenario requires methods that can efficiently handle and analyze such data in a scalable and interpretable manner, addressing the high volume and inherent complexity of biological sequences. In this context, this work proposes a novel methodology – GREAC (Genomic Region Extraction and Classifier) – for extracting discriminative regions from genomic sequences, reducing data dimensionality, identifying biologically relevant patterns, and variant classification. The proposed methodology is grounded in digital signal processing principles, such as filters and sequences transformation, employing k-mers as the primary source of information to filter and identify informative genomic regions. The relative frequency values of these regions are then measured to construct standardized signals across different variants. Each reference signal represents the characteristic behavior of a variant, enabling the identification of genomic patterns that allow their classification through statistical divergence measures, distance metrics, and supervised classifiers such as XGBoost. GREAC was implemented in the Julia programming language and is public domain opensource software, emphasizing efficiency, transparency, and scientific reproducibility. The implementation enables execution on personal computers, thereby promoting accessibility and encouraging contributions from the scientific community for further improvements. GREAC represents thus a significant contribution to the fields of bioinformatics and computational genomics, presenting a novel methodology for pattern recognition in genomic sequences.
- Resonant recognition model as a preprocessing technique for RNA classificationPublication . Souza, Felipe Bueno de; Pimenta-Zanon, Matheus Henrique; Henriques, Dora; Pinto, M. Alice; Balsa, Carlos; Rufino, José; Lopes, Fabrício MartinsThe development of high throughput sequencing technologies, such as RNA-Seq, has enabled the generation of large volumes of biological data. Thus, it is necessary to develop computational methods to interpret this massive volume of data and contribute to knowledge discovery. RNA sequences are products of the transcription of genomic DNA sequences and represent the gene expression process that organisms use to synthesize protein or RNA molecules. These RNA sequences can be compared between organisms of the same or different species to demonstrate similar functional proteins. There are several classes of RNA sequences (mRNA, rRNA, tRNA, ncRNA, etc.), with different biological functions. The correct identification of each class of RNA sequences is important because of the huge volume of unlabelled data available. In this context, this study proposes an approach based on the Resonant Recognition Model (RRM) for feature extraction and classification regarding the ncRNA and mRNA classes. To assess the proposed approach, it was adopted the dataset from the PLEK method. Despite the reduction of the input data size achieved using the RRM model, the results show high accuracy for primary protein sequences translated from RNA sequences, signaling the potential of the proposed approach to classify RNA.
- Resonant recognition model as a preprocessing technique for RNA classificationPublication . Souza, Felipe Bueno de; Pimenta-Zanon, Matheus; Henriques, Dora; Pinto, M. Alice; Balsa, Carlos; Rufino, José; Fabrício Martins LopesThe development of high throughput sequencing technologies, such as RNA-Seq, has enabled the generation of large volumes of biological data. Thus, it is necessary to develop computational methods to interpret this massive volume of data and contribute to knowledge discovery. RNA sequences are products of the transcription of genomic DNA sequences and represent the gene expression process that organisms use to synthesize protein or RNA molecules. These RNA sequences can be compared between organisms of the same or different species to demonstrate similar functional proteins. There are several classes of RNA sequences (mRNA, rRNA, tRNA, ncRNA, etc.), with different biological functions. The correct identification of each class of RNA sequences is important because of the huge volume of unlabelled data available. In this context, this study proposes an approach based on the Resonant Recognition Model (RRM) for feature extraction and classification regarding the ncRNA and mRNA classes. To assess the proposed approach, it was adopted the dataset from the PLEK method. Despite the reduction of the input data size achieved using the RRM model, the results show high accuracy for primary protein sequences translated from RNA sequences, signaling the potential of the proposed approach to classify RNA.
