Repositório :: Percorrer por autor "Souza, Felipe Bueno de"

Percorrer por autor "Souza, Felipe Bueno de"

A mostrar 1 - 4 de 4

An Efficient Feature Extraction Method for Identifying Signatures of Viral Genomic Variants
Publication . Souza, Felipe Bueno de; Pimenta-Zanon, Matheus Henrique; Henriques, Dora; Pinto, M. Alice; Balsa, Carlos; Rufino, José; Lopes, Fabrício Martins
Genomic analysis is a powerful way to understand viral pathogens and their variations. However, most of the genomic analysis methods are based on sequence alignment, which has a high computational cost. This study introduces a novel methodology to extract discriminative regions from viral genomes. Using exclusive k-mers through strategically defined sliding windows, our approach identifies genomic regions with high concentrations of variant-specific signatures, showcasing high-accuracy classification while requiring modest computational resources. The data-driven and nonparametric nature of our approach enables pattern extraction without imposing predefined distributions, enhancing both analytical flexibility and result interpretability. By balancing minimal k-mer sizes with maximum discriminative power, our method achieves remarkable generalization capability even with limited training samples. The computational efficiency of the methodology alongside the biological transparency and explainability in the results makes it accessible to research environments with restricted processing capacity, potentially accelerating genomic signature discovery across diverse viral pathogens and contributing to better variant tracking and characterization, thus opening up even more possibilities in genomic analysis studies.
2026Comunicação em conferência Acesso restrito Ver mais
Extraction of discriminative regions over genomic sequences
Publication . Souza, Felipe Bueno de; Rufino, José; Pinto, Maria alice; Lopes, Fabrício Martins
As computing technologies continue to evolve, new generations of processors have achieved increased levels of computational power and efficiency. This progress enables the execution of tasks that, in the past, required high-end computers and can now be performed on personal systems, allowing many scientific fields to benefit from this progress, including biology. Along with this computational progress, the advancement of DNA sequencing technology is responsible for the exponential growth in the volume and complexity of available genomic data. This scenario requires methods that can efficiently handle and analyze such data in a scalable and interpretable manner, addressing the high volume and inherent complexity of biological sequences. In this context, this work proposes a novel methodology – GREAC (Genomic Region Extraction and Classifier) – for extracting discriminative regions from genomic sequences, reducing data dimensionality, identifying biologically relevant patterns, and variant classification. The proposed methodology is grounded in digital signal processing principles, such as filters and sequences transformation, employing k-mers as the primary source of information to filter and identify informative genomic regions. The relative frequency values of these regions are then measured to construct standardized signals across different variants. Each reference signal represents the characteristic behavior of a variant, enabling the identification of genomic patterns that allow their classification through statistical divergence measures, distance metrics, and supervised classifiers such as XGBoost. GREAC was implemented in the Julia programming language and is public domain opensource software, emphasizing efficiency, transparency, and scientific reproducibility. The implementation enables execution on personal computers, thereby promoting accessibility and encouraging contributions from the scientific community for further improvements. GREAC represents thus a significant contribution to the fields of bioinformatics and computational genomics, presenting a novel methodology for pattern recognition in genomic sequences.
2025Dissertação de mestrado Acesso aberto Ver mais
Resonant recognition model as a preprocessing technique for RNA classification
Publication . Souza, Felipe Bueno de; Pimenta-Zanon, Matheus Henrique; Henriques, Dora; Pinto, M. Alice; Balsa, Carlos; Rufino, José; Lopes, Fabrício Martins
The development of high throughput sequencing technologies, such as RNA-Seq, has enabled the generation of large volumes of biological data. Thus, it is necessary to develop computational methods to interpret this massive volume of data and contribute to knowledge discovery. RNA sequences are products of the transcription of genomic DNA sequences and represent the gene expression process that organisms use to synthesize protein or RNA molecules. These RNA sequences can be compared between organisms of the same or different species to demonstrate similar functional proteins. There are several classes of RNA sequences (mRNA, rRNA, tRNA, ncRNA, etc.), with different biological functions. The correct identification of each class of RNA sequences is important because of the huge volume of unlabelled data available. In this context, this study proposes an approach based on the Resonant Recognition Model (RRM) for feature extraction and classification regarding the ncRNA and mRNA classes. To assess the proposed approach, it was adopted the dataset from the PLEK method. Despite the reduction of the input data size achieved using the RRM model, the results show high accuracy for primary protein sequences translated from RNA sequences, signaling the potential of the proposed approach to classify RNA.
2025Comunicação em conferência Acesso restrito Ver mais
Resonant recognition model as a preprocessing technique for RNA classification
Publication . Souza, Felipe Bueno de; Pimenta-Zanon, Matheus; Henriques, Dora; Pinto, M. Alice; Balsa, Carlos; Rufino, José; Fabrício Martins Lopes
The development of high throughput sequencing technologies, such as RNA-Seq, has enabled the generation of large volumes of biological data. Thus, it is necessary to develop computational methods to interpret this massive volume of data and contribute to knowledge discovery. RNA sequences are products of the transcription of genomic DNA sequences and represent the gene expression process that organisms use to synthesize protein or RNA molecules. These RNA sequences can be compared between organisms of the same or diﬀerent species to demonstrate similar functional proteins. There are several classes of RNA sequences (mRNA, rRNA, tRNA, ncRNA, etc.), with diﬀerent biological functions. The correct identiﬁcation of each class of RNA sequences is important because of the huge volume of unlabelled data available. In this context, this study proposes an approach based on the Resonant Recognition Model (RRM) for feature extraction and classiﬁcation regarding the ncRNA and mRNA classes. To assess the proposed approach, it was adopted the dataset from the PLEK method. Despite the reduction of the input data size achieved using the RRM model, the results show high accuracy for primary protein sequences translated from RNA sequences, signaling the potential of the proposed approach to classify RNA.
2025Comunicação em conferência Acesso aberto Ver mais

Percorrer por autor "Souza, Felipe Bueno de"

Resultados por página

Opções de ordenação