Repository logo
 
Loading...
Profile Picture

Search Results

Now showing 1 - 10 of 34
  • Meta-learning approach for bacteria classification and identification of informative genes of the Bacillus megaterium: tomato roots tissue interaction
    Publication . Rodrigues, Vânia; Deusdado, Sérgio; Rodrigues, Vânia
    Plant growth-promoting rhizobacteria (PGPRs) are bacteria that colonize the plant roots. These beneficial bacteria have an influence on plant development through multiple mechanisms, such as nutrient availability, alleviating biotic and abiotic stress, and secrete phytohormones. Therefore, their inoculation constitutes a powerful tool towards sustainable agriculture and crop production. To understand plant-PGPRs interaction we present the classification of PGPR using machine learning and meta-learning classifiers namely Support Vector Machine (SVM), Kernel Logistic Regression (KLR), meta-SVM and meta-KLR to predict the presence of Bacillus megaterium inoculated in tomato root tissues using publicly available transcriptomic data. The original dataset presents 36 significantly differentially expressed genes. As the meta-KLR achieved near-optimal performance considering all the relevant metrics, this meta learner was afterwards used to identify the informative genes (IGs). The outcomes showed 157 IGs, being present all significantly differentially expressed genes previously identified. Among the IGs, 113 were identified as tomato genes, 5 as Bacillus subtilis proteins, 1 as Escherichia coli protein and 6 were unidentified. Then, a functional enrichment analysis of the tomato IGs showed 175 biological processes, 22 molecular functions and 20 KEGG pathways involved in B. megaterium–tomato interaction. Furthermore, the biological networks study of their Arabidopsis thaliana orthologous genes identified the co-expression, predicted interaction, shared protein domains and co-localization networks.
  • Metalearning approach for leukemia informative genes prioritization
    Publication . Rodrigues, Vânia; Deusdado, Sérgio; Rodrigues, Vânia
    The discovery of diagnostic or prognostic biomarkers is fundamental to optimize therapeutics for patients. By enhancing the interpretability of the prediction model, this work is aimed to optimize Leukemia diagnosis while retaining a high-performance evaluation in the identification of informative genes. For this purpose, we used an optimal parameterization of Kernel Logistic Regression method on Leukemia microarray gene expression data classification, applying metalearners to select attributes, reducing the data dimensionality before passing it to the classifier. Pearson correlation and chi-squared statistic were the attribute evaluators applied on metalearners, having information gain as single-attribute evaluator. The implemented models relied on 10-fold cross-validation. The metalearners approach identified 12 common genes, with highest average merit of 0.999. The practical work was developed using the public datamining software WEKA.
  • Pathogens-in-foods: a database of occurrence of microbial hazerds in foods commercialised in Europe
    Publication . Deusdado, Sérgio; Cadavez, Vasco; Rodrigues, Vânia; Kooh, Pauline; Moez, Sanna; Gonzales-Barron, Ursula; Rodrigues, Vânia
    The objective of this study was to build a database of the occurrence (both prevalence and counts) of the most important biological hazards in foods commercialised in Europe. For this, systematic literature searches were first conducted for every pathogen; namely, Salmonella, Campylobacter, Shigatoxin-producing Escherichia co//, Listería monocytogenes, Yersinia enterocolitica, Bacillus cereus, Clostridium perfringens, Staphylococcus aureus, Toxoplasma gondií, norovirus, Hepatitis A vírus, Hepatitis E virus, Cryptosporidium ana Giardia duodenalis; and after screening for relevance and methodological quality assessment, data were carefully extracted from the primary studies into a harmonised arrangement consisting of primary study characteristics, food characteristics ana stage within the food chain, microbiological methods, prevalence results, enumeration results and potential for bias. Based on the microbiological survey results extracted from 977 primary studies, the database Pathogens-ln-Foods hás been constructed to facilitate data access and retrieval according to hazard, food class, country or any other relevant variable, with the ability to execute simple statistical calculations.
  • Análise bioinformática da estrutura e função da informação biológica
    Publication . Choupina, Altino; Deusdado, Sérgio
    O conhecimento derivado das tecnologias genómicas e computacionais aumenta em progressão geométrica. A compreensão dessa avalanche de dados está intimamente vinculada ao formidável desenvolvimento na área da bioinformática. Ao possibilitar a avaliação global dessa extraordinária quantidade de dados, a bioinformática tem acelerado consideravelmente as descobertas científicas. No software de análise do genoma podem encontrar-se diversos pacotes de programas, os quais acompanham todo o processo desde a receção dos gráficos provenientes do sequenciador até à publicação dos dados em bases de dados on-line. Estas características, juntamente com o acesso grátis para académicos, a compatibilidade de ficheiros, e a sua data de conceção são os principais fatores de seleção nas escolhas realizadas.
  • Solanum lycopersicum - Fusarium oxysporum Fo47 interaction study using ml classifiers in transcriptomic data
    Publication . Rodrigues, Vânia; Deusdado, Sérgio; Rodrigues, Vânia
    Fusarium oxysporum Fo47 is a pervasive endophyte that can colonize plant roots, initiating an interaction that can provide phytosanitary defenses. The response triggered by this non-pathogenic fungus is not well understood. To elucidate the Solanum lycopersicum - Fusarium oxysporum Fo47 interaction, machine learning methods were used to identify the informative genes (IGs) using publicly available transcriptomic data. The assembled dataset revealed 244 significantly differentially expressed genes (DEGs). The experimental work with machine learning classifiers achieved significant identification of these DEGs. Multilayer Perceptron (MLP) classifiers and Kernel Logistic Regression metalearners (meta-KLR) parameterization was optimized, achieving MLP-b and meta-KLR-b near optimal performance. Afterwards, these classifiers were used as attribute evaluators identifying two sets (A,B) of highest-rated genes, 393 (set A) by MLP-b and 317 (set B) by meta-KLR-b. Regarding the percent of significantly differentially expressed genes found by the classifiers compared to the total 244 DEGs, the set A presented 92.2%, while the set B presented 84.8%. Considering B⊂A, the IGs identified by MLP-b (set A) were used in the subsequent analysis. Among this 393 IGs, 379 were identified as Solanum lycopersicum genes, 1 as Escherichia coli protein (Hygromycin-B 4-O-kinase), 1 as Saccharomyces cerevisiae protein (galactose-responsive transcription factor GAL4) and 12 were unidentified. Then, a functional enrichment analysis of the Solanum lycopersicum IGs showed 283 biological processes and 20 biological pathways involved in the Solanum lycopersicum - Fo47 interaction.
  • Deterministic Classifiers Accuracy Optimization for Cancer Microarray Data
    Publication . Rodrigues, Vânia; Deusdado, Sérgio; Rodrigues, Vânia
    The objective of this study was to improve classification accuracy in cancer microarray gene expression data using a collection of machine learning algorithms available in WEKA. State of the art deterministic classification methods, such as: Kernel Logistic Regression, Support Vector Machine, Stochastic Gradient Descent and Logistic Model Trees were applied on publicly available cancer microarray datasets aiming to discover regularities that provide insights to help characterization and diagnosis correctness on each cancer typology. The implemented models, relying on 10-fold cross-validation, parameterized to enhance accuracy, reached accuracy above 90%. Moreover, although the variety of methodologies, no significant statistic differences were registered between them, at significance level 0.05, confirming that all the selected methods are effective for this type of analysis.
  • Gene expression analysis of Solanum lycopersicum - Bacillus megaterium Interaction to identify informative genes using machine learning classifiers
    Publication . Rodrigues, Vânia; Deusdado, Sérgio; Rodrigues, Vânia
    There has been a growing interest in identifying specific plant growth-promoting rhizobacteria that confer health, growth, and protective benefits to plant host. Understanding the mechanisms of this association as well as the differences that determine the different outcomes can be exploited to optimize beneficial interactions. To this end, we developed a classifier capable of predicting the presence of Bacillus megaterium inoculated in tomato root tissue and identify potential informative genes related to their interaction. Two machine learning models, Kernel Logistic Regression and Multilayer Perceptron were studied. From the 4 Multilayer Perceptron classifiers tested (MLP-a, MLP-b, MLP-c and MLP-d) with different parameters, MLP-a and MLP-c achieved near optimal performance considering all the relevant metrics. Then, these classifiers were used as attribute evaluators to identify two sets of informative genes (IGs). MLP-a showed 216 highest-rated attributes. Among these IGs, 173 were identified as Solanum lycopersicum genes, 37 were assigned to 5 Bacillus subtilis protein, 4 were assigned to 1 Escherichia coli protein and 2 were unidentified. On the other hand, MLP-c showed the same highest-rated attributes adding 27 new attributes. Based on the results of MLP-a and MLP-c, considering the identified tomato IGs, a functional enrichment analysis was developed showing nine and eight biological pathways, respectively. Furthermore, the same IGs were used to compose biological networks from Arabidopsis thaliana orthologous genes. The biological networks identified for the first set were co-expression, shared protein domains, predicted interaction and co-localization. The second set presented the same networks adding physical interaction.
  • BBMS++ – metapesquisador bioinformático
    Publication . Carocho, Márcio; Deusdado, Sérgio
    Neste artigo descreve-se a criação implementação de um meta-pesquisador [1-7] na área da bioinformática (BBMS – Basic Bioinformatics Meta-searcher), desenvolvido de raiz na língua portuguesa. O BBMS permite um acesso centralizado aos motores de busca online das principais bases de dados biológicos primárias, bastante fácil de entender e que permite aceder a informação biológica contida nas mais importantes bases de dados públicas mundiais, sem que, para isso o utilizador tenha que sair do “website” desenvolvido. O BBMS tem sido actualizado para abranger um maior leque de frontes de bioinformação, mas centra-se mormente em nucleótidos, proteínas e vias metabólicas [8,9]. Adicionalmente, o BBMS também pesquisa bibliografia científica nas principais publicações de índole bioinformática. O programa é autónomo, apenas é necessário introduzir as palavras-chave da pesquisa e apontar para que base de dados a pesquisa deve ser feita. Por defeito, os resultados são disponibilizados em Inglês, mas um painel de tradução, incorporado no nosso meta-pesquisador, permite que estes sejam traduzidos para várias línguas, incluindo o Português, em tempo real e sem demoras. Outro aspecto importante é a pesquisa global na Internet, que pode ser utilizada quando o utilizador não sabe onde procurar especificamente. Deste modo, o meta-pesquisador BBMS realiza a pesquisa em bases de dados, motores de busca específicos, bancos de imagens e vídeos, entre outros locais relevantes. Para esta operação, baseia-se no funcionamento do “Bioinformatic Harvester” [10], um outro meta-pesquisador. Outra aplicação integrada no BBMS é a pesquisa por artigos científicos, permitindo uma pesquisa rigorosa nas bases de dados mais importantes. Na segunda versão do BBMS, o BBMS++, além das melhorias gráficas, foi adicionado o acesso a aplicações bioinformáticas online, tais como a realização de “BLAST's” e “e- PCR”, entre outras. De modo a ter uma percepção da usabilidade, utilidade e eficiência do BBMS foi levado a cabo um inquérito, envolvendo cerca de 30 inquiridos, sendo os resultados finais muito satisfatórios. O BBMS++ está acessível em http://www.esa.ipb.pt/bbms
  • Experiências computacionais na optimização do processo de fermentação da E. Coli usando o Optferm
    Publication . Teixeira, Tânia; Deusdado, Sérgio
    Nas últimas duas décadas, fruto do desenvolvimento acelerado da investigação em biologia molecular, a quantidade de dados genómicos, proteómicos, metabolómicos e filogenéticos cresceu exponencialmente, obrigando os investigadores ao recurso a ferramentas computacionais para armazenar, comunicar e tratar os dados biológicos descobertos. Em consequência, a bioinformática tem emergido desta necessidade para dar resposta às questões de gestão da bioinformação, bem como servindo de auxiliar da inferência de conhecimento que leva à compreensão funcional. A necessidade de incluir, adaptar e maximizar a produtividade de bioprocessos em diferentes indústrias biotecnológicas originou um elevado interesse em programas informáticos que possam auxiliar na optimização das produções biotecnológicas que recorrem a microrganismos. Neste âmbito, têm sido desenvolvidos vários programas, que permitem modelar, simular e optimizar in silico alguns bioprocessos. Neste seguimento, foi feito um estudo do estado de arte das ferramentas existentes no sentido de seleccionar fundamentadamente a(s) fermenta(s) mais adequadas para este trabalho. O OptFerm é um software que integra um conjunto variado de algoritmos de IA, nomeadamente baseados em redes neuronais artificiais, com o objectivo de auxiliar a optimização de bioprocessos de fermentação. É uma ferramenta de fácil utilização, modular, programado em linguagem Java, e que permite a realização de várias tarefas de simulação, optimização e estimação de parâmetros com diferentes condições no que refere a variáveis de estado, parâmetros, perfis de alimentação, entre outros, usados em bioreactores do tipo fed-batch. Numa tentativa de testar a aplicabilidade e a eficiência do software, foram efectuados testes, usando o OptFerm, tendo como modelo o processo fermentativo da Escherichia cole. Recolheram-se os resultados experimentais produzidos pelos algoritmos baseados nas diferentes técnicas de IA e compararam-se os seus desempenhos. Finalmente, fez-se uma análise comparativa no sentido de validar as metodologias usadas face aos resultados obtidos em laboratório usando bioreactores físicos.
  • SimSearch: A new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences
    Publication . Deusdado, Sérgio; Carvalho, Paulo
    In this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similarity matrix’s upper triangle, the new methodology searches the sequence(s) for all the exact and approximate patterns in regular or reverse order. The algorithm accepts parameterization to work with greater seeds for near-optimal results. Performance tests show significant efficiency improvement over traditional optimal methods based on dynamic programming. Comparing the new algorithm’s efficiency against heuristic based methods, equalizing the required sensitivity, the proposed algorithm remains acceptable.