Repositório :: Percorrer por autor "Silva, Lucas Ribeiro"

Percorrer por autor "Silva, Lucas Ribeiro"

A mostrar 1 - 3 de 3

Development of an intelligent agent for knowledge extraction in the pathogens in foods (PIF) database with machine learning
Publication . Silva, Lucas Ribeiro; Alves, Paulo; Cadavez, Vasco
Scientific databases like the Pathogens in Foods (PIF) Database hold valuable public health data but are often inaccessible to experts lacking programming skills. This research addresses this gap by developing and evaluating a novel Visual Natural Language Interface (V-NLI) for the PIF database. The resulting PIF Intelligent Agent empowers users to perform complex queries, conduct meta-analyses, and generate dynamic reports using natural language. The agent uses a hybrid, dual-mode architecture separating language interpretation from statistical computation. An "Open Chat Mode" offers a flexible exploratory interface via a tool-calling Small Language Model (SLM) with Retrieval-Augmented Generation (RAG). A "Guided Meta-Analysis Mode" provides a structured workflow for generating reproducible scientific reports through a dedicated Rserver backend. A comprehensive evaluation benchmarked five SLMs: Phi-4 Mini (3.8B), MFDoom/deepseek-r1-tool-calling (14B), Cogito (14B), Qwen 3 (8B), and Gemini 2.5 Pro. While all models achieved flawless functional accuracy, their effectiveness was determined by interpretive quality. The ability to generate concise, factually coherent text was the key differentiator, with smaller, instruction-tuned models showing performance comparable or superior in conciseness to larger models. The end-to-end system proved highly reliable, validating the architecture and establishing interpretive fidelity as a critical benchmark for domain-specific agents.
2025Dissertação de mestrado Acesso embargado Ver mais
Pathogens-in-Foods (PIF): An open-access European database of occurrence data of biological hazards in foods
Publication . Gonzales-Barron, Ursula; Faria, Ana Sofia; Thebault, Anne; Guillier, Laurent; Mendes, Lucas Ribeiro; Silva, Lucas Ribeiro; Messens, Winy; Kooh, Pauline; Cadavez, Vasco
The collection of occurrence data of foodborne pathogens in foods faces the hindrances of dispersion of information, lack of standardisation and harmonisation, and ultimately, high expenditure in time and resources. The Pathogens-in-Foods (PIF) database was conceived as a solution to centralise published data on prevalence and concentration of pathogenic bacteria, viruses and parasites occurring in foods, obtained through systematic review (SR), and categorised in harmonised data structures under controlled terminologies. The present article outlines how PIF was constructed to adhere to the FAIR (findability, accessibility, interoperability and reusability) principles for scientific data management; and proceeds with a description of the PIF concept, which entails two phases: the SR process and the population of PIF. The protocolled SR process is supported by a welldefined search strategy, inclusion criteria, and rules for internal validation assessment; whereas the population of PIF with new data relies in data extraction, validation and release. The article then introduces a novel data quality approach, named as the CCC approach (data consistency, conformity and completeness), which ensures proper interpretation of data, richness of data, and flawless transcription of data. After a brief explanation of the three PIF components – database, back-end and front-end – the article proceeds with the exposition of the data model, as well as the capabilities of the front-end, including data search, insertion and curation. The future of PIF lies in expanding its capabilities, addressing emerging challenges, and leveraging technological advancements to maintain its relevance and utility in the evolving landscape of food safety.
2025Artigo científico Acesso restrito Ver mais
A Retrieval-Augmented Natural Language Interface for Data Description and Meta-Analysis in the Pathogens-in-Foods (PIF) Database
Publication . Silva, Lucas Ribeiro; Gonzales-Barron, Ursula; Cadavez, Vasco
Food-safety occurrence databases are increasingly important for surveillance, evidence appraisal, and quantitative risk assessment, yet their routine analytical use remains constrained by the need for database literacy and statistical programming. Building on the curated and harmonized Pathogens-in-Foods (PIF) database, we developed and evaluated a retrieval-augmented natural-language interface designed to support grounded querying and reproducible evidence synthesis. The system includes two complementary modes: an Open Chat Mode for exploratory, tool-mediated interrogation of the database and a Guided Meta-Analysis Mode that couples structured user input to a deterministic R-based analytical pipeline. Evaluation included four compact language models: Phi-4 Mini (3.8B), DeepSeek-R1 Tool-Calling (14B), Cogito (14B), and Qwen 3 (8B), together with Gemini 2.5 Pro as a larger proprietary baseline model. Within a 10-query benchmark, all models achieved 100% tool selection accuracy and retrieval correctness; for the five argument bearing queries, all models also achieved 100% argument extraction F1-score, indicating reliable grounding of database operations for the evaluated query set. In a guided case study on Toxoplasma in meat and meat products (153 records from 65 studies), the system achieved 100% numerical concordance and high visual informativeness; the highest report quality index was 93% with Qwen 3 (8B). Performance differences across models arose primarily from the factual precision and economy of their written interpretations rather than from failures in tool execution. These findings support hybrid, evidence-grounded analytical interfaces built on curated data resources and deterministic statistical backends as practical tools for accelerating surveillance-oriented evidence synthesis in food protection.
2026Artigo científico Acesso aberto Ver mais

Percorrer por autor "Silva, Lucas Ribeiro"

Resultados por página

Opções de ordenação