ESTiG - Dissertações de Mestrado Alunos
URI permanente para esta coleção:
Navegar
Entradas recentes
- Architecture for scalable deployment of AI modelsPublication . Costa, João Vítor Nogueira da; Lopes, Rui Pedro; Rufino, JoséThis work presents a modular architecture for the scalable deployment of Artificial Intelligence (AI) models that combines Infrastructure-as-Code, container orchestration, and automated observability-driven control loops. The system provisions compute resources on on-premises Proxmox environments using Terraform, applies post-provision configuration with Ansible, orchestrates containerized services through Docker Swarm, serves Machine Learning (ML) models via TorchServe, and stores and visualizes operational metrics using InfluxDB and Grafana. The final design closes an autonomous feedback loop in which Grafana alerts trigger a backend that executes Terraform actions to add or remove worker nodes; newly created machines are configured and joined to the cluster automatically by Ansible. The prototype was validated with two pretrained image classification models (ResNet-18, DenseNet-161), demonstrating functional correctness (idempotent provisioning, service replication, load balancing, and failover) and performance benefits under load when elastic scaling is enabled. While the approach proved portable between Amazon Web Services (AWS) and Proxmox and effective for medium scale workloads, the evaluation surfaced practical constraints—most notably Virtual Machine (VM) provisioning latency and a five-minute alert resolution delay—that limit responsiveness to short bursts. The architecture meets its primary objectives of scalable, automated model serving with minimal operator intervention, and outlines opportunities for reducing reaction time (e.g., container level scaling before VM creation) and enhancing scheduling sophistication.
- Optimization of a feature selection tool for inference of gene regulatory networksPublication . Cunha, João Vítor Fuzetti da; Rufino, José; Lopes, Fabrício MartinsThis dissertation concerns the computational optimization of DimReduction, a feature selection tool for inference of Gene Regulatory Networks (GRN). The primary aim was to make it faster and more performance scalable, in order to to handle large amounts of data, which would bring it closer to the bioinformatics community. The approach involved the translation of the original Java GUI-based implementation into a CLI version and the re-implementation of the latter in Python. Once the performance of the Python version was lower than expected, the focus turned again to the Java CLI version. The major bottleneck in this version was determined and addressed, namely the elimination of explicit invocation of the Garbage Collector (GC) led to the runtime of a reference dataset (with 4511 genes) to reduce from more than 2 days to 42 minutes. The optimized version of Java was then parallelized, using a threaded approach, which yielded near-linear speedups. The new Java parallel implementation was then compared with other reference platforms from the literature (GENIE3, CLR, ARACNE, C3NET, BC3NET, MRNET, MRNETB, KBOOST and PCIT). The findings indicate that even though some alternatives ensure higher metrics of quality (AUROC/AUPR), DimReduction speed makes it a competitive tool in the field.
- WEXGrid: a modular python framework for defining, managing, and executing complex workflows in local and lSF grid environmentsPublication . Marques, João Pedro da Silva; Matos, PauloWEXGrid is a modular framework and software tool designed to address the challenges of creating, managing, and executing scientific workflows on local workstations and distributed Load Sharing Facility (LSF) Grid environments. Break workflows into discrete targets with explicit inputs, outputs, execution actions, and resource requirements, all of which are defined programmatically in Python. This ensures that the underlying concepts remain accessible to computational scientists. The system then builds a dependency graph in which the execution order is automated based on data and control dependencies. This graph can be used to drive dynamic scheduling based on resource availability and task readiness. Cache management reduces redundant computations by validating outputs through timestamps and checksums. This influences scheduling decisions, optimizing resource utilization. WEXGrid supports parallel and asynchronous task execution, balancing workload distribution across resources with data locality concerns to avoid Input/Output (I/O) bottlenecks. Its fault-tolerant architecture includes mechanisms for detecting and isolating faults to avoid cascading failures. Its provenance capture and metadata interoperability meet the Findable, Accessible, Interoperable, Reusable (FAIR) principles for reproducibility and portability. WEXGrid aims to enable scalability and maintainability, providing a unified interface for defining and executing workflows for heterogeneous computational infrastructures without sacrificing performance or transparency. Future enhancements include adaptive scheduling informed by real-time telemetry, rich provenance integration, and extended interoperability with cloud and containerized environments. These enhancements will ensure sustained efficiency and reproducibility across heterogeneous scientific computing platforms.
- Seleção inteligente de recursos humanos com modelos LLMPublication . Santana, Matheus Patriarca; Teixeira, João Paulo; Gonçalves, Diego BertoliniO processo de triagem de currículos na área de vendas enfrenta desafios significativos devido à diversidade de formatos, terminologias e níveis de detalhamento. Para superar esta dependência da análise manual, este trabalho investigou a aplicação de modelos de linguagem natural (LLMs) para automatizar a extração e padronização de informações relevantes de currículos. A metodologia utilizou modelos como GPT-4.1, GPT-4.1 Mini e Gemini 2.5 Pro, além da ferramenta ChatPDF e bibliotecas de apoio para a extração textual. Foram elaborados prompts específicos para estruturar atributos nominais e ordinais de forma consistente. O desempenho dos modelos foi avaliado com métricas como acurácia e erro médio aritmético. Posteriormente, o modelo GPT-4.1, que obteve o melhor desempenho, foi aplicado em um conjunto ampliado de 50 currículos para validação. Os dados extraídos foram submetidos a um modelo classificador, resultando em um Erro Médio Absoluto (MAE) de 0.76, numa escala de 10 pontos, na comparação com dados reais, o que valida a confiabilidade do método de extração para a classificação automática. Os resultados demonstram que os modelos de linguagem natural são eficazes na extração de dados, destacando-se o modelo GPT-4.1. Conclui-se que o uso de LLMs é uma abordagem promissora para a triagem automatizada, pois reduz o esforço manual e aumenta a consistência das avaliações, tendo sido consolidada em um protótipo web integrador para demonstrar sua aplicabilidade prática.
- Multi-agent system for diagnosing defects on a car assembly linePublication . Izidorio, Felipe Merenda; Leitão, Paulo; Barbosa, José; Alves, Gleifer VazTraditional approaches to diagnosing geometric defects in automotive assembly lines are based on isolated methods, which have limitations in terms of robustness and early detection of anomalies. This dissertation presents a hierarchical multi-agent architecture for collaborative defect diagnosis, organized into three layers: Point Agents perform local analysis by applying multiple diagnostic algorithms; Station Agents coordinate groups of agents within each station; Inter-Station Agent provides a systemic view by identifying correlations between stations. Coordination uses correlation-based clustering and leader election, enabling efficient aggregation of diagnostics. Communication flows hierarchically and laterally between correlated agents. This organization provides scalability, modularity, and robustness by confining local failures. Experimental validation demonstrates that the collaborative architecture achieves superior accuracy compared to isolated methods, showing that the complementarity between distributed algorithms provides more robust diagnostics and early warning capabilities.
- Predictive modeling of media audience based on time seriesPublication . Silva, Bruno Filipe Lopes da; Alves, Paulo; Fernandes, José EduardoThe rapid evolution of media consumption habits and the increasing competition between television and digital platforms have intensified the need for accurate audience forecasting. Understanding how audiences fluctuate over time is crucial for broadcasters, advertisers, and content producers seeking to optimize programming strategies and allocate resources efficiently. This dissertation presents a comprehensive study on the prediction of television audience ratings using machine learning and statistical models. The work compares multiple modelling approaches, including Linear Regression, Ridge Regression, Random Forest, Gradient Boosting (LightGBM), Long Short-Term Memory (LSTM) networks, and the SARIMA statistical model. The analysis was conducted on four datasets derived from Portuguese television audience data, covering pre- and post-COVID-19 periods and incorporating different program types schemes. It is important to emphasize that exclusively exogenous variables were used, that is, variables external to the audience generation process itself, deliberately excluding endogenous variables, in order to evaluate the predictive capacity of the models based only on contextual and programmatic factors. A rigorous preprocessing pipeline was implemented, including data cleaning, feature encoding, temporal normalization, and seasonality analysis. Hyperparameter optimization was performed using grid and randomized search methods, and models were evaluated according to MAE, RMSE, MSE, and R2 metrics. The results demonstrate that ensemble-based methods, particularly Random Forest and LightGBM, consistently outperform linear and statistical baselines, achieving R2 scores above 0.93. The LSTM network effectively captured temporal dependencies but showed sensitivity to the reduction of training data in the post-COVID subsets, while the SARIMA model proved less suitable for capturing nonlinear audience dynamics. The study also identifies clear evidence of seasonal and behavioural patterns in television audiences, which can be leveraged to improve future forecasting models. Future research directions include the integrating of external data sources such as social media and streaming platform metrics. Such extensions could further enhance the contextual understanding of audience behaviour and support data-driven decision-making in the broadcasting industry.
- Development of server side application for smart waste systemsPublication . Merah, Soheyb; Lopes, Rui PedroThis thesis presents the design and implementation of a scalable backend server for real-time monitoring and analytics of smart waste management systems. The server collects data from decentralized sensors installed in urban dumpsters for detection and classification of volatile organic compounds through an electronic nose sensor. Upon detecting hazardous gas thresholds, the system generates automated alerts to waste-collection teams, thereby improving worker safety and reducing environmental risks. Additionally, the backend provides a web-based dashboard for visualizing historical and live analytics, enabling municipal authorities to optimize collection routes, predict fill-level trends, and reduce operational costs. The architecture leverages a microservices approach with RESTful APIs, a time-series database for sensor data storage, and message-queueing for event-driven notifications. The proposed solution contributes to the fields of Internet of Things (IoT) in smart cities, environmental monitoring, and backend systems engineering.
- Extraction of discriminative regions over genomic sequencesPublication . Souza, Felipe Bueno de; Rufino, José; Pinto, Maria alice; Lopes, Fabrício MartinsAs computing technologies continue to evolve, new generations of processors have achieved increased levels of computational power and efficiency. This progress enables the execution of tasks that, in the past, required high-end computers and can now be performed on personal systems, allowing many scientific fields to benefit from this progress, including biology. Along with this computational progress, the advancement of DNA sequencing technology is responsible for the exponential growth in the volume and complexity of available genomic data. This scenario requires methods that can efficiently handle and analyze such data in a scalable and interpretable manner, addressing the high volume and inherent complexity of biological sequences. In this context, this work proposes a novel methodology – GREAC (Genomic Region Extraction and Classifier) – for extracting discriminative regions from genomic sequences, reducing data dimensionality, identifying biologically relevant patterns, and variant classification. The proposed methodology is grounded in digital signal processing principles, such as filters and sequences transformation, employing k-mers as the primary source of information to filter and identify informative genomic regions. The relative frequency values of these regions are then measured to construct standardized signals across different variants. Each reference signal represents the characteristic behavior of a variant, enabling the identification of genomic patterns that allow their classification through statistical divergence measures, distance metrics, and supervised classifiers such as XGBoost. GREAC was implemented in the Julia programming language and is public domain opensource software, emphasizing efficiency, transparency, and scientific reproducibility. The implementation enables execution on personal computers, thereby promoting accessibility and encouraging contributions from the scientific community for further improvements. GREAC represents thus a significant contribution to the fields of bioinformatics and computational genomics, presenting a novel methodology for pattern recognition in genomic sequences.
- Cyber–Ed A digital hands-on platform for learning cybersecurityPublication . Rocha, Alexandra Sofia Dias Alves; Pedrosa, Tiago; Lopes, Rui PedroA cibersegurança afirma-se como uma área crítica na sociedade contemporânea, especialmente após a pandemia de COVID-19, que acelerou a transformação digital e expôs vulnerabilidades crescentes. A falta de profissionais qualificados para enfrentar estes desafios demonstra a necessidade de metodologias educativas que promovam competências práticas, adaptativas e alinhadas com tecnologias emergentes. O estudo inclui uma revisão sistemática que fundamenta as opções metodológicas, analisando vantagens e limitações de abordagens tradicionais e inovadoras, com foco na gamificação, na Inteligência Artifical (IA) e em ambientes virtuais. Esta dissertação propõe uma plataforma educativa automatizada que combina laboratórios virtuais (Labtainers) e competições Capture The Flag (CTF) em ambientes dinâmicos, seguros e acessíveis via virtual private network (VPN). A solução integra princípios de gamificação e automação, utilizando tecnologias como Terraform, Ansible, React, FastAPI e Proxmox, com o objetivo de proporcionar experiências de aprendizagem realistas, diversificadas e inclusivas na formação em cibersegurança. Foram implementados e testados alguns cenários de aprendizagem prática, abrangendo ataques simulados, como SQL Injection (SQLI), e exercícios de administração de sistemas. Os resultados demonstram eficácia na criação de cenários e potencial de aplicação em contextos académicos e empresariais. Apesar de limitações, como a capacidade atual de utilizadores simultâneos e a diversidade restrita de cenários, a plataforma constitui uma base sólida para futuras expansões, incluindo a migração para infraestruturas cloud e a personalização mediada por IA. Este trabalho contribui, assim, para o avanço do ensino prático de cibersegurança, preparando profissionais para um panorama digital em rápida transformação.
- Chatbots, para transformar a experiência e interação do utilizador em cenários imersivos 3DPublication . Sampaio, Telmo Fernando de Oliveira; Oliveira, Pedro Filipe; Matos, PauloA crescente adoção de ambientes tridimensionais imersivos em contextos comerciais criou uma necessidade crítica por interfaces conversacionais inteligentes que transcendam as limitações dos chatbots bidimensionais tradicionais. Esta dissertação apresenta o desenvolvimento de um sistema integrado que combina agentes conversacionais baseados em Inteligência Artificial (IA) com ambientes de realidade virtual. O sistema implementa uma arquitetura Retrieval-Augmented Generation (RAG) que garante respostas factuais precisas por meio de um pipeline automatizado para aquisição e processamento de dados da web, combinando modelos de linguagem em grande escala (GPT-4o-mini) com bancos de dados vetoriais (Pinecone), processamento de voz em tempo real (Convai) e ambientes Tridimensional (3D) desenvolvidos no Unity, otimizados para dispositivos Meta Quest. A avaliação empírica em dois cenários comerciais distintos (BNH e Rádio Popular) demonstrou uma taxa de sucesso geral de 94,6%, com tempos médios de resposta de 2,17 segundos, validando a eficácia da arquitetura proposta. O sistema inclui uma plataforma de gestão administrativa que permite monitorização em tempo real, análise de padrões de utilização e controlo dinâmico da base de conhecimento, garantindo a conformidade com o Regulamento Geral de Proteção de Dados (GDPR) através de mecanismos de anonimização automática. Os resultados confirmam a viabilidade técnica e a aplicabilidade comercial de agentes conversacionais em ambientes imersivos, demonstrando generalização em diferentes domínios e abrindo caminho para novas formas de interação humano-computador em contextos de comércio eletrónico e atendimento ao cliente.
