ESTiG - Dissertações de Mestrado Alunos
URI permanente para esta coleção:
Navegar
Entradas recentes
- The application of active learning methodologies in the description of the salt effect on the solubility of amino acidsPublication . Piske, Christopher Andrey; Abranches, João Dinis Oliveira; Pinho, Simão; Leite, Priscilla dos Santos GaschiIn aqueous solutions containing electrolytes, ions influence both the solubility and the stability of biomolecules. However, inconsistencies across published data highlight the need for a critical review. To address this, a database was constructed on the solubility of glycine in electrolyte solutions spanning from 1996 to 2024, and the experimental data were critically evaluated. Gaussian Process (GP) models were implemented to analyze, predict, and validate solubility behavior. The GP model successfully captures salting-in and salting-out trends, along with specific ion effects reported in the literature. It also provides predictive uncertainty estimates that help identify potentially inconsistent data points or sets. This uncertainty-based analysis enables the reconciliation of conflicting datasets and helps prioritize new experimental measurements in regions where data are sparse or less reliable. By applying a data-filtering method that removes experimental points falling outside the uncertainty range of the model, the influence of inconsistent values is reduced. This results in a more robust model fit and improved prediction accuracy. Therefore, the GP establishes a quantitative foundation for consolidating the current knowledge on the solubility of glycine in saline solutions, identifying methodological inconsistencies in the literature.
- Avaliação do comportamento ao fogo de sistemas construtivos light wood framePublication . Duarte, Yuranick Ivoaltino de Carvalho; Mesquita, L.M.R.A crescente procura por soluções construtivas sustentáveis tem impulsionado a utilização de sistemas em madeira, como o Light Wood Frame (LWF), no setor da construção. Este processo destaca-se pela sua eficiência térmica, leveza estrutural e reduzida pegada ecológica, sendo constituído por madeira maciça e painéis derivados, em combinação com materiais isolantes, nomeadamente a lã de rocha. No entanto, a resistência ao fogo destas estruturas continua a ser um dos principais desafios para a sua ampla aplicação, exigindo investigações aprofundadas sobre o seu comportamento térmico e mecânico em condições de incêndio. Neste trabalho é avaliado o comportamento ao fogo de paredes de sistema Light Wood Frame constituídas por painéis multicamada, compostos por revestimentos em MDF, estrutura em madeira de pinho-abeto e isolamento em lã de rocha, através de ensaios experimentais de resistência ao fogo realizados em duas configurações distintas de parede, complementados por modelação numérica. Os ensaios realizados permitiram avaliar o desempenho ao fogo das paredes analisadas, nomeadamente ao nível do tempo de resistência ao fogo, da evolução da temperatura no lado não exposto e da carbonização dos elementos estruturais, observando-se diferenças entre as duas configurações ensaiadas. Os resultados experimentais foram utilizados na calibração e validação de um modelo numérico desenvolvido de acordo com o Eurocódigo 5 (EN 1995-1-2), tendo-se verificado concordância entre os resultados experimentais e numéricos.
- Intelligent OCR application for text extraction and structuring on online platforms and newspapersPublication . Junior, Paulo Roberto Machado Silva; Alves, Paulo; Fernandes, José Eduardo; Cunha, Márcio Rodrigues daThe monitoring of print media is a important function for the advertising industry, enabling the identification of advertisements in newspapers and magazines for market analysis. However, automating this extraction is challenging due to the complex layouts of these publications. Conventional Optical Character Recognition (OCR) systems, capable of transcribing individual characters, often fail to retain structural organization and logicalreading order. To address these issues, the proposed process integrates Document Layout Analysis(DLA) with OCR in a multi-stage process. YOLOv10 and YOLOv12 models detect and segment document elements, and the resulting regions are then passed to PaddleOCR for text extraction. Experimental results show that the first pre-trained model achieved a mAP@50 of 0.728 on a 2,000 images sample from DocLayNet. The second pre-trained model achieved a mAP@50 of 0.519 on a custom dataset. The fusion strategy reduced detection redundancy, and comparative evaluation against a production baseline indicates competitive performance. The final workflow produces a semi-structured JSON output that preserves the association between bounding box coordinates and extracted text. Future work will assess Vision Language Models (VLMs) to improve reading order reconstruction in more complex layouts.
- Planeamento de lajes estruturais: integração dos tempos de execução com recurso à simulação Monte CarloPublication . Almeida, Matheus Mendes de; Oliveira, RuiA construção enfrenta grandes desafios na gestão de tempo, no qual integra aspectos críticos para o sucesso dos projetos. Este estudo de investigação propõe uma análise comparativa entre diferentes sistemas de laje – incluindo laje maciça, vigotas protendidas, Steel Deck e lajes alveolares – utilizando o Critical Path Method (CPM), Program Evaluation and Review Technique (PERT) e Simulação de Monte Carlo para incorporar as incertezas, num cenário brasileiro. A pesquisa está dividida em três etapas. Primeiro, uma revisão bibliográfica sobre planejamento e controle de projetos, explorando o uso do CPM, PERT e Simulação de Monte Carlo. Por meio de um estudo de caso, na segunda etapa, é apresentada uma recolha detalhada de dados por meio de um formulário dirigido a técnicos de obra, procurando identificar a duração de cada etapa dos processos construtivos. Nessa etapa, também será coletado as durações das atividades com base em coeficientes de produtividade de mão de obra (RUP’s) (Razão Unitária de Produtividade), extraídos das tabelas SINAPI. Esses dados serão usados para a criação dos cronogramas de cada sistema. Por fim, na terceira etapa, envolve a aplicação da Simulação de Monte Carlo aos cronogramas realizados, em conjunto com o CPM e PERT (aplicando CPM), integrando as incertezas para uma visão mais completa. O estudo oferece uma comparação bem fundamentada entre os sistemas construtivos, destacando a gestão de tempo, contribuindo como uma base sólida para decisões estratégicas na construção. Além disso, os resultados poderão servir como referência para estudos futuros na área de planejamento e controle de projetos que envolvam esta tipologia de lajes estruturais.
- Architecture for scalable deployment of AI modelsPublication . Costa, João Vítor Nogueira da; Lopes, Rui Pedro; Rufino, JoséThis work presents a modular architecture for the scalable deployment of Artificial Intelligence (AI) models that combines Infrastructure-as-Code, container orchestration, and automated observability-driven control loops. The system provisions compute resources on on-premises Proxmox environments using Terraform, applies post-provision configuration with Ansible, orchestrates containerized services through Docker Swarm, serves Machine Learning (ML) models via TorchServe, and stores and visualizes operational metrics using InfluxDB and Grafana. The final design closes an autonomous feedback loop in which Grafana alerts trigger a backend that executes Terraform actions to add or remove worker nodes; newly created machines are configured and joined to the cluster automatically by Ansible. The prototype was validated with two pretrained image classification models (ResNet-18, DenseNet-161), demonstrating functional correctness (idempotent provisioning, service replication, load balancing, and failover) and performance benefits under load when elastic scaling is enabled. While the approach proved portable between Amazon Web Services (AWS) and Proxmox and effective for medium scale workloads, the evaluation surfaced practical constraints—most notably Virtual Machine (VM) provisioning latency and a five-minute alert resolution delay—that limit responsiveness to short bursts. The architecture meets its primary objectives of scalable, automated model serving with minimal operator intervention, and outlines opportunities for reducing reaction time (e.g., container level scaling before VM creation) and enhancing scheduling sophistication.
- Optimization of a feature selection tool for inference of gene regulatory networksPublication . Cunha, João Vítor Fuzetti da; Rufino, José; Lopes, Fabrício MartinsThis dissertation concerns the computational optimization of DimReduction, a feature selection tool for inference of Gene Regulatory Networks (GRN). The primary aim was to make it faster and more performance scalable, in order to to handle large amounts of data, which would bring it closer to the bioinformatics community. The approach involved the translation of the original Java GUI-based implementation into a CLI version and the re-implementation of the latter in Python. Once the performance of the Python version was lower than expected, the focus turned again to the Java CLI version. The major bottleneck in this version was determined and addressed, namely the elimination of explicit invocation of the Garbage Collector (GC) led to the runtime of a reference dataset (with 4511 genes) to reduce from more than 2 days to 42 minutes. The optimized version of Java was then parallelized, using a threaded approach, which yielded near-linear speedups. The new Java parallel implementation was then compared with other reference platforms from the literature (GENIE3, CLR, ARACNE, C3NET, BC3NET, MRNET, MRNETB, KBOOST and PCIT). The findings indicate that even though some alternatives ensure higher metrics of quality (AUROC/AUPR), DimReduction speed makes it a competitive tool in the field.
- WEXGrid: a modular python framework for defining, managing, and executing complex workflows in local and lSF grid environmentsPublication . Marques, João Pedro da Silva; Matos, PauloWEXGrid is a modular framework and software tool designed to address the challenges of creating, managing, and executing scientific workflows on local workstations and distributed Load Sharing Facility (LSF) Grid environments. Break workflows into discrete targets with explicit inputs, outputs, execution actions, and resource requirements, all of which are defined programmatically in Python. This ensures that the underlying concepts remain accessible to computational scientists. The system then builds a dependency graph in which the execution order is automated based on data and control dependencies. This graph can be used to drive dynamic scheduling based on resource availability and task readiness. Cache management reduces redundant computations by validating outputs through timestamps and checksums. This influences scheduling decisions, optimizing resource utilization. WEXGrid supports parallel and asynchronous task execution, balancing workload distribution across resources with data locality concerns to avoid Input/Output (I/O) bottlenecks. Its fault-tolerant architecture includes mechanisms for detecting and isolating faults to avoid cascading failures. Its provenance capture and metadata interoperability meet the Findable, Accessible, Interoperable, Reusable (FAIR) principles for reproducibility and portability. WEXGrid aims to enable scalability and maintainability, providing a unified interface for defining and executing workflows for heterogeneous computational infrastructures without sacrificing performance or transparency. Future enhancements include adaptive scheduling informed by real-time telemetry, rich provenance integration, and extended interoperability with cloud and containerized environments. These enhancements will ensure sustained efficiency and reproducibility across heterogeneous scientific computing platforms.
- Seleção inteligente de recursos humanos com modelos LLMPublication . Santana, Matheus Patriarca; Teixeira, João Paulo; Gonçalves, Diego BertoliniO processo de triagem de currículos na área de vendas enfrenta desafios significativos devido à diversidade de formatos, terminologias e níveis de detalhamento. Para superar esta dependência da análise manual, este trabalho investigou a aplicação de modelos de linguagem natural (LLMs) para automatizar a extração e padronização de informações relevantes de currículos. A metodologia utilizou modelos como GPT-4.1, GPT-4.1 Mini e Gemini 2.5 Pro, além da ferramenta ChatPDF e bibliotecas de apoio para a extração textual. Foram elaborados prompts específicos para estruturar atributos nominais e ordinais de forma consistente. O desempenho dos modelos foi avaliado com métricas como acurácia e erro médio aritmético. Posteriormente, o modelo GPT-4.1, que obteve o melhor desempenho, foi aplicado em um conjunto ampliado de 50 currículos para validação. Os dados extraídos foram submetidos a um modelo classificador, resultando em um Erro Médio Absoluto (MAE) de 0.76, numa escala de 10 pontos, na comparação com dados reais, o que valida a confiabilidade do método de extração para a classificação automática. Os resultados demonstram que os modelos de linguagem natural são eficazes na extração de dados, destacando-se o modelo GPT-4.1. Conclui-se que o uso de LLMs é uma abordagem promissora para a triagem automatizada, pois reduz o esforço manual e aumenta a consistência das avaliações, tendo sido consolidada em um protótipo web integrador para demonstrar sua aplicabilidade prática.
- Multi-agent system for diagnosing defects on a car assembly linePublication . Izidorio, Felipe Merenda; Leitão, Paulo; Barbosa, José; Alves, Gleifer VazTraditional approaches to diagnosing geometric defects in automotive assembly lines are based on isolated methods, which have limitations in terms of robustness and early detection of anomalies. This dissertation presents a hierarchical multi-agent architecture for collaborative defect diagnosis, organized into three layers: Point Agents perform local analysis by applying multiple diagnostic algorithms; Station Agents coordinate groups of agents within each station; Inter-Station Agent provides a systemic view by identifying correlations between stations. Coordination uses correlation-based clustering and leader election, enabling efficient aggregation of diagnostics. Communication flows hierarchically and laterally between correlated agents. This organization provides scalability, modularity, and robustness by confining local failures. Experimental validation demonstrates that the collaborative architecture achieves superior accuracy compared to isolated methods, showing that the complementarity between distributed algorithms provides more robust diagnostics and early warning capabilities.
- Predictive modeling of media audience based on time seriesPublication . Silva, Bruno Filipe Lopes da; Alves, Paulo; Fernandes, José EduardoThe rapid evolution of media consumption habits and the increasing competition between television and digital platforms have intensified the need for accurate audience forecasting. Understanding how audiences fluctuate over time is crucial for broadcasters, advertisers, and content producers seeking to optimize programming strategies and allocate resources efficiently. This dissertation presents a comprehensive study on the prediction of television audience ratings using machine learning and statistical models. The work compares multiple modelling approaches, including Linear Regression, Ridge Regression, Random Forest, Gradient Boosting (LightGBM), Long Short-Term Memory (LSTM) networks, and the SARIMA statistical model. The analysis was conducted on four datasets derived from Portuguese television audience data, covering pre- and post-COVID-19 periods and incorporating different program types schemes. It is important to emphasize that exclusively exogenous variables were used, that is, variables external to the audience generation process itself, deliberately excluding endogenous variables, in order to evaluate the predictive capacity of the models based only on contextual and programmatic factors. A rigorous preprocessing pipeline was implemented, including data cleaning, feature encoding, temporal normalization, and seasonality analysis. Hyperparameter optimization was performed using grid and randomized search methods, and models were evaluated according to MAE, RMSE, MSE, and R2 metrics. The results demonstrate that ensemble-based methods, particularly Random Forest and LightGBM, consistently outperform linear and statistical baselines, achieving R2 scores above 0.93. The LSTM network effectively captured temporal dependencies but showed sensitivity to the reduction of training data in the post-COVID subsets, while the SARIMA model proved less suitable for capturing nonlinear audience dynamics. The study also identifies clear evidence of seasonal and behavioural patterns in television audiences, which can be leveraged to improve future forecasting models. Future research directions include the integrating of external data sources such as social media and streaming platform metrics. Such extensions could further enhance the contextual understanding of audience behaviour and support data-driven decision-making in the broadcasting industry.
