Logo do repositório
 
A carregar...
Miniatura
Publicação

Synthetic data generation for volatile organic compounds recognition

Utilize este identificador para referenciar este registo.

Orientador(es)

Resumo(s)

The fact that machine learning (ML) models to recognize volatile organic compounds (VOC) are typically developed with limited datasets and can be expensive to gather scaled sensor data is an obstacle in their development. The Bosch BME688 is a multi-gas sensor that can give detailed environmental data, but needs large experimental campaigns to construct representative data sets. To overcome this issue, we introduce a Python library on synthetic data generation to the BME688. The tool uses the Kernel Density Estimation (KDE) to generate an empirical gas resistance distribution according to various heater profiles and uses mathematical gas mixing to generate self-configurable multi-gas simulations. Experiments by validation on coffee and oil gases show that the resulting datasets retain the statistical characteristics of actual measurements, both at the stepwise level of gas resistance distributions and at the multivariate level with Principal Component Analysis (PCA). The library generates machine learning reproducible experimentation, machine learning algorithm prototyping on mixtures of percentages, and provision of systematic evaluation of VOC recognition systems. The contribution of the work is a modular and lightweight framework to address the problem of the lack of data, facilitate the reproducible research and speed up the creation of air quality monitoring solutions based on ML.

Descrição

Palavras-chave

Contexto Educativo

Citação

Ahmadi, Mahdia; Ibrahim, Ahmad Gamal; Jvarsheishvili, Mariam; Igrejas, Getúlio; Izidorio, Felipe, Lopes, Rui Pedro; Soares, Caio; Rodrigues, João Pedro (2025). Synthetic data generation for volatile organic compounds recognition. In RECPAD 2025 - 31st Portuguese Conference on Pattern Recognition. Aveiro, Portugal.

Unidades organizacionais

Fascículo