Comparing RL policies for robotic pusher

Bonjour, Pedro; Lopes, Rui Pedro

http://hdl.handle.net/10198/35167

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
Comparing RL Policies.pdf		2.98 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Bonjour, Pedro

Lopes, Rui Pedro

Resumo(s)

Reinforcement learning (RL) has been consolidated as a promising approach to optimizing robotic tasks, allowing the improvement of performance and energy efficiency. This study investigates the effectiveness of five RL algorithms in the Pusher environment. Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Soft Actor-Critic (SAC) and Twin Delayed Deep Deterministic Policy Gradient (TD3). We evaluated training time, computational efficiency, and reward values to identify the most balanced solution between accuracy and energy consumption. The results indicate that the PPO offers the best compromise between performance and efficiency, with reduced training time and stability in learning. SAC achieves the best rewards but requires more training time, while A2C faces difficulties in continuous spaces. DDPG and TD3, despite t he good results, have high computational consumption, which limits their viability for real-time industrial applications. These findings highlight the importance of considering energy efficiency when choosing RL algorithms for robotic applications. As a future direction, we propose the implementation of these algorithms in a real-world environment, as well as the exploration of hybrid approaches that combine different strategies to improve accuracy and minimize energy consumption.

Palavras-chave

Reinforcement learning Autonomous robotics PoliciesPerformance Agent training

URI

http://hdl.handle.net/10198/35167

Citação

Bonjour, Pedro; Lopes, Rui Pedro (2026). Comparing RL policies for robotic pusher. In 5th International Conference OL2A. Cham: Springer Nature. p. 220–230. ISBN 9783032001399