| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 2.98 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
Reinforcement learning (RL) has been consolidated as a promising approach to optimizing robotic tasks, allowing the improvement of performance and energy efficiency. This study investigates the effectiveness of five RL algorithms in the Pusher environment. Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Soft Actor-Critic (SAC) and Twin Delayed Deep Deterministic Policy Gradient (TD3). We evaluated training time, computational efficiency, and reward values to identify the most balanced solution between accuracy and energy consumption. The results indicate that the PPO offers the best compromise between performance and efficiency, with reduced training time and stability in learning. SAC achieves the best rewards but requires more training time, while A2C faces difficulties in continuous spaces. DDPG and TD3, despite t he good results, have high computational consumption, which limits their viability for real-time industrial applications. These findings highlight the importance of considering energy efficiency when choosing RL algorithms for robotic applications. As a future direction, we propose the implementation of these algorithms in a real-world environment, as well as the exploration of hybrid approaches that combine different strategies to improve accuracy and minimize energy consumption.
Description
Keywords
Reinforcement learning Autonomous robotics PoliciesPerformance Agent training
Pedagogical Context
Citation
Bonjour, Pedro; Lopes, Rui Pedro (2026). Comparing RL policies for robotic pusher. In 5th International Conference OL2A. Cham: Springer Nature. p. 220–230. ISBN 9783032001399
Publisher
Springer Nature
