Repository logo
 
Loading...
Thumbnail Image
Publication

Comparing RL policies for robotic pusher

Use this identifier to reference this record.
Name:Description:Size:Format: 
Comparing RL Policies.pdf2.98 MBAdobe PDF Download

Advisor(s)

Abstract(s)

Reinforcement learning (RL) has been consolidated as a promising approach to optimizing robotic tasks, allowing the improvement of performance and energy efficiency. This study investigates the effectiveness of five RL algorithms in the Pusher environment. Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Soft Actor-Critic (SAC) and Twin Delayed Deep Deterministic Policy Gradient (TD3). We evaluated training time, computational efficiency, and reward values to identify the most balanced solution between accuracy and energy consumption. The results indicate that the PPO offers the best compromise between performance and efficiency, with reduced training time and stability in learning. SAC achieves the best rewards but requires more training time, while A2C faces difficulties in continuous spaces. DDPG and TD3, despite t he good results, have high computational consumption, which limits their viability for real-time industrial applications. These findings highlight the importance of considering energy efficiency when choosing RL algorithms for robotic applications. As a future direction, we propose the implementation of these algorithms in a real-world environment, as well as the exploration of hybrid approaches that combine different strategies to improve accuracy and minimize energy consumption.

Description

Keywords

Reinforcement learning Autonomous robotics PoliciesPerformance Agent training

Pedagogical Context

Citation

Bonjour, Pedro; Lopes, Rui Pedro (2026). Comparing RL policies for robotic pusher. In 5th International Conference OL2A. Cham: Springer Nature. p. 220–230. ISBN 9783032001399

Organizational Units

Journal Issue