New ICRA and RAL papers

We got three new ICRA papers:

  1. Farraj, F. B.; Osa, T.; Pedemonte, N.; Peters, J.; Neumann, G.; Giordano, P.R.; (2017). A Learning-based Shared Control Architecture for Interactive Task Execution, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA).

    Shared control is a key technology for various robotic applications in which a robotic system and a human operator are meant to collaborate efficiently. In order to achieve efficient task execution in shared control, it is essential to predict the desired behavior for a given situation or context to simplify the control task for the human operator. To do this prediction, we use Learning from Demonstration (LfD), which is a popular approach for transferring human skills to robots.
    We encode the demonstrated behavior as trajectory distributions and generalize the learned distributions to new situations. The goal of this paper is to present a shared control framework that uses learned expert distributions to gain more autonomy. Our approach controls the balance between the controller’s autonomy and the human preference based on the distributions of the demonstrated trajectories. Moreover, the learned distributions are autonomously refined from collaborative task executions, resulting in a master-slave system with increasing autonomy that requires less user input with an increasing number of task executions. We experimentally validated that our shared control approach enables efficient task executions. Moreover, the conducted experiments demonstrated that the developed system improves its performances through interactive task executions with our shared control.

  2. End, F.; Akrour, R.; Peters, J.; Neumann, G. (2017). Layered Direct Policy Search for Learning Hierarchical Skills, Proceedings of the International Conference on Robotics and Automation (ICRA).

    Solutions to real world robotic tasks often require complex behaviors in high dimensional continuous state and action spaces. Reinforcement Learning (RL) is aimed at learning such behaviors but often fails for lack of scalability. To address this issue, Hierarchical RL (HRL) algorithms leverage hierarchical policies to exploit the structure of a task. However, many HRL algorithms rely on task specific knowledge such as a set of predefined sub-policies or sub-goals. In this paper we propose a new HRL algorithm based on information theoretic principles to autonomously uncover a diverse set of sub-policies and their activation policies. Moreover, the learning process mirrors the policys structure and is thus also hierarchical, consisting on a set of independent optimization problems. The hierarchical structure of the learning process allows us to control the learning rate of the sub-policies and the gating individually and add specific information theoretic constraints to each layer to ensure the diversification of the sub-policies. We evaluate our algorithm on two high dimensional continuous tasks and experimentally demonstrate its ability to autonomously discover a rich set of sub-policies.

  3. Gabriel, A.; Akrour, R.; Peters, J.; Neumann, G. (2017). Empowered Skills, Proceedings of the International Conference on Robotics and Automation (ICRA).

    Robot Reinforcement Learning (RL) algorithms return a policy that maximizes a global cumulative reward signal but typically do not create diverse behavior. Hence, the policy will typically only capture a single solution of a task. However, many motor tasks have a large variety of solutions and the knowledge about these solutions can have several advantages. For example, in an adversarial setting such as robot table tennis, the lack of diversity renders the behavior predictable and hence easy to counter for the opponent. In an interactive setting such as learning from human feedback, an emphasis on diversity gives the human more opportunity for guiding the robot and avoid the latter to be stuck in local minima of the task. In order to increase diversity of the learned behaviors, we leverage prior work on intrinsic motivation and empowerment. We derive a new intrinsic motivation signal by enriching the description of a task with an outcome space, representing interesting aspects of a sensorimotor stream. For example, in table tennis, the outcome space could be given by the return position and return ball speed. The intrinsic motivation is now given by the diversity of future outcomes, a concept also known as empowerment. We derive a new policy search algorithm that maximizes a trade-off between the extrinsic reward and this intrinsic motivation criterion. Experiments on a planar reaching task and simulated robot table tennis demonstrate that our algorithm can learn a diverse set of behaviors within the area of interest of the tasks.

and one RAL paper accepted:

  1. Osa, T.; Ghalamzan E. A. M.;, Stolkin, R.; Lioutikov, R.; Peters, J.; Neumann, G. Guiding Trajectory Optimization by Demonstrated Distributions, IEEE Robotics and Automation Letters (RA-L), IEEE.

    Trajectory optimization is an essential tool for motion planning under multiple constraints of robotic manipulators. Optimization-based methods can explicitly optimize a trajectory by leveraging prior knowledge of the system and have been used in various applications such as collision avoidance. However, these methods often require a hand-coded cost function in order to achieve the desired behavior. Specifying such cost function for a complex desired behavior, e.g., disentangling a rope, is a nontrivial task that is often even infeasible. Learning from demonstration (LfD) methods offer an alternative way to program robot motion. LfD methods are less dependent on analytical models and instead learn the behavior of experts implicitly from the demonstrated trajectories. However, the problem of adapting the demonstrations to new situations, e.g., avoiding newly introduced obstacles, has not been fully investigated in the literature. In this paper, we present a motion planning framework that combines the advantages of optimization-based and demonstration-based methods. We learn a distribution of trajectories demonstrated by human experts and use it to guide the trajectory optimization process. The resulting trajectory maintains the demonstrated behaviors, which are essential to performing the task successfully, while adapting the trajectory to avoid obstacles. In simulated experiments and with a real robotic system, we verify that our approach optimizes the trajectory to avoid obstacles and encodes the demonstrated behavior in the resulting trajectory.

Congratulations to all authors!