“Learning on the Job” Discover our latest research to appear in the IEEE RA-L journal.

We are pleased to announce that our paper entitled “Learning on the Job: Long-Term Behavioural Adaptation in Human-Robot Interactions” by Francesco Del Duchetto and prof. Marc Hanheide has been accepted for publication in the IEEE Robotics and Automation Letters (RA-L) journal. This work originates from the research produced with the long term deployment of our robot Lindsey at the Collection Museum as part of the  “Lindsey – A Robot Tour Guide” project.

With this work, we design a learning framework to allow autonomous robots interacting socially with people to adapt their own behaviour online. In particular, we propose this approach with the goal of improving the ability of Lindsey to deliver engaging guided tours to the museum visitors.

A picture of Lindsey in the archaeological gallery of The Collection museum during a tour guide interaction. The robot learning framework detects user engagement online and optimises its interaction policy for maximising such overall engagement using Reinforcement Learning.

The robot is programmed with an initial “static” policy which is then updated over time as the robot explores different states and actions choices. To allow the learning to progress continually, effectively balancing the exploration of new states and actions with the exploitation of the successful ones already seen, we use UCBVI, a Reinforcement Learning (RL) algorithm based on the principle of “optimism in the face of uncertainty”. The learning signal for optimising the policy is the users’ engagement level detected in real-time during the interaction, from the robot head camera, by our learned regression model presented in our previous work “Are You Still With Me?”.

Average number of stops visited in the tours and rate of tours completed to the end per week in the year. The learning period is highlighted in green.

Results show that after a couple of months of exploration, the robot policy learned to maintain the engagement of users for longer, with an increase of 22.8% over the initial static policy in the number of items visited during the tour and a 30% increase in the probability of completing the tour.

The preprint of this article is available on arXiv at: https://arxiv.org/abs/2203.10518.