Jorge Cortés


Exploiting bias for cooperative planning in multi-agent tree search
A. Ma, M. Ouimet, J. Cortés
IEEE Robotics and Automation Letters 5 (2) (2020), 1819-1826


Graph search over states and actions is a valuable tool for robotic planning and navigation. However, the required computation is sensitive to the size of the state and action spaces, a fact which is further exacerbated in multi-agent planning by the number of agents and the presence of sparse reward signals dependent on the cooperation of agents. To tackle these problems, we introduce an algorithm that is pre-trained in a centralized fashion but implemented on robots in a distributed way at runtime. The centralized portion uses imitation learning to iteratively construct policies that help guide an individual agent's own runtime search as well as predict other agents' future actions by exploiting previously discovered joint actions. Our algorithm includes a novel method of tree search based on a mixture of the individual and joint action space, which can be interpreted as a cascading effect where agents are biased by exploration of new actions, exploitation of previously profitable ones, and recommendation provided by deep neural nets. Simulations show the efficacy of the proposed method in cooperative scenarios with sparse rewards.


Mechanical and Aerospace Engineering, University of California, San Diego
9500 Gilman Dr, La Jolla, California, 92093-0411

Ph: 1-858-822-7930
Fax: 1-858-822-3107

cortes at
Skype id: jorgilliyo