XR technologies will make it possible for humans to interact with AI agents that are virtually embodied in their environments. This could enable powerful new forms of AI-human collaboration, but will require major advances in both the fundamental capabilities of agents and their ability to interact smoothly with the user. Breakthroughs in spatial awareness, gesture recognition, shared control, collaborative intelligence and visualisation techniques will be essential. What are the outstanding challenges in each of these areas, how will they be overcome and what kind of AI-human collaboration will this enable over the next 5, 10 and 25 years?