Tether Autonomous Functional Play with Correspondence-Driven Trajectory Warping

William Liang1,2, Sam Wang1, Hung-Ju Wang1,3, Osbert Bastani1, Yecheng Jason Ma1,3†, Dinesh Jayaraman1†

1University of Pennsylvania

2University of California, Berkeley

3Dyna Robotics

Equal advising

Paper Code Thread ICLR 2026

Abstract

TLDR. Tether performs autonomous real-world functional play involving structured, task-directed interactions. We introduce a policy that performs trajectory warping anchored by keypoint correspondences, which is extremely data efficient and robust to significant spatial and semantic environment variation. Running the policy within a VLM-guided multi-task loop, we generate a stream of play data that consistently improves downstream policy learning over time.

The ability to conduct and learn from interaction and experience is a central challenge in robotics, offering a scalable alternative to labor-intensive human demonstrations. However, realizing such "play" requires (1) a policy robust to diverse, potentially out-of-distribution environment states, and (2) a procedure that continuously produces useful robot experience. To address these challenges, we introduce Tether, a method for autonomous functional play involving structured, task-directed interactions. First, we design a novel open-loop policy that warps actions from a small set of source demonstrations (≤10) by anchoring them to semantic keypoint correspondences in the target scene. We show that this design is extremely data-efficient and robust even under significant spatial and semantic variations. Second, we deploy this policy for autonomous functional play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement, guided by the visual understanding capabilities of vision-language models. This procedure generates diverse, high-quality datasets with minimal human intervention. In a household-like multi-object setup, our method is the first to perform many hours of autonomous multi-task play in the real world starting from only a handful of demonstrations. This produces a stream of data that consistently improves the performance of closed-loop imitation policies over time, ultimately yielding over 1000 expert-level trajectories and training policies competitive with those learned from human-collected demonstrations.

Method

Correspondence-Driven Trajectory Warping. Given a few demos, our policy computes keypoint correspondences and produces a warped trajectory action plan, which is executed open-loop.
Autonomous Multi-Task Play with Vision-Language Models. Our play procedure continuously runs the Tether policy, cycling across different tasks and querying a VLM for plan generation and success detection.

Robust Imitation

Tasks. Our Tether policy is robust to spatial and semantic environment variation, as evaluated on in-distribution (first row) and out-of-distribution (second row) objects, as well as challenging manipulation skills (third row).
Main Policy Comparison. Our Tether policy surpasses imitation learning baselines in the low data regime.

Autonomous Play

Timelapse. Tether performs over 24 hours of autonomous real-world play with minimal human intervention. We record a subsection of its run, played at 100x speed.
Autonomous Play Statistics. In 26 hours of play, Tether produces over 1000 trajectories across 6 tasks (left) and significantly expands data diversity, visualized as heatmaps of object poses at the beginning and end of play (right).
Generated Trajectories. Tether produces expert-level trajectories with randomization and resets induced by play.
Spontaneous Correction. While a flipped bowl is nearly impossible to fix with one arm and requires intervention, Tether corrects this case by chance, highlighting that at scale, coincidences may result in unexpected behaviors.
Downstream Policy Learning Results. Tether's stream of data consistently improves policy performance over time, achieving results competitive with policies trained on an equal number of human-collected demos (black).
Comparison on Play Distribution. Evaluated on the distribution of environment states encountered during play, Tether policy is more robust and outperforms more data-hungry diffusion policies.

Citation

@misc{liang2026tether, title={Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping}, author={William Liang and Sam Wang and Hung-Ju Wang and Osbert Bastani and Yecheng Jason Ma and Dinesh Jayaraman}, year={2026}, eprint={2603.03278}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2603.03278}, }