WIZARD: Robotic Policy Adaptation via Weight-Space Meta-Learning

TL;DR.

WIZARD enables a robot to learn a new task from a single video demonstration, without any fine-tuning. It directly generates the "brain surgery" (policy weights) needed for the robot to adapt, improving performance on unseen tasks by up to 15x compared to standard methods.

Zero-Shot Rollout Demonstrations.

LIBERO-Spatial: Pick up the black bowl on the stove and place it on the plate

LIBERO-Object: Pick up the orange juice and place it in the basket

LIBERO-Goal: Put the wine bottle on top of the cabinet

LIBERO-Spatial: Pick up the black bowl on the cookie box and place it on the plate

LIBERO-Object: Pick up the cream cheese and place it in the basket

LIBERO-Goal: Put the bowl on the plate

Abstract.

Modern imitation-based robotic policies are predominantly expert systems: large pretrained Vision-Language-Action (VLA) models typically require test-time fine-tuning on task data to specialize their behavior. We propose a new paradigm for robotic test-time adaptation: generating task-specific policy parameters directly in weight space, enabling scalable zero-shot adaptation to unseen tasks.

We introduce WIZARD, a meta-network that generates low-rank adapters for a frozen VLA policy from task evidence composed of language instructions and short demonstration videos. During meta-training, we construct a dataset that pairs task descriptions with their corresponding task-specific LoRA updates. At inference time, given only a prompt and a single video from an unseen task, WIZARD generates the task-specific LoRA adapter in a single forward pass. Experiments on the LIBERO benchmark show that the generated adapters improve performance by up to ~2x on unseen datasets and up to ~15x on unseen tasks.

Key Contributions.

Zero-Shot Adaptation in Weight-Space: A new paradigm for robotic policy adaptation that generates task-specific weights directly, bypassing test-time fine-tuning.
The WIZARD Architecture: A novel meta-network that conditions a frozen VLA backbone with a generated LoRA adapter from a single video demonstration.
Strong Empirical Performance: State-of-the-art zero-shot results on the LIBERO benchmark, demonstrating significant gains in generalization and efficiency.

The WIZARD Architecture.

WIZARD re-frames robotic adaptation as a direct parameter inference problem. Instead of slow, gradient-based fine-tuning, it learns to map task evidence directly to the parameter delta (a LoRA adapter) needed to specialize a general policy. This is done in three stages:

Perception: A multimodal encoder processes a language prompt and a single demonstration video, creating a compact "task embedding" that captures the task's semantics and kinematics.
Reasoning: The core meta-network, the Adapter Generator, takes this task embedding and synthesizes the weights for a LoRA adapter in a single forward pass.
Action: The generated adapter is injected into the frozen, pre-trained VLA backbone. The resulting specialized policy can then be rolled out to execute the new task, zero-shot.

Zero-Shot Generalization: LIBERO-Spatial.

We evaluate against standard fine-tuning under strict held-out distribution shifts. In LIBERO-Spatial, the baseline MT-VLA struggles to adapt without gradient updates. Conversely, WIZARD's generated adapters correctly interpret relational spatial instructions and adapt kinematics, achieving a ~2x performance increase over the baseline entirely zero-shot.

Method	T1	T2	T3	T4	T5	T6	T7	T8	T9	T10	Avg.
MT-VLA (Baseline)	0.22	0.00	0.56	0.86	0.00	0.02	0.00	0.18	0.02	0.00	0.19
WIZARD (Ours)	0.90	0.12	0.82	0.84	0.08	0.28	0.10	0.76	0.08	0.00	0.40
π₀.₅ Experts (Upper Bound)	1.00	0.98	1.00	0.94	0.92	0.98	0.96	0.98	0.96	0.96	0.97

Ablations.

Our analysis reveals that task-level embeddings preserve discriminative semantics necessary for effective parameter generation, outperforming broader dataset-level conditioning. Furthermore, a single demonstration episode is sufficient to construct an informative task embedding; expanding the support size yields diminishing returns. Finally, ablating the input modalities confirms that both text (providing high-level intent) and visual demonstrations (providing geometric and kinematic priors) are critical to prevent performance collapse.

Citation.

@misc{bianchi2026wizard,
    title={WIZARD: Robotic Policy Adaptation via Weight-Space Meta-Learning}, 
    author={Christian Bianchi and Siamak Yousefi and Alessio Sampieri and Luca Rigazio and Fabio Galasso and Luca Franco},
    year={2026},
    publisher={ItalAI}
}

WIZARD.

Robotic Policy Adaptationvia Weight-Space Meta-Learning