PoLAR: Factorizing Extent and Mode in Latent Actions

Abstract

Latent action pretraining learns representations of visual change from pairs of observations, but existing methods typically encode each transition as a single unstructured representation that entangles transition extent and transition mode. We introduce Polar Latent Actions with Radial structure (PoLAR), which imposes a radial-direction structure on latent actions, encouraging radius to encode transition extent and direction to retain transition mode.

PoLAR uses temporal offset between two observations as a weak proxy for transition extent, encouraging latent action from observation pairs separated by larger temporal gaps to occupy larger radii. We instantiate this structure in hyperbolic space, whose expanding volume with radius offers a natural fit for more diverse transition modes at larger extents. Across in-task and large-scale pretraining settings, PoLAR improves downstream policy performance in simulation and real-world robot experiments, outperforming latent action baselines and strong pretrained VLAs.

Overview

PoLAR factorizes transition extent and mode in latent actions. — **Figure 1.** PoLAR uses temporal offset to order transition extent along radius, allowing similar transition modes to remain in similar directions. Sweeping the radius token with fixed direction increases decoded transition extent.

Evaluation Tasks

Results

Real Robot Rollouts

Rollouts shown at 2x playback.

Pick & Place Banana

PoLARSuccess

UniVLAFailed

Villa-XFailed

Cup Stack

PoLARSuccess

π_0.5Failed

UniVLAFailed

Open Pot & Banana

PoLARSuccess

UniVLAFailed

π_0.5Failed

Model Zoo

Public PoLAR checkpoints are available on Hugging Face.

Tokenizer

PoLAR Tokenizer

Radial-direction latent action tokenizer pretrained on BridgeData V2.

Hugging Face

VLA

PoLAR VLA

Latent VLA trained with PoLAR action tokens on BridgeData V2.

Hugging Face

Analysis

Temporal offset and radial supervision diagnostics. — **Figure 5.** Temporal offset is an effective proxy for transition extent, and PoLAR radii increase with temporal offset while flat baselines remain nearly constant.

Radius controls transition extent. — **Figure 6.** With direction tokens fixed, increasing the radial token produces progressively larger visual transitions.

Additional radius sweep examples. — **Appendix Figure.** Additional radius-sweep examples show the same behavior across more transitions.

Citation

@misc{jeong2026polar,
  title         = {PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning},
  author        = {Youngjoon Jeong and Jihwan Yu and Minsoo Jo and Junha Chun and Taesup Kim},
  year          = {2026},
  eprint        = {2606.21139},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {https://arxiv.org/abs/2606.21139}
}

PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning

PoLAR learns latent actions from observation pairs by separating transition extent from transition mode.

Abstract

Overview

Evaluation Tasks

Results

Real Robot Rollouts

Pick & Place Banana

Cup Stack

Open Pot & Banana

Model Zoo

PoLAR Tokenizer

PoLAR VLA

Analysis

Citation