DexEXO: A Wearability-First Dexterous Exoskeleton for Operator-Agnostic Demonstration and Learning

Under Review

TL;DR: DexEXO enables scalable dexterous demonstrations through a user-agnostic design and exact visual-kinematic embodiment.

Abstract

Read more Read less

Scaling dexterous robot learning is constrained by the difficulty of collecting high-quality demonstrations across diverse operators. Existing wearable interfaces often trade comfort and cross-user adaptability for kinematic fidelity, while embodiment mismatch between demonstration and deployment requires visual post-processing before policy training. We present DexEXO, a wearability-first hand exoskeleton that aligns visual appearance, contact geometry, and kinematics at the hardware level. DexEXO features a pose-tolerant thumb mechanism and a slider-based finger interface analytically modeled to support hand lengths from 140 mm to 217 mm, reducing operator-specific fitting and enabling scalable cross-operator data collection. A passive hand visually matches the deployed robot, allowing direct policy training from raw wrist-mounted RGB observations. User studies demonstrate improved comfort and usability compared to prior wearable systems. Using visually aligned observations alone, we train diffusion policies that achieve competitive performance while substantially simplifying the end-to-end pipeline. These results show that prioritizing wearability and hardware-level embodiment alignment reduces both human and algorithmic bottlenecks without sacrificing task performance.

Hardware Demonstrations

Click Video to Focus

Policy Rollouts

Click Video to Focus

Framework

Hardware Design

DexEXO hardware design framework
Mechanical overview of DexEXO. DexEXO integrates a linkage-driven wearable exoskeleton, a passive data-capture hand, and an onboard sensing/power module. Insets highlight key subsystems: (a) passive finger slider for cross-user fit, (b) pose-tolerant thumb coupling interface, (c) parallel four-bar finger linkage for motion transmission, and (d) passive hand thumb that reproduces the intended thumb DOF.

Data Collection to Policy Deployment Pipeline

Data collection to policy deployment learning framework
An overview of the full demonstration data modalities, policy training, and inference with visual-aligned observations.

The passive hand enables aligned visual observations, allowing direct raw image-to-training without any post-processing. Left is a demonstration video collected from the wrist camera, and the right video shows the wrist camera view during a policy rollout (8x speed).

Results

User Study: Subjective Feedback Results

Subjective feedback results before reveal Subjective feedback results after reveal
Finger independence is not applicable to teleoperation, as the user's natural hand motion is unconstrained.
A user study with 14 university participants (7 female, 7 male; ages 18-27) with hand lengths ranging from 165 mm to 195 mm evaluated DexEXO against DexUMI and vision-based teleoperation. DexEXO received significantly higher ratings for comfort, lower frustration, and greater finger independence, supporting its wearability-first design that accommodates a wide range of hand sizes without rigid alignment or per-user calibration.

User Study: Quantitative Performance Results

Quantitative user study metrics before reveal Quantitative user study metrics after reveal
Completion time was defined as the average time from picking up scissors to finishing the cut, time for 5 page flips, average time for a successful 3-cup stack, and time to play 16 piano notes, respectively.
* DexUMI failed the scissors-cutting task because its added exoskeleton structure prevented the fingers from fitting into the scissor handles, whereas an actual XHand would have fit, while vision-based teleoperation lacked the precision and responsiveness needed to manipulate the scissors effectively.
DexEXO achieved the highest success rates on scissors cutting, piano playing, page flipping, and cup stacking, demonstrating robust cross-user operation.

Policy Evaluation Comparisons

Policy evaluation chart before reveal Policy evaluation chart after reveal
Above shows the policy evaluation results for the block pick and place, carton opening, and bottle grasping tasks. Ablation study with absolute finger values used as a policy condition shows that the extra input provides no significant difference in performance.

Full Video

Citation

BibTeX


            Under review