SmolVLA Base (ONNX Export)

This repository contains an ONNX export of the SmolVLA base policy model from the LeRobot ecosystem.
SmolVLA is Hugging Face’s lightweight vision‑language‑action model for robotics:contentReference. The original model has roughly 450 million parameters and is designed to be fine‑tuned on robot datasets collected with LeRobot.

The ONNX export in this repo preserves the same weights and behavior as the PyTorch model lerobot/smolvla_base, but packages the policy as several smaller ONNX graphs to enable hardware‑agnostic inference via ONNXRuntime.

The export splits the SmolVLA architecture into multiple components. Each .onnx file corresponds to a specific part of the model:

File	Role in the SmolVLA architecture
`smolvlm_vision.onnx`	Vision encoder; processes RGB camera frames and produces visual embeddings.
`smolvlm_text.onnx`	Text encoder; converts tokenized instructions into language embeddings.
`smolvlm_expert_prefill.onnx`	“Prefill” stage of the action expert; conditions on vision and language context.
`smolvlm_expert_decode.onnx`	“Decode” stage of the action expert; autoregressively generates action tokens.
`state_projector.onnx`	Projects the robot’s sensorimotor state into the model’s latent space.
`time_in_projector.onnx`	Projects the current timestep into the latent space.
`time_out_projector.onnx`	Projects internal time features back into the expert.
`action_in_projector.onnx`	Projects previous action chunks into the latent space (for chunked generation).
`action_out_projector.onnx`	Projects the model’s output back into continuous control actions.

All files are exported at opset 17 and use static shapes.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Model tree for ainekko/smolvla_base_onnx

Base model

lerobot/smolvla_base

Quantized

(1)

this model

ainekko
/

smolvla_base_onnx

SmolVLA Base (ONNX Export)

Contents

Model tree for ainekko/smolvla_base_onnx

Dataset used to train ainekko/smolvla_base_onnx