Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,74 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- self-supervised-learning
|
| 5 |
+
- world-models
|
| 6 |
+
- equivariance
|
| 7 |
+
- vision
|
| 8 |
+
- pytorch
|
| 9 |
+
datasets:
|
| 10 |
+
- 3DIEBench
|
| 11 |
+
- STL10
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
|
| 15 |
+
|
| 16 |
+
<p align="center">
|
| 17 |
+
<a href="https://openreview.net/forum?id=GKt3VRaCU1"><img src="https://img.shields.io/badge/NeurIPS%202025-Paper-blue" alt="Paper"></a>
|
| 18 |
+
<a href="https://hafezgh.github.io/seq-jepa/"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
|
| 19 |
+
<a href="https://github.com/hafezgh/seq-jepa"><img src="https://img.shields.io/badge/GitHub-Code-black" alt="Code"></a>
|
| 20 |
+
</p>
|
| 21 |
+
|
| 22 |
+
## Model Description
|
| 23 |
+
|
| 24 |
+
By processing views sequentially with action conditioning, seq-JEPA naturally segregates representations for equivariance- and invariance-demanding tasks.
|
| 25 |
+
|
| 26 |
+
## Available Checkpoints
|
| 27 |
+
|
| 28 |
+
| Checkpoint | Dataset | Training | Download |
|
| 29 |
+
|------------|---------|----------|----------|
|
| 30 |
+
| `3diebench_rot_seqlen3.pth` | 3DIEBench | seq-len=3, rotation conditioning | [Download](https://huggingface.co/Hafez/seq-JEPA/resolve/main/3diebench_rot_seqlen3.pth) |
|
| 31 |
+
| `stl10_pls.pth` | STL10 | PLS (predictive learning across saccades) | [Download](https://huggingface.co/Hafez/seq-JEPA/resolve/main/stl10_pls.pth) |
|
| 32 |
+
|
| 33 |
+
## Usage
|
| 34 |
+
|
| 35 |
+
First, clone the repository to access model definitions:
|
| 36 |
+
|
| 37 |
+
git clone https://github.com/hafezgh/seq-jepa.git
|
| 38 |
+
cd seq-jepaThen load the checkpoints:
|
| 39 |
+
|
| 40 |
+
import torch
|
| 41 |
+
from models import SeqJEPA_Transforms, SeqJEPA_PLS
|
| 42 |
+
|
| 43 |
+
# 3DIEBench checkpoint
|
| 44 |
+
kwargs = {
|
| 45 |
+
"num_heads": 4, "n_channels": 3, "num_enc_layers": 3,
|
| 46 |
+
"num_classes": 55, "act_cond": True, "pred_hidden": 1024,
|
| 47 |
+
"act_projdim": 128, "act_latentdim": 4, "cifar_resnet": False,
|
| 48 |
+
"learn_act_emb": True
|
| 49 |
+
}
|
| 50 |
+
model = SeqJEPA_Transforms(img_size=128, ema=True, ema_decay=0.996, **kwargs)
|
| 51 |
+
ckpt = torch.load('3diebench_rot_seqlen3.pth')
|
| 52 |
+
model.load_state_dict(ckpt['model_state_dict'])
|
| 53 |
+
|
| 54 |
+
# STL10 PLS checkpoint
|
| 55 |
+
kwargs = {
|
| 56 |
+
"num_heads": 4, "n_channels": 3, "num_enc_layers": 3,
|
| 57 |
+
"num_classes": 10, "act_cond": True, "pred_hidden": 1024,
|
| 58 |
+
"act_projdim": 128, "act_latentdim": 2, "cifar_resnet": True,
|
| 59 |
+
"learn_act_emb": True, "pos_dim": 2
|
| 60 |
+
}
|
| 61 |
+
model = SeqJEPA_PLS(fovea_size=32, img_size=96, ema=True, ema_decay=0.996, **kwargs)
|
| 62 |
+
ckpt = torch.load('stl10_pls.pth')
|
| 63 |
+
model.load_state_dict(ckpt['model_state_dict'])## Citation
|
| 64 |
+
|
| 65 |
+
## Citation
|
| 66 |
+
|
| 67 |
+
@inproceedings{
|
| 68 |
+
ghaemi2025seqjepa,
|
| 69 |
+
title={seq-{JEPA}: Autoregressive Predictive Learning of Invariant-Equivariant World Models},
|
| 70 |
+
author={Hafez Ghaemi and Eilif Benjamin Muller and Shahab Bakhtiari},
|
| 71 |
+
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
|
| 72 |
+
year={2025},
|
| 73 |
+
url={https://openreview.net/forum?id=GKt3VRaCU1}
|
| 74 |
+
}
|