metadata
license: apache-2.0
tags:
- vision
- ocr
- compression
- autoencoding
Bad Autoencoding - Model Checkpoints
Checkpoints for the paper: "Optical Context Compression Is Just (Bad) Autoencoding"
Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick
Links
Available Checkpoints
| Checkpoint | Objective | Hybrid Tokens | Training | PPL |
|---|---|---|---|---|
vision_base_hybrid0_reconstruction |
Reconstruction | 0 | Direct | 1.03 |
vision_base_hybrid0_lm_direct |
Language Modeling | 0 | Direct (no recon init) | 5.08 |
vision_base_hybrid0_lm_recon_init |
Language Modeling | 0 | Initialized from reconstruction | 5.06 |
Naming Convention
{regime}_{size}_{hybrid}_[reconstruction|lm]_[direct|recon_init]
- regime: vision, conv1d_residual, meanpool, text
- size: tiny, small, base, large (for vision); compression target (for others)
- hybrid: hybrid0 (pure vision/compression) or hybrid100 (100 text tokens + vision)
- objective: reconstruction or lm (language modeling)
- training: direct (trained from scratch) or recon_init (initialized from reconstruction checkpoint)
Model Details
- Architecture: DeepSeek-OCR with trainable vision encoder
- Image Size: 768x768 (base)
- Encoder Status: Trained (not frozen)
- Dataset: 510k samples from FineWiki
Usage
from huggingface_hub import hf_hub_download
# Download a specific checkpoint
checkpoint_path = hf_hub_download(
repo_id="ivnle/bad-autoencoding",
filename="vision_base_hybrid0_lm_direct/model.pt",
repo_type="model"
)
Citation
@article{lee2024optical,
title={Optical Context Compression Is Just (Bad) Autoencoding},
author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor},
journal={arXiv preprint arXiv:2512.03643},
year={2024}
}