bad-autoencoding / README.md
ivnle's picture
Upload README.md with huggingface_hub
13963a7 verified
---
license: apache-2.0
tags:
- vision
- ocr
- compression
- autoencoding
---
# Bad Autoencoding - Model Checkpoints
Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"**
Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick
## Links
- **Paper**: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643)
- **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding)
## Available Checkpoints
Naming convention: `{regime}_{config}_h{N}_{objective}[_recon-init]`
### Reconstruction
| Checkpoint | Regime | CR | PPL |
|------------|--------|-----|-----|
| `vision_base_h0_recon` | Vision base | 3.60 | 1.03 |
| `meanpool_w4s4_h0_recon` | Meanpool w4s4 | 3.97 | 1.04 |
| `conv1d_t250_h0_recon` | Conv1D t250 | 3.97 | 1.00 |
| `vision_tiny_h0_recon` | Vision tiny | 12.82 | 1.14 |
| `conv1d_t63_h0_recon` | Conv1D t63 | 15.38 | 1.01 |
### Language Modeling
| Checkpoint | Regime | CR | Init | PPL |
|------------|--------|-----|------|-----|
| `vision_base_h0_lm` | Vision base | 3.60 | Direct | 5.08 |
| `vision_base_h0_lm_recon-init` | Vision base | 3.60 | From recon | 5.06 |
| `text_ctx277_h0_lm` | Text ctx277 (Truncation) | 3.60 | Direct | 5.02 |
| `meanpool_w4s4_h0_lm_recon-init` | Meanpool w4s4 | 3.97 | From recon | 5.02 |
| `conv1d_t250_h0_lm_recon-init` | Conv1D t250 | 3.97 | From recon | 4.96 |
## Model Details
- **Architecture**: DeepSeek-OCR with vision encoder
- **Vision checkpoints**: Trained encoder (base=768x768, tiny=384x384)
- **Text checkpoints**: Truncation baseline (no vision encoder), context=277 tokens
- **Meanpool checkpoints**: Frozen encoder, window=4, stride=4
- **Conv1D checkpoints**: Trained hierarchical encoder (t250=CR 3.97, t63=CR 15.38)
- **Dataset**: 510k samples from FineWiki
## Usage
```python
from huggingface_hub import hf_hub_download
# Download a specific checkpoint
checkpoint_path = hf_hub_download(
repo_id="ivnle/bad-autoencoding",
filename="vision_base_h0_lm/model.pt",
repo_type="model"
)
```
## Citation
```bibtex
@article{lee2024optical,
title={Optical Context Compression Is Just (Bad) Autoencoding},
author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor},
journal={arXiv preprint arXiv:2512.03643},
year={2024}
}
```