|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- vision |
|
|
- ocr |
|
|
- compression |
|
|
- autoencoding |
|
|
--- |
|
|
|
|
|
# Bad Autoencoding - Model Checkpoints |
|
|
|
|
|
Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"** |
|
|
|
|
|
Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick |
|
|
|
|
|
## Links |
|
|
|
|
|
- **Paper**: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643) |
|
|
- **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding) |
|
|
|
|
|
## Available Checkpoints |
|
|
|
|
|
Naming convention: `{regime}_{config}_h{N}_{objective}[_recon-init]` |
|
|
|
|
|
### Reconstruction |
|
|
|
|
|
| Checkpoint | Regime | CR | PPL | |
|
|
|------------|--------|-----|-----| |
|
|
| `vision_base_h0_recon` | Vision base | 3.60 | 1.03 | |
|
|
| `meanpool_w4s4_h0_recon` | Meanpool w4s4 | 3.97 | 1.04 | |
|
|
| `conv1d_t250_h0_recon` | Conv1D t250 | 3.97 | 1.00 | |
|
|
| `vision_tiny_h0_recon` | Vision tiny | 12.82 | 1.14 | |
|
|
| `conv1d_t63_h0_recon` | Conv1D t63 | 15.38 | 1.01 | |
|
|
|
|
|
### Language Modeling |
|
|
|
|
|
| Checkpoint | Regime | CR | Init | PPL | |
|
|
|------------|--------|-----|------|-----| |
|
|
| `vision_base_h0_lm` | Vision base | 3.60 | Direct | 5.08 | |
|
|
| `vision_base_h0_lm_recon-init` | Vision base | 3.60 | From recon | 5.06 | |
|
|
| `text_ctx277_h0_lm` | Text ctx277 (Truncation) | 3.60 | Direct | 5.02 | |
|
|
| `meanpool_w4s4_h0_lm_recon-init` | Meanpool w4s4 | 3.97 | From recon | 5.02 | |
|
|
| `conv1d_t250_h0_lm_recon-init` | Conv1D t250 | 3.97 | From recon | 4.96 | |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture**: DeepSeek-OCR with vision encoder |
|
|
- **Vision checkpoints**: Trained encoder (base=768x768, tiny=384x384) |
|
|
- **Text checkpoints**: Truncation baseline (no vision encoder), context=277 tokens |
|
|
- **Meanpool checkpoints**: Frozen encoder, window=4, stride=4 |
|
|
- **Conv1D checkpoints**: Trained hierarchical encoder (t250=CR 3.97, t63=CR 15.38) |
|
|
- **Dataset**: 510k samples from FineWiki |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download a specific checkpoint |
|
|
checkpoint_path = hf_hub_download( |
|
|
repo_id="ivnle/bad-autoencoding", |
|
|
filename="vision_base_h0_lm/model.pt", |
|
|
repo_type="model" |
|
|
) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{lee2024optical, |
|
|
title={Optical Context Compression Is Just (Bad) Autoencoding}, |
|
|
author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor}, |
|
|
journal={arXiv preprint arXiv:2512.03643}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|