|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- vision |
|
|
- ocr |
|
|
- compression |
|
|
- autoencoding |
|
|
--- |
|
|
|
|
|
# Bad Autoencoding - Model Checkpoints |
|
|
|
|
|
Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"** |
|
|
|
|
|
Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick |
|
|
|
|
|
## Links |
|
|
|
|
|
- **Paper**: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643) |
|
|
- **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding) |
|
|
|
|
|
## Available Checkpoints |
|
|
|
|
|
| Checkpoint | Objective | Hybrid | Training | PPL | |
|
|
|------------|-----------|--------|----------|-----| |
|
|
| `vision_base_h0_recon` | Reconstruction | 0 | - | 1.03 | |
|
|
| `vision_base_h0_lm` | LM | 0 | Direct | 5.08 | |
|
|
| `vision_base_h0_lm_recon-init` | LM | 0 | From reconstruction | 5.06 | |
|
|
|
|
|
## Naming Convention |
|
|
|
|
|
``` |
|
|
{regime}_{config}_h{N}_{objective}[_recon-init] |
|
|
``` |
|
|
|
|
|
| Field | Values | Description | |
|
|
|-------|--------|-------------| |
|
|
| regime | vision, conv1d, meanpool, text | Compression architecture | |
|
|
| config | base/small/tiny/large, t500/t250, w10s10, ctx525 | Regime-specific config | |
|
|
| h{N} | h0, h100 | Hybrid text tokens (0 = pure vision) | |
|
|
| objective | recon, lm | Training objective | |
|
|
| recon-init | (optional) | LM initialized from reconstruction checkpoint | |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture**: DeepSeek-OCR with trainable vision encoder |
|
|
- **Image Size**: 768x768 (base) |
|
|
- **Encoder Status**: Trained (not frozen) |
|
|
- **Dataset**: 510k samples from FineWiki |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download a specific checkpoint |
|
|
checkpoint_path = hf_hub_download( |
|
|
repo_id="ivnle/bad-autoencoding", |
|
|
filename="vision_base_h0_lm/model.pt", |
|
|
repo_type="model" |
|
|
) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{lee2024optical, |
|
|
title={Optical Context Compression Is Just (Bad) Autoencoding}, |
|
|
author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor}, |
|
|
journal={arXiv preprint arXiv:2512.03643}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|