File size: 2,243 Bytes
4ff3cec 4ec698c 4ff3cec 4ec698c 4ff3cec 942beb7 f4927e9 4ccfaf4 f4927e9 4ccfaf4 f4927e9 4ccfaf4 942beb7 4ccfaf4 4ff3cec 4ec698c 4ff3cec 4ccfaf4 4ff3cec 4ec698c 4ff3cec 4ec698c 4ff3cec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
license: apache-2.0
tags:
- vision
- ocr
- compression
- autoencoding
---
# Bad Autoencoding - Model Checkpoints
Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"**
Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick
## Links
- **Paper**: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643)
- **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding)
## Available Checkpoints
### Vision (base, 768x768)
| Checkpoint | Objective | Hybrid | Training | CR | PPL |
|------------|-----------|--------|----------|-----|-----|
| `vision_base_h0_recon` | Reconstruction | 0 | - | 3.60 | 1.03 |
| `vision_base_h0_lm` | LM | 0 | Direct | 3.60 | 5.08 |
| `vision_base_h0_lm_recon-init` | LM | 0 | From recon | 3.60 | 5.06 |
### Meanpool (w4s4)
| Checkpoint | Objective | Hybrid | Training | CR | PPL |
|------------|-----------|--------|----------|-----|-----|
| `meanpool_w4s4_h0_recon` | Reconstruction | 0 | - | 3.97 | 1.04 |
| `meanpool_w4s4_h0_lm_recon-init` | LM | 0 | From recon | 3.97 | 5.02 |
## Naming Convention
```
{regime}_{config}_h{N}_{objective}[_recon-init]
```
| Field | Values | Description |
|-------|--------|-------------|
| regime | vision, conv1d, meanpool, text | Compression architecture |
| config | base/small/tiny/large, t500/t250, w4s4/w10s10, ctx525 | Regime-specific config |
| h{N} | h0, h100 | Hybrid text tokens (0 = pure vision/compression) |
| objective | recon, lm | Training objective |
| recon-init | (optional) | LM initialized from reconstruction checkpoint |
## Model Details
- **Architecture**: DeepSeek-OCR with trainable vision encoder
- **Encoder Status**: Trained (not frozen)
- **Dataset**: 510k samples from FineWiki
## Usage
```python
from huggingface_hub import hf_hub_download
# Download a specific checkpoint
checkpoint_path = hf_hub_download(
repo_id="ivnle/bad-autoencoding",
filename="vision_base_h0_lm/model.pt",
repo_type="model"
)
```
## Citation
```bibtex
@article{lee2024optical,
title={Optical Context Compression Is Just (Bad) Autoencoding},
author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor},
journal={arXiv preprint arXiv:2512.03643},
year={2024}
}
```
|