File size: 2,243 Bytes
4ff3cec
 
 
 
 
 
 
 
 
 
 
4ec698c
 
 
4ff3cec
 
 
4ec698c
4ff3cec
 
 
 
942beb7
 
 
 
 
 
 
 
 
 
 
 
 
 
f4927e9
4ccfaf4
f4927e9
4ccfaf4
 
 
f4927e9
4ccfaf4
 
 
942beb7
 
4ccfaf4
 
4ff3cec
 
 
 
 
4ec698c
4ff3cec
 
 
 
 
 
 
 
 
4ccfaf4
4ff3cec
 
 
 
 
 
 
4ec698c
4ff3cec
4ec698c
 
 
4ff3cec
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: apache-2.0
tags:
  - vision
  - ocr
  - compression
  - autoencoding
---

# Bad Autoencoding - Model Checkpoints

Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"**

Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick

## Links

- **Paper**: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643)
- **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding)

## Available Checkpoints

### Vision (base, 768x768)

| Checkpoint | Objective | Hybrid | Training | CR | PPL |
|------------|-----------|--------|----------|-----|-----|
| `vision_base_h0_recon` | Reconstruction | 0 | - | 3.60 | 1.03 |
| `vision_base_h0_lm` | LM | 0 | Direct | 3.60 | 5.08 |
| `vision_base_h0_lm_recon-init` | LM | 0 | From recon | 3.60 | 5.06 |

### Meanpool (w4s4)

| Checkpoint | Objective | Hybrid | Training | CR | PPL |
|------------|-----------|--------|----------|-----|-----|
| `meanpool_w4s4_h0_recon` | Reconstruction | 0 | - | 3.97 | 1.04 |
| `meanpool_w4s4_h0_lm_recon-init` | LM | 0 | From recon | 3.97 | 5.02 |

## Naming Convention

```
{regime}_{config}_h{N}_{objective}[_recon-init]
```

| Field | Values | Description |
|-------|--------|-------------|
| regime | vision, conv1d, meanpool, text | Compression architecture |
| config | base/small/tiny/large, t500/t250, w4s4/w10s10, ctx525 | Regime-specific config |
| h{N} | h0, h100 | Hybrid text tokens (0 = pure vision/compression) |
| objective | recon, lm | Training objective |
| recon-init | (optional) | LM initialized from reconstruction checkpoint |

## Model Details

- **Architecture**: DeepSeek-OCR with trainable vision encoder
- **Encoder Status**: Trained (not frozen)
- **Dataset**: 510k samples from FineWiki

## Usage

```python
from huggingface_hub import hf_hub_download

# Download a specific checkpoint
checkpoint_path = hf_hub_download(
    repo_id="ivnle/bad-autoencoding",
    filename="vision_base_h0_lm/model.pt",
    repo_type="model"
)
```

## Citation

```bibtex
@article{lee2024optical,
  title={Optical Context Compression Is Just (Bad) Autoencoding},
  author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor},
  journal={arXiv preprint arXiv:2512.03643},
  year={2024}
}
```