--- license: apache-2.0 tags: - vision - ocr - compression - autoencoding --- # Bad Autoencoding - Model Checkpoints Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"** Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick ## Links - **Paper**: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643) - **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding) ## Available Checkpoints | Checkpoint | Objective | Hybrid | Training | PPL | |------------|-----------|--------|----------|-----| | `vision_base_h0_recon` | Reconstruction | 0 | - | 1.03 | | `vision_base_h0_lm` | LM | 0 | Direct | 5.08 | | `vision_base_h0_lm_recon-init` | LM | 0 | From reconstruction | 5.06 | ## Naming Convention ``` {regime}_{config}_h{N}_{objective}[_recon-init] ``` | Field | Values | Description | |-------|--------|-------------| | regime | vision, conv1d, meanpool, text | Compression architecture | | config | base/small/tiny/large, t500/t250, w10s10, ctx525 | Regime-specific config | | h{N} | h0, h100 | Hybrid text tokens (0 = pure vision) | | objective | recon, lm | Training objective | | recon-init | (optional) | LM initialized from reconstruction checkpoint | ## Model Details - **Architecture**: DeepSeek-OCR with trainable vision encoder - **Image Size**: 768x768 (base) - **Encoder Status**: Trained (not frozen) - **Dataset**: 510k samples from FineWiki ## Usage ```python from huggingface_hub import hf_hub_download # Download a specific checkpoint checkpoint_path = hf_hub_download( repo_id="ivnle/bad-autoencoding", filename="vision_base_h0_lm/model.pt", repo_type="model" ) ``` ## Citation ```bibtex @article{lee2024optical, title={Optical Context Compression Is Just (Bad) Autoencoding}, author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor}, journal={arXiv preprint arXiv:2512.03643}, year={2024} } ```