--- license: apache-2.0 tags: - vision - ocr - compression - autoencoding --- # Bad Autoencoding - Model Checkpoints Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"** Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick ## Links - **Paper**: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643) - **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding) ## Available Checkpoints | Checkpoint | Objective | Hybrid Tokens | Training | PPL | |------------|-----------|---------------|----------|-----| | `vision_base_hybrid0_reconstruction` | Reconstruction | 0 | Direct | 1.03 | | `vision_base_hybrid0_lm_direct` | Language Modeling | 0 | Direct (no recon init) | 5.08 | | `vision_base_hybrid0_lm_recon_init` | Language Modeling | 0 | Initialized from reconstruction | 5.06 | ### Naming Convention `{regime}_{size}_{hybrid}_[reconstruction|lm]_[direct|recon_init]` - **regime**: vision, conv1d_residual, meanpool, text - **size**: tiny, small, base, large (for vision); compression target (for others) - **hybrid**: hybrid0 (pure vision/compression) or hybrid100 (100 text tokens + vision) - **objective**: reconstruction or lm (language modeling) - **training**: direct (trained from scratch) or recon_init (initialized from reconstruction checkpoint) ## Model Details - **Architecture**: DeepSeek-OCR with trainable vision encoder - **Image Size**: 768x768 (base) - **Encoder Status**: Trained (not frozen) - **Dataset**: 510k samples from FineWiki ## Usage ```python from huggingface_hub import hf_hub_download # Download a specific checkpoint checkpoint_path = hf_hub_download( repo_id="ivnle/bad-autoencoding", filename="vision_base_hybrid0_lm_direct/model.pt", repo_type="model" ) ``` ## Citation ```bibtex @article{lee2024optical, title={Optical Context Compression Is Just (Bad) Autoencoding}, author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor}, journal={arXiv preprint arXiv:2512.03643}, year={2024} } ```