--- license: apache-2.0 tags: - vision - ocr - compression - autoencoding --- # Bad Autoencoding - Model Checkpoints Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"** Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick ## Links - **Paper**: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643) - **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding) ## Available Checkpoints Naming convention: `{regime}_{config}_h{N}_{objective}[_recon-init]` ### Reconstruction | Checkpoint | Regime | CR | PPL | |------------|--------|-----|-----| | `vision_base_h0_recon` | Vision base | 3.60 | 1.03 | | `meanpool_w4s4_h0_recon` | Meanpool w4s4 | 3.97 | 1.04 | | `conv1d_t250_h0_recon` | Conv1D t250 | 3.97 | 1.00 | | `vision_tiny_h0_recon` | Vision tiny | 12.82 | 1.14 | | `conv1d_t63_h0_recon` | Conv1D t63 | 15.38 | 1.01 | ### Language Modeling | Checkpoint | Regime | CR | Init | PPL | |------------|--------|-----|------|-----| | `vision_base_h0_lm` | Vision base | 3.60 | Direct | 5.08 | | `vision_base_h0_lm_recon-init` | Vision base | 3.60 | From recon | 5.06 | | `text_ctx277_h0_lm` | Text ctx277 (Truncation) | 3.60 | Direct | 5.02 | | `meanpool_w4s4_h0_lm_recon-init` | Meanpool w4s4 | 3.97 | From recon | 5.02 | | `conv1d_t250_h0_lm_recon-init` | Conv1D t250 | 3.97 | From recon | 4.96 | ## Model Details - **Architecture**: DeepSeek-OCR with vision encoder - **Vision checkpoints**: Trained encoder (base=768x768, tiny=384x384) - **Text checkpoints**: Truncation baseline (no vision encoder), context=277 tokens - **Meanpool checkpoints**: Frozen encoder, window=4, stride=4 - **Conv1D checkpoints**: Trained hierarchical encoder (t250=CR 3.97, t63=CR 15.38) - **Dataset**: 510k samples from FineWiki ## Usage ```python from huggingface_hub import hf_hub_download # Download a specific checkpoint checkpoint_path = hf_hub_download( repo_id="ivnle/bad-autoencoding", filename="vision_base_h0_lm/model.pt", repo_type="model" ) ``` ## Citation ```bibtex @article{lee2024optical, title={Optical Context Compression Is Just (Bad) Autoencoding}, author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor}, journal={arXiv preprint arXiv:2512.03643}, year={2024} } ```