ivnle
/

bad-autoencoding

+---
+license: apache-2.0
+tags:
+  - vision
+  - ocr
+  - compression
+  - autoencoding
+---
+# Bad Autoencoding - Model Checkpoints
+Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"** (ICML 2025)
+## Links
+- **Paper**: [Coming soon]
+- **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding)
+## Available Checkpoints
+| Checkpoint | Objective | Training | PPL |
+|------------|-----------|----------|-----|
+| `vision_base_reconstruction` | Reconstruction | Direct | 1.03 |
+| `vision_base_lm_direct` | Language Modeling | Direct (no recon init) | 5.08 |
+| `vision_base_lm_recon_init` | Language Modeling | Initialized from reconstruction | 5.06 |
+## Model Details
+- **Architecture**: DeepSeek-OCR with trainable vision encoder
+- **Image Size**: 768x768 (base)
+- **Encoder Status**: Trained (not frozen)
+- **Dataset**: 510k samples
+## Usage
+```python
+from huggingface_hub import hf_hub_download
+# Download a specific checkpoint
+checkpoint_path = hf_hub_download(
+    repo_id="ivnle/bad-autoencoding",
+    filename="vision_base_lm_direct/model.pt",
+    repo_type="model"
+)
+```
+## Citation
+```bibtex
+@inproceedings{bad-autoencoding2025,
+  title={Optical Context Compression Is Just (Bad) Autoencoding},
+  author={...},
+  booktitle={ICML},
+  year={2025}
+}
+```