ivnle
/

bad-autoencoding

Model card Files Files and versions

bad-autoencoding / README.md

ivnle's picture

Upload README.md with huggingface_hub

4ccfaf4 verified 17 days ago

|

1.93 kB

	---
	license: apache-2.0
	tags:
	- vision
	- ocr
	- compression
	- autoencoding
	---

	# Bad Autoencoding - Model Checkpoints

	Checkpoints for the paper: "Optical Context Compression Is Just (Bad) Autoencoding"

	Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick

	## Links

	- Paper: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643)
	- Code: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding)

	## Available Checkpoints

	\| Checkpoint \| Objective \| Hybrid \| Training \| PPL \|
	\|------------\|-----------\|--------\|----------\|-----\|
	\| `vision_base_h0_recon` \| Reconstruction \| 0 \| - \| 1.03 \|
	\| `vision_base_h0_lm` \| LM \| 0 \| Direct \| 5.08 \|
	\| `vision_base_h0_lm_recon-init` \| LM \| 0 \| From reconstruction \| 5.06 \|

	## Naming Convention

	```
	{regime}_{config}_h{N}_{objective}[_recon-init]
	```

	\| Field \| Values \| Description \|
	\|-------\|--------\|-------------\|
	\| regime \| vision, conv1d, meanpool, text \| Compression architecture \|
	\| config \| base/small/tiny/large, t500/t250, w10s10, ctx525 \| Regime-specific config \|
	\| h{N} \| h0, h100 \| Hybrid text tokens (0 = pure vision) \|
	\| objective \| recon, lm \| Training objective \|
	\| recon-init \| (optional) \| LM initialized from reconstruction checkpoint \|

	## Model Details

	- Architecture: DeepSeek-OCR with trainable vision encoder
	- Image Size: 768x768 (base)
	- Encoder Status: Trained (not frozen)
	- Dataset: 510k samples from FineWiki

	## Usage

	```python
	from huggingface_hub import hf_hub_download

	# Download a specific checkpoint
	checkpoint_path = hf_hub_download(
	repo_id="ivnle/bad-autoencoding",
	filename="vision_base_h0_lm/model.pt",
	repo_type="model"
	)
	```

	## Citation

	```bibtex
	@article{lee2024optical,
	title={Optical Context Compression Is Just (Bad) Autoencoding},
	author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor},
	journal={arXiv preprint arXiv:2512.03643},
	year={2024}
	}
	```