bad-autoencoding / README.md
ivnle's picture
Upload README.md with huggingface_hub
f4927e9 verified
|
raw
history blame
2.04 kB
metadata
license: apache-2.0
tags:
  - vision
  - ocr
  - compression
  - autoencoding

Bad Autoencoding - Model Checkpoints

Checkpoints for the paper: "Optical Context Compression Is Just (Bad) Autoencoding"

Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick

Links

Available Checkpoints

Checkpoint Objective Hybrid Tokens Training PPL
vision_base_hybrid0_reconstruction Reconstruction 0 Direct 1.03
vision_base_hybrid0_lm_direct Language Modeling 0 Direct (no recon init) 5.08
vision_base_hybrid0_lm_recon_init Language Modeling 0 Initialized from reconstruction 5.06

Naming Convention

{regime}_{size}_{hybrid}_[reconstruction|lm]_[direct|recon_init]

  • regime: vision, conv1d_residual, meanpool, text
  • size: tiny, small, base, large (for vision); compression target (for others)
  • hybrid: hybrid0 (pure vision/compression) or hybrid100 (100 text tokens + vision)
  • objective: reconstruction or lm (language modeling)
  • training: direct (trained from scratch) or recon_init (initialized from reconstruction checkpoint)

Model Details

  • Architecture: DeepSeek-OCR with trainable vision encoder
  • Image Size: 768x768 (base)
  • Encoder Status: Trained (not frozen)
  • Dataset: 510k samples from FineWiki

Usage

from huggingface_hub import hf_hub_download

# Download a specific checkpoint
checkpoint_path = hf_hub_download(
    repo_id="ivnle/bad-autoencoding",
    filename="vision_base_hybrid0_lm_direct/model.pt",
    repo_type="model"
)

Citation

@article{lee2024optical,
  title={Optical Context Compression Is Just (Bad) Autoencoding},
  author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor},
  journal={arXiv preprint arXiv:2512.03643},
  year={2024}
}