ivnle commited on
Commit
1d404e5
·
verified ·
1 Parent(s): b66dc55

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +15 -26
README.md CHANGED
@@ -20,39 +20,28 @@ Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick
20
 
21
  ## Available Checkpoints
22
 
23
- ### Vision (base, 768x768)
24
 
25
- | Checkpoint | Objective | Hybrid | Training | CR | PPL |
26
- |------------|-----------|--------|----------|-----|-----|
27
- | `vision_base_h0_recon` | Reconstruction | 0 | - | 3.60 | 1.03 |
28
- | `vision_base_h0_lm` | LM | 0 | Direct | 3.60 | 5.08 |
29
- | `vision_base_h0_lm_recon-init` | LM | 0 | From recon | 3.60 | 5.06 |
30
 
31
- ### Meanpool (w4s4)
 
 
 
32
 
33
- | Checkpoint | Objective | Hybrid | Training | CR | PPL |
34
- |------------|-----------|--------|----------|-----|-----|
35
- | `meanpool_w4s4_h0_recon` | Reconstruction | 0 | - | 3.97 | 1.04 |
36
- | `meanpool_w4s4_h0_lm_recon-init` | LM | 0 | From recon | 3.97 | 5.02 |
37
 
38
- ## Naming Convention
39
-
40
- ```
41
- {regime}_{config}_h{N}_{objective}[_recon-init]
42
- ```
43
-
44
- | Field | Values | Description |
45
- |-------|--------|-------------|
46
- | regime | vision, conv1d, meanpool, text | Compression architecture |
47
- | config | base/small/tiny/large, t500/t250, w4s4/w10s10, ctx525 | Regime-specific config |
48
- | h{N} | h0, h100 | Hybrid text tokens (0 = pure vision/compression) |
49
- | objective | recon, lm | Training objective |
50
- | recon-init | (optional) | LM initialized from reconstruction checkpoint |
51
 
52
  ## Model Details
53
 
54
- - **Architecture**: DeepSeek-OCR with trainable vision encoder
55
- - **Encoder Status**: Trained (not frozen)
 
56
  - **Dataset**: 510k samples from FineWiki
57
 
58
  ## Usage
 
20
 
21
  ## Available Checkpoints
22
 
23
+ Naming convention: `{regime}_{config}_h{N}_{objective}[_recon-init]`
24
 
25
+ ### Reconstruction
 
 
 
 
26
 
27
+ | Checkpoint | Regime | CR | PPL |
28
+ |------------|--------|-----|-----|
29
+ | `vision_base_h0_recon` | Vision base | 3.60 | 1.03 |
30
+ | `meanpool_w4s4_h0_recon` | Meanpool w4s4 | 3.97 | 1.04 |
31
 
32
+ ### Language Modeling
 
 
 
33
 
34
+ | Checkpoint | Regime | CR | Init | PPL |
35
+ |------------|--------|-----|------|-----|
36
+ | `vision_base_h0_lm` | Vision base | 3.60 | Direct | 5.08 |
37
+ | `vision_base_h0_lm_recon-init` | Vision base | 3.60 | From recon | 5.06 |
38
+ | `meanpool_w4s4_h0_lm_recon-init` | Meanpool w4s4 | 3.97 | From recon | 5.02 |
 
 
 
 
 
 
 
 
39
 
40
  ## Model Details
41
 
42
+ - **Architecture**: DeepSeek-OCR with vision encoder
43
+ - **Vision checkpoints**: Trained encoder, 768x768 (base)
44
+ - **Meanpool checkpoints**: Frozen encoder, window=4, stride=4
45
  - **Dataset**: 510k samples from FineWiki
46
 
47
  ## Usage