diff --git "a/vision_tiny_h0_recon/train.log" "b/vision_tiny_h0_recon/train.log" new file mode 100644--- /dev/null +++ "b/vision_tiny_h0_recon/train.log" @@ -0,0 +1,2622 @@ +2025-11-18 21:47:22,384 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_tiny_reconstruction_20251118_214704', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='tiny', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251118_214704', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=500, initial_validation=True, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint=None, init_from_checkpoint=None, aux_loss_weight=0.5, num_workers=16, prefetch_factor=2, seed=42, eval_seed=42, device='cuda', compile=True, use_optimized_model=False, use_encoder_checkpointing=False) +2025-11-18 21:47:22,384 - INFO - Using preset vision prompt: 'free_ocr' → ''\nFree OCR.'' +2025-11-18 21:47:22,384 - INFO - Setting random seed: 42 +2025-11-18 21:47:22,853 - INFO - Auto-generated W&B run name: production_vision_tiny_reconstruction_20251118_214704 +2025-11-18 21:47:24,009 - INFO - Initialized W&B run: vision-compression-2/production_vision_tiny_reconstruction_20251118_214704 (ID: tto6r4hl) +2025-11-18 21:47:24,009 - INFO - Loading model and tokenizer... +2025-11-18 21:47:35,415 - INFO - Compiling model with torch.compile... +2025-11-18 21:47:35,416 - INFO - Note: First forward pass will compile (may take several minutes) +2025-11-18 21:47:36,452 - INFO - Created Vision Compression trainer (mode: tiny) +2025-11-18 21:47:36,453 - INFO - Training objective: reconstruction +2025-11-18 21:47:36,520 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640 +2025-11-18 21:47:36,521 - INFO - Loading training data from data/training/splits_510k/train.jsonl +2025-11-18 21:51:44,942 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl +2025-11-18 21:51:44,942 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-18 21:51:44,969 - INFO - Loading validation data from data/training/splits_510k/val.jsonl +2025-11-18 21:51:47,803 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl +2025-11-18 21:51:47,803 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-18 21:51:47,833 - INFO - Created AdamW optimizer with differential LR: + Encoder: 474 param tensors @ lr=1e-05 + Decoder: 2236 param tensors @ lr=0.0001 + Fused kernels: True +2025-11-18 21:51:47,834 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-18 21:51:47,834 - INFO - Starting training loop... +2025-11-18 21:51:47,834 - INFO - +====================================================================== +2025-11-18 21:51:47,834 - INFO - Running initial validation (before any training)... +2025-11-18 21:51:47,834 - INFO - ====================================================================== +2025-11-18 22:03:21,537 - INFO - Validation loss: 0.5682, perplexity: 1.77 +2025-11-18 22:03:21,539 - INFO - Qualitative metrics (n=5): +2025-11-18 22:03:21,539 - INFO - BLEU: 0.5692 +2025-11-18 22:03:21,540 - INFO - METEOR: 0.7497 +2025-11-18 22:03:21,541 - INFO - Edit Distance: 0.2998 +2025-11-18 22:03:21,541 - INFO - F-measure: 0.7539 +2025-11-18 22:03:21,542 - INFO - +====================================================================== +2025-11-18 22:03:21,542 - INFO - Qualitative Evaluation Samples: +2025-11-18 22:03:21,543 - INFO - ====================================================================== +2025-11-18 22:03:21,544 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-18 22:03:21,544 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-18 22:03:21,545 - INFO - Generated: 'Q gave it four stars out of five and said that “Perhaps the [album’s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it’s a survey-bore. But it’s...' +2025-11-18 22:03:21,546 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-18 22:03:21,547 - INFO - ---------------------------------------------------------------------- +2025-11-18 22:03:21,547 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-18 22:03:21,548 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-18 22:03:21,549 - INFO - Generated: 'was Sirre Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROTC; ...' +2025-11-18 22:03:21,550 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-18 22:03:21,551 - INFO - ---------------------------------------------------------------------- +2025-11-18 22:03:21,552 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-18 22:03:21,552 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-18 22:03:21,554 - INFO - Generated: 'at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Betl steps the ax and bo...' +2025-11-18 22:03:21,554 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-18 22:03:21,555 - INFO - ---------------------------------------------------------------------- +2025-11-18 22:03:21,556 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-18 22:03:21,556 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-18 22:03:21,557 - INFO - Generated: '# Oriya (Unicode block) Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-18 22:03:21,558 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-18 22:03:21,558 - INFO - ---------------------------------------------------------------------- +2025-11-18 22:03:21,559 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-18 22:03:21,559 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-18 22:03:21,560 - INFO - Generated: '| Name | Age | Description |\n|---------------------------|-----|-------------------------------------------------...' +2025-11-18 22:03:21,560 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-18 22:03:21,561 - INFO - ---------------------------------------------------------------------- +2025-11-18 22:03:21,562 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_0.jsonl +2025-11-18 22:03:22,724 - INFO - Initial validation - Loss: 0.5682, Perplexity: 1.77 +2025-11-18 22:03:22,724 - INFO - ====================================================================== + +2025-11-18 22:03:22,724 - INFO - +====================================================================== +2025-11-18 22:03:22,725 - INFO - Epoch 1/1 +2025-11-18 22:03:22,725 - INFO - ====================================================================== +2025-11-18 22:04:40,423 - INFO - Effective context tokens (per-sample): 78 | Compression ratio: 12.82x +2025-11-18 22:04:40,423 - INFO - Target tokens per sample: 1000 +2025-11-18 22:07:56,367 - INFO - Epoch 1 Step 10 (Global: 10): loss=0.4593, ppl=1.58, grad_norm=5.66, lr=1.09e-06, throughput=1754 tok/s +2025-11-18 22:10:59,044 - INFO - Epoch 1 Step 20 (Global: 20): loss=0.4012, ppl=1.49, grad_norm=7.34, lr=1.17e-06, throughput=2628 tok/s +2025-11-18 22:13:57,114 - INFO - Epoch 1 Step 30 (Global: 30): loss=0.4750, ppl=1.61, grad_norm=6.84, lr=1.26e-06, throughput=2696 tok/s +2025-11-18 22:16:53,778 - INFO - Epoch 1 Step 40 (Global: 40): loss=0.4553, ppl=1.58, grad_norm=7.16, lr=1.35e-06, throughput=2717 tok/s +2025-11-18 22:19:45,903 - INFO - Epoch 1 Step 50 (Global: 50): loss=0.4331, ppl=1.54, grad_norm=10.62, lr=1.43e-06, throughput=2789 tok/s +2025-11-18 22:22:57,386 - INFO - Epoch 1 Step 60 (Global: 60): loss=0.4176, ppl=1.52, grad_norm=10.44, lr=1.52e-06, throughput=2507 tok/s +2025-11-18 22:25:59,106 - INFO - Epoch 1 Step 70 (Global: 70): loss=0.3816, ppl=1.46, grad_norm=9.38, lr=1.61e-06, throughput=2641 tok/s +2025-11-18 22:28:48,803 - INFO - Epoch 1 Step 80 (Global: 80): loss=0.4099, ppl=1.51, grad_norm=9.44, lr=1.69e-06, throughput=2829 tok/s +2025-11-18 22:31:40,582 - INFO - Epoch 1 Step 90 (Global: 90): loss=0.4108, ppl=1.51, grad_norm=8.81, lr=1.78e-06, throughput=2794 tok/s +2025-11-18 22:34:32,069 - INFO - Epoch 1 Step 100 (Global: 100): loss=0.3907, ppl=1.48, grad_norm=6.91, lr=1.86e-06, throughput=2799 tok/s +2025-11-18 22:37:53,837 - INFO - Epoch 1 Step 110 (Global: 110): loss=0.3336, ppl=1.40, grad_norm=6.81, lr=1.95e-06, throughput=2379 tok/s +2025-11-18 22:41:16,930 - INFO - Epoch 1 Step 120 (Global: 120): loss=0.4168, ppl=1.52, grad_norm=7.72, lr=2.04e-06, throughput=2364 tok/s +2025-11-18 22:44:08,416 - INFO - Epoch 1 Step 130 (Global: 130): loss=0.4065, ppl=1.50, grad_norm=6.62, lr=2.12e-06, throughput=2799 tok/s +2025-11-18 22:47:17,564 - INFO - Epoch 1 Step 140 (Global: 140): loss=0.4069, ppl=1.50, grad_norm=5.44, lr=2.21e-06, throughput=2538 tok/s +2025-11-18 22:50:40,684 - INFO - Epoch 1 Step 150 (Global: 150): loss=0.4204, ppl=1.52, grad_norm=6.88, lr=2.30e-06, throughput=2363 tok/s +2025-11-18 22:53:36,594 - INFO - Epoch 1 Step 160 (Global: 160): loss=0.3670, ppl=1.44, grad_norm=6.34, lr=2.38e-06, throughput=2729 tok/s +2025-11-18 22:56:35,070 - INFO - Epoch 1 Step 170 (Global: 170): loss=0.3369, ppl=1.40, grad_norm=7.41, lr=2.47e-06, throughput=2689 tok/s +2025-11-18 22:59:31,336 - INFO - Epoch 1 Step 180 (Global: 180): loss=0.3959, ppl=1.49, grad_norm=7.91, lr=2.56e-06, throughput=2723 tok/s +2025-11-18 23:02:29,381 - INFO - Epoch 1 Step 190 (Global: 190): loss=0.4015, ppl=1.49, grad_norm=7.28, lr=2.64e-06, throughput=2696 tok/s +2025-11-18 23:05:24,293 - INFO - Epoch 1 Step 200 (Global: 200): loss=0.4286, ppl=1.54, grad_norm=7.12, lr=2.73e-06, throughput=2744 tok/s +2025-11-18 23:08:19,744 - INFO - Epoch 1 Step 210 (Global: 210): loss=0.3792, ppl=1.46, grad_norm=9.56, lr=2.82e-06, throughput=2736 tok/s +2025-11-18 23:11:19,778 - INFO - Epoch 1 Step 220 (Global: 220): loss=0.3509, ppl=1.42, grad_norm=5.56, lr=2.90e-06, throughput=2666 tok/s +2025-11-18 23:14:14,273 - INFO - Epoch 1 Step 230 (Global: 230): loss=0.3500, ppl=1.42, grad_norm=7.72, lr=2.99e-06, throughput=2751 tok/s +2025-11-18 23:17:03,981 - INFO - Epoch 1 Step 240 (Global: 240): loss=0.3363, ppl=1.40, grad_norm=5.69, lr=3.07e-06, throughput=2828 tok/s +2025-11-18 23:19:54,105 - INFO - Epoch 1 Step 250 (Global: 250): loss=0.3040, ppl=1.36, grad_norm=7.28, lr=3.16e-06, throughput=2821 tok/s +2025-11-18 23:22:42,877 - INFO - Epoch 1 Step 260 (Global: 260): loss=0.3367, ppl=1.40, grad_norm=6.00, lr=3.25e-06, throughput=2844 tok/s +2025-11-18 23:25:32,457 - INFO - Epoch 1 Step 270 (Global: 270): loss=0.3157, ppl=1.37, grad_norm=5.47, lr=3.33e-06, throughput=2831 tok/s +2025-11-18 23:28:22,818 - INFO - Epoch 1 Step 280 (Global: 280): loss=0.3380, ppl=1.40, grad_norm=6.47, lr=3.42e-06, throughput=2818 tok/s +2025-11-18 23:31:12,884 - INFO - Epoch 1 Step 290 (Global: 290): loss=0.3536, ppl=1.42, grad_norm=6.47, lr=3.51e-06, throughput=2822 tok/s +2025-11-18 23:34:09,621 - INFO - Epoch 1 Step 300 (Global: 300): loss=0.3451, ppl=1.41, grad_norm=8.56, lr=3.59e-06, throughput=2716 tok/s +2025-11-18 23:37:15,267 - INFO - Epoch 1 Step 310 (Global: 310): loss=0.3132, ppl=1.37, grad_norm=7.41, lr=3.68e-06, throughput=2586 tok/s +2025-11-18 23:40:09,041 - INFO - Epoch 1 Step 320 (Global: 320): loss=0.3167, ppl=1.37, grad_norm=5.69, lr=3.77e-06, throughput=2762 tok/s +2025-11-18 23:42:57,981 - INFO - Epoch 1 Step 330 (Global: 330): loss=0.3268, ppl=1.39, grad_norm=6.72, lr=3.85e-06, throughput=2841 tok/s +2025-11-18 23:45:45,951 - INFO - Epoch 1 Step 340 (Global: 340): loss=0.3578, ppl=1.43, grad_norm=9.19, lr=3.94e-06, throughput=2858 tok/s +2025-11-18 23:48:34,044 - INFO - Epoch 1 Step 350 (Global: 350): loss=0.3080, ppl=1.36, grad_norm=6.72, lr=4.03e-06, throughput=2856 tok/s +2025-11-18 23:51:21,028 - INFO - Epoch 1 Step 360 (Global: 360): loss=0.3180, ppl=1.37, grad_norm=5.94, lr=4.11e-06, throughput=2875 tok/s +2025-11-18 23:54:09,399 - INFO - Epoch 1 Step 370 (Global: 370): loss=0.3211, ppl=1.38, grad_norm=6.81, lr=4.20e-06, throughput=2851 tok/s +2025-11-18 23:56:57,007 - INFO - Epoch 1 Step 380 (Global: 380): loss=0.3210, ppl=1.38, grad_norm=6.09, lr=4.29e-06, throughput=2864 tok/s +2025-11-18 23:59:44,844 - INFO - Epoch 1 Step 390 (Global: 390): loss=0.2967, ppl=1.35, grad_norm=6.81, lr=4.37e-06, throughput=2860 tok/s +2025-11-19 00:02:33,507 - INFO - Epoch 1 Step 400 (Global: 400): loss=0.2875, ppl=1.33, grad_norm=8.62, lr=4.46e-06, throughput=2846 tok/s +2025-11-19 00:05:22,959 - INFO - Epoch 1 Step 410 (Global: 410): loss=0.3211, ppl=1.38, grad_norm=5.31, lr=4.54e-06, throughput=2833 tok/s +2025-11-19 00:08:11,069 - INFO - Epoch 1 Step 420 (Global: 420): loss=0.3299, ppl=1.39, grad_norm=4.94, lr=4.63e-06, throughput=2855 tok/s +2025-11-19 00:11:01,897 - INFO - Epoch 1 Step 430 (Global: 430): loss=0.3459, ppl=1.41, grad_norm=8.31, lr=4.72e-06, throughput=2810 tok/s +2025-11-19 00:13:49,639 - INFO - Epoch 1 Step 440 (Global: 440): loss=0.3007, ppl=1.35, grad_norm=8.44, lr=4.80e-06, throughput=2862 tok/s +2025-11-19 00:16:36,703 - INFO - Epoch 1 Step 450 (Global: 450): loss=0.2797, ppl=1.32, grad_norm=6.00, lr=4.89e-06, throughput=2873 tok/s +2025-11-19 00:19:33,822 - INFO - Epoch 1 Step 460 (Global: 460): loss=0.2799, ppl=1.32, grad_norm=7.09, lr=4.98e-06, throughput=2710 tok/s +2025-11-19 00:22:21,436 - INFO - Epoch 1 Step 470 (Global: 470): loss=0.3711, ppl=1.45, grad_norm=9.38, lr=5.06e-06, throughput=2864 tok/s +2025-11-19 00:25:10,111 - INFO - Epoch 1 Step 480 (Global: 480): loss=0.3132, ppl=1.37, grad_norm=6.19, lr=5.15e-06, throughput=2846 tok/s +2025-11-19 00:27:59,148 - INFO - Epoch 1 Step 490 (Global: 490): loss=0.2771, ppl=1.32, grad_norm=6.56, lr=5.24e-06, throughput=2840 tok/s +2025-11-19 00:30:49,601 - INFO - Epoch 1 Step 500 (Global: 500): loss=0.3272, ppl=1.39, grad_norm=5.72, lr=5.32e-06, throughput=2816 tok/s +2025-11-19 00:30:49,601 - INFO - +Running validation at step 500... +2025-11-19 00:40:37,494 - INFO - Validation loss: 0.3008, perplexity: 1.35 +2025-11-19 00:40:37,495 - INFO - Qualitative metrics (n=5): +2025-11-19 00:40:37,495 - INFO - BLEU: 0.7318 +2025-11-19 00:40:37,495 - INFO - METEOR: 0.8309 +2025-11-19 00:40:37,495 - INFO - Edit Distance: 0.2378 +2025-11-19 00:40:37,495 - INFO - F-measure: 0.8412 +2025-11-19 00:40:37,495 - INFO - +====================================================================== +2025-11-19 00:40:37,495 - INFO - Qualitative Evaluation Samples: +2025-11-19 00:40:37,496 - INFO - ====================================================================== +2025-11-19 00:40:37,496 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 00:40:37,496 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 00:40:37,496 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps the album\'s seemingly illogical sequencing songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s not t...' +2025-11-19 00:40:37,496 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 00:40:37,496 - INFO - ---------------------------------------------------------------------- +2025-11-19 00:40:37,496 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 00:40:37,496 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 00:40:37,496 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the women president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 00:40:37,497 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 00:40:37,497 - INFO - ---------------------------------------------------------------------- +2025-11-19 00:40:37,497 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 00:40:37,497 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 00:40:37,497 - INFO - Generated: ' at the meeting Laymaia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beet stops the ax and ...' +2025-11-19 00:40:37,497 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 00:40:37,497 - INFO - ---------------------------------------------------------------------- +2025-11-19 00:40:37,497 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 00:40:37,497 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 00:40:37,497 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 00:40:37,497 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 00:40:37,498 - INFO - ---------------------------------------------------------------------- +2025-11-19 00:40:37,498 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 00:40:37,498 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 00:40:37,498 - INFO - Generated: ' | | The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian Abyss and Labyrinth of Worlds | June 2, 2011 | DOS | Blue Sky Productions / L...' +2025-11-19 00:40:37,498 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 00:40:37,498 - INFO - ---------------------------------------------------------------------- +2025-11-19 00:40:37,499 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_500.jsonl +2025-11-19 00:41:14,831 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 00:41:14,847 - INFO - New best validation loss: 0.3008, perplexity: 1.35 +2025-11-19 00:44:02,723 - INFO - Epoch 1 Step 510 (Global: 510): loss=0.3269, ppl=1.39, grad_norm=7.00, lr=5.41e-06, throughput=2859 tok/s +2025-11-19 00:46:52,844 - INFO - Epoch 1 Step 520 (Global: 520): loss=0.2981, ppl=1.35, grad_norm=5.41, lr=5.50e-06, throughput=2822 tok/s +2025-11-19 00:49:41,803 - INFO - Epoch 1 Step 530 (Global: 530): loss=0.3001, ppl=1.35, grad_norm=5.69, lr=5.58e-06, throughput=2841 tok/s +2025-11-19 00:52:34,153 - INFO - Epoch 1 Step 540 (Global: 540): loss=0.2907, ppl=1.34, grad_norm=5.09, lr=5.67e-06, throughput=2785 tok/s +2025-11-19 00:55:24,471 - INFO - Epoch 1 Step 550 (Global: 550): loss=0.3149, ppl=1.37, grad_norm=6.56, lr=5.76e-06, throughput=2818 tok/s +2025-11-19 00:58:15,308 - INFO - Epoch 1 Step 560 (Global: 560): loss=0.2529, ppl=1.29, grad_norm=5.12, lr=5.84e-06, throughput=2810 tok/s +2025-11-19 01:01:05,328 - INFO - Epoch 1 Step 570 (Global: 570): loss=0.3121, ppl=1.37, grad_norm=8.38, lr=5.93e-06, throughput=2823 tok/s +2025-11-19 01:03:56,423 - INFO - Epoch 1 Step 580 (Global: 580): loss=0.2624, ppl=1.30, grad_norm=5.38, lr=6.01e-06, throughput=2806 tok/s +2025-11-19 01:06:46,726 - INFO - Epoch 1 Step 590 (Global: 590): loss=0.2855, ppl=1.33, grad_norm=6.00, lr=6.10e-06, throughput=2819 tok/s +2025-11-19 01:09:47,056 - INFO - Epoch 1 Step 600 (Global: 600): loss=0.2759, ppl=1.32, grad_norm=7.03, lr=6.19e-06, throughput=2662 tok/s +2025-11-19 01:12:36,697 - INFO - Epoch 1 Step 610 (Global: 610): loss=0.2765, ppl=1.32, grad_norm=8.25, lr=6.27e-06, throughput=2830 tok/s +2025-11-19 01:15:25,403 - INFO - Epoch 1 Step 620 (Global: 620): loss=0.2990, ppl=1.35, grad_norm=7.06, lr=6.36e-06, throughput=2845 tok/s +2025-11-19 01:18:12,911 - INFO - Epoch 1 Step 630 (Global: 630): loss=0.3108, ppl=1.36, grad_norm=5.88, lr=6.45e-06, throughput=2866 tok/s +2025-11-19 01:21:01,242 - INFO - Epoch 1 Step 640 (Global: 640): loss=0.3032, ppl=1.35, grad_norm=4.88, lr=6.53e-06, throughput=2852 tok/s +2025-11-19 01:23:51,677 - INFO - Epoch 1 Step 650 (Global: 650): loss=0.2966, ppl=1.35, grad_norm=7.69, lr=6.62e-06, throughput=2816 tok/s +2025-11-19 01:26:42,894 - INFO - Epoch 1 Step 660 (Global: 660): loss=0.2566, ppl=1.29, grad_norm=4.34, lr=6.71e-06, throughput=2803 tok/s +2025-11-19 01:29:30,718 - INFO - Epoch 1 Step 670 (Global: 670): loss=0.2913, ppl=1.34, grad_norm=7.53, lr=6.79e-06, throughput=2860 tok/s +2025-11-19 01:32:27,958 - INFO - Epoch 1 Step 680 (Global: 680): loss=0.3132, ppl=1.37, grad_norm=4.94, lr=6.88e-06, throughput=2708 tok/s +2025-11-19 01:35:17,373 - INFO - Epoch 1 Step 690 (Global: 690): loss=0.2729, ppl=1.31, grad_norm=4.88, lr=6.97e-06, throughput=2833 tok/s +2025-11-19 01:38:05,406 - INFO - Epoch 1 Step 700 (Global: 700): loss=0.3139, ppl=1.37, grad_norm=7.59, lr=7.05e-06, throughput=2857 tok/s +2025-11-19 01:40:53,593 - INFO - Epoch 1 Step 710 (Global: 710): loss=0.2841, ppl=1.33, grad_norm=5.94, lr=7.14e-06, throughput=2854 tok/s +2025-11-19 01:43:41,710 - INFO - Epoch 1 Step 720 (Global: 720): loss=0.3153, ppl=1.37, grad_norm=6.12, lr=7.22e-06, throughput=2855 tok/s +2025-11-19 01:46:29,933 - INFO - Epoch 1 Step 730 (Global: 730): loss=0.3197, ppl=1.38, grad_norm=5.56, lr=7.31e-06, throughput=2853 tok/s +2025-11-19 01:49:18,837 - INFO - Epoch 1 Step 740 (Global: 740): loss=0.2919, ppl=1.34, grad_norm=5.91, lr=7.40e-06, throughput=2842 tok/s +2025-11-19 01:52:06,942 - INFO - Epoch 1 Step 750 (Global: 750): loss=0.2401, ppl=1.27, grad_norm=4.78, lr=7.48e-06, throughput=2855 tok/s +2025-11-19 01:54:54,556 - INFO - Epoch 1 Step 760 (Global: 760): loss=0.3049, ppl=1.36, grad_norm=8.81, lr=7.57e-06, throughput=2864 tok/s +2025-11-19 01:57:42,164 - INFO - Epoch 1 Step 770 (Global: 770): loss=0.2898, ppl=1.34, grad_norm=6.31, lr=7.66e-06, throughput=2864 tok/s +2025-11-19 02:00:30,044 - INFO - Epoch 1 Step 780 (Global: 780): loss=0.2717, ppl=1.31, grad_norm=4.78, lr=7.74e-06, throughput=2859 tok/s +2025-11-19 02:03:29,864 - INFO - Epoch 1 Step 790 (Global: 790): loss=0.3003, ppl=1.35, grad_norm=5.53, lr=7.83e-06, throughput=2669 tok/s +2025-11-19 02:06:24,582 - INFO - Epoch 1 Step 800 (Global: 800): loss=0.2726, ppl=1.31, grad_norm=7.78, lr=7.92e-06, throughput=2747 tok/s +2025-11-19 02:09:15,713 - INFO - Epoch 1 Step 810 (Global: 810): loss=0.2793, ppl=1.32, grad_norm=5.78, lr=8.00e-06, throughput=2805 tok/s +2025-11-19 02:12:06,933 - INFO - Epoch 1 Step 820 (Global: 820): loss=0.2705, ppl=1.31, grad_norm=5.69, lr=8.09e-06, throughput=2803 tok/s +2025-11-19 02:14:55,215 - INFO - Epoch 1 Step 830 (Global: 830): loss=0.2797, ppl=1.32, grad_norm=6.00, lr=8.18e-06, throughput=2852 tok/s +2025-11-19 02:17:43,285 - INFO - Epoch 1 Step 840 (Global: 840): loss=0.4875, ppl=1.63, grad_norm=15.62, lr=8.26e-06, throughput=2856 tok/s +2025-11-19 02:20:36,536 - INFO - Epoch 1 Step 850 (Global: 850): loss=0.3820, ppl=1.47, grad_norm=6.78, lr=8.35e-06, throughput=2771 tok/s +2025-11-19 02:23:31,249 - INFO - Epoch 1 Step 860 (Global: 860): loss=0.3222, ppl=1.38, grad_norm=5.16, lr=8.44e-06, throughput=2747 tok/s +2025-11-19 02:26:23,802 - INFO - Epoch 1 Step 870 (Global: 870): loss=0.2418, ppl=1.27, grad_norm=4.62, lr=8.52e-06, throughput=2782 tok/s +2025-11-19 02:29:23,820 - INFO - Epoch 1 Step 880 (Global: 880): loss=0.2997, ppl=1.35, grad_norm=4.50, lr=8.61e-06, throughput=2666 tok/s +2025-11-19 02:32:12,479 - INFO - Epoch 1 Step 890 (Global: 890): loss=0.2914, ppl=1.34, grad_norm=7.53, lr=8.69e-06, throughput=2846 tok/s +2025-11-19 02:35:02,284 - INFO - Epoch 1 Step 900 (Global: 900): loss=0.2557, ppl=1.29, grad_norm=5.47, lr=8.78e-06, throughput=2827 tok/s +2025-11-19 02:37:52,021 - INFO - Epoch 1 Step 910 (Global: 910): loss=0.2940, ppl=1.34, grad_norm=5.41, lr=8.87e-06, throughput=2828 tok/s +2025-11-19 02:40:41,117 - INFO - Epoch 1 Step 920 (Global: 920): loss=0.2685, ppl=1.31, grad_norm=4.50, lr=8.95e-06, throughput=2839 tok/s +2025-11-19 02:43:34,847 - INFO - Epoch 1 Step 930 (Global: 930): loss=0.2778, ppl=1.32, grad_norm=4.59, lr=9.04e-06, throughput=2763 tok/s +2025-11-19 02:46:26,162 - INFO - Epoch 1 Step 940 (Global: 940): loss=0.2981, ppl=1.35, grad_norm=5.34, lr=9.13e-06, throughput=2802 tok/s +2025-11-19 02:49:17,556 - INFO - Epoch 1 Step 950 (Global: 950): loss=0.2779, ppl=1.32, grad_norm=5.00, lr=9.21e-06, throughput=2801 tok/s +2025-11-19 02:52:08,925 - INFO - Epoch 1 Step 960 (Global: 960): loss=0.2835, ppl=1.33, grad_norm=6.19, lr=9.30e-06, throughput=2801 tok/s +2025-11-19 02:55:01,256 - INFO - Epoch 1 Step 970 (Global: 970): loss=0.2984, ppl=1.35, grad_norm=8.56, lr=9.39e-06, throughput=2785 tok/s +2025-11-19 02:57:53,379 - INFO - Epoch 1 Step 980 (Global: 980): loss=0.2848, ppl=1.33, grad_norm=5.97, lr=9.47e-06, throughput=2789 tok/s +2025-11-19 03:00:44,419 - INFO - Epoch 1 Step 990 (Global: 990): loss=0.2959, ppl=1.34, grad_norm=4.56, lr=9.56e-06, throughput=2806 tok/s +2025-11-19 03:03:42,925 - INFO - Epoch 1 Step 1000 (Global: 1000): loss=0.2878, ppl=1.33, grad_norm=6.47, lr=9.65e-06, throughput=2689 tok/s +2025-11-19 03:03:42,925 - INFO - +Running validation at step 1000... +2025-11-19 03:13:37,960 - INFO - Validation loss: 0.2869, perplexity: 1.33 +2025-11-19 03:13:37,961 - INFO - Qualitative metrics (n=5): +2025-11-19 03:13:37,961 - INFO - BLEU: 0.7226 +2025-11-19 03:13:37,961 - INFO - METEOR: 0.8156 +2025-11-19 03:13:37,961 - INFO - Edit Distance: 0.2368 +2025-11-19 03:13:37,961 - INFO - F-measure: 0.8249 +2025-11-19 03:13:37,961 - INFO - +====================================================================== +2025-11-19 03:13:37,962 - INFO - Qualitative Evaluation Samples: +2025-11-19 03:13:37,962 - INFO - ====================================================================== +2025-11-19 03:13:37,962 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 03:13:37,962 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 03:13:37,962 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps the [album\'s] seemingly illegal goodness of songs makes sense if they wish to lure their audience into thinking it is yay-source. But it\'s not ...' +2025-11-19 03:13:37,962 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 03:13:37,962 - INFO - ---------------------------------------------------------------------- +2025-11-19 03:13:37,962 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 03:13:37,962 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 03:13:37,962 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Muslim Student Assembly; the leader of Army ROTC;...' +2025-11-19 03:13:37,962 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 03:13:37,963 - INFO - ---------------------------------------------------------------------- +2025-11-19 03:13:37,963 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 03:13:37,963 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 03:13:37,963 - INFO - Generated: ' at the meeting Laymia healed. His weapon of choice is a giant ray, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beet stops the ax and ...' +2025-11-19 03:13:37,963 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 03:13:37,963 - INFO - ---------------------------------------------------------------------- +2025-11-19 03:13:37,963 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 03:13:37,963 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 03:13:37,963 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 03:13:37,963 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 03:13:37,964 - INFO - ---------------------------------------------------------------------- +2025-11-19 03:13:37,964 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 03:13:37,964 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 03:13:37,964 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stylian Abyss and Labyrinth of Worlds | June 2, 2011 | DOS | Blue Sky Productions ...' +2025-11-19 03:13:37,964 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 03:13:37,964 - INFO - ---------------------------------------------------------------------- +2025-11-19 03:13:37,965 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_1000.jsonl +2025-11-19 03:14:30,165 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 03:14:30,184 - INFO - New best validation loss: 0.2869, perplexity: 1.33 +2025-11-19 03:17:24,835 - INFO - Epoch 1 Step 1010 (Global: 1010): loss=0.3248, ppl=1.38, grad_norm=6.84, lr=9.73e-06, throughput=2749 tok/s +2025-11-19 03:20:13,926 - INFO - Epoch 1 Step 1020 (Global: 1020): loss=0.3387, ppl=1.40, grad_norm=6.19, lr=9.82e-06, throughput=2839 tok/s +2025-11-19 03:23:04,099 - INFO - Epoch 1 Step 1030 (Global: 1030): loss=0.2856, ppl=1.33, grad_norm=9.94, lr=9.90e-06, throughput=2821 tok/s +2025-11-19 03:25:53,835 - INFO - Epoch 1 Step 1040 (Global: 1040): loss=0.2608, ppl=1.30, grad_norm=6.25, lr=9.99e-06, throughput=2828 tok/s +2025-11-19 03:28:43,751 - INFO - Epoch 1 Step 1050 (Global: 1050): loss=0.2562, ppl=1.29, grad_norm=6.41, lr=1.00e-05, throughput=2825 tok/s +2025-11-19 03:31:31,671 - INFO - Epoch 1 Step 1060 (Global: 1060): loss=0.2644, ppl=1.30, grad_norm=4.84, lr=1.00e-05, throughput=2859 tok/s +2025-11-19 03:34:22,074 - INFO - Epoch 1 Step 1070 (Global: 1070): loss=0.2830, ppl=1.33, grad_norm=4.12, lr=1.00e-05, throughput=2817 tok/s +2025-11-19 03:37:12,486 - INFO - Epoch 1 Step 1080 (Global: 1080): loss=0.3081, ppl=1.36, grad_norm=5.47, lr=1.00e-05, throughput=2817 tok/s +2025-11-19 03:40:10,438 - INFO - Epoch 1 Step 1090 (Global: 1090): loss=0.2853, ppl=1.33, grad_norm=7.06, lr=1.00e-05, throughput=2697 tok/s +2025-11-19 03:42:58,693 - INFO - Epoch 1 Step 1100 (Global: 1100): loss=0.3044, ppl=1.36, grad_norm=6.34, lr=1.00e-05, throughput=2853 tok/s +2025-11-19 03:45:48,482 - INFO - Epoch 1 Step 1110 (Global: 1110): loss=0.2495, ppl=1.28, grad_norm=5.62, lr=1.00e-05, throughput=2827 tok/s +2025-11-19 03:48:39,687 - INFO - Epoch 1 Step 1120 (Global: 1120): loss=0.3001, ppl=1.35, grad_norm=7.84, lr=1.00e-05, throughput=2804 tok/s +2025-11-19 03:51:30,579 - INFO - Epoch 1 Step 1130 (Global: 1130): loss=0.2778, ppl=1.32, grad_norm=5.41, lr=1.00e-05, throughput=2809 tok/s +2025-11-19 03:54:20,398 - INFO - Epoch 1 Step 1140 (Global: 1140): loss=0.3027, ppl=1.35, grad_norm=5.56, lr=1.00e-05, throughput=2827 tok/s +2025-11-19 03:57:10,181 - INFO - Epoch 1 Step 1150 (Global: 1150): loss=0.2728, ppl=1.31, grad_norm=7.25, lr=1.00e-05, throughput=2827 tok/s +2025-11-19 03:59:59,417 - INFO - Epoch 1 Step 1160 (Global: 1160): loss=0.3053, ppl=1.36, grad_norm=5.78, lr=1.00e-05, throughput=2836 tok/s +2025-11-19 04:03:00,031 - INFO - Epoch 1 Step 1170 (Global: 1170): loss=0.2539, ppl=1.29, grad_norm=6.06, lr=1.00e-05, throughput=2658 tok/s +2025-11-19 04:05:50,076 - INFO - Epoch 1 Step 1180 (Global: 1180): loss=0.2643, ppl=1.30, grad_norm=6.78, lr=9.99e-06, throughput=2823 tok/s +2025-11-19 04:08:41,287 - INFO - Epoch 1 Step 1190 (Global: 1190): loss=0.2871, ppl=1.33, grad_norm=6.38, lr=9.99e-06, throughput=2804 tok/s +2025-11-19 04:11:32,011 - INFO - Epoch 1 Step 1200 (Global: 1200): loss=0.2408, ppl=1.27, grad_norm=4.75, lr=9.99e-06, throughput=2812 tok/s +2025-11-19 04:14:22,478 - INFO - Epoch 1 Step 1210 (Global: 1210): loss=0.2236, ppl=1.25, grad_norm=5.31, lr=9.99e-06, throughput=2816 tok/s +2025-11-19 04:17:11,636 - INFO - Epoch 1 Step 1220 (Global: 1220): loss=0.2877, ppl=1.33, grad_norm=6.19, lr=9.99e-06, throughput=2838 tok/s +2025-11-19 04:20:01,745 - INFO - Epoch 1 Step 1230 (Global: 1230): loss=0.2489, ppl=1.28, grad_norm=4.38, lr=9.99e-06, throughput=2822 tok/s +2025-11-19 04:22:53,198 - INFO - Epoch 1 Step 1240 (Global: 1240): loss=0.2464, ppl=1.28, grad_norm=4.56, lr=9.99e-06, throughput=2800 tok/s +2025-11-19 04:25:43,667 - INFO - Epoch 1 Step 1250 (Global: 1250): loss=0.2408, ppl=1.27, grad_norm=3.61, lr=9.99e-06, throughput=2816 tok/s +2025-11-19 04:28:34,273 - INFO - Epoch 1 Step 1260 (Global: 1260): loss=0.2772, ppl=1.32, grad_norm=6.12, lr=9.99e-06, throughput=2814 tok/s +2025-11-19 04:31:24,547 - INFO - Epoch 1 Step 1270 (Global: 1270): loss=0.2496, ppl=1.28, grad_norm=6.62, lr=9.99e-06, throughput=2819 tok/s +2025-11-19 04:34:16,437 - INFO - Epoch 1 Step 1280 (Global: 1280): loss=0.2498, ppl=1.28, grad_norm=4.66, lr=9.98e-06, throughput=2793 tok/s +2025-11-19 04:37:08,706 - INFO - Epoch 1 Step 1290 (Global: 1290): loss=0.2262, ppl=1.25, grad_norm=4.09, lr=9.98e-06, throughput=2786 tok/s +2025-11-19 04:40:01,128 - INFO - Epoch 1 Step 1300 (Global: 1300): loss=0.2357, ppl=1.27, grad_norm=4.62, lr=9.98e-06, throughput=2784 tok/s +2025-11-19 04:42:57,021 - INFO - Epoch 1 Step 1310 (Global: 1310): loss=0.2292, ppl=1.26, grad_norm=4.06, lr=9.98e-06, throughput=2729 tok/s +2025-11-19 04:45:48,540 - INFO - Epoch 1 Step 1320 (Global: 1320): loss=0.2674, ppl=1.31, grad_norm=5.31, lr=9.98e-06, throughput=2799 tok/s +2025-11-19 04:48:36,892 - INFO - Epoch 1 Step 1330 (Global: 1330): loss=0.2664, ppl=1.31, grad_norm=4.00, lr=9.98e-06, throughput=2851 tok/s +2025-11-19 04:51:24,377 - INFO - Epoch 1 Step 1340 (Global: 1340): loss=0.2336, ppl=1.26, grad_norm=4.56, lr=9.97e-06, throughput=2866 tok/s +2025-11-19 04:54:21,885 - INFO - Epoch 1 Step 1350 (Global: 1350): loss=0.2627, ppl=1.30, grad_norm=7.22, lr=9.97e-06, throughput=2704 tok/s +2025-11-19 04:57:09,664 - INFO - Epoch 1 Step 1360 (Global: 1360): loss=0.2789, ppl=1.32, grad_norm=5.25, lr=9.97e-06, throughput=2861 tok/s +2025-11-19 04:59:57,329 - INFO - Epoch 1 Step 1370 (Global: 1370): loss=0.2515, ppl=1.29, grad_norm=4.81, lr=9.97e-06, throughput=2863 tok/s +2025-11-19 05:02:44,066 - INFO - Epoch 1 Step 1380 (Global: 1380): loss=0.2415, ppl=1.27, grad_norm=7.25, lr=9.97e-06, throughput=2879 tok/s +2025-11-19 05:05:30,898 - INFO - Epoch 1 Step 1390 (Global: 1390): loss=0.2471, ppl=1.28, grad_norm=4.19, lr=9.97e-06, throughput=2877 tok/s +2025-11-19 05:08:21,238 - INFO - Epoch 1 Step 1400 (Global: 1400): loss=0.2927, ppl=1.34, grad_norm=9.12, lr=9.96e-06, throughput=2818 tok/s +2025-11-19 05:11:11,392 - INFO - Epoch 1 Step 1410 (Global: 1410): loss=0.2472, ppl=1.28, grad_norm=6.81, lr=9.96e-06, throughput=2821 tok/s +2025-11-19 05:14:00,876 - INFO - Epoch 1 Step 1420 (Global: 1420): loss=0.2850, ppl=1.33, grad_norm=5.66, lr=9.96e-06, throughput=2832 tok/s +2025-11-19 05:16:48,881 - INFO - Epoch 1 Step 1430 (Global: 1430): loss=0.2401, ppl=1.27, grad_norm=6.12, lr=9.96e-06, throughput=2857 tok/s +2025-11-19 05:19:37,130 - INFO - Epoch 1 Step 1440 (Global: 1440): loss=0.2649, ppl=1.30, grad_norm=6.34, lr=9.96e-06, throughput=2853 tok/s +2025-11-19 05:22:25,313 - INFO - Epoch 1 Step 1450 (Global: 1450): loss=0.2640, ppl=1.30, grad_norm=4.72, lr=9.95e-06, throughput=2854 tok/s +2025-11-19 05:25:13,667 - INFO - Epoch 1 Step 1460 (Global: 1460): loss=0.2605, ppl=1.30, grad_norm=6.06, lr=9.95e-06, throughput=2851 tok/s +2025-11-19 05:28:02,951 - INFO - Epoch 1 Step 1470 (Global: 1470): loss=0.2523, ppl=1.29, grad_norm=4.22, lr=9.95e-06, throughput=2836 tok/s +2025-11-19 05:30:51,320 - INFO - Epoch 1 Step 1480 (Global: 1480): loss=0.2503, ppl=1.28, grad_norm=6.41, lr=9.95e-06, throughput=2851 tok/s +2025-11-19 05:33:48,022 - INFO - Epoch 1 Step 1490 (Global: 1490): loss=0.2421, ppl=1.27, grad_norm=4.53, lr=9.94e-06, throughput=2716 tok/s +2025-11-19 05:36:35,576 - INFO - Epoch 1 Step 1500 (Global: 1500): loss=0.2791, ppl=1.32, grad_norm=5.06, lr=9.94e-06, throughput=2865 tok/s +2025-11-19 05:36:35,577 - INFO - +Running validation at step 1500... +2025-11-19 05:46:37,075 - INFO - Validation loss: 0.2520, perplexity: 1.29 +2025-11-19 05:46:37,076 - INFO - Qualitative metrics (n=5): +2025-11-19 05:46:37,076 - INFO - BLEU: 0.5810 +2025-11-19 05:46:37,076 - INFO - METEOR: 0.6987 +2025-11-19 05:46:37,076 - INFO - Edit Distance: 0.3056 +2025-11-19 05:46:37,076 - INFO - F-measure: 0.7285 +2025-11-19 05:46:37,076 - INFO - +====================================================================== +2025-11-19 05:46:37,076 - INFO - Qualitative Evaluation Samples: +2025-11-19 05:46:37,077 - INFO - ====================================================================== +2025-11-19 05:46:37,077 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 05:46:37,077 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 05:46:37,077 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps the album\'s seemingly illegal consequence of songs makes sense if they wish to lure their audience into thinking it\'s a six-pourer. But it\'s not...' +2025-11-19 05:46:37,077 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 05:46:37,077 - INFO - ---------------------------------------------------------------------- +2025-11-19 05:46:37,077 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 05:46:37,078 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 05:46:37,078 - INFO - Generated: ', was Sirène Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 05:46:37,078 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 05:46:37,078 - INFO - ---------------------------------------------------------------------- +2025-11-19 05:46:37,078 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 05:46:37,078 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 05:46:37,078 - INFO - Generated: ' at the meeting Layina headed. His weapon of choice is a giant axe, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Steel steps the ax and...' +2025-11-19 05:46:37,078 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 05:46:37,078 - INFO - ---------------------------------------------------------------------- +2025-11-19 05:46:37,078 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 05:46:37,079 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 05:46:37,079 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 05:46:37,079 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 05:46:37,079 - INFO - ---------------------------------------------------------------------- +2025-11-19 05:46:37,080 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 05:46:37,080 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 05:46:37,080 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 05:46:37,080 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 05:46:37,080 - INFO - ---------------------------------------------------------------------- +2025-11-19 05:46:37,081 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_1500.jsonl +2025-11-19 05:47:27,851 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 05:47:27,867 - INFO - New best validation loss: 0.2520, perplexity: 1.29 +2025-11-19 05:50:14,398 - INFO - Epoch 1 Step 1510 (Global: 1510): loss=0.2463, ppl=1.28, grad_norm=4.97, lr=9.94e-06, throughput=2883 tok/s +2025-11-19 05:53:00,349 - INFO - Epoch 1 Step 1520 (Global: 1520): loss=0.2700, ppl=1.31, grad_norm=6.88, lr=9.94e-06, throughput=2892 tok/s +2025-11-19 05:55:45,771 - INFO - Epoch 1 Step 1530 (Global: 1530): loss=0.2293, ppl=1.26, grad_norm=4.38, lr=9.93e-06, throughput=2902 tok/s +2025-11-19 05:58:32,764 - INFO - Epoch 1 Step 1540 (Global: 1540): loss=0.2037, ppl=1.23, grad_norm=5.44, lr=9.93e-06, throughput=2874 tok/s +2025-11-19 06:01:20,509 - INFO - Epoch 1 Step 1550 (Global: 1550): loss=0.2072, ppl=1.23, grad_norm=4.09, lr=9.93e-06, throughput=2862 tok/s +2025-11-19 06:04:08,054 - INFO - Epoch 1 Step 1560 (Global: 1560): loss=0.2644, ppl=1.30, grad_norm=7.25, lr=9.92e-06, throughput=2865 tok/s +2025-11-19 06:06:55,411 - INFO - Epoch 1 Step 1570 (Global: 1570): loss=0.2415, ppl=1.27, grad_norm=4.12, lr=9.92e-06, throughput=2868 tok/s +2025-11-19 06:09:42,944 - INFO - Epoch 1 Step 1580 (Global: 1580): loss=0.2407, ppl=1.27, grad_norm=4.44, lr=9.92e-06, throughput=2865 tok/s +2025-11-19 06:12:30,897 - INFO - Epoch 1 Step 1590 (Global: 1590): loss=0.2473, ppl=1.28, grad_norm=6.09, lr=9.92e-06, throughput=2858 tok/s +2025-11-19 06:15:18,636 - INFO - Epoch 1 Step 1600 (Global: 1600): loss=0.2253, ppl=1.25, grad_norm=5.00, lr=9.91e-06, throughput=2862 tok/s +2025-11-19 06:18:05,982 - INFO - Epoch 1 Step 1610 (Global: 1610): loss=0.2567, ppl=1.29, grad_norm=5.34, lr=9.91e-06, throughput=2868 tok/s +2025-11-19 06:21:02,688 - INFO - Epoch 1 Step 1620 (Global: 1620): loss=0.2268, ppl=1.25, grad_norm=3.95, lr=9.91e-06, throughput=2716 tok/s +2025-11-19 06:23:49,402 - INFO - Epoch 1 Step 1630 (Global: 1630): loss=0.2769, ppl=1.32, grad_norm=6.09, lr=9.90e-06, throughput=2879 tok/s +2025-11-19 06:26:36,722 - INFO - Epoch 1 Step 1640 (Global: 1640): loss=0.2127, ppl=1.24, grad_norm=4.72, lr=9.90e-06, throughput=2869 tok/s +2025-11-19 06:29:24,296 - INFO - Epoch 1 Step 1650 (Global: 1650): loss=0.2226, ppl=1.25, grad_norm=4.47, lr=9.90e-06, throughput=2864 tok/s +2025-11-19 06:32:11,085 - INFO - Epoch 1 Step 1660 (Global: 1660): loss=0.2455, ppl=1.28, grad_norm=5.34, lr=9.89e-06, throughput=2878 tok/s +2025-11-19 06:34:59,094 - INFO - Epoch 1 Step 1670 (Global: 1670): loss=0.2577, ppl=1.29, grad_norm=6.91, lr=9.89e-06, throughput=2857 tok/s +2025-11-19 06:37:46,682 - INFO - Epoch 1 Step 1680 (Global: 1680): loss=0.2194, ppl=1.25, grad_norm=5.06, lr=9.89e-06, throughput=2864 tok/s +2025-11-19 06:40:34,281 - INFO - Epoch 1 Step 1690 (Global: 1690): loss=0.2529, ppl=1.29, grad_norm=4.88, lr=9.88e-06, throughput=2864 tok/s +2025-11-19 06:43:21,965 - INFO - Epoch 1 Step 1700 (Global: 1700): loss=0.2429, ppl=1.27, grad_norm=4.94, lr=9.88e-06, throughput=2863 tok/s +2025-11-19 06:46:09,642 - INFO - Epoch 1 Step 1710 (Global: 1710): loss=0.2521, ppl=1.29, grad_norm=3.83, lr=9.87e-06, throughput=2863 tok/s +2025-11-19 06:48:57,113 - INFO - Epoch 1 Step 1720 (Global: 1720): loss=0.2308, ppl=1.26, grad_norm=4.97, lr=9.87e-06, throughput=2866 tok/s +2025-11-19 06:51:44,783 - INFO - Epoch 1 Step 1730 (Global: 1730): loss=0.1951, ppl=1.22, grad_norm=4.19, lr=9.87e-06, throughput=2863 tok/s +2025-11-19 06:54:31,688 - INFO - Epoch 1 Step 1740 (Global: 1740): loss=0.2394, ppl=1.27, grad_norm=4.06, lr=9.86e-06, throughput=2876 tok/s +2025-11-19 06:57:18,604 - INFO - Epoch 1 Step 1750 (Global: 1750): loss=0.2338, ppl=1.26, grad_norm=5.41, lr=9.86e-06, throughput=2876 tok/s +2025-11-19 07:00:14,599 - INFO - Epoch 1 Step 1760 (Global: 1760): loss=0.2132, ppl=1.24, grad_norm=4.25, lr=9.86e-06, throughput=2727 tok/s +2025-11-19 07:03:02,149 - INFO - Epoch 1 Step 1770 (Global: 1770): loss=0.2498, ppl=1.28, grad_norm=4.03, lr=9.85e-06, throughput=2865 tok/s +2025-11-19 07:05:48,902 - INFO - Epoch 1 Step 1780 (Global: 1780): loss=0.2356, ppl=1.27, grad_norm=4.78, lr=9.85e-06, throughput=2879 tok/s +2025-11-19 07:08:36,226 - INFO - Epoch 1 Step 1790 (Global: 1790): loss=0.2218, ppl=1.25, grad_norm=4.44, lr=9.84e-06, throughput=2869 tok/s +2025-11-19 07:11:24,130 - INFO - Epoch 1 Step 1800 (Global: 1800): loss=0.2171, ppl=1.24, grad_norm=7.22, lr=9.84e-06, throughput=2859 tok/s +2025-11-19 07:14:11,810 - INFO - Epoch 1 Step 1810 (Global: 1810): loss=0.2427, ppl=1.27, grad_norm=5.34, lr=9.83e-06, throughput=2863 tok/s +2025-11-19 07:16:59,383 - INFO - Epoch 1 Step 1820 (Global: 1820): loss=0.2245, ppl=1.25, grad_norm=4.31, lr=9.83e-06, throughput=2864 tok/s +2025-11-19 07:19:48,362 - INFO - Epoch 1 Step 1830 (Global: 1830): loss=0.1944, ppl=1.21, grad_norm=3.59, lr=9.83e-06, throughput=2841 tok/s +2025-11-19 07:22:36,912 - INFO - Epoch 1 Step 1840 (Global: 1840): loss=0.2525, ppl=1.29, grad_norm=4.31, lr=9.82e-06, throughput=2848 tok/s +2025-11-19 07:25:26,253 - INFO - Epoch 1 Step 1850 (Global: 1850): loss=0.2413, ppl=1.27, grad_norm=5.28, lr=9.82e-06, throughput=2835 tok/s +2025-11-19 07:28:14,212 - INFO - Epoch 1 Step 1860 (Global: 1860): loss=0.2306, ppl=1.26, grad_norm=4.69, lr=9.81e-06, throughput=2858 tok/s +2025-11-19 07:31:11,367 - INFO - Epoch 1 Step 1870 (Global: 1870): loss=0.2331, ppl=1.26, grad_norm=4.28, lr=9.81e-06, throughput=2710 tok/s +2025-11-19 07:33:59,677 - INFO - Epoch 1 Step 1880 (Global: 1880): loss=0.2002, ppl=1.22, grad_norm=3.34, lr=9.80e-06, throughput=2852 tok/s +2025-11-19 07:36:47,203 - INFO - Epoch 1 Step 1890 (Global: 1890): loss=0.2132, ppl=1.24, grad_norm=4.78, lr=9.80e-06, throughput=2865 tok/s +2025-11-19 07:39:34,632 - INFO - Epoch 1 Step 1900 (Global: 1900): loss=0.2305, ppl=1.26, grad_norm=5.56, lr=9.79e-06, throughput=2867 tok/s +2025-11-19 07:42:23,014 - INFO - Epoch 1 Step 1910 (Global: 1910): loss=0.2367, ppl=1.27, grad_norm=5.81, lr=9.79e-06, throughput=2851 tok/s +2025-11-19 07:45:10,562 - INFO - Epoch 1 Step 1920 (Global: 1920): loss=0.2218, ppl=1.25, grad_norm=6.03, lr=9.78e-06, throughput=2865 tok/s +2025-11-19 07:47:57,705 - INFO - Epoch 1 Step 1930 (Global: 1930): loss=0.2100, ppl=1.23, grad_norm=6.38, lr=9.78e-06, throughput=2872 tok/s +2025-11-19 07:50:44,776 - INFO - Epoch 1 Step 1940 (Global: 1940): loss=0.2399, ppl=1.27, grad_norm=4.50, lr=9.77e-06, throughput=2873 tok/s +2025-11-19 07:53:32,203 - INFO - Epoch 1 Step 1950 (Global: 1950): loss=0.2438, ppl=1.28, grad_norm=4.97, lr=9.77e-06, throughput=2867 tok/s +2025-11-19 07:56:19,417 - INFO - Epoch 1 Step 1960 (Global: 1960): loss=0.2025, ppl=1.22, grad_norm=3.97, lr=9.76e-06, throughput=2871 tok/s +2025-11-19 07:59:07,117 - INFO - Epoch 1 Step 1970 (Global: 1970): loss=0.2164, ppl=1.24, grad_norm=7.41, lr=9.76e-06, throughput=2862 tok/s +2025-11-19 08:01:54,265 - INFO - Epoch 1 Step 1980 (Global: 1980): loss=0.2400, ppl=1.27, grad_norm=4.78, lr=9.75e-06, throughput=2872 tok/s +2025-11-19 08:04:41,833 - INFO - Epoch 1 Step 1990 (Global: 1990): loss=0.2283, ppl=1.26, grad_norm=4.47, lr=9.75e-06, throughput=2865 tok/s +2025-11-19 08:07:29,445 - INFO - Epoch 1 Step 2000 (Global: 2000): loss=0.2469, ppl=1.28, grad_norm=4.84, lr=9.74e-06, throughput=2864 tok/s +2025-11-19 08:07:29,446 - INFO - +Running validation at step 2000... +2025-11-19 08:17:03,473 - INFO - Validation loss: 0.2304, perplexity: 1.26 +2025-11-19 08:17:03,474 - INFO - Qualitative metrics (n=5): +2025-11-19 08:17:03,474 - INFO - BLEU: 0.7917 +2025-11-19 08:17:03,474 - INFO - METEOR: 0.8655 +2025-11-19 08:17:03,474 - INFO - Edit Distance: 0.1709 +2025-11-19 08:17:03,474 - INFO - F-measure: 0.8781 +2025-11-19 08:17:03,474 - INFO - +====================================================================== +2025-11-19 08:17:03,475 - INFO - Qualitative Evaluation Samples: +2025-11-19 08:17:03,475 - INFO - ====================================================================== +2025-11-19 08:17:03,475 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 08:17:03,475 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 08:17:03,475 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illegal consequences songs makes sense if they wish to lure their audience into thinking it\'s a spy-voice. But it\'s not ...' +2025-11-19 08:17:03,475 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 08:17:03,475 - INFO - ---------------------------------------------------------------------- +2025-11-19 08:17:03,475 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 08:17:03,475 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 08:17:03,475 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 08:17:03,476 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 08:17:03,476 - INFO - ---------------------------------------------------------------------- +2025-11-19 08:17:03,476 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 08:17:03,476 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 08:17:03,476 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Best steps out as and b...' +2025-11-19 08:17:03,476 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 08:17:03,476 - INFO - ---------------------------------------------------------------------- +2025-11-19 08:17:03,476 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 08:17:03,476 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 08:17:03,477 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 08:17:03,477 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 08:17:03,477 - INFO - ---------------------------------------------------------------------- +2025-11-19 08:17:03,477 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 08:17:03,477 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 08:17:03,477 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 08:17:03,477 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 08:17:03,477 - INFO - ---------------------------------------------------------------------- +2025-11-19 08:17:03,478 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_2000.jsonl +2025-11-19 08:17:52,459 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 08:17:52,479 - INFO - New best validation loss: 0.2304, perplexity: 1.26 +2025-11-19 08:20:41,840 - INFO - Epoch 1 Step 2010 (Global: 2010): loss=0.2508, ppl=1.29, grad_norm=4.72, lr=9.74e-06, throughput=2834 tok/s +2025-11-19 08:23:40,115 - INFO - Epoch 1 Step 2020 (Global: 2020): loss=0.2317, ppl=1.26, grad_norm=4.78, lr=9.73e-06, throughput=2692 tok/s +2025-11-19 08:26:29,153 - INFO - Epoch 1 Step 2030 (Global: 2030): loss=0.2246, ppl=1.25, grad_norm=4.47, lr=9.73e-06, throughput=2840 tok/s +2025-11-19 08:29:17,491 - INFO - Epoch 1 Step 2040 (Global: 2040): loss=0.2163, ppl=1.24, grad_norm=4.62, lr=9.72e-06, throughput=2851 tok/s +2025-11-19 08:32:06,112 - INFO - Epoch 1 Step 2050 (Global: 2050): loss=0.2432, ppl=1.28, grad_norm=9.44, lr=9.72e-06, throughput=2847 tok/s +2025-11-19 08:34:54,699 - INFO - Epoch 1 Step 2060 (Global: 2060): loss=0.2187, ppl=1.24, grad_norm=4.62, lr=9.71e-06, throughput=2847 tok/s +2025-11-19 08:37:42,055 - INFO - Epoch 1 Step 2070 (Global: 2070): loss=0.2232, ppl=1.25, grad_norm=6.81, lr=9.71e-06, throughput=2868 tok/s +2025-11-19 08:40:31,079 - INFO - Epoch 1 Step 2080 (Global: 2080): loss=0.2157, ppl=1.24, grad_norm=4.78, lr=9.70e-06, throughput=2840 tok/s +2025-11-19 08:43:20,527 - INFO - Epoch 1 Step 2090 (Global: 2090): loss=0.1928, ppl=1.21, grad_norm=4.06, lr=9.69e-06, throughput=2833 tok/s +2025-11-19 08:46:20,911 - INFO - Epoch 1 Step 2100 (Global: 2100): loss=0.2328, ppl=1.26, grad_norm=5.03, lr=9.69e-06, throughput=2661 tok/s +2025-11-19 08:49:12,134 - INFO - Epoch 1 Step 2110 (Global: 2110): loss=0.2022, ppl=1.22, grad_norm=3.89, lr=9.68e-06, throughput=2803 tok/s +2025-11-19 08:52:04,937 - INFO - Epoch 1 Step 2120 (Global: 2120): loss=0.2303, ppl=1.26, grad_norm=4.81, lr=9.68e-06, throughput=2778 tok/s +2025-11-19 08:54:58,717 - INFO - Epoch 1 Step 2130 (Global: 2130): loss=0.2045, ppl=1.23, grad_norm=3.62, lr=9.67e-06, throughput=2762 tok/s +2025-11-19 08:57:50,532 - INFO - Epoch 1 Step 2140 (Global: 2140): loss=0.1918, ppl=1.21, grad_norm=4.28, lr=9.66e-06, throughput=2794 tok/s +2025-11-19 09:00:42,187 - INFO - Epoch 1 Step 2150 (Global: 2150): loss=0.2163, ppl=1.24, grad_norm=4.62, lr=9.66e-06, throughput=2796 tok/s +2025-11-19 09:03:37,736 - INFO - Epoch 1 Step 2160 (Global: 2160): loss=0.2146, ppl=1.24, grad_norm=3.42, lr=9.65e-06, throughput=2734 tok/s +2025-11-19 09:06:34,399 - INFO - Epoch 1 Step 2170 (Global: 2170): loss=0.2210, ppl=1.25, grad_norm=4.06, lr=9.65e-06, throughput=2717 tok/s +2025-11-19 09:09:28,014 - INFO - Epoch 1 Step 2180 (Global: 2180): loss=0.2070, ppl=1.23, grad_norm=4.97, lr=9.64e-06, throughput=2765 tok/s +2025-11-19 09:12:23,512 - INFO - Epoch 1 Step 2190 (Global: 2190): loss=0.2109, ppl=1.23, grad_norm=38.75, lr=9.63e-06, throughput=2735 tok/s +2025-11-19 09:15:21,616 - INFO - Epoch 1 Step 2200 (Global: 2200): loss=0.2152, ppl=1.24, grad_norm=4.59, lr=9.63e-06, throughput=2695 tok/s +2025-11-19 09:18:16,998 - INFO - Epoch 1 Step 2210 (Global: 2210): loss=0.2006, ppl=1.22, grad_norm=3.83, lr=9.62e-06, throughput=2737 tok/s +2025-11-19 09:21:13,872 - INFO - Epoch 1 Step 2220 (Global: 2220): loss=0.2147, ppl=1.24, grad_norm=4.84, lr=9.61e-06, throughput=2714 tok/s +2025-11-19 09:24:19,708 - INFO - Epoch 1 Step 2230 (Global: 2230): loss=0.2236, ppl=1.25, grad_norm=6.62, lr=9.61e-06, throughput=2583 tok/s +2025-11-19 09:27:13,796 - INFO - Epoch 1 Step 2240 (Global: 2240): loss=0.2485, ppl=1.28, grad_norm=5.78, lr=9.60e-06, throughput=2757 tok/s +2025-11-19 09:30:09,572 - INFO - Epoch 1 Step 2250 (Global: 2250): loss=0.2151, ppl=1.24, grad_norm=4.53, lr=9.60e-06, throughput=2731 tok/s +2025-11-19 09:33:02,584 - INFO - Epoch 1 Step 2260 (Global: 2260): loss=0.2350, ppl=1.26, grad_norm=4.62, lr=9.59e-06, throughput=2774 tok/s +2025-11-19 09:35:55,204 - INFO - Epoch 1 Step 2270 (Global: 2270): loss=0.2208, ppl=1.25, grad_norm=4.97, lr=9.58e-06, throughput=2781 tok/s +2025-11-19 09:38:46,875 - INFO - Epoch 1 Step 2280 (Global: 2280): loss=0.2011, ppl=1.22, grad_norm=3.84, lr=9.58e-06, throughput=2796 tok/s +2025-11-19 09:41:36,781 - INFO - Epoch 1 Step 2290 (Global: 2290): loss=0.2093, ppl=1.23, grad_norm=6.44, lr=9.57e-06, throughput=2825 tok/s +2025-11-19 09:44:28,776 - INFO - Epoch 1 Step 2300 (Global: 2300): loss=0.1733, ppl=1.19, grad_norm=5.22, lr=9.56e-06, throughput=2791 tok/s +2025-11-19 09:47:21,813 - INFO - Epoch 1 Step 2310 (Global: 2310): loss=0.2129, ppl=1.24, grad_norm=3.86, lr=9.55e-06, throughput=2774 tok/s +2025-11-19 09:50:12,073 - INFO - Epoch 1 Step 2320 (Global: 2320): loss=0.2076, ppl=1.23, grad_norm=3.72, lr=9.55e-06, throughput=2819 tok/s +2025-11-19 09:53:02,739 - INFO - Epoch 1 Step 2330 (Global: 2330): loss=0.2224, ppl=1.25, grad_norm=4.38, lr=9.54e-06, throughput=2813 tok/s +2025-11-19 09:55:53,699 - INFO - Epoch 1 Step 2340 (Global: 2340): loss=0.1962, ppl=1.22, grad_norm=3.78, lr=9.53e-06, throughput=2808 tok/s +2025-11-19 09:58:43,828 - INFO - Epoch 1 Step 2350 (Global: 2350): loss=0.2960, ppl=1.34, grad_norm=6.59, lr=9.53e-06, throughput=2821 tok/s +2025-11-19 10:01:34,252 - INFO - Epoch 1 Step 2360 (Global: 2360): loss=0.2004, ppl=1.22, grad_norm=5.97, lr=9.52e-06, throughput=2817 tok/s +2025-11-19 10:04:26,630 - INFO - Epoch 1 Step 2370 (Global: 2370): loss=0.2148, ppl=1.24, grad_norm=4.06, lr=9.51e-06, throughput=2785 tok/s +2025-11-19 10:07:18,733 - INFO - Epoch 1 Step 2380 (Global: 2380): loss=0.2085, ppl=1.23, grad_norm=3.94, lr=9.51e-06, throughput=2789 tok/s +2025-11-19 10:10:09,227 - INFO - Epoch 1 Step 2390 (Global: 2390): loss=0.2003, ppl=1.22, grad_norm=3.56, lr=9.50e-06, throughput=2815 tok/s +2025-11-19 10:13:00,576 - INFO - Epoch 1 Step 2400 (Global: 2400): loss=0.2419, ppl=1.27, grad_norm=5.72, lr=9.49e-06, throughput=2801 tok/s +2025-11-19 10:15:53,363 - INFO - Epoch 1 Step 2410 (Global: 2410): loss=0.2311, ppl=1.26, grad_norm=4.66, lr=9.48e-06, throughput=2778 tok/s +2025-11-19 10:18:44,182 - INFO - Epoch 1 Step 2420 (Global: 2420): loss=0.2126, ppl=1.24, grad_norm=4.75, lr=9.48e-06, throughput=2810 tok/s +2025-11-19 10:21:36,095 - INFO - Epoch 1 Step 2430 (Global: 2430): loss=0.1936, ppl=1.21, grad_norm=4.50, lr=9.47e-06, throughput=2792 tok/s +2025-11-19 10:24:26,611 - INFO - Epoch 1 Step 2440 (Global: 2440): loss=0.2025, ppl=1.22, grad_norm=5.22, lr=9.46e-06, throughput=2815 tok/s +2025-11-19 10:27:26,796 - INFO - Epoch 1 Step 2450 (Global: 2450): loss=0.2216, ppl=1.25, grad_norm=4.81, lr=9.45e-06, throughput=2664 tok/s +2025-11-19 10:30:14,909 - INFO - Epoch 1 Step 2460 (Global: 2460): loss=0.2105, ppl=1.23, grad_norm=3.34, lr=9.45e-06, throughput=2855 tok/s +2025-11-19 10:33:05,744 - INFO - Epoch 1 Step 2470 (Global: 2470): loss=0.1969, ppl=1.22, grad_norm=5.22, lr=9.44e-06, throughput=2810 tok/s +2025-11-19 10:36:00,774 - INFO - Epoch 1 Step 2480 (Global: 2480): loss=0.2011, ppl=1.22, grad_norm=4.97, lr=9.43e-06, throughput=2742 tok/s +2025-11-19 10:38:52,185 - INFO - Epoch 1 Step 2490 (Global: 2490): loss=0.2395, ppl=1.27, grad_norm=4.38, lr=9.42e-06, throughput=2800 tok/s +2025-11-19 10:41:44,408 - INFO - Epoch 1 Step 2500 (Global: 2500): loss=0.2418, ppl=1.27, grad_norm=3.53, lr=9.41e-06, throughput=2787 tok/s +2025-11-19 10:41:44,409 - INFO - +Running validation at step 2500... +2025-11-19 10:51:38,920 - INFO - Validation loss: 0.2181, perplexity: 1.24 +2025-11-19 10:51:38,921 - INFO - Qualitative metrics (n=5): +2025-11-19 10:51:38,921 - INFO - BLEU: 0.7988 +2025-11-19 10:51:38,921 - INFO - METEOR: 0.8762 +2025-11-19 10:51:38,921 - INFO - Edit Distance: 0.1896 +2025-11-19 10:51:38,922 - INFO - F-measure: 0.8844 +2025-11-19 10:51:38,922 - INFO - +====================================================================== +2025-11-19 10:51:38,922 - INFO - Qualitative Evaluation Samples: +2025-11-19 10:51:38,922 - INFO - ====================================================================== +2025-11-19 10:51:38,922 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 10:51:38,922 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 10:51:38,922 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illegal consequence of songs makes sense if they wish to lure their audience into thinking it\'s as you-save-ver. But it\'...' +2025-11-19 10:51:38,922 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 10:51:38,922 - INFO - ---------------------------------------------------------------------- +2025-11-19 10:51:38,922 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 10:51:38,922 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 10:51:38,923 - INFO - Generated: ', was Simone Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of ARY; and...' +2025-11-19 10:51:38,923 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 10:51:38,923 - INFO - ---------------------------------------------------------------------- +2025-11-19 10:51:38,923 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 10:51:38,923 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 10:51:38,923 - INFO - Generated: ' at the meeting Laynusia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beet steps out of his...' +2025-11-19 10:51:38,924 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 10:51:38,924 - INFO - ---------------------------------------------------------------------- +2025-11-19 10:51:38,924 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 10:51:38,924 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 10:51:38,924 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 10:51:38,924 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 10:51:38,925 - INFO - ---------------------------------------------------------------------- +2025-11-19 10:51:38,925 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 10:51:38,925 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 10:51:38,925 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian Abyss and Labyrinth of Worlds | June 2, 2011 | DOS ...' +2025-11-19 10:51:38,925 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 10:51:38,925 - INFO - ---------------------------------------------------------------------- +2025-11-19 10:51:38,926 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_2500.jsonl +2025-11-19 10:52:29,205 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 10:52:29,225 - INFO - New best validation loss: 0.2181, perplexity: 1.24 +2025-11-19 10:55:20,776 - INFO - Epoch 1 Step 2510 (Global: 2510): loss=0.1945, ppl=1.21, grad_norm=4.53, lr=9.41e-06, throughput=2798 tok/s +2025-11-19 10:58:13,406 - INFO - Epoch 1 Step 2520 (Global: 2520): loss=0.2162, ppl=1.24, grad_norm=4.75, lr=9.40e-06, throughput=2781 tok/s +2025-11-19 11:01:04,288 - INFO - Epoch 1 Step 2530 (Global: 2530): loss=0.2112, ppl=1.24, grad_norm=4.44, lr=9.39e-06, throughput=2809 tok/s +2025-11-19 11:03:58,151 - INFO - Epoch 1 Step 2540 (Global: 2540): loss=0.2222, ppl=1.25, grad_norm=4.34, lr=9.38e-06, throughput=2761 tok/s +2025-11-19 11:06:51,452 - INFO - Epoch 1 Step 2550 (Global: 2550): loss=0.2169, ppl=1.24, grad_norm=4.53, lr=9.37e-06, throughput=2770 tok/s +2025-11-19 11:09:45,105 - INFO - Epoch 1 Step 2560 (Global: 2560): loss=0.2215, ppl=1.25, grad_norm=3.70, lr=9.37e-06, throughput=2764 tok/s +2025-11-19 11:12:44,849 - INFO - Epoch 1 Step 2570 (Global: 2570): loss=0.2066, ppl=1.23, grad_norm=15.81, lr=9.36e-06, throughput=2671 tok/s +2025-11-19 11:15:38,713 - INFO - Epoch 1 Step 2580 (Global: 2580): loss=0.2203, ppl=1.25, grad_norm=5.28, lr=9.35e-06, throughput=2761 tok/s +2025-11-19 11:18:29,688 - INFO - Epoch 1 Step 2590 (Global: 2590): loss=0.2194, ppl=1.25, grad_norm=3.59, lr=9.34e-06, throughput=2807 tok/s +2025-11-19 11:21:21,569 - INFO - Epoch 1 Step 2600 (Global: 2600): loss=0.2250, ppl=1.25, grad_norm=6.84, lr=9.33e-06, throughput=2793 tok/s +2025-11-19 11:24:15,861 - INFO - Epoch 1 Step 2610 (Global: 2610): loss=0.1940, ppl=1.21, grad_norm=4.59, lr=9.32e-06, throughput=2754 tok/s +2025-11-19 11:27:08,899 - INFO - Epoch 1 Step 2620 (Global: 2620): loss=0.1956, ppl=1.22, grad_norm=4.34, lr=9.32e-06, throughput=2774 tok/s +2025-11-19 11:29:58,785 - INFO - Epoch 1 Step 2630 (Global: 2630): loss=0.1756, ppl=1.19, grad_norm=4.84, lr=9.31e-06, throughput=2825 tok/s +2025-11-19 11:32:49,385 - INFO - Epoch 1 Step 2640 (Global: 2640): loss=0.2041, ppl=1.23, grad_norm=5.69, lr=9.30e-06, throughput=2814 tok/s +2025-11-19 11:35:40,565 - INFO - Epoch 1 Step 2650 (Global: 2650): loss=0.2194, ppl=1.25, grad_norm=9.50, lr=9.29e-06, throughput=2804 tok/s +2025-11-19 11:38:31,711 - INFO - Epoch 1 Step 2660 (Global: 2660): loss=0.1996, ppl=1.22, grad_norm=4.03, lr=9.28e-06, throughput=2805 tok/s +2025-11-19 11:41:20,945 - INFO - Epoch 1 Step 2670 (Global: 2670): loss=0.1920, ppl=1.21, grad_norm=4.75, lr=9.27e-06, throughput=2836 tok/s +2025-11-19 11:44:12,230 - INFO - Epoch 1 Step 2680 (Global: 2680): loss=0.2171, ppl=1.24, grad_norm=4.47, lr=9.26e-06, throughput=2802 tok/s +2025-11-19 11:47:02,639 - INFO - Epoch 1 Step 2690 (Global: 2690): loss=0.2473, ppl=1.28, grad_norm=6.38, lr=9.26e-06, throughput=2817 tok/s +2025-11-19 11:49:53,455 - INFO - Epoch 1 Step 2700 (Global: 2700): loss=0.1919, ppl=1.21, grad_norm=3.55, lr=9.25e-06, throughput=2810 tok/s +2025-11-19 11:52:51,566 - INFO - Epoch 1 Step 2710 (Global: 2710): loss=0.2430, ppl=1.28, grad_norm=4.75, lr=9.24e-06, throughput=2695 tok/s +2025-11-19 11:55:41,867 - INFO - Epoch 1 Step 2720 (Global: 2720): loss=0.2140, ppl=1.24, grad_norm=5.59, lr=9.23e-06, throughput=2819 tok/s +2025-11-19 11:58:31,772 - INFO - Epoch 1 Step 2730 (Global: 2730): loss=0.2095, ppl=1.23, grad_norm=4.88, lr=9.22e-06, throughput=2825 tok/s +2025-11-19 12:01:21,346 - INFO - Epoch 1 Step 2740 (Global: 2740): loss=0.2029, ppl=1.22, grad_norm=6.28, lr=9.21e-06, throughput=2831 tok/s +2025-11-19 12:04:12,545 - INFO - Epoch 1 Step 2750 (Global: 2750): loss=0.2009, ppl=1.22, grad_norm=5.06, lr=9.20e-06, throughput=2804 tok/s +2025-11-19 12:07:02,439 - INFO - Epoch 1 Step 2760 (Global: 2760): loss=0.1782, ppl=1.20, grad_norm=4.81, lr=9.19e-06, throughput=2825 tok/s +2025-11-19 12:09:54,406 - INFO - Epoch 1 Step 2770 (Global: 2770): loss=0.2169, ppl=1.24, grad_norm=3.95, lr=9.18e-06, throughput=2791 tok/s +2025-11-19 12:12:43,935 - INFO - Epoch 1 Step 2780 (Global: 2780): loss=0.1852, ppl=1.20, grad_norm=4.72, lr=9.17e-06, throughput=2831 tok/s +2025-11-19 12:15:32,898 - INFO - Epoch 1 Step 2790 (Global: 2790): loss=0.1776, ppl=1.19, grad_norm=5.47, lr=9.17e-06, throughput=2841 tok/s +2025-11-19 12:18:22,438 - INFO - Epoch 1 Step 2800 (Global: 2800): loss=0.2015, ppl=1.22, grad_norm=11.12, lr=9.16e-06, throughput=2831 tok/s +2025-11-19 12:21:12,611 - INFO - Epoch 1 Step 2810 (Global: 2810): loss=0.1938, ppl=1.21, grad_norm=5.09, lr=9.15e-06, throughput=2821 tok/s +2025-11-19 12:24:01,960 - INFO - Epoch 1 Step 2820 (Global: 2820): loss=0.2158, ppl=1.24, grad_norm=5.94, lr=9.14e-06, throughput=2834 tok/s +2025-11-19 12:26:50,079 - INFO - Epoch 1 Step 2830 (Global: 2830): loss=0.1947, ppl=1.21, grad_norm=4.62, lr=9.13e-06, throughput=2855 tok/s +2025-11-19 12:29:40,295 - INFO - Epoch 1 Step 2840 (Global: 2840): loss=0.1980, ppl=1.22, grad_norm=3.98, lr=9.12e-06, throughput=2820 tok/s +2025-11-19 12:32:30,287 - INFO - Epoch 1 Step 2850 (Global: 2850): loss=0.1839, ppl=1.20, grad_norm=3.77, lr=9.11e-06, throughput=2824 tok/s +2025-11-19 12:35:19,901 - INFO - Epoch 1 Step 2860 (Global: 2860): loss=0.1852, ppl=1.20, grad_norm=5.75, lr=9.10e-06, throughput=2830 tok/s +2025-11-19 12:38:09,760 - INFO - Epoch 1 Step 2870 (Global: 2870): loss=0.1962, ppl=1.22, grad_norm=4.41, lr=9.09e-06, throughput=2826 tok/s +2025-11-19 12:40:58,024 - INFO - Epoch 1 Step 2880 (Global: 2880): loss=0.1961, ppl=1.22, grad_norm=4.44, lr=9.08e-06, throughput=2853 tok/s +2025-11-19 12:43:46,146 - INFO - Epoch 1 Step 2890 (Global: 2890): loss=0.1896, ppl=1.21, grad_norm=4.41, lr=9.07e-06, throughput=2855 tok/s +2025-11-19 12:46:35,402 - INFO - Epoch 1 Step 2900 (Global: 2900): loss=0.1851, ppl=1.20, grad_norm=4.50, lr=9.06e-06, throughput=2836 tok/s +2025-11-19 12:49:25,411 - INFO - Epoch 1 Step 2910 (Global: 2910): loss=0.2075, ppl=1.23, grad_norm=5.69, lr=9.05e-06, throughput=2823 tok/s +2025-11-19 12:52:14,755 - INFO - Epoch 1 Step 2920 (Global: 2920): loss=0.1983, ppl=1.22, grad_norm=4.44, lr=9.04e-06, throughput=2835 tok/s +2025-11-19 12:55:12,662 - INFO - Epoch 1 Step 2930 (Global: 2930): loss=0.1833, ppl=1.20, grad_norm=3.19, lr=9.03e-06, throughput=2698 tok/s +2025-11-19 12:58:02,401 - INFO - Epoch 1 Step 2940 (Global: 2940): loss=0.2030, ppl=1.23, grad_norm=5.19, lr=9.02e-06, throughput=2828 tok/s +2025-11-19 13:00:53,690 - INFO - Epoch 1 Step 2950 (Global: 2950): loss=0.1855, ppl=1.20, grad_norm=5.28, lr=9.01e-06, throughput=2802 tok/s +2025-11-19 13:03:43,951 - INFO - Epoch 1 Step 2960 (Global: 2960): loss=0.1776, ppl=1.19, grad_norm=5.75, lr=9.00e-06, throughput=2819 tok/s +2025-11-19 13:06:37,458 - INFO - Epoch 1 Step 2970 (Global: 2970): loss=0.1935, ppl=1.21, grad_norm=3.80, lr=8.99e-06, throughput=2766 tok/s +2025-11-19 13:09:30,516 - INFO - Epoch 1 Step 2980 (Global: 2980): loss=0.1840, ppl=1.20, grad_norm=3.89, lr=8.98e-06, throughput=2774 tok/s +2025-11-19 13:12:22,163 - INFO - Epoch 1 Step 2990 (Global: 2990): loss=0.2065, ppl=1.23, grad_norm=5.09, lr=8.97e-06, throughput=2796 tok/s +2025-11-19 13:15:12,730 - INFO - Epoch 1 Step 3000 (Global: 3000): loss=0.1989, ppl=1.22, grad_norm=6.91, lr=8.96e-06, throughput=2814 tok/s +2025-11-19 13:15:12,730 - INFO - +Running validation at step 3000... +2025-11-19 13:25:23,539 - INFO - Validation loss: 0.1959, perplexity: 1.22 +2025-11-19 13:25:23,540 - INFO - Qualitative metrics (n=5): +2025-11-19 13:25:23,540 - INFO - BLEU: 0.8649 +2025-11-19 13:25:23,540 - INFO - METEOR: 0.9298 +2025-11-19 13:25:23,540 - INFO - Edit Distance: 0.1361 +2025-11-19 13:25:23,540 - INFO - F-measure: 0.9230 +2025-11-19 13:25:23,540 - INFO - +====================================================================== +2025-11-19 13:25:23,541 - INFO - Qualitative Evaluation Samples: +2025-11-19 13:25:23,541 - INFO - ====================================================================== +2025-11-19 13:25:23,541 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 13:25:23,541 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 13:25:23,541 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illegal consequence of songs makes sense if they wish to rule their audience into thinking it\'s a yes–no–worse. But it\'s...' +2025-11-19 13:25:23,541 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 13:25:23,541 - INFO - ---------------------------------------------------------------------- +2025-11-19 13:25:23,541 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 13:25:23,541 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 13:25:23,541 - INFO - Generated: ', was Sierra Abro-Quacha, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of ARry ROT...' +2025-11-19 13:25:23,541 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 13:25:23,542 - INFO - ---------------------------------------------------------------------- +2025-11-19 13:25:23,542 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 13:25:23,542 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 13:25:23,542 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beetl stops the ax and ...' +2025-11-19 13:25:23,542 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 13:25:23,542 - INFO - ---------------------------------------------------------------------- +2025-11-19 13:25:23,542 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 13:25:23,542 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 13:25:23,542 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 13:25:23,543 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 13:25:23,543 - INFO - ---------------------------------------------------------------------- +2025-11-19 13:25:23,543 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 13:25:23,543 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 13:25:23,543 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 13:25:23,543 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 13:25:23,543 - INFO - ---------------------------------------------------------------------- +2025-11-19 13:25:23,544 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_3000.jsonl +2025-11-19 13:26:12,706 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 13:26:12,726 - INFO - New best validation loss: 0.1959, perplexity: 1.22 +2025-11-19 13:29:08,389 - INFO - Epoch 1 Step 3010 (Global: 3010): loss=0.1829, ppl=1.20, grad_norm=4.66, lr=8.95e-06, throughput=2733 tok/s +2025-11-19 13:32:01,185 - INFO - Epoch 1 Step 3020 (Global: 3020): loss=0.1847, ppl=1.20, grad_norm=8.38, lr=8.94e-06, throughput=2778 tok/s +2025-11-19 13:35:02,496 - INFO - Epoch 1 Step 3030 (Global: 3030): loss=0.1783, ppl=1.20, grad_norm=4.44, lr=8.93e-06, throughput=2647 tok/s +2025-11-19 13:37:56,315 - INFO - Epoch 1 Step 3040 (Global: 3040): loss=0.1792, ppl=1.20, grad_norm=5.31, lr=8.92e-06, throughput=2762 tok/s +2025-11-19 13:40:47,077 - INFO - Epoch 1 Step 3050 (Global: 3050): loss=0.2132, ppl=1.24, grad_norm=4.16, lr=8.91e-06, throughput=2811 tok/s +2025-11-19 13:43:39,508 - INFO - Epoch 1 Step 3060 (Global: 3060): loss=0.1959, ppl=1.22, grad_norm=8.06, lr=8.90e-06, throughput=2784 tok/s +2025-11-19 13:46:33,101 - INFO - Epoch 1 Step 3070 (Global: 3070): loss=0.1812, ppl=1.20, grad_norm=5.00, lr=8.89e-06, throughput=2765 tok/s +2025-11-19 13:49:24,471 - INFO - Epoch 1 Step 3080 (Global: 3080): loss=0.1827, ppl=1.20, grad_norm=4.12, lr=8.88e-06, throughput=2801 tok/s +2025-11-19 13:52:16,014 - INFO - Epoch 1 Step 3090 (Global: 3090): loss=0.1376, ppl=1.15, grad_norm=3.81, lr=8.87e-06, throughput=2798 tok/s +2025-11-19 13:55:11,145 - INFO - Epoch 1 Step 3100 (Global: 3100): loss=0.1780, ppl=1.19, grad_norm=4.53, lr=8.86e-06, throughput=2741 tok/s +2025-11-19 13:58:14,278 - INFO - Epoch 1 Step 3110 (Global: 3110): loss=0.2171, ppl=1.24, grad_norm=8.12, lr=8.85e-06, throughput=2621 tok/s +2025-11-19 14:01:09,147 - INFO - Epoch 1 Step 3120 (Global: 3120): loss=0.2131, ppl=1.24, grad_norm=5.44, lr=8.84e-06, throughput=2745 tok/s +2025-11-19 14:04:04,001 - INFO - Epoch 1 Step 3130 (Global: 3130): loss=0.2493, ppl=1.28, grad_norm=4.12, lr=8.82e-06, throughput=2745 tok/s +2025-11-19 14:06:56,780 - INFO - Epoch 1 Step 3140 (Global: 3140): loss=0.1894, ppl=1.21, grad_norm=8.25, lr=8.81e-06, throughput=2778 tok/s +2025-11-19 14:09:52,429 - INFO - Epoch 1 Step 3150 (Global: 3150): loss=0.2156, ppl=1.24, grad_norm=3.41, lr=8.80e-06, throughput=2733 tok/s +2025-11-19 14:12:43,848 - INFO - Epoch 1 Step 3160 (Global: 3160): loss=0.1960, ppl=1.22, grad_norm=3.25, lr=8.79e-06, throughput=2800 tok/s +2025-11-19 14:15:35,963 - INFO - Epoch 1 Step 3170 (Global: 3170): loss=0.1941, ppl=1.21, grad_norm=3.89, lr=8.78e-06, throughput=2789 tok/s +2025-11-19 14:18:29,561 - INFO - Epoch 1 Step 3180 (Global: 3180): loss=0.2098, ppl=1.23, grad_norm=6.62, lr=8.77e-06, throughput=2765 tok/s +2025-11-19 14:21:20,921 - INFO - Epoch 1 Step 3190 (Global: 3190): loss=0.2042, ppl=1.23, grad_norm=5.88, lr=8.76e-06, throughput=2801 tok/s +2025-11-19 14:24:11,052 - INFO - Epoch 1 Step 3200 (Global: 3200): loss=0.1682, ppl=1.18, grad_norm=4.06, lr=8.75e-06, throughput=2821 tok/s +2025-11-19 14:27:05,373 - INFO - Epoch 1 Step 3210 (Global: 3210): loss=0.2018, ppl=1.22, grad_norm=5.22, lr=8.74e-06, throughput=2754 tok/s +2025-11-19 14:29:58,591 - INFO - Epoch 1 Step 3220 (Global: 3220): loss=0.2048, ppl=1.23, grad_norm=5.03, lr=8.73e-06, throughput=2771 tok/s +2025-11-19 14:32:50,269 - INFO - Epoch 1 Step 3230 (Global: 3230): loss=0.1794, ppl=1.20, grad_norm=3.38, lr=8.71e-06, throughput=2796 tok/s +2025-11-19 14:35:42,778 - INFO - Epoch 1 Step 3240 (Global: 3240): loss=0.1965, ppl=1.22, grad_norm=3.98, lr=8.70e-06, throughput=2782 tok/s +2025-11-19 14:38:32,946 - INFO - Epoch 1 Step 3250 (Global: 3250): loss=0.1698, ppl=1.19, grad_norm=4.59, lr=8.69e-06, throughput=2821 tok/s +2025-11-19 14:41:23,338 - INFO - Epoch 1 Step 3260 (Global: 3260): loss=0.1782, ppl=1.20, grad_norm=4.72, lr=8.68e-06, throughput=2817 tok/s +2025-11-19 14:44:18,200 - INFO - Epoch 1 Step 3270 (Global: 3270): loss=0.1981, ppl=1.22, grad_norm=4.94, lr=8.67e-06, throughput=2745 tok/s +2025-11-19 14:47:08,235 - INFO - Epoch 1 Step 3280 (Global: 3280): loss=0.1958, ppl=1.22, grad_norm=3.64, lr=8.66e-06, throughput=2823 tok/s +2025-11-19 14:50:00,150 - INFO - Epoch 1 Step 3290 (Global: 3290): loss=0.2075, ppl=1.23, grad_norm=3.73, lr=8.65e-06, throughput=2792 tok/s +2025-11-19 14:52:50,359 - INFO - Epoch 1 Step 3300 (Global: 3300): loss=0.2265, ppl=1.25, grad_norm=6.34, lr=8.63e-06, throughput=2820 tok/s +2025-11-19 14:55:39,346 - INFO - Epoch 1 Step 3310 (Global: 3310): loss=0.1831, ppl=1.20, grad_norm=3.97, lr=8.62e-06, throughput=2840 tok/s +2025-11-19 14:58:29,993 - INFO - Epoch 1 Step 3320 (Global: 3320): loss=0.1960, ppl=1.22, grad_norm=6.22, lr=8.61e-06, throughput=2813 tok/s +2025-11-19 15:01:18,947 - INFO - Epoch 1 Step 3330 (Global: 3330): loss=0.1522, ppl=1.16, grad_norm=3.06, lr=8.60e-06, throughput=2841 tok/s +2025-11-19 15:04:07,063 - INFO - Epoch 1 Step 3340 (Global: 3340): loss=0.2372, ppl=1.27, grad_norm=7.69, lr=8.59e-06, throughput=2855 tok/s +2025-11-19 15:06:58,095 - INFO - Epoch 1 Step 3350 (Global: 3350): loss=0.2262, ppl=1.25, grad_norm=4.59, lr=8.58e-06, throughput=2807 tok/s +2025-11-19 15:09:47,601 - INFO - Epoch 1 Step 3360 (Global: 3360): loss=0.1711, ppl=1.19, grad_norm=3.00, lr=8.57e-06, throughput=2832 tok/s +2025-11-19 15:12:36,625 - INFO - Epoch 1 Step 3370 (Global: 3370): loss=0.1853, ppl=1.20, grad_norm=4.06, lr=8.55e-06, throughput=2840 tok/s +2025-11-19 15:15:29,228 - INFO - Epoch 1 Step 3380 (Global: 3380): loss=0.1660, ppl=1.18, grad_norm=4.06, lr=8.54e-06, throughput=2781 tok/s +2025-11-19 15:18:19,777 - INFO - Epoch 1 Step 3390 (Global: 3390): loss=0.1781, ppl=1.20, grad_norm=6.75, lr=8.53e-06, throughput=2814 tok/s +2025-11-19 15:21:08,979 - INFO - Epoch 1 Step 3400 (Global: 3400): loss=0.1865, ppl=1.21, grad_norm=3.92, lr=8.52e-06, throughput=2837 tok/s +2025-11-19 15:24:01,408 - INFO - Epoch 1 Step 3410 (Global: 3410): loss=0.1944, ppl=1.21, grad_norm=3.48, lr=8.51e-06, throughput=2784 tok/s +2025-11-19 15:27:02,194 - INFO - Epoch 1 Step 3420 (Global: 3420): loss=0.1945, ppl=1.21, grad_norm=4.62, lr=8.49e-06, throughput=2655 tok/s +2025-11-19 15:29:53,524 - INFO - Epoch 1 Step 3430 (Global: 3430): loss=0.1933, ppl=1.21, grad_norm=4.66, lr=8.48e-06, throughput=2802 tok/s +2025-11-19 15:32:46,165 - INFO - Epoch 1 Step 3440 (Global: 3440): loss=0.1772, ppl=1.19, grad_norm=4.31, lr=8.47e-06, throughput=2780 tok/s +2025-11-19 15:35:35,517 - INFO - Epoch 1 Step 3450 (Global: 3450): loss=0.1863, ppl=1.20, grad_norm=8.06, lr=8.46e-06, throughput=2834 tok/s +2025-11-19 15:38:27,557 - INFO - Epoch 1 Step 3460 (Global: 3460): loss=0.1890, ppl=1.21, grad_norm=4.69, lr=8.45e-06, throughput=2790 tok/s +2025-11-19 15:41:20,202 - INFO - Epoch 1 Step 3470 (Global: 3470): loss=0.1755, ppl=1.19, grad_norm=8.75, lr=8.43e-06, throughput=2780 tok/s +2025-11-19 15:44:10,406 - INFO - Epoch 1 Step 3480 (Global: 3480): loss=0.1603, ppl=1.17, grad_norm=3.19, lr=8.42e-06, throughput=2820 tok/s +2025-11-19 15:47:03,244 - INFO - Epoch 1 Step 3490 (Global: 3490): loss=0.1893, ppl=1.21, grad_norm=6.59, lr=8.41e-06, throughput=2777 tok/s +2025-11-19 15:49:54,165 - INFO - Epoch 1 Step 3500 (Global: 3500): loss=0.2129, ppl=1.24, grad_norm=5.16, lr=8.40e-06, throughput=2808 tok/s +2025-11-19 15:49:54,165 - INFO - +Running validation at step 3500... +2025-11-19 15:59:52,843 - INFO - Validation loss: 0.1853, perplexity: 1.20 +2025-11-19 15:59:52,843 - INFO - Qualitative metrics (n=5): +2025-11-19 15:59:52,844 - INFO - BLEU: 0.7993 +2025-11-19 15:59:52,844 - INFO - METEOR: 0.8594 +2025-11-19 15:59:52,844 - INFO - Edit Distance: 0.1902 +2025-11-19 15:59:52,844 - INFO - F-measure: 0.8847 +2025-11-19 15:59:52,844 - INFO - +====================================================================== +2025-11-19 15:59:52,844 - INFO - Qualitative Evaluation Samples: +2025-11-19 15:59:52,844 - INFO - ====================================================================== +2025-11-19 15:59:52,844 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 15:59:52,844 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 15:59:52,844 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps the [album\'s] seemingly illegal consequence of songs makes sense if they wish to lure their audience into thinking it\'s as-you-verse. But it\'s n...' +2025-11-19 15:59:52,844 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 15:59:52,845 - INFO - ---------------------------------------------------------------------- +2025-11-19 15:59:52,845 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 15:59:52,845 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 15:59:52,845 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the women president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 15:59:52,845 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 15:59:52,845 - INFO - ---------------------------------------------------------------------- +2025-11-19 15:59:52,845 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 15:59:52,845 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 15:59:52,845 - INFO - Generated: ' at the meeting Layma headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beet stops the ax and bo...' +2025-11-19 15:59:52,845 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 15:59:52,845 - INFO - ---------------------------------------------------------------------- +2025-11-19 15:59:52,845 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 15:59:52,846 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 15:59:52,846 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 15:59:52,846 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 15:59:52,846 - INFO - ---------------------------------------------------------------------- +2025-11-19 15:59:52,846 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 15:59:52,846 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 15:59:52,846 - INFO - Generated: ' | | The Sims 3: Generations | May 31, 2011 | Windows ...' +2025-11-19 15:59:52,846 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 15:59:52,846 - INFO - ---------------------------------------------------------------------- +2025-11-19 15:59:52,848 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_3500.jsonl +2025-11-19 16:00:42,388 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 16:00:42,408 - INFO - New best validation loss: 0.1853, perplexity: 1.20 +2025-11-19 16:03:36,712 - INFO - Epoch 1 Step 3510 (Global: 3510): loss=0.1806, ppl=1.20, grad_norm=2.86, lr=8.38e-06, throughput=2754 tok/s +2025-11-19 16:06:30,021 - INFO - Epoch 1 Step 3520 (Global: 3520): loss=0.1798, ppl=1.20, grad_norm=3.14, lr=8.37e-06, throughput=2770 tok/s +2025-11-19 16:09:23,294 - INFO - Epoch 1 Step 3530 (Global: 3530): loss=0.1665, ppl=1.18, grad_norm=3.45, lr=8.36e-06, throughput=2770 tok/s +2025-11-19 16:12:17,095 - INFO - Epoch 1 Step 3540 (Global: 3540): loss=0.1854, ppl=1.20, grad_norm=3.95, lr=8.35e-06, throughput=2762 tok/s +2025-11-19 16:15:10,956 - INFO - Epoch 1 Step 3550 (Global: 3550): loss=0.1917, ppl=1.21, grad_norm=4.34, lr=8.33e-06, throughput=2761 tok/s +2025-11-19 16:18:05,925 - INFO - Epoch 1 Step 3560 (Global: 3560): loss=0.2077, ppl=1.23, grad_norm=5.59, lr=8.32e-06, throughput=2743 tok/s +2025-11-19 16:21:17,404 - INFO - Epoch 1 Step 3570 (Global: 3570): loss=0.1826, ppl=1.20, grad_norm=3.92, lr=8.31e-06, throughput=2507 tok/s +2025-11-19 16:24:11,316 - INFO - Epoch 1 Step 3580 (Global: 3580): loss=0.1991, ppl=1.22, grad_norm=4.03, lr=8.30e-06, throughput=2760 tok/s +2025-11-19 16:27:05,151 - INFO - Epoch 1 Step 3590 (Global: 3590): loss=0.1974, ppl=1.22, grad_norm=3.91, lr=8.28e-06, throughput=2761 tok/s +2025-11-19 16:29:58,518 - INFO - Epoch 1 Step 3600 (Global: 3600): loss=0.1750, ppl=1.19, grad_norm=4.22, lr=8.27e-06, throughput=2769 tok/s +2025-11-19 16:32:51,812 - INFO - Epoch 1 Step 3610 (Global: 3610): loss=0.1810, ppl=1.20, grad_norm=4.28, lr=8.26e-06, throughput=2770 tok/s +2025-11-19 16:35:48,249 - INFO - Epoch 1 Step 3620 (Global: 3620): loss=0.1556, ppl=1.17, grad_norm=2.70, lr=8.25e-06, throughput=2721 tok/s +2025-11-19 16:38:41,806 - INFO - Epoch 1 Step 3630 (Global: 3630): loss=0.1412, ppl=1.15, grad_norm=2.73, lr=8.23e-06, throughput=2766 tok/s +2025-11-19 16:41:35,331 - INFO - Epoch 1 Step 3640 (Global: 3640): loss=0.1822, ppl=1.20, grad_norm=4.59, lr=8.22e-06, throughput=2766 tok/s +2025-11-19 16:44:31,404 - INFO - Epoch 1 Step 3650 (Global: 3650): loss=0.1718, ppl=1.19, grad_norm=4.62, lr=8.21e-06, throughput=2726 tok/s +2025-11-19 16:47:22,564 - INFO - Epoch 1 Step 3660 (Global: 3660): loss=0.1756, ppl=1.19, grad_norm=4.78, lr=8.20e-06, throughput=2804 tok/s +2025-11-19 16:50:12,904 - INFO - Epoch 1 Step 3670 (Global: 3670): loss=0.1721, ppl=1.19, grad_norm=4.31, lr=8.18e-06, throughput=2818 tok/s +2025-11-19 16:53:08,070 - INFO - Epoch 1 Step 3680 (Global: 3680): loss=0.1300, ppl=1.14, grad_norm=4.03, lr=8.17e-06, throughput=2740 tok/s +2025-11-19 16:55:58,959 - INFO - Epoch 1 Step 3690 (Global: 3690): loss=0.1698, ppl=1.19, grad_norm=7.38, lr=8.16e-06, throughput=2809 tok/s +2025-11-19 16:58:51,893 - INFO - Epoch 1 Step 3700 (Global: 3700): loss=0.1835, ppl=1.20, grad_norm=4.38, lr=8.14e-06, throughput=2776 tok/s +2025-11-19 17:01:43,488 - INFO - Epoch 1 Step 3710 (Global: 3710): loss=0.1625, ppl=1.18, grad_norm=3.08, lr=8.13e-06, throughput=2797 tok/s +2025-11-19 17:04:35,072 - INFO - Epoch 1 Step 3720 (Global: 3720): loss=0.1579, ppl=1.17, grad_norm=3.45, lr=8.12e-06, throughput=2798 tok/s +2025-11-19 17:07:29,882 - INFO - Epoch 1 Step 3730 (Global: 3730): loss=0.1930, ppl=1.21, grad_norm=5.19, lr=8.10e-06, throughput=2746 tok/s +2025-11-19 17:10:23,746 - INFO - Epoch 1 Step 3740 (Global: 3740): loss=0.2120, ppl=1.24, grad_norm=4.47, lr=8.09e-06, throughput=2761 tok/s +2025-11-19 17:13:21,471 - INFO - Epoch 1 Step 3750 (Global: 3750): loss=0.1647, ppl=1.18, grad_norm=3.25, lr=8.08e-06, throughput=2701 tok/s +2025-11-19 17:16:25,203 - INFO - Epoch 1 Step 3760 (Global: 3760): loss=0.1788, ppl=1.20, grad_norm=4.09, lr=8.06e-06, throughput=2613 tok/s +2025-11-19 17:19:18,749 - INFO - Epoch 1 Step 3770 (Global: 3770): loss=0.1680, ppl=1.18, grad_norm=3.89, lr=8.05e-06, throughput=2766 tok/s +2025-11-19 17:22:09,091 - INFO - Epoch 1 Step 3780 (Global: 3780): loss=0.1761, ppl=1.19, grad_norm=4.53, lr=8.04e-06, throughput=2818 tok/s +2025-11-19 17:25:02,254 - INFO - Epoch 1 Step 3790 (Global: 3790): loss=0.1588, ppl=1.17, grad_norm=2.69, lr=8.02e-06, throughput=2772 tok/s +2025-11-19 17:27:53,451 - INFO - Epoch 1 Step 3800 (Global: 3800): loss=0.1790, ppl=1.20, grad_norm=4.22, lr=8.01e-06, throughput=2804 tok/s +2025-11-19 17:30:43,843 - INFO - Epoch 1 Step 3810 (Global: 3810): loss=0.1707, ppl=1.19, grad_norm=3.77, lr=8.00e-06, throughput=2817 tok/s +2025-11-19 17:33:38,875 - INFO - Epoch 1 Step 3820 (Global: 3820): loss=0.1830, ppl=1.20, grad_norm=4.56, lr=7.98e-06, throughput=2742 tok/s +2025-11-19 17:36:30,116 - INFO - Epoch 1 Step 3830 (Global: 3830): loss=0.1737, ppl=1.19, grad_norm=3.25, lr=7.97e-06, throughput=2803 tok/s +2025-11-19 17:39:22,133 - INFO - Epoch 1 Step 3840 (Global: 3840): loss=0.1646, ppl=1.18, grad_norm=3.69, lr=7.96e-06, throughput=2790 tok/s +2025-11-19 17:42:13,672 - INFO - Epoch 1 Step 3850 (Global: 3850): loss=0.2175, ppl=1.24, grad_norm=3.41, lr=7.94e-06, throughput=2798 tok/s +2025-11-19 17:45:02,946 - INFO - Epoch 1 Step 3860 (Global: 3860): loss=0.1455, ppl=1.16, grad_norm=4.44, lr=7.93e-06, throughput=2836 tok/s +2025-11-19 17:47:57,866 - INFO - Epoch 1 Step 3870 (Global: 3870): loss=0.1789, ppl=1.20, grad_norm=4.53, lr=7.92e-06, throughput=2744 tok/s +2025-11-19 17:50:50,190 - INFO - Epoch 1 Step 3880 (Global: 3880): loss=0.1734, ppl=1.19, grad_norm=7.44, lr=7.90e-06, throughput=2785 tok/s +2025-11-19 17:53:39,668 - INFO - Epoch 1 Step 3890 (Global: 3890): loss=0.1933, ppl=1.21, grad_norm=4.62, lr=7.89e-06, throughput=2832 tok/s +2025-11-19 17:56:33,742 - INFO - Epoch 1 Step 3900 (Global: 3900): loss=0.1583, ppl=1.17, grad_norm=3.34, lr=7.88e-06, throughput=2757 tok/s +2025-11-19 17:59:23,995 - INFO - Epoch 1 Step 3910 (Global: 3910): loss=0.2078, ppl=1.23, grad_norm=4.44, lr=7.86e-06, throughput=2819 tok/s +2025-11-19 18:02:14,570 - INFO - Epoch 1 Step 3920 (Global: 3920): loss=0.1648, ppl=1.18, grad_norm=3.75, lr=7.85e-06, throughput=2814 tok/s +2025-11-19 18:05:08,562 - INFO - Epoch 1 Step 3930 (Global: 3930): loss=0.1708, ppl=1.19, grad_norm=4.94, lr=7.83e-06, throughput=2759 tok/s +2025-11-19 18:07:59,607 - INFO - Epoch 1 Step 3940 (Global: 3940): loss=0.1687, ppl=1.18, grad_norm=5.34, lr=7.82e-06, throughput=2806 tok/s +2025-11-19 18:10:53,989 - INFO - Epoch 1 Step 3950 (Global: 3950): loss=0.1729, ppl=1.19, grad_norm=4.50, lr=7.81e-06, throughput=2753 tok/s +2025-11-19 18:13:46,283 - INFO - Epoch 1 Step 3960 (Global: 3960): loss=0.1806, ppl=1.20, grad_norm=3.88, lr=7.79e-06, throughput=2786 tok/s +2025-11-19 18:16:39,805 - INFO - Epoch 1 Step 3970 (Global: 3970): loss=0.1802, ppl=1.20, grad_norm=3.95, lr=7.78e-06, throughput=2766 tok/s +2025-11-19 18:19:38,072 - INFO - Epoch 1 Step 3980 (Global: 3980): loss=0.2068, ppl=1.23, grad_norm=3.72, lr=7.77e-06, throughput=2693 tok/s +2025-11-19 18:22:31,808 - INFO - Epoch 1 Step 3990 (Global: 3990): loss=0.1571, ppl=1.17, grad_norm=4.00, lr=7.75e-06, throughput=2763 tok/s +2025-11-19 18:25:24,380 - INFO - Epoch 1 Step 4000 (Global: 4000): loss=0.1656, ppl=1.18, grad_norm=4.12, lr=7.74e-06, throughput=2781 tok/s +2025-11-19 18:25:24,381 - INFO - +Running validation at step 4000... +2025-11-19 18:35:43,100 - INFO - Validation loss: 0.1742, perplexity: 1.19 +2025-11-19 18:35:43,101 - INFO - Qualitative metrics (n=5): +2025-11-19 18:35:43,101 - INFO - BLEU: 0.8398 +2025-11-19 18:35:43,101 - INFO - METEOR: 0.8965 +2025-11-19 18:35:43,101 - INFO - Edit Distance: 0.1444 +2025-11-19 18:35:43,101 - INFO - F-measure: 0.8981 +2025-11-19 18:35:43,102 - INFO - +====================================================================== +2025-11-19 18:35:43,102 - INFO - Qualitative Evaluation Samples: +2025-11-19 18:35:43,102 - INFO - ====================================================================== +2025-11-19 18:35:43,102 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 18:35:43,102 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 18:35:43,103 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps the album\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s no...' +2025-11-19 18:35:43,103 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 18:35:43,103 - INFO - ---------------------------------------------------------------------- +2025-11-19 18:35:43,103 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 18:35:43,103 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 18:35:43,104 - INFO - Generated: 's, was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army RO...' +2025-11-19 18:35:43,104 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 18:35:43,104 - INFO - ---------------------------------------------------------------------- +2025-11-19 18:35:43,104 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 18:35:43,104 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 18:35:43,105 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beet slops the ax and b...' +2025-11-19 18:35:43,105 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 18:35:43,105 - INFO - ---------------------------------------------------------------------- +2025-11-19 18:35:43,105 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 18:35:43,105 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 18:35:43,106 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 18:35:43,106 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 18:35:43,106 - INFO - ---------------------------------------------------------------------- +2025-11-19 18:35:43,106 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 18:35:43,106 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 18:35:43,107 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxim Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian A...' +2025-11-19 18:35:43,107 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 18:35:43,107 - INFO - ---------------------------------------------------------------------- +2025-11-19 18:35:43,109 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_4000.jsonl +2025-11-19 18:36:29,466 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 18:36:29,491 - INFO - New best validation loss: 0.1742, perplexity: 1.19 +2025-11-19 18:39:24,764 - INFO - Epoch 1 Step 4010 (Global: 4010): loss=0.1670, ppl=1.18, grad_norm=3.30, lr=7.72e-06, throughput=2739 tok/s +2025-11-19 18:42:20,665 - INFO - Epoch 1 Step 4020 (Global: 4020): loss=0.2014, ppl=1.22, grad_norm=4.28, lr=7.71e-06, throughput=2729 tok/s +2025-11-19 18:45:11,461 - INFO - Epoch 1 Step 4030 (Global: 4030): loss=0.1414, ppl=1.15, grad_norm=5.56, lr=7.70e-06, throughput=2810 tok/s +2025-11-19 18:48:01,493 - INFO - Epoch 1 Step 4040 (Global: 4040): loss=0.1766, ppl=1.19, grad_norm=5.06, lr=7.68e-06, throughput=2823 tok/s +2025-11-19 18:50:56,371 - INFO - Epoch 1 Step 4050 (Global: 4050): loss=0.1572, ppl=1.17, grad_norm=7.06, lr=7.67e-06, throughput=2745 tok/s +2025-11-19 18:53:48,270 - INFO - Epoch 1 Step 4060 (Global: 4060): loss=0.2009, ppl=1.22, grad_norm=5.78, lr=7.65e-06, throughput=2792 tok/s +2025-11-19 18:56:48,262 - INFO - Epoch 1 Step 4070 (Global: 4070): loss=0.1667, ppl=1.18, grad_norm=3.86, lr=7.64e-06, throughput=2667 tok/s +2025-11-19 18:59:43,238 - INFO - Epoch 1 Step 4080 (Global: 4080): loss=0.1667, ppl=1.18, grad_norm=4.19, lr=7.62e-06, throughput=2743 tok/s +2025-11-19 19:02:34,966 - INFO - Epoch 1 Step 4090 (Global: 4090): loss=0.1996, ppl=1.22, grad_norm=4.84, lr=7.61e-06, throughput=2795 tok/s +2025-11-19 19:05:25,225 - INFO - Epoch 1 Step 4100 (Global: 4100): loss=0.1641, ppl=1.18, grad_norm=4.12, lr=7.60e-06, throughput=2819 tok/s +2025-11-19 19:08:20,476 - INFO - Epoch 1 Step 4110 (Global: 4110): loss=0.1703, ppl=1.19, grad_norm=3.41, lr=7.58e-06, throughput=2739 tok/s +2025-11-19 19:11:11,180 - INFO - Epoch 1 Step 4120 (Global: 4120): loss=0.1487, ppl=1.16, grad_norm=3.25, lr=7.57e-06, throughput=2812 tok/s +2025-11-19 19:14:03,953 - INFO - Epoch 1 Step 4130 (Global: 4130): loss=0.1880, ppl=1.21, grad_norm=3.91, lr=7.55e-06, throughput=2778 tok/s +2025-11-19 19:16:56,064 - INFO - Epoch 1 Step 4140 (Global: 4140): loss=0.1717, ppl=1.19, grad_norm=4.09, lr=7.54e-06, throughput=2789 tok/s +2025-11-19 19:19:45,633 - INFO - Epoch 1 Step 4150 (Global: 4150): loss=0.1797, ppl=1.20, grad_norm=4.12, lr=7.52e-06, throughput=2831 tok/s +2025-11-19 19:22:39,481 - INFO - Epoch 1 Step 4160 (Global: 4160): loss=0.1661, ppl=1.18, grad_norm=3.45, lr=7.51e-06, throughput=2761 tok/s +2025-11-19 19:25:30,051 - INFO - Epoch 1 Step 4170 (Global: 4170): loss=0.1874, ppl=1.21, grad_norm=3.89, lr=7.49e-06, throughput=2814 tok/s +2025-11-19 19:28:21,337 - INFO - Epoch 1 Step 4180 (Global: 4180): loss=0.1725, ppl=1.19, grad_norm=4.94, lr=7.48e-06, throughput=2802 tok/s +2025-11-19 19:31:15,547 - INFO - Epoch 1 Step 4190 (Global: 4190): loss=0.1535, ppl=1.17, grad_norm=3.91, lr=7.47e-06, throughput=2755 tok/s +2025-11-19 19:34:06,898 - INFO - Epoch 1 Step 4200 (Global: 4200): loss=0.1928, ppl=1.21, grad_norm=4.12, lr=7.45e-06, throughput=2801 tok/s +2025-11-19 19:37:00,993 - INFO - Epoch 1 Step 4210 (Global: 4210): loss=0.1654, ppl=1.18, grad_norm=6.62, lr=7.44e-06, throughput=2757 tok/s +2025-11-19 19:39:53,348 - INFO - Epoch 1 Step 4220 (Global: 4220): loss=0.1633, ppl=1.18, grad_norm=3.81, lr=7.42e-06, throughput=2785 tok/s +2025-11-19 19:42:43,626 - INFO - Epoch 1 Step 4230 (Global: 4230): loss=0.1704, ppl=1.19, grad_norm=4.56, lr=7.41e-06, throughput=2819 tok/s +2025-11-19 19:45:37,641 - INFO - Epoch 1 Step 4240 (Global: 4240): loss=0.1659, ppl=1.18, grad_norm=3.05, lr=7.39e-06, throughput=2758 tok/s +2025-11-19 19:48:29,161 - INFO - Epoch 1 Step 4250 (Global: 4250): loss=0.1751, ppl=1.19, grad_norm=3.94, lr=7.38e-06, throughput=2799 tok/s +2025-11-19 19:51:20,291 - INFO - Epoch 1 Step 4260 (Global: 4260): loss=0.1548, ppl=1.17, grad_norm=3.95, lr=7.36e-06, throughput=2805 tok/s +2025-11-19 19:54:13,699 - INFO - Epoch 1 Step 4270 (Global: 4270): loss=0.1617, ppl=1.18, grad_norm=3.30, lr=7.35e-06, throughput=2768 tok/s +2025-11-19 19:57:04,322 - INFO - Epoch 1 Step 4280 (Global: 4280): loss=0.1427, ppl=1.15, grad_norm=4.22, lr=7.33e-06, throughput=2813 tok/s +2025-11-19 19:59:54,972 - INFO - Epoch 1 Step 4290 (Global: 4290): loss=0.1523, ppl=1.16, grad_norm=3.22, lr=7.32e-06, throughput=2813 tok/s +2025-11-19 20:02:47,184 - INFO - Epoch 1 Step 4300 (Global: 4300): loss=0.1383, ppl=1.15, grad_norm=3.91, lr=7.30e-06, throughput=2787 tok/s +2025-11-19 20:05:38,707 - INFO - Epoch 1 Step 4310 (Global: 4310): loss=0.1364, ppl=1.15, grad_norm=3.41, lr=7.29e-06, throughput=2799 tok/s +2025-11-19 20:08:29,166 - INFO - Epoch 1 Step 4320 (Global: 4320): loss=0.1807, ppl=1.20, grad_norm=3.58, lr=7.27e-06, throughput=2816 tok/s +2025-11-19 20:11:18,391 - INFO - Epoch 1 Step 4330 (Global: 4330): loss=0.1571, ppl=1.17, grad_norm=3.84, lr=7.26e-06, throughput=2836 tok/s +2025-11-19 20:14:06,723 - INFO - Epoch 1 Step 4340 (Global: 4340): loss=0.1535, ppl=1.17, grad_norm=3.72, lr=7.24e-06, throughput=2852 tok/s +2025-11-19 20:17:00,771 - INFO - Epoch 1 Step 4350 (Global: 4350): loss=0.1521, ppl=1.16, grad_norm=4.56, lr=7.23e-06, throughput=2758 tok/s +2025-11-19 20:19:53,644 - INFO - Epoch 1 Step 4360 (Global: 4360): loss=0.1429, ppl=1.15, grad_norm=3.41, lr=7.21e-06, throughput=2777 tok/s +2025-11-19 20:22:43,297 - INFO - Epoch 1 Step 4370 (Global: 4370): loss=0.1731, ppl=1.19, grad_norm=5.28, lr=7.20e-06, throughput=2829 tok/s +2025-11-19 20:25:32,712 - INFO - Epoch 1 Step 4380 (Global: 4380): loss=0.1698, ppl=1.19, grad_norm=6.78, lr=7.18e-06, throughput=2833 tok/s +2025-11-19 20:28:35,511 - INFO - Epoch 1 Step 4390 (Global: 4390): loss=0.1634, ppl=1.18, grad_norm=4.69, lr=7.17e-06, throughput=2626 tok/s +2025-11-19 20:31:25,595 - INFO - Epoch 1 Step 4400 (Global: 4400): loss=0.1463, ppl=1.16, grad_norm=3.53, lr=7.15e-06, throughput=2822 tok/s +2025-11-19 20:34:21,311 - INFO - Epoch 1 Step 4410 (Global: 4410): loss=0.1684, ppl=1.18, grad_norm=3.91, lr=7.14e-06, throughput=2732 tok/s +2025-11-19 20:37:16,576 - INFO - Epoch 1 Step 4420 (Global: 4420): loss=0.1823, ppl=1.20, grad_norm=3.94, lr=7.12e-06, throughput=2739 tok/s +2025-11-19 20:40:09,777 - INFO - Epoch 1 Step 4430 (Global: 4430): loss=0.1552, ppl=1.17, grad_norm=3.83, lr=7.11e-06, throughput=2771 tok/s +2025-11-19 20:43:03,070 - INFO - Epoch 1 Step 4440 (Global: 4440): loss=0.1651, ppl=1.18, grad_norm=6.56, lr=7.09e-06, throughput=2770 tok/s +2025-11-19 20:45:55,036 - INFO - Epoch 1 Step 4450 (Global: 4450): loss=0.1431, ppl=1.15, grad_norm=4.09, lr=7.08e-06, throughput=2791 tok/s +2025-11-19 20:48:47,557 - INFO - Epoch 1 Step 4460 (Global: 4460): loss=0.1710, ppl=1.19, grad_norm=5.38, lr=7.06e-06, throughput=2782 tok/s +2025-11-19 20:51:38,599 - INFO - Epoch 1 Step 4470 (Global: 4470): loss=0.1528, ppl=1.17, grad_norm=3.91, lr=7.05e-06, throughput=2806 tok/s +2025-11-19 20:54:30,357 - INFO - Epoch 1 Step 4480 (Global: 4480): loss=0.1723, ppl=1.19, grad_norm=3.72, lr=7.03e-06, throughput=2795 tok/s +2025-11-19 20:57:19,541 - INFO - Epoch 1 Step 4490 (Global: 4490): loss=0.1587, ppl=1.17, grad_norm=4.88, lr=7.02e-06, throughput=2837 tok/s +2025-11-19 21:00:09,645 - INFO - Epoch 1 Step 4500 (Global: 4500): loss=0.1574, ppl=1.17, grad_norm=4.16, lr=7.00e-06, throughput=2822 tok/s +2025-11-19 21:00:09,646 - INFO - +Running validation at step 4500... +2025-11-19 21:09:55,449 - INFO - Validation loss: 0.1662, perplexity: 1.18 +2025-11-19 21:09:55,449 - INFO - Qualitative metrics (n=5): +2025-11-19 21:09:55,449 - INFO - BLEU: 0.8469 +2025-11-19 21:09:55,449 - INFO - METEOR: 0.9118 +2025-11-19 21:09:55,450 - INFO - Edit Distance: 0.1526 +2025-11-19 21:09:55,450 - INFO - F-measure: 0.9164 +2025-11-19 21:09:55,450 - INFO - +====================================================================== +2025-11-19 21:09:55,450 - INFO - Qualitative Evaluation Samples: +2025-11-19 21:09:55,450 - INFO - ====================================================================== +2025-11-19 21:09:55,450 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 21:09:55,450 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 21:09:55,450 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps the album\'s seemingly illogical suggestion of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s not...' +2025-11-19 21:09:55,450 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 21:09:55,450 - INFO - ---------------------------------------------------------------------- +2025-11-19 21:09:55,450 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 21:09:55,451 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 21:09:55,451 - INFO - Generated: 's, was Sierra Obu–Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of ARGY ROT...' +2025-11-19 21:09:55,451 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 21:09:55,451 - INFO - ---------------------------------------------------------------------- +2025-11-19 21:09:55,451 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 21:09:55,451 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 21:09:55,451 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beet stops the ax and b...' +2025-11-19 21:09:55,451 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 21:09:55,451 - INFO - ---------------------------------------------------------------------- +2025-11-19 21:09:55,452 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 21:09:55,452 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 21:09:55,452 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 21:09:55,452 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 21:09:55,452 - INFO - ---------------------------------------------------------------------- +2025-11-19 21:09:55,452 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 21:09:55,452 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 21:09:55,452 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underw...' +2025-11-19 21:09:55,452 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 21:09:55,453 - INFO - ---------------------------------------------------------------------- +2025-11-19 21:09:55,454 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_4500.jsonl +2025-11-19 21:10:55,273 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 21:10:55,285 - INFO - New best validation loss: 0.1662, perplexity: 1.18 +2025-11-19 21:13:46,420 - INFO - Epoch 1 Step 4510 (Global: 4510): loss=0.1527, ppl=1.16, grad_norm=3.17, lr=6.99e-06, throughput=2805 tok/s +2025-11-19 21:16:35,915 - INFO - Epoch 1 Step 4520 (Global: 4520): loss=0.1550, ppl=1.17, grad_norm=3.12, lr=6.97e-06, throughput=2832 tok/s +2025-11-19 21:19:27,616 - INFO - Epoch 1 Step 4530 (Global: 4530): loss=0.1583, ppl=1.17, grad_norm=3.97, lr=6.96e-06, throughput=2796 tok/s +2025-11-19 21:22:19,887 - INFO - Epoch 1 Step 4540 (Global: 4540): loss=0.1522, ppl=1.16, grad_norm=4.22, lr=6.94e-06, throughput=2786 tok/s +2025-11-19 21:25:14,225 - INFO - Epoch 1 Step 4550 (Global: 4550): loss=0.1695, ppl=1.18, grad_norm=3.53, lr=6.92e-06, throughput=2753 tok/s +2025-11-19 21:28:07,606 - INFO - Epoch 1 Step 4560 (Global: 4560): loss=0.1661, ppl=1.18, grad_norm=9.81, lr=6.91e-06, throughput=2769 tok/s +2025-11-19 21:31:01,219 - INFO - Epoch 1 Step 4570 (Global: 4570): loss=0.1660, ppl=1.18, grad_norm=4.38, lr=6.89e-06, throughput=2765 tok/s +2025-11-19 21:33:56,714 - INFO - Epoch 1 Step 4580 (Global: 4580): loss=0.1504, ppl=1.16, grad_norm=2.72, lr=6.88e-06, throughput=2735 tok/s +2025-11-19 21:36:50,890 - INFO - Epoch 1 Step 4590 (Global: 4590): loss=0.1498, ppl=1.16, grad_norm=3.78, lr=6.86e-06, throughput=2756 tok/s +2025-11-19 21:39:42,304 - INFO - Epoch 1 Step 4600 (Global: 4600): loss=0.1584, ppl=1.17, grad_norm=4.69, lr=6.85e-06, throughput=2800 tok/s +2025-11-19 21:42:33,380 - INFO - Epoch 1 Step 4610 (Global: 4610): loss=0.1874, ppl=1.21, grad_norm=3.22, lr=6.83e-06, throughput=2806 tok/s +2025-11-19 21:45:24,004 - INFO - Epoch 1 Step 4620 (Global: 4620): loss=0.1760, ppl=1.19, grad_norm=4.91, lr=6.82e-06, throughput=2813 tok/s +2025-11-19 21:48:13,763 - INFO - Epoch 1 Step 4630 (Global: 4630): loss=0.1595, ppl=1.17, grad_norm=4.53, lr=6.80e-06, throughput=2828 tok/s +2025-11-19 21:51:06,575 - INFO - Epoch 1 Step 4640 (Global: 4640): loss=0.1835, ppl=1.20, grad_norm=3.48, lr=6.78e-06, throughput=2778 tok/s +2025-11-19 21:53:57,597 - INFO - Epoch 1 Step 4650 (Global: 4650): loss=0.1734, ppl=1.19, grad_norm=4.06, lr=6.77e-06, throughput=2807 tok/s +2025-11-19 21:56:49,856 - INFO - Epoch 1 Step 4660 (Global: 4660): loss=0.1491, ppl=1.16, grad_norm=3.23, lr=6.75e-06, throughput=2787 tok/s +2025-11-19 21:59:50,529 - INFO - Epoch 1 Step 4670 (Global: 4670): loss=0.1661, ppl=1.18, grad_norm=3.86, lr=6.74e-06, throughput=2657 tok/s +2025-11-19 22:02:41,950 - INFO - Epoch 1 Step 4680 (Global: 4680): loss=0.1400, ppl=1.15, grad_norm=3.34, lr=6.72e-06, throughput=2800 tok/s +2025-11-19 22:05:34,972 - INFO - Epoch 1 Step 4690 (Global: 4690): loss=0.1454, ppl=1.16, grad_norm=4.09, lr=6.71e-06, throughput=2774 tok/s +2025-11-19 22:08:27,992 - INFO - Epoch 1 Step 4700 (Global: 4700): loss=0.1567, ppl=1.17, grad_norm=5.22, lr=6.69e-06, throughput=2774 tok/s +2025-11-19 22:11:20,349 - INFO - Epoch 1 Step 4710 (Global: 4710): loss=0.1522, ppl=1.16, grad_norm=4.94, lr=6.67e-06, throughput=2785 tok/s +2025-11-19 22:14:13,480 - INFO - Epoch 1 Step 4720 (Global: 4720): loss=0.1674, ppl=1.18, grad_norm=5.50, lr=6.66e-06, throughput=2773 tok/s +2025-11-19 22:17:05,162 - INFO - Epoch 1 Step 4730 (Global: 4730): loss=0.1600, ppl=1.17, grad_norm=3.27, lr=6.64e-06, throughput=2796 tok/s +2025-11-19 22:19:57,663 - INFO - Epoch 1 Step 4740 (Global: 4740): loss=0.1882, ppl=1.21, grad_norm=3.89, lr=6.63e-06, throughput=2783 tok/s +2025-11-19 22:22:50,254 - INFO - Epoch 1 Step 4750 (Global: 4750): loss=0.1615, ppl=1.18, grad_norm=4.16, lr=6.61e-06, throughput=2781 tok/s +2025-11-19 22:25:43,832 - INFO - Epoch 1 Step 4760 (Global: 4760): loss=0.1643, ppl=1.18, grad_norm=4.28, lr=6.60e-06, throughput=2765 tok/s +2025-11-19 22:28:35,200 - INFO - Epoch 1 Step 4770 (Global: 4770): loss=0.1622, ppl=1.18, grad_norm=5.22, lr=6.58e-06, throughput=2801 tok/s +2025-11-19 22:31:25,583 - INFO - Epoch 1 Step 4780 (Global: 4780): loss=0.1599, ppl=1.17, grad_norm=3.95, lr=6.56e-06, throughput=2817 tok/s +2025-11-19 22:34:17,728 - INFO - Epoch 1 Step 4790 (Global: 4790): loss=0.1335, ppl=1.14, grad_norm=2.94, lr=6.55e-06, throughput=2788 tok/s +2025-11-19 22:37:10,191 - INFO - Epoch 1 Step 4800 (Global: 4800): loss=0.1831, ppl=1.20, grad_norm=3.06, lr=6.53e-06, throughput=2783 tok/s +2025-11-19 22:40:07,975 - INFO - Epoch 1 Step 4810 (Global: 4810): loss=0.1820, ppl=1.20, grad_norm=5.03, lr=6.52e-06, throughput=2700 tok/s +2025-11-19 22:43:09,696 - INFO - Epoch 1 Step 4820 (Global: 4820): loss=0.1635, ppl=1.18, grad_norm=4.19, lr=6.50e-06, throughput=2641 tok/s +2025-11-19 22:46:03,688 - INFO - Epoch 1 Step 4830 (Global: 4830): loss=0.1611, ppl=1.17, grad_norm=3.59, lr=6.48e-06, throughput=2759 tok/s +2025-11-19 22:48:58,395 - INFO - Epoch 1 Step 4840 (Global: 4840): loss=0.1576, ppl=1.17, grad_norm=2.98, lr=6.47e-06, throughput=2748 tok/s +2025-11-19 22:51:51,545 - INFO - Epoch 1 Step 4850 (Global: 4850): loss=0.1535, ppl=1.17, grad_norm=6.88, lr=6.45e-06, throughput=2772 tok/s +2025-11-19 22:54:43,842 - INFO - Epoch 1 Step 4860 (Global: 4860): loss=0.1664, ppl=1.18, grad_norm=4.22, lr=6.44e-06, throughput=2786 tok/s +2025-11-19 22:57:35,087 - INFO - Epoch 1 Step 4870 (Global: 4870): loss=0.1632, ppl=1.18, grad_norm=4.19, lr=6.42e-06, throughput=2803 tok/s +2025-11-19 23:00:29,193 - INFO - Epoch 1 Step 4880 (Global: 4880): loss=0.1319, ppl=1.14, grad_norm=4.88, lr=6.40e-06, throughput=2757 tok/s +2025-11-19 23:03:25,501 - INFO - Epoch 1 Step 4890 (Global: 4890): loss=0.1364, ppl=1.15, grad_norm=3.77, lr=6.39e-06, throughput=2723 tok/s +2025-11-19 23:06:21,948 - INFO - Epoch 1 Step 4900 (Global: 4900): loss=0.1599, ppl=1.17, grad_norm=4.84, lr=6.37e-06, throughput=2720 tok/s +2025-11-19 23:09:17,314 - INFO - Epoch 1 Step 4910 (Global: 4910): loss=0.1438, ppl=1.15, grad_norm=3.42, lr=6.35e-06, throughput=2737 tok/s +2025-11-19 23:12:10,711 - INFO - Epoch 1 Step 4920 (Global: 4920): loss=0.1624, ppl=1.18, grad_norm=3.14, lr=6.34e-06, throughput=2768 tok/s +2025-11-19 23:15:04,275 - INFO - Epoch 1 Step 4930 (Global: 4930): loss=0.1517, ppl=1.16, grad_norm=3.73, lr=6.32e-06, throughput=2766 tok/s +2025-11-19 23:17:57,628 - INFO - Epoch 1 Step 4940 (Global: 4940): loss=0.1381, ppl=1.15, grad_norm=3.53, lr=6.31e-06, throughput=2769 tok/s +2025-11-19 23:20:50,553 - INFO - Epoch 1 Step 4950 (Global: 4950): loss=0.2048, ppl=1.23, grad_norm=4.00, lr=6.29e-06, throughput=2776 tok/s +2025-11-19 23:23:43,485 - INFO - Epoch 1 Step 4960 (Global: 4960): loss=0.1436, ppl=1.15, grad_norm=7.03, lr=6.27e-06, throughput=2776 tok/s +2025-11-19 23:26:36,548 - INFO - Epoch 1 Step 4970 (Global: 4970): loss=0.1391, ppl=1.15, grad_norm=3.34, lr=6.26e-06, throughput=2774 tok/s +2025-11-19 23:29:28,950 - INFO - Epoch 1 Step 4980 (Global: 4980): loss=0.1430, ppl=1.15, grad_norm=3.50, lr=6.24e-06, throughput=2784 tok/s +2025-11-19 23:32:22,263 - INFO - Epoch 1 Step 4990 (Global: 4990): loss=0.1730, ppl=1.19, grad_norm=4.47, lr=6.23e-06, throughput=2770 tok/s +2025-11-19 23:35:14,333 - INFO - Epoch 1 Step 5000 (Global: 5000): loss=0.1654, ppl=1.18, grad_norm=3.47, lr=6.21e-06, throughput=2790 tok/s +2025-11-19 23:35:14,333 - INFO - +Running validation at step 5000... +2025-11-19 23:45:09,591 - INFO - Validation loss: 0.1556, perplexity: 1.17 +2025-11-19 23:45:09,592 - INFO - Qualitative metrics (n=5): +2025-11-19 23:45:09,592 - INFO - BLEU: 0.8320 +2025-11-19 23:45:09,592 - INFO - METEOR: 0.8916 +2025-11-19 23:45:09,592 - INFO - Edit Distance: 0.1388 +2025-11-19 23:45:09,592 - INFO - F-measure: 0.8971 +2025-11-19 23:45:09,592 - INFO - +====================================================================== +2025-11-19 23:45:09,593 - INFO - Qualitative Evaluation Samples: +2025-11-19 23:45:09,593 - INFO - ====================================================================== +2025-11-19 23:45:09,593 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-19 23:45:09,593 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-19 23:45:09,593 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s seemingly] illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 23:45:09,593 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-19 23:45:09,594 - INFO - ---------------------------------------------------------------------- +2025-11-19 23:45:09,594 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-19 23:45:09,594 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-19 23:45:09,594 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of ARoy ROT...' +2025-11-19 23:45:09,594 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-19 23:45:09,595 - INFO - ---------------------------------------------------------------------- +2025-11-19 23:45:09,595 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-19 23:45:09,595 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-19 23:45:09,595 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beet stops the ax and b...' +2025-11-19 23:45:09,595 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-19 23:45:09,596 - INFO - ---------------------------------------------------------------------- +2025-11-19 23:45:09,596 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-19 23:45:09,596 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-19 23:45:09,596 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 23:45:09,596 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-19 23:45:09,596 - INFO - ---------------------------------------------------------------------- +2025-11-19 23:45:09,597 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-19 23:45:09,597 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-19 23:45:09,597 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian Abyss ...' +2025-11-19 23:45:09,597 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-19 23:45:09,597 - INFO - ---------------------------------------------------------------------- +2025-11-19 23:45:09,598 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_5000.jsonl +2025-11-19 23:46:02,193 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-19 23:46:02,216 - INFO - New best validation loss: 0.1556, perplexity: 1.17 +2025-11-19 23:48:55,757 - INFO - Epoch 1 Step 5010 (Global: 5010): loss=0.1760, ppl=1.19, grad_norm=3.98, lr=6.19e-06, throughput=2766 tok/s +2025-11-19 23:51:48,815 - INFO - Epoch 1 Step 5020 (Global: 5020): loss=0.1614, ppl=1.18, grad_norm=5.00, lr=6.18e-06, throughput=2774 tok/s +2025-11-19 23:54:51,804 - INFO - Epoch 1 Step 5030 (Global: 5030): loss=0.1345, ppl=1.14, grad_norm=2.89, lr=6.16e-06, throughput=2623 tok/s +2025-11-19 23:57:43,853 - INFO - Epoch 1 Step 5040 (Global: 5040): loss=0.1401, ppl=1.15, grad_norm=4.38, lr=6.14e-06, throughput=2790 tok/s +2025-11-20 00:00:37,604 - INFO - Epoch 1 Step 5050 (Global: 5050): loss=0.1844, ppl=1.20, grad_norm=4.88, lr=6.13e-06, throughput=2763 tok/s +2025-11-20 00:03:29,656 - INFO - Epoch 1 Step 5060 (Global: 5060): loss=0.1304, ppl=1.14, grad_norm=2.84, lr=6.11e-06, throughput=2790 tok/s +2025-11-20 00:06:21,949 - INFO - Epoch 1 Step 5070 (Global: 5070): loss=0.1537, ppl=1.17, grad_norm=2.88, lr=6.10e-06, throughput=2786 tok/s +2025-11-20 00:09:12,607 - INFO - Epoch 1 Step 5080 (Global: 5080): loss=0.1888, ppl=1.21, grad_norm=4.12, lr=6.08e-06, throughput=2813 tok/s +2025-11-20 00:12:04,026 - INFO - Epoch 1 Step 5090 (Global: 5090): loss=0.1780, ppl=1.19, grad_norm=4.47, lr=6.06e-06, throughput=2800 tok/s +2025-11-20 00:14:54,739 - INFO - Epoch 1 Step 5100 (Global: 5100): loss=0.1339, ppl=1.14, grad_norm=3.41, lr=6.05e-06, throughput=2812 tok/s +2025-11-20 00:17:46,077 - INFO - Epoch 1 Step 5110 (Global: 5110): loss=0.1455, ppl=1.16, grad_norm=4.09, lr=6.03e-06, throughput=2802 tok/s +2025-11-20 00:20:37,548 - INFO - Epoch 1 Step 5120 (Global: 5120): loss=0.1771, ppl=1.19, grad_norm=6.75, lr=6.01e-06, throughput=2799 tok/s +2025-11-20 00:23:29,554 - INFO - Epoch 1 Step 5130 (Global: 5130): loss=0.1594, ppl=1.17, grad_norm=7.19, lr=6.00e-06, throughput=2791 tok/s +2025-11-20 00:26:23,211 - INFO - Epoch 1 Step 5140 (Global: 5140): loss=0.1535, ppl=1.17, grad_norm=4.34, lr=5.98e-06, throughput=2764 tok/s +2025-11-20 00:29:16,287 - INFO - Epoch 1 Step 5150 (Global: 5150): loss=0.1495, ppl=1.16, grad_norm=4.25, lr=5.96e-06, throughput=2773 tok/s +2025-11-20 00:32:08,300 - INFO - Epoch 1 Step 5160 (Global: 5160): loss=0.1652, ppl=1.18, grad_norm=4.09, lr=5.95e-06, throughput=2791 tok/s +2025-11-20 00:35:02,004 - INFO - Epoch 1 Step 5170 (Global: 5170): loss=0.1649, ppl=1.18, grad_norm=4.19, lr=5.93e-06, throughput=2763 tok/s +2025-11-20 00:37:54,170 - INFO - Epoch 1 Step 5180 (Global: 5180): loss=0.1517, ppl=1.16, grad_norm=3.69, lr=5.91e-06, throughput=2788 tok/s +2025-11-20 00:40:47,269 - INFO - Epoch 1 Step 5190 (Global: 5190): loss=0.1444, ppl=1.16, grad_norm=5.19, lr=5.90e-06, throughput=2773 tok/s +2025-11-20 00:43:40,503 - INFO - Epoch 1 Step 5200 (Global: 5200): loss=0.1533, ppl=1.17, grad_norm=4.09, lr=5.88e-06, throughput=2771 tok/s +2025-11-20 00:46:33,281 - INFO - Epoch 1 Step 5210 (Global: 5210): loss=0.1766, ppl=1.19, grad_norm=5.81, lr=5.87e-06, throughput=2778 tok/s +2025-11-20 00:49:26,853 - INFO - Epoch 1 Step 5220 (Global: 5220): loss=0.1585, ppl=1.17, grad_norm=4.03, lr=5.85e-06, throughput=2765 tok/s +2025-11-20 00:52:18,643 - INFO - Epoch 1 Step 5230 (Global: 5230): loss=0.1485, ppl=1.16, grad_norm=4.34, lr=5.83e-06, throughput=2794 tok/s +2025-11-20 00:55:11,177 - INFO - Epoch 1 Step 5240 (Global: 5240): loss=0.1428, ppl=1.15, grad_norm=3.50, lr=5.82e-06, throughput=2782 tok/s +2025-11-20 00:58:02,780 - INFO - Epoch 1 Step 5250 (Global: 5250): loss=0.1514, ppl=1.16, grad_norm=3.22, lr=5.80e-06, throughput=2797 tok/s +2025-11-20 01:00:54,664 - INFO - Epoch 1 Step 5260 (Global: 5260): loss=0.1561, ppl=1.17, grad_norm=3.80, lr=5.78e-06, throughput=2793 tok/s +2025-11-20 01:03:50,481 - INFO - Epoch 1 Step 5270 (Global: 5270): loss=0.1537, ppl=1.17, grad_norm=4.81, lr=5.77e-06, throughput=2730 tok/s +2025-11-20 01:06:44,241 - INFO - Epoch 1 Step 5280 (Global: 5280): loss=0.1606, ppl=1.17, grad_norm=5.00, lr=5.75e-06, throughput=2762 tok/s +2025-11-20 01:09:37,648 - INFO - Epoch 1 Step 5290 (Global: 5290): loss=0.1287, ppl=1.14, grad_norm=3.80, lr=5.73e-06, throughput=2768 tok/s +2025-11-20 01:12:33,456 - INFO - Epoch 1 Step 5300 (Global: 5300): loss=0.1393, ppl=1.15, grad_norm=2.47, lr=5.72e-06, throughput=2730 tok/s +2025-11-20 01:15:27,926 - INFO - Epoch 1 Step 5310 (Global: 5310): loss=0.1433, ppl=1.15, grad_norm=2.91, lr=5.70e-06, throughput=2751 tok/s +2025-11-20 01:18:23,041 - INFO - Epoch 1 Step 5320 (Global: 5320): loss=0.1610, ppl=1.17, grad_norm=7.53, lr=5.68e-06, throughput=2741 tok/s +2025-11-20 01:21:15,299 - INFO - Epoch 1 Step 5330 (Global: 5330): loss=0.1601, ppl=1.17, grad_norm=3.94, lr=5.67e-06, throughput=2787 tok/s +2025-11-20 01:24:07,009 - INFO - Epoch 1 Step 5340 (Global: 5340): loss=0.1385, ppl=1.15, grad_norm=3.02, lr=5.65e-06, throughput=2795 tok/s +2025-11-20 01:27:01,444 - INFO - Epoch 1 Step 5350 (Global: 5350): loss=0.1474, ppl=1.16, grad_norm=5.62, lr=5.63e-06, throughput=2752 tok/s +2025-11-20 01:29:54,687 - INFO - Epoch 1 Step 5360 (Global: 5360): loss=0.1521, ppl=1.16, grad_norm=3.69, lr=5.62e-06, throughput=2771 tok/s +2025-11-20 01:32:49,762 - INFO - Epoch 1 Step 5370 (Global: 5370): loss=0.1631, ppl=1.18, grad_norm=3.89, lr=5.60e-06, throughput=2742 tok/s +2025-11-20 01:35:43,002 - INFO - Epoch 1 Step 5380 (Global: 5380): loss=0.1690, ppl=1.18, grad_norm=3.00, lr=5.58e-06, throughput=2771 tok/s +2025-11-20 01:38:35,719 - INFO - Epoch 1 Step 5390 (Global: 5390): loss=0.1507, ppl=1.16, grad_norm=3.53, lr=5.57e-06, throughput=2779 tok/s +2025-11-20 01:41:29,263 - INFO - Epoch 1 Step 5400 (Global: 5400): loss=0.1220, ppl=1.13, grad_norm=3.05, lr=5.55e-06, throughput=2766 tok/s +2025-11-20 01:44:22,171 - INFO - Epoch 1 Step 5410 (Global: 5410): loss=0.1439, ppl=1.15, grad_norm=3.03, lr=5.53e-06, throughput=2776 tok/s +2025-11-20 01:47:15,013 - INFO - Epoch 1 Step 5420 (Global: 5420): loss=0.1472, ppl=1.16, grad_norm=4.59, lr=5.52e-06, throughput=2777 tok/s +2025-11-20 01:50:05,651 - INFO - Epoch 1 Step 5430 (Global: 5430): loss=0.1602, ppl=1.17, grad_norm=4.47, lr=5.50e-06, throughput=2813 tok/s +2025-11-20 01:52:56,436 - INFO - Epoch 1 Step 5440 (Global: 5440): loss=0.1647, ppl=1.18, grad_norm=4.25, lr=5.48e-06, throughput=2811 tok/s +2025-11-20 01:55:46,571 - INFO - Epoch 1 Step 5450 (Global: 5450): loss=0.1704, ppl=1.19, grad_norm=3.33, lr=5.47e-06, throughput=2821 tok/s +2025-11-20 01:58:37,200 - INFO - Epoch 1 Step 5460 (Global: 5460): loss=0.1635, ppl=1.18, grad_norm=3.36, lr=5.45e-06, throughput=2813 tok/s +2025-11-20 02:01:27,975 - INFO - Epoch 1 Step 5470 (Global: 5470): loss=0.1505, ppl=1.16, grad_norm=3.97, lr=5.43e-06, throughput=2811 tok/s +2025-11-20 02:04:19,043 - INFO - Epoch 1 Step 5480 (Global: 5480): loss=0.1369, ppl=1.15, grad_norm=4.19, lr=5.42e-06, throughput=2806 tok/s +2025-11-20 02:07:09,542 - INFO - Epoch 1 Step 5490 (Global: 5490): loss=0.1523, ppl=1.16, grad_norm=4.31, lr=5.40e-06, throughput=2815 tok/s +2025-11-20 02:10:00,275 - INFO - Epoch 1 Step 5500 (Global: 5500): loss=0.1504, ppl=1.16, grad_norm=3.12, lr=5.38e-06, throughput=2811 tok/s +2025-11-20 02:10:00,276 - INFO - +Running validation at step 5500... +2025-11-20 02:20:06,250 - INFO - Validation loss: 0.1494, perplexity: 1.16 +2025-11-20 02:20:06,251 - INFO - Qualitative metrics (n=5): +2025-11-20 02:20:06,251 - INFO - BLEU: 0.8582 +2025-11-20 02:20:06,251 - INFO - METEOR: 0.9152 +2025-11-20 02:20:06,251 - INFO - Edit Distance: 0.1310 +2025-11-20 02:20:06,251 - INFO - F-measure: 0.9197 +2025-11-20 02:20:06,252 - INFO - +====================================================================== +2025-11-20 02:20:06,252 - INFO - Qualitative Evaluation Samples: +2025-11-20 02:20:06,252 - INFO - ====================================================================== +2025-11-20 02:20:06,252 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-20 02:20:06,252 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-20 02:20:06,253 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s a so-wyure. But it\'s n...' +2025-11-20 02:20:06,253 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-20 02:20:06,253 - INFO - ---------------------------------------------------------------------- +2025-11-20 02:20:06,253 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-20 02:20:06,253 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-20 02:20:06,253 - INFO - Generated: 's, was Simone Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army RO...' +2025-11-20 02:20:06,253 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-20 02:20:06,253 - INFO - ---------------------------------------------------------------------- +2025-11-20 02:20:06,254 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-20 02:20:06,254 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-20 02:20:06,254 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-20 02:20:06,254 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-20 02:20:06,255 - INFO - ---------------------------------------------------------------------- +2025-11-20 02:20:06,255 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-20 02:20:06,255 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-20 02:20:06,255 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-20 02:20:06,255 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-20 02:20:06,255 - INFO - ---------------------------------------------------------------------- +2025-11-20 02:20:06,255 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-20 02:20:06,256 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-20 02:20:06,256 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian Abyss ...' +2025-11-20 02:20:06,256 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-20 02:20:06,256 - INFO - ---------------------------------------------------------------------- +2025-11-20 02:20:06,257 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_5500.jsonl +2025-11-20 02:21:00,886 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 02:21:00,912 - INFO - New best validation loss: 0.1494, perplexity: 1.16 +2025-11-20 02:23:51,672 - INFO - Epoch 1 Step 5510 (Global: 5510): loss=0.1483, ppl=1.16, grad_norm=5.75, lr=5.37e-06, throughput=2811 tok/s +2025-11-20 02:26:42,338 - INFO - Epoch 1 Step 5520 (Global: 5520): loss=0.1459, ppl=1.16, grad_norm=3.47, lr=5.35e-06, throughput=2813 tok/s +2025-11-20 02:29:32,834 - INFO - Epoch 1 Step 5530 (Global: 5530): loss=0.1540, ppl=1.17, grad_norm=3.98, lr=5.33e-06, throughput=2815 tok/s +2025-11-20 02:32:22,696 - INFO - Epoch 1 Step 5540 (Global: 5540): loss=0.1381, ppl=1.15, grad_norm=3.39, lr=5.32e-06, throughput=2826 tok/s +2025-11-20 02:35:23,323 - INFO - Epoch 1 Step 5550 (Global: 5550): loss=0.1552, ppl=1.17, grad_norm=2.92, lr=5.30e-06, throughput=2657 tok/s +2025-11-20 02:38:13,452 - INFO - Epoch 1 Step 5560 (Global: 5560): loss=0.1333, ppl=1.14, grad_norm=3.09, lr=5.28e-06, throughput=2821 tok/s +2025-11-20 02:41:02,731 - INFO - Epoch 1 Step 5570 (Global: 5570): loss=0.1515, ppl=1.16, grad_norm=5.28, lr=5.27e-06, throughput=2836 tok/s +2025-11-20 02:43:51,903 - INFO - Epoch 1 Step 5580 (Global: 5580): loss=0.1513, ppl=1.16, grad_norm=4.47, lr=5.25e-06, throughput=2837 tok/s +2025-11-20 02:46:42,345 - INFO - Epoch 1 Step 5590 (Global: 5590): loss=0.1858, ppl=1.20, grad_norm=4.53, lr=5.23e-06, throughput=2816 tok/s +2025-11-20 02:49:32,303 - INFO - Epoch 1 Step 5600 (Global: 5600): loss=0.1618, ppl=1.18, grad_norm=3.70, lr=5.22e-06, throughput=2824 tok/s +2025-11-20 02:52:22,456 - INFO - Epoch 1 Step 5610 (Global: 5610): loss=0.1327, ppl=1.14, grad_norm=3.28, lr=5.20e-06, throughput=2821 tok/s +2025-11-20 02:55:12,615 - INFO - Epoch 1 Step 5620 (Global: 5620): loss=0.1746, ppl=1.19, grad_norm=3.28, lr=5.18e-06, throughput=2821 tok/s +2025-11-20 02:58:02,151 - INFO - Epoch 1 Step 5630 (Global: 5630): loss=0.1214, ppl=1.13, grad_norm=3.02, lr=5.17e-06, throughput=2831 tok/s +2025-11-20 03:00:52,707 - INFO - Epoch 1 Step 5640 (Global: 5640): loss=0.1285, ppl=1.14, grad_norm=3.52, lr=5.15e-06, throughput=2814 tok/s +2025-11-20 03:03:44,093 - INFO - Epoch 1 Step 5650 (Global: 5650): loss=0.1479, ppl=1.16, grad_norm=3.78, lr=5.13e-06, throughput=2801 tok/s +2025-11-20 03:06:34,715 - INFO - Epoch 1 Step 5660 (Global: 5660): loss=0.1589, ppl=1.17, grad_norm=3.50, lr=5.12e-06, throughput=2813 tok/s +2025-11-20 03:09:24,751 - INFO - Epoch 1 Step 5670 (Global: 5670): loss=0.1448, ppl=1.16, grad_norm=4.00, lr=5.10e-06, throughput=2823 tok/s +2025-11-20 03:12:15,495 - INFO - Epoch 1 Step 5680 (Global: 5680): loss=0.1526, ppl=1.16, grad_norm=4.25, lr=5.08e-06, throughput=2811 tok/s +2025-11-20 03:15:06,613 - INFO - Epoch 1 Step 5690 (Global: 5690): loss=0.1249, ppl=1.13, grad_norm=3.28, lr=5.07e-06, throughput=2805 tok/s +2025-11-20 03:17:57,685 - INFO - Epoch 1 Step 5700 (Global: 5700): loss=0.1355, ppl=1.15, grad_norm=2.86, lr=5.05e-06, throughput=2806 tok/s +2025-11-20 03:20:48,668 - INFO - Epoch 1 Step 5710 (Global: 5710): loss=0.1227, ppl=1.13, grad_norm=4.31, lr=5.03e-06, throughput=2807 tok/s +2025-11-20 03:23:39,009 - INFO - Epoch 1 Step 5720 (Global: 5720): loss=0.1382, ppl=1.15, grad_norm=3.52, lr=5.02e-06, throughput=2818 tok/s +2025-11-20 03:26:28,430 - INFO - Epoch 1 Step 5730 (Global: 5730): loss=0.1416, ppl=1.15, grad_norm=2.91, lr=5.00e-06, throughput=2833 tok/s +2025-11-20 03:29:20,121 - INFO - Epoch 1 Step 5740 (Global: 5740): loss=0.1487, ppl=1.16, grad_norm=4.53, lr=4.98e-06, throughput=2796 tok/s +2025-11-20 03:32:10,424 - INFO - Epoch 1 Step 5750 (Global: 5750): loss=0.1371, ppl=1.15, grad_norm=5.12, lr=4.96e-06, throughput=2819 tok/s +2025-11-20 03:34:59,551 - INFO - Epoch 1 Step 5760 (Global: 5760): loss=0.1423, ppl=1.15, grad_norm=5.41, lr=4.95e-06, throughput=2838 tok/s +2025-11-20 03:37:49,840 - INFO - Epoch 1 Step 5770 (Global: 5770): loss=0.1384, ppl=1.15, grad_norm=2.86, lr=4.93e-06, throughput=2819 tok/s +2025-11-20 03:40:41,259 - INFO - Epoch 1 Step 5780 (Global: 5780): loss=0.1387, ppl=1.15, grad_norm=5.56, lr=4.91e-06, throughput=2800 tok/s +2025-11-20 03:43:31,689 - INFO - Epoch 1 Step 5790 (Global: 5790): loss=0.1270, ppl=1.14, grad_norm=3.11, lr=4.90e-06, throughput=2816 tok/s +2025-11-20 03:46:20,862 - INFO - Epoch 1 Step 5800 (Global: 5800): loss=0.1471, ppl=1.16, grad_norm=3.38, lr=4.88e-06, throughput=2837 tok/s +2025-11-20 03:49:12,132 - INFO - Epoch 1 Step 5810 (Global: 5810): loss=0.1473, ppl=1.16, grad_norm=3.28, lr=4.86e-06, throughput=2803 tok/s +2025-11-20 03:52:00,628 - INFO - Epoch 1 Step 5820 (Global: 5820): loss=0.1241, ppl=1.13, grad_norm=3.89, lr=4.85e-06, throughput=2849 tok/s +2025-11-20 03:54:48,118 - INFO - Epoch 1 Step 5830 (Global: 5830): loss=0.1532, ppl=1.17, grad_norm=3.47, lr=4.83e-06, throughput=2866 tok/s +2025-11-20 03:57:36,017 - INFO - Epoch 1 Step 5840 (Global: 5840): loss=0.1418, ppl=1.15, grad_norm=3.28, lr=4.81e-06, throughput=2859 tok/s +2025-11-20 04:00:24,405 - INFO - Epoch 1 Step 5850 (Global: 5850): loss=0.1423, ppl=1.15, grad_norm=4.12, lr=4.80e-06, throughput=2851 tok/s +2025-11-20 04:03:13,576 - INFO - Epoch 1 Step 5860 (Global: 5860): loss=0.1704, ppl=1.19, grad_norm=3.97, lr=4.78e-06, throughput=2837 tok/s +2025-11-20 04:06:02,795 - INFO - Epoch 1 Step 5870 (Global: 5870): loss=0.1473, ppl=1.16, grad_norm=2.81, lr=4.76e-06, throughput=2837 tok/s +2025-11-20 04:08:51,716 - INFO - Epoch 1 Step 5880 (Global: 5880): loss=0.1293, ppl=1.14, grad_norm=3.91, lr=4.75e-06, throughput=2842 tok/s +2025-11-20 04:11:41,200 - INFO - Epoch 1 Step 5890 (Global: 5890): loss=0.1685, ppl=1.18, grad_norm=4.31, lr=4.73e-06, throughput=2832 tok/s +2025-11-20 04:14:30,666 - INFO - Epoch 1 Step 5900 (Global: 5900): loss=0.1774, ppl=1.19, grad_norm=5.44, lr=4.71e-06, throughput=2832 tok/s +2025-11-20 04:17:19,720 - INFO - Epoch 1 Step 5910 (Global: 5910): loss=0.1683, ppl=1.18, grad_norm=3.81, lr=4.70e-06, throughput=2839 tok/s +2025-11-20 04:20:08,256 - INFO - Epoch 1 Step 5920 (Global: 5920): loss=0.1406, ppl=1.15, grad_norm=3.06, lr=4.68e-06, throughput=2848 tok/s +2025-11-20 04:22:58,174 - INFO - Epoch 1 Step 5930 (Global: 5930): loss=0.1344, ppl=1.14, grad_norm=4.00, lr=4.66e-06, throughput=2825 tok/s +2025-11-20 04:25:47,368 - INFO - Epoch 1 Step 5940 (Global: 5940): loss=0.1337, ppl=1.14, grad_norm=3.02, lr=4.65e-06, throughput=2837 tok/s +2025-11-20 04:28:36,556 - INFO - Epoch 1 Step 5950 (Global: 5950): loss=0.1495, ppl=1.16, grad_norm=4.50, lr=4.63e-06, throughput=2837 tok/s +2025-11-20 04:31:23,580 - INFO - Epoch 1 Step 5960 (Global: 5960): loss=0.1171, ppl=1.12, grad_norm=2.88, lr=4.61e-06, throughput=2874 tok/s +2025-11-20 04:34:12,535 - INFO - Epoch 1 Step 5970 (Global: 5970): loss=0.1384, ppl=1.15, grad_norm=3.98, lr=4.60e-06, throughput=2841 tok/s +2025-11-20 04:37:02,358 - INFO - Epoch 1 Step 5980 (Global: 5980): loss=0.1362, ppl=1.15, grad_norm=3.28, lr=4.58e-06, throughput=2826 tok/s +2025-11-20 04:39:50,638 - INFO - Epoch 1 Step 5990 (Global: 5990): loss=0.1454, ppl=1.16, grad_norm=3.38, lr=4.56e-06, throughput=2852 tok/s +2025-11-20 04:42:39,557 - INFO - Epoch 1 Step 6000 (Global: 6000): loss=0.1508, ppl=1.16, grad_norm=3.45, lr=4.55e-06, throughput=2842 tok/s +2025-11-20 04:42:39,557 - INFO - +Running validation at step 6000... +2025-11-20 04:52:16,472 - INFO - Validation loss: 0.1451, perplexity: 1.16 +2025-11-20 04:52:16,473 - INFO - Qualitative metrics (n=5): +2025-11-20 04:52:16,473 - INFO - BLEU: 0.8322 +2025-11-20 04:52:16,473 - INFO - METEOR: 0.9035 +2025-11-20 04:52:16,473 - INFO - Edit Distance: 0.1404 +2025-11-20 04:52:16,473 - INFO - F-measure: 0.9059 +2025-11-20 04:52:16,473 - INFO - +====================================================================== +2025-11-20 04:52:16,473 - INFO - Qualitative Evaluation Samples: +2025-11-20 04:52:16,473 - INFO - ====================================================================== +2025-11-20 04:52:16,474 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-20 04:52:16,474 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-20 04:52:16,474 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s a yes–no–worse. But it\'...' +2025-11-20 04:52:16,474 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-20 04:52:16,474 - INFO - ---------------------------------------------------------------------- +2025-11-20 04:52:16,474 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-20 04:52:16,474 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-20 04:52:16,474 - INFO - Generated: 's, was Rienne Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army RO...' +2025-11-20 04:52:16,474 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-20 04:52:16,474 - INFO - ---------------------------------------------------------------------- +2025-11-20 04:52:16,474 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-20 04:52:16,475 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-20 04:52:16,475 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant axe, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and ...' +2025-11-20 04:52:16,475 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-20 04:52:16,475 - INFO - ---------------------------------------------------------------------- +2025-11-20 04:52:16,475 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-20 04:52:16,475 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-20 04:52:16,475 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-20 04:52:16,475 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-20 04:52:16,475 - INFO - ---------------------------------------------------------------------- +2025-11-20 04:52:16,475 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-20 04:52:16,476 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-20 04:52:16,476 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian A...' +2025-11-20 04:52:16,476 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-20 04:52:16,476 - INFO - ---------------------------------------------------------------------- +2025-11-20 04:52:16,477 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_6000.jsonl +2025-11-20 04:53:05,921 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 04:53:05,931 - INFO - New best validation loss: 0.1451, perplexity: 1.16 +2025-11-20 04:55:53,812 - INFO - Epoch 1 Step 6010 (Global: 6010): loss=0.1455, ppl=1.16, grad_norm=2.91, lr=4.53e-06, throughput=2859 tok/s +2025-11-20 04:58:41,871 - INFO - Epoch 1 Step 6020 (Global: 6020): loss=0.1306, ppl=1.14, grad_norm=3.38, lr=4.51e-06, throughput=2856 tok/s +2025-11-20 05:01:30,827 - INFO - Epoch 1 Step 6030 (Global: 6030): loss=0.1582, ppl=1.17, grad_norm=4.38, lr=4.50e-06, throughput=2841 tok/s +2025-11-20 05:04:20,183 - INFO - Epoch 1 Step 6040 (Global: 6040): loss=0.1631, ppl=1.18, grad_norm=3.67, lr=4.48e-06, throughput=2834 tok/s +2025-11-20 05:07:10,678 - INFO - Epoch 1 Step 6050 (Global: 6050): loss=0.1396, ppl=1.15, grad_norm=3.31, lr=4.46e-06, throughput=2815 tok/s +2025-11-20 05:10:10,340 - INFO - Epoch 1 Step 6060 (Global: 6060): loss=0.1322, ppl=1.14, grad_norm=4.59, lr=4.45e-06, throughput=2672 tok/s +2025-11-20 05:13:00,187 - INFO - Epoch 1 Step 6070 (Global: 6070): loss=0.1626, ppl=1.18, grad_norm=3.59, lr=4.43e-06, throughput=2826 tok/s +2025-11-20 05:15:50,307 - INFO - Epoch 1 Step 6080 (Global: 6080): loss=0.1382, ppl=1.15, grad_norm=3.73, lr=4.41e-06, throughput=2822 tok/s +2025-11-20 05:18:40,559 - INFO - Epoch 1 Step 6090 (Global: 6090): loss=0.1349, ppl=1.14, grad_norm=3.03, lr=4.40e-06, throughput=2819 tok/s +2025-11-20 05:21:30,199 - INFO - Epoch 1 Step 6100 (Global: 6100): loss=0.1411, ppl=1.15, grad_norm=4.09, lr=4.38e-06, throughput=2830 tok/s +2025-11-20 05:24:19,922 - INFO - Epoch 1 Step 6110 (Global: 6110): loss=0.1163, ppl=1.12, grad_norm=3.42, lr=4.36e-06, throughput=2828 tok/s +2025-11-20 05:27:08,244 - INFO - Epoch 1 Step 6120 (Global: 6120): loss=0.1784, ppl=1.20, grad_norm=3.11, lr=4.35e-06, throughput=2852 tok/s +2025-11-20 05:29:58,553 - INFO - Epoch 1 Step 6130 (Global: 6130): loss=0.1480, ppl=1.16, grad_norm=4.31, lr=4.33e-06, throughput=2818 tok/s +2025-11-20 05:32:48,058 - INFO - Epoch 1 Step 6140 (Global: 6140): loss=0.1546, ppl=1.17, grad_norm=4.22, lr=4.31e-06, throughput=2832 tok/s +2025-11-20 05:35:38,013 - INFO - Epoch 1 Step 6150 (Global: 6150): loss=0.1541, ppl=1.17, grad_norm=4.16, lr=4.30e-06, throughput=2824 tok/s +2025-11-20 05:38:28,147 - INFO - Epoch 1 Step 6160 (Global: 6160): loss=0.1471, ppl=1.16, grad_norm=4.41, lr=4.28e-06, throughput=2821 tok/s +2025-11-20 05:41:17,836 - INFO - Epoch 1 Step 6170 (Global: 6170): loss=0.1521, ppl=1.16, grad_norm=4.53, lr=4.26e-06, throughput=2829 tok/s +2025-11-20 05:44:07,898 - INFO - Epoch 1 Step 6180 (Global: 6180): loss=0.1611, ppl=1.17, grad_norm=3.73, lr=4.25e-06, throughput=2823 tok/s +2025-11-20 05:46:58,032 - INFO - Epoch 1 Step 6190 (Global: 6190): loss=0.1614, ppl=1.18, grad_norm=2.69, lr=4.23e-06, throughput=2821 tok/s +2025-11-20 05:49:48,545 - INFO - Epoch 1 Step 6200 (Global: 6200): loss=0.1266, ppl=1.13, grad_norm=2.73, lr=4.21e-06, throughput=2815 tok/s +2025-11-20 05:52:38,300 - INFO - Epoch 1 Step 6210 (Global: 6210): loss=0.1501, ppl=1.16, grad_norm=3.28, lr=4.20e-06, throughput=2828 tok/s +2025-11-20 05:55:28,342 - INFO - Epoch 1 Step 6220 (Global: 6220): loss=0.1465, ppl=1.16, grad_norm=4.16, lr=4.18e-06, throughput=2823 tok/s +2025-11-20 05:58:18,020 - INFO - Epoch 1 Step 6230 (Global: 6230): loss=0.1313, ppl=1.14, grad_norm=4.16, lr=4.16e-06, throughput=2829 tok/s +2025-11-20 06:01:06,495 - INFO - Epoch 1 Step 6240 (Global: 6240): loss=0.1354, ppl=1.14, grad_norm=3.55, lr=4.15e-06, throughput=2849 tok/s +2025-11-20 06:03:56,156 - INFO - Epoch 1 Step 6250 (Global: 6250): loss=0.1253, ppl=1.13, grad_norm=2.97, lr=4.13e-06, throughput=2829 tok/s +2025-11-20 06:06:45,172 - INFO - Epoch 1 Step 6260 (Global: 6260): loss=0.1428, ppl=1.15, grad_norm=3.61, lr=4.12e-06, throughput=2840 tok/s +2025-11-20 06:09:34,706 - INFO - Epoch 1 Step 6270 (Global: 6270): loss=0.1436, ppl=1.15, grad_norm=3.39, lr=4.10e-06, throughput=2831 tok/s +2025-11-20 06:12:23,999 - INFO - Epoch 1 Step 6280 (Global: 6280): loss=0.1206, ppl=1.13, grad_norm=3.17, lr=4.08e-06, throughput=2835 tok/s +2025-11-20 06:15:13,168 - INFO - Epoch 1 Step 6290 (Global: 6290): loss=0.1260, ppl=1.13, grad_norm=2.45, lr=4.07e-06, throughput=2837 tok/s +2025-11-20 06:18:02,609 - INFO - Epoch 1 Step 6300 (Global: 6300): loss=0.1520, ppl=1.16, grad_norm=3.36, lr=4.05e-06, throughput=2833 tok/s +2025-11-20 06:20:52,795 - INFO - Epoch 1 Step 6310 (Global: 6310): loss=0.1456, ppl=1.16, grad_norm=3.08, lr=4.03e-06, throughput=2820 tok/s +2025-11-20 06:23:42,366 - INFO - Epoch 1 Step 6320 (Global: 6320): loss=0.1648, ppl=1.18, grad_norm=3.81, lr=4.02e-06, throughput=2831 tok/s +2025-11-20 06:26:41,595 - INFO - Epoch 1 Step 6330 (Global: 6330): loss=0.1248, ppl=1.13, grad_norm=4.00, lr=4.00e-06, throughput=2678 tok/s +2025-11-20 06:29:32,418 - INFO - Epoch 1 Step 6340 (Global: 6340): loss=0.1468, ppl=1.16, grad_norm=3.16, lr=3.98e-06, throughput=2810 tok/s +2025-11-20 06:32:22,049 - INFO - Epoch 1 Step 6350 (Global: 6350): loss=0.1409, ppl=1.15, grad_norm=3.89, lr=3.97e-06, throughput=2830 tok/s +2025-11-20 06:35:11,503 - INFO - Epoch 1 Step 6360 (Global: 6360): loss=0.1316, ppl=1.14, grad_norm=4.34, lr=3.95e-06, throughput=2833 tok/s +2025-11-20 06:38:00,353 - INFO - Epoch 1 Step 6370 (Global: 6370): loss=0.1581, ppl=1.17, grad_norm=2.62, lr=3.93e-06, throughput=2843 tok/s +2025-11-20 06:40:49,761 - INFO - Epoch 1 Step 6380 (Global: 6380): loss=0.1481, ppl=1.16, grad_norm=3.56, lr=3.92e-06, throughput=2833 tok/s +2025-11-20 06:43:40,150 - INFO - Epoch 1 Step 6390 (Global: 6390): loss=0.1430, ppl=1.15, grad_norm=3.58, lr=3.90e-06, throughput=2817 tok/s +2025-11-20 06:46:30,536 - INFO - Epoch 1 Step 6400 (Global: 6400): loss=0.1662, ppl=1.18, grad_norm=4.34, lr=3.89e-06, throughput=2817 tok/s +2025-11-20 06:49:20,143 - INFO - Epoch 1 Step 6410 (Global: 6410): loss=0.1549, ppl=1.17, grad_norm=3.86, lr=3.87e-06, throughput=2830 tok/s +2025-11-20 06:52:11,284 - INFO - Epoch 1 Step 6420 (Global: 6420): loss=0.1332, ppl=1.14, grad_norm=3.62, lr=3.85e-06, throughput=2805 tok/s +2025-11-20 06:55:01,198 - INFO - Epoch 1 Step 6430 (Global: 6430): loss=0.1471, ppl=1.16, grad_norm=3.25, lr=3.84e-06, throughput=2825 tok/s +2025-11-20 06:57:52,236 - INFO - Epoch 1 Step 6440 (Global: 6440): loss=0.1535, ppl=1.17, grad_norm=4.59, lr=3.82e-06, throughput=2806 tok/s +2025-11-20 07:00:42,143 - INFO - Epoch 1 Step 6450 (Global: 6450): loss=0.1496, ppl=1.16, grad_norm=4.47, lr=3.80e-06, throughput=2825 tok/s +2025-11-20 07:03:31,452 - INFO - Epoch 1 Step 6460 (Global: 6460): loss=0.1486, ppl=1.16, grad_norm=4.81, lr=3.79e-06, throughput=2835 tok/s +2025-11-20 07:06:22,074 - INFO - Epoch 1 Step 6470 (Global: 6470): loss=0.1445, ppl=1.16, grad_norm=3.25, lr=3.77e-06, throughput=2813 tok/s +2025-11-20 07:09:12,042 - INFO - Epoch 1 Step 6480 (Global: 6480): loss=0.1462, ppl=1.16, grad_norm=3.47, lr=3.76e-06, throughput=2824 tok/s +2025-11-20 07:12:01,505 - INFO - Epoch 1 Step 6490 (Global: 6490): loss=0.1274, ppl=1.14, grad_norm=3.05, lr=3.74e-06, throughput=2833 tok/s +2025-11-20 07:14:50,574 - INFO - Epoch 1 Step 6500 (Global: 6500): loss=0.1546, ppl=1.17, grad_norm=3.27, lr=3.72e-06, throughput=2839 tok/s +2025-11-20 07:14:50,574 - INFO - +Running validation at step 6500... +2025-11-20 07:24:37,030 - INFO - Validation loss: 0.1407, perplexity: 1.15 +2025-11-20 07:24:37,031 - INFO - Qualitative metrics (n=5): +2025-11-20 07:24:37,031 - INFO - BLEU: 0.8395 +2025-11-20 07:24:37,031 - INFO - METEOR: 0.9086 +2025-11-20 07:24:37,031 - INFO - Edit Distance: 0.1454 +2025-11-20 07:24:37,031 - INFO - F-measure: 0.9135 +2025-11-20 07:24:37,031 - INFO - +====================================================================== +2025-11-20 07:24:37,031 - INFO - Qualitative Evaluation Samples: +2025-11-20 07:24:37,031 - INFO - ====================================================================== +2025-11-20 07:24:37,031 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-20 07:24:37,031 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-20 07:24:37,032 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s a yes–no–worse. But it...' +2025-11-20 07:24:37,032 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-20 07:24:37,032 - INFO - ---------------------------------------------------------------------- +2025-11-20 07:24:37,032 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-20 07:24:37,032 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-20 07:24:37,032 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-20 07:24:37,032 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-20 07:24:37,032 - INFO - ---------------------------------------------------------------------- +2025-11-20 07:24:37,032 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-20 07:24:37,032 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-20 07:24:37,032 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant axe, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and ...' +2025-11-20 07:24:37,033 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-20 07:24:37,033 - INFO - ---------------------------------------------------------------------- +2025-11-20 07:24:37,033 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-20 07:24:37,033 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-20 07:24:37,033 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-20 07:24:37,033 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-20 07:24:37,033 - INFO - ---------------------------------------------------------------------- +2025-11-20 07:24:37,033 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-20 07:24:37,033 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-20 07:24:37,033 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima ...' +2025-11-20 07:24:37,033 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-20 07:24:37,034 - INFO - ---------------------------------------------------------------------- +2025-11-20 07:24:37,035 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_6500.jsonl +2025-11-20 07:25:27,916 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 07:25:27,937 - INFO - New best validation loss: 0.1407, perplexity: 1.15 +2025-11-20 07:28:18,343 - INFO - Epoch 1 Step 6510 (Global: 6510): loss=0.1188, ppl=1.13, grad_norm=3.73, lr=3.71e-06, throughput=2817 tok/s +2025-11-20 07:31:06,921 - INFO - Epoch 1 Step 6520 (Global: 6520): loss=0.1326, ppl=1.14, grad_norm=3.39, lr=3.69e-06, throughput=2847 tok/s +2025-11-20 07:33:55,499 - INFO - Epoch 1 Step 6530 (Global: 6530): loss=0.1335, ppl=1.14, grad_norm=4.47, lr=3.67e-06, throughput=2847 tok/s +2025-11-20 07:36:44,069 - INFO - Epoch 1 Step 6540 (Global: 6540): loss=0.1252, ppl=1.13, grad_norm=3.45, lr=3.66e-06, throughput=2848 tok/s +2025-11-20 07:39:33,216 - INFO - Epoch 1 Step 6550 (Global: 6550): loss=0.1399, ppl=1.15, grad_norm=3.61, lr=3.64e-06, throughput=2838 tok/s +2025-11-20 07:42:21,647 - INFO - Epoch 1 Step 6560 (Global: 6560): loss=0.1495, ppl=1.16, grad_norm=4.22, lr=3.63e-06, throughput=2850 tok/s +2025-11-20 07:45:10,357 - INFO - Epoch 1 Step 6570 (Global: 6570): loss=0.1467, ppl=1.16, grad_norm=3.23, lr=3.61e-06, throughput=2845 tok/s +2025-11-20 07:47:59,317 - INFO - Epoch 1 Step 6580 (Global: 6580): loss=0.1263, ppl=1.13, grad_norm=3.58, lr=3.59e-06, throughput=2841 tok/s +2025-11-20 07:50:48,045 - INFO - Epoch 1 Step 6590 (Global: 6590): loss=0.1443, ppl=1.16, grad_norm=4.59, lr=3.58e-06, throughput=2845 tok/s +2025-11-20 07:53:35,885 - INFO - Epoch 1 Step 6600 (Global: 6600): loss=0.1238, ppl=1.13, grad_norm=3.72, lr=3.56e-06, throughput=2860 tok/s +2025-11-20 07:56:24,786 - INFO - Epoch 1 Step 6610 (Global: 6610): loss=0.1396, ppl=1.15, grad_norm=3.50, lr=3.55e-06, throughput=2842 tok/s +2025-11-20 07:59:13,495 - INFO - Epoch 1 Step 6620 (Global: 6620): loss=0.1443, ppl=1.16, grad_norm=4.16, lr=3.53e-06, throughput=2845 tok/s +2025-11-20 08:02:02,599 - INFO - Epoch 1 Step 6630 (Global: 6630): loss=0.1361, ppl=1.15, grad_norm=4.09, lr=3.51e-06, throughput=2839 tok/s +2025-11-20 08:04:51,962 - INFO - Epoch 1 Step 6640 (Global: 6640): loss=0.1661, ppl=1.18, grad_norm=4.16, lr=3.50e-06, throughput=2834 tok/s +2025-11-20 08:07:42,417 - INFO - Epoch 1 Step 6650 (Global: 6650): loss=0.1166, ppl=1.12, grad_norm=3.38, lr=3.48e-06, throughput=2816 tok/s +2025-11-20 08:10:32,762 - INFO - Epoch 1 Step 6660 (Global: 6660): loss=0.1372, ppl=1.15, grad_norm=3.31, lr=3.47e-06, throughput=2818 tok/s +2025-11-20 08:13:23,468 - INFO - Epoch 1 Step 6670 (Global: 6670): loss=0.1661, ppl=1.18, grad_norm=3.64, lr=3.45e-06, throughput=2812 tok/s +2025-11-20 08:16:14,876 - INFO - Epoch 1 Step 6680 (Global: 6680): loss=0.1288, ppl=1.14, grad_norm=3.41, lr=3.43e-06, throughput=2800 tok/s +2025-11-20 08:19:05,046 - INFO - Epoch 1 Step 6690 (Global: 6690): loss=0.1330, ppl=1.14, grad_norm=2.95, lr=3.42e-06, throughput=2821 tok/s +2025-11-20 08:21:55,625 - INFO - Epoch 1 Step 6700 (Global: 6700): loss=0.1434, ppl=1.15, grad_norm=4.44, lr=3.40e-06, throughput=2814 tok/s +2025-11-20 08:24:45,602 - INFO - Epoch 1 Step 6710 (Global: 6710): loss=0.1302, ppl=1.14, grad_norm=3.41, lr=3.39e-06, throughput=2824 tok/s +2025-11-20 08:27:34,808 - INFO - Epoch 1 Step 6720 (Global: 6720): loss=0.1472, ppl=1.16, grad_norm=4.69, lr=3.37e-06, throughput=2837 tok/s +2025-11-20 08:30:23,505 - INFO - Epoch 1 Step 6730 (Global: 6730): loss=0.1243, ppl=1.13, grad_norm=6.88, lr=3.35e-06, throughput=2845 tok/s +2025-11-20 08:33:12,342 - INFO - Epoch 1 Step 6740 (Global: 6740): loss=0.1555, ppl=1.17, grad_norm=3.72, lr=3.34e-06, throughput=2843 tok/s +2025-11-20 08:36:01,336 - INFO - Epoch 1 Step 6750 (Global: 6750): loss=0.1363, ppl=1.15, grad_norm=4.59, lr=3.32e-06, throughput=2840 tok/s +2025-11-20 08:38:50,012 - INFO - Epoch 1 Step 6760 (Global: 6760): loss=0.1254, ppl=1.13, grad_norm=2.70, lr=3.31e-06, throughput=2846 tok/s +2025-11-20 08:41:37,869 - INFO - Epoch 1 Step 6770 (Global: 6770): loss=0.1246, ppl=1.13, grad_norm=4.12, lr=3.29e-06, throughput=2860 tok/s +2025-11-20 08:44:26,166 - INFO - Epoch 1 Step 6780 (Global: 6780): loss=0.1252, ppl=1.13, grad_norm=3.55, lr=3.28e-06, throughput=2852 tok/s +2025-11-20 08:47:14,772 - INFO - Epoch 1 Step 6790 (Global: 6790): loss=0.1266, ppl=1.13, grad_norm=3.23, lr=3.26e-06, throughput=2847 tok/s +2025-11-20 08:50:03,327 - INFO - Epoch 1 Step 6800 (Global: 6800): loss=0.1411, ppl=1.15, grad_norm=4.97, lr=3.24e-06, throughput=2848 tok/s +2025-11-20 08:52:52,346 - INFO - Epoch 1 Step 6810 (Global: 6810): loss=0.1554, ppl=1.17, grad_norm=3.67, lr=3.23e-06, throughput=2840 tok/s +2025-11-20 08:55:41,017 - INFO - Epoch 1 Step 6820 (Global: 6820): loss=0.1517, ppl=1.16, grad_norm=4.09, lr=3.21e-06, throughput=2846 tok/s +2025-11-20 08:58:29,988 - INFO - Epoch 1 Step 6830 (Global: 6830): loss=0.1210, ppl=1.13, grad_norm=3.69, lr=3.20e-06, throughput=2841 tok/s +2025-11-20 09:01:19,241 - INFO - Epoch 1 Step 6840 (Global: 6840): loss=0.1411, ppl=1.15, grad_norm=2.91, lr=3.18e-06, throughput=2836 tok/s +2025-11-20 09:04:09,731 - INFO - Epoch 1 Step 6850 (Global: 6850): loss=0.1368, ppl=1.15, grad_norm=3.16, lr=3.17e-06, throughput=2815 tok/s +2025-11-20 09:07:10,245 - INFO - Epoch 1 Step 6860 (Global: 6860): loss=0.1347, ppl=1.14, grad_norm=2.88, lr=3.15e-06, throughput=2662 tok/s +2025-11-20 09:10:01,393 - INFO - Epoch 1 Step 6870 (Global: 6870): loss=0.1120, ppl=1.12, grad_norm=5.44, lr=3.13e-06, throughput=2805 tok/s +2025-11-20 09:12:52,562 - INFO - Epoch 1 Step 6880 (Global: 6880): loss=0.1164, ppl=1.12, grad_norm=2.36, lr=3.12e-06, throughput=2804 tok/s +2025-11-20 09:15:41,910 - INFO - Epoch 1 Step 6890 (Global: 6890): loss=0.1401, ppl=1.15, grad_norm=4.50, lr=3.10e-06, throughput=2834 tok/s +2025-11-20 09:18:31,758 - INFO - Epoch 1 Step 6900 (Global: 6900): loss=0.1265, ppl=1.13, grad_norm=3.52, lr=3.09e-06, throughput=2826 tok/s +2025-11-20 09:21:21,930 - INFO - Epoch 1 Step 6910 (Global: 6910): loss=0.1393, ppl=1.15, grad_norm=2.84, lr=3.07e-06, throughput=2821 tok/s +2025-11-20 09:24:13,731 - INFO - Epoch 1 Step 6920 (Global: 6920): loss=0.1389, ppl=1.15, grad_norm=3.25, lr=3.06e-06, throughput=2794 tok/s +2025-11-20 09:27:04,246 - INFO - Epoch 1 Step 6930 (Global: 6930): loss=0.1345, ppl=1.14, grad_norm=3.66, lr=3.04e-06, throughput=2815 tok/s +2025-11-20 09:29:52,667 - INFO - Epoch 1 Step 6940 (Global: 6940): loss=0.1358, ppl=1.15, grad_norm=4.88, lr=3.03e-06, throughput=2850 tok/s +2025-11-20 09:32:41,834 - INFO - Epoch 1 Step 6950 (Global: 6950): loss=0.1390, ppl=1.15, grad_norm=4.03, lr=3.01e-06, throughput=2837 tok/s +2025-11-20 09:35:31,984 - INFO - Epoch 1 Step 6960 (Global: 6960): loss=0.1045, ppl=1.11, grad_norm=3.27, lr=3.00e-06, throughput=2821 tok/s +2025-11-20 09:38:22,547 - INFO - Epoch 1 Step 6970 (Global: 6970): loss=0.1295, ppl=1.14, grad_norm=3.25, lr=2.98e-06, throughput=2814 tok/s +2025-11-20 09:41:13,028 - INFO - Epoch 1 Step 6980 (Global: 6980): loss=0.1564, ppl=1.17, grad_norm=4.50, lr=2.96e-06, throughput=2816 tok/s +2025-11-20 09:44:02,528 - INFO - Epoch 1 Step 6990 (Global: 6990): loss=0.1498, ppl=1.16, grad_norm=5.38, lr=2.95e-06, throughput=2832 tok/s +2025-11-20 09:46:53,059 - INFO - Epoch 1 Step 7000 (Global: 7000): loss=0.1345, ppl=1.14, grad_norm=3.22, lr=2.93e-06, throughput=2815 tok/s +2025-11-20 09:46:53,059 - INFO - +Running validation at step 7000... +2025-11-20 09:56:33,399 - INFO - Validation loss: 0.1375, perplexity: 1.15 +2025-11-20 09:56:33,399 - INFO - Qualitative metrics (n=5): +2025-11-20 09:56:33,400 - INFO - BLEU: 0.8286 +2025-11-20 09:56:33,400 - INFO - METEOR: 0.9038 +2025-11-20 09:56:33,400 - INFO - Edit Distance: 0.1462 +2025-11-20 09:56:33,400 - INFO - F-measure: 0.9056 +2025-11-20 09:56:33,400 - INFO - +====================================================================== +2025-11-20 09:56:33,400 - INFO - Qualitative Evaluation Samples: +2025-11-20 09:56:33,400 - INFO - ====================================================================== +2025-11-20 09:56:33,401 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-20 09:56:33,401 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-20 09:56:33,401 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s a yes-or-wubert. But i...' +2025-11-20 09:56:33,401 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-20 09:56:33,401 - INFO - ---------------------------------------------------------------------- +2025-11-20 09:56:33,401 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-20 09:56:33,401 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-20 09:56:33,401 - INFO - Generated: 's, was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army RO...' +2025-11-20 09:56:33,401 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-20 09:56:33,401 - INFO - ---------------------------------------------------------------------- +2025-11-20 09:56:33,401 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-20 09:56:33,402 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-20 09:56:33,402 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant axe, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and ...' +2025-11-20 09:56:33,402 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-20 09:56:33,402 - INFO - ---------------------------------------------------------------------- +2025-11-20 09:56:33,402 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-20 09:56:33,402 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-20 09:56:33,402 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-20 09:56:33,402 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-20 09:56:33,402 - INFO - ---------------------------------------------------------------------- +2025-11-20 09:56:33,402 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-20 09:56:33,402 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-20 09:56:33,403 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian Abyss a...' +2025-11-20 09:56:33,403 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-20 09:56:33,403 - INFO - ---------------------------------------------------------------------- +2025-11-20 13:35:24,056 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_tiny_reconstruction_20251118_214704', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='tiny', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251118_214704', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=True, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint='outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=2, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=True, use_optimized_model=False, use_encoder_checkpointing=False) +2025-11-20 13:35:24,056 - INFO - Resuming training from checkpoint: outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 13:35:24,056 - INFO - Continuing outputs in directory: outputs/production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 13:35:24,056 - INFO - Using preset vision prompt: 'free_ocr' → ''\nFree OCR.'' +2025-11-20 13:35:24,056 - INFO - Setting random seed: 42 +2025-11-20 13:35:24,346 - INFO - Peeking checkpoint metadata from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 13:35:38,734 - INFO - Checkpoint metadata: epoch=0, batch_idx=77999, global_step=6500 +2025-11-20 13:35:38,734 - INFO - W&B run ID: tto6r4hl +2025-11-20 13:35:39,299 - INFO - Auto-generated W&B run name: production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 13:35:39,301 - INFO - Resuming W&B run with ID: tto6r4hl +2025-11-20 13:35:40,468 - INFO - Initialized W&B run: vision-compression-2/production_vision_tiny_reconstruction_20251118_214704 (ID: tto6r4hl) +2025-11-20 13:35:40,469 - INFO - Loading model and tokenizer... +2025-11-20 13:36:00,029 - INFO - Compiling model with torch.compile... +2025-11-20 13:36:00,029 - INFO - Note: First forward pass will compile (may take several minutes) +2025-11-20 13:36:00,948 - INFO - Created Vision Compression trainer (mode: tiny) +2025-11-20 13:36:00,948 - INFO - Training objective: reconstruction +2025-11-20 13:36:00,979 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640 +2025-11-20 13:36:00,980 - INFO - Loading training data from data/training/splits_510k/train.jsonl +2025-11-20 13:38:36,824 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl +2025-11-20 13:38:36,824 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 13:38:36,924 - INFO - Loading validation data from data/training/splits_510k/val.jsonl +2025-11-20 13:38:39,460 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl +2025-11-20 13:38:39,461 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 13:38:39,492 - INFO - Created AdamW optimizer with differential LR: + Encoder: 474 param tensors @ lr=1e-05 + Decoder: 2236 param tensors @ lr=0.0001 + Fused kernels: True +2025-11-20 13:38:39,492 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-20 13:38:39,492 - INFO - Loading checkpoint state (model/optimizer/scheduler) from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 13:38:53,100 - INFO - ✓ Successfully loaded optimizer state from checkpoint +2025-11-20 13:38:53,101 - INFO - ✓ Successfully loaded scheduler state from checkpoint +2025-11-20 13:38:53,105 - WARNING - Failed to restore RNG states: RNG state must be a torch.ByteTensor. Continuing with current RNG state. +2025-11-20 13:38:53,106 - INFO - Resumed from epoch 0, batch 77999, global_step 6500 + Best validation loss: 0.1407 +2025-11-20 13:38:53,106 - INFO - W&B run ID: tto6r4hl +2025-11-20 13:38:53,106 - INFO - ✓ Sampler state loaded (500000 samples) +2025-11-20 13:38:53,127 - INFO - Restored training state: epoch=0, batch_idx=77999, global_step=6500, best_val_loss=0.1407 +2025-11-20 13:38:53,132 - INFO - Resuming mid-epoch: will skip first 78000 batches of epoch 0 +2025-11-20 13:38:53,132 - INFO - Starting training loop... +2025-11-20 13:38:53,133 - INFO - +====================================================================== +2025-11-20 13:38:53,133 - INFO - Running initial validation (before any training)... +2025-11-20 13:38:53,133 - INFO - ====================================================================== +2025-11-20 13:43:04,583 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,584 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,584 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,585 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,586 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,587 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,587 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,587 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,587 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,588 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,595 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,596 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,598 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,599 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,600 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,603 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,603 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,604 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,606 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,607 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,608 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,608 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,609 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,609 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,609 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,609 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,614 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,614 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,615 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,615 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,616 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,618 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,619 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,619 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,619 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,619 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,620 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,623 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,624 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,627 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,627 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,627 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,628 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,628 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,628 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,628 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,628 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,628 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,628 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,629 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,629 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,629 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,629 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,629 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,629 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,629 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,629 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,629 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,630 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,630 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,630 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,630 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,630 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,630 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,630 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,630 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,630 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,631 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,631 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,631 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,635 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,636 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,637 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,637 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,637 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,637 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,637 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,637 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,637 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,637 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,638 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,638 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,638 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,638 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,638 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,638 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,638 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,639 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,640 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,641 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,641 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,641 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,641 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,641 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,641 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,642 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,642 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,642 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,642 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,642 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,642 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,643 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,643 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,650 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,650 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,650 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,650 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,650 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,650 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,650 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,650 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,652 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,652 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,652 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,653 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,654 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,655 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,656 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,657 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,658 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,659 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,864 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,864 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,864 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,865 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,865 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,865 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,865 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,865 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,866 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,866 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,869 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,869 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,869 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,870 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,870 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,871 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,871 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,872 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,872 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,872 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,873 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,873 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,873 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,875 - WARNING - socket.send() raised exception. +2025-11-20 13:43:04,876 - WARNING - socket.send() raised exception. +2025-11-20 13:45:00,652 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_tiny_reconstruction_20251118_214704', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='tiny', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251118_214704', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=False, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint='outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=2, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=True, use_optimized_model=False, use_encoder_checkpointing=False) +2025-11-20 13:45:00,652 - INFO - Resuming training from checkpoint: outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 13:45:00,652 - INFO - Continuing outputs in directory: outputs/production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 13:45:00,652 - INFO - Using preset vision prompt: 'free_ocr' → ''\nFree OCR.'' +2025-11-20 13:45:00,652 - INFO - Setting random seed: 42 +2025-11-20 13:45:00,897 - INFO - Peeking checkpoint metadata from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 13:45:15,551 - INFO - Checkpoint metadata: epoch=0, batch_idx=77999, global_step=6500 +2025-11-20 13:45:15,551 - INFO - W&B run ID: tto6r4hl +2025-11-20 13:45:16,173 - INFO - Auto-generated W&B run name: production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 13:45:16,175 - INFO - Resuming W&B run with ID: tto6r4hl +2025-11-20 13:45:21,329 - INFO - Initialized W&B run: vision-compression-2/production_vision_tiny_reconstruction_20251118_214704 (ID: tto6r4hl) +2025-11-20 13:45:21,329 - INFO - Loading model and tokenizer... +2025-11-20 13:45:30,602 - INFO - Compiling model with torch.compile... +2025-11-20 13:45:30,602 - INFO - Note: First forward pass will compile (may take several minutes) +2025-11-20 13:45:31,446 - INFO - Created Vision Compression trainer (mode: tiny) +2025-11-20 13:45:31,446 - INFO - Training objective: reconstruction +2025-11-20 13:45:31,478 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640 +2025-11-20 13:45:31,478 - INFO - Loading training data from data/training/splits_510k/train.jsonl +2025-11-20 13:48:05,235 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl +2025-11-20 13:48:05,236 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 13:48:05,350 - INFO - Loading validation data from data/training/splits_510k/val.jsonl +2025-11-20 13:48:07,890 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl +2025-11-20 13:48:07,891 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 13:48:07,924 - INFO - Created AdamW optimizer with differential LR: + Encoder: 474 param tensors @ lr=1e-05 + Decoder: 2236 param tensors @ lr=0.0001 + Fused kernels: True +2025-11-20 13:48:07,924 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-20 13:48:07,924 - INFO - Loading checkpoint state (model/optimizer/scheduler) from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 13:48:20,958 - INFO - ✓ Successfully loaded optimizer state from checkpoint +2025-11-20 13:48:20,959 - INFO - ✓ Successfully loaded scheduler state from checkpoint +2025-11-20 13:48:20,960 - WARNING - Failed to restore RNG states: RNG state must be a torch.ByteTensor. Continuing with current RNG state. +2025-11-20 13:48:20,960 - INFO - Resumed from epoch 0, batch 77999, global_step 6500 + Best validation loss: 0.1407 +2025-11-20 13:48:20,960 - INFO - W&B run ID: tto6r4hl +2025-11-20 13:48:20,960 - INFO - ✓ Sampler state loaded (500000 samples) +2025-11-20 13:48:20,990 - INFO - Restored training state: epoch=0, batch_idx=77999, global_step=6500, best_val_loss=0.1407 +2025-11-20 13:48:20,995 - INFO - Resuming mid-epoch: will skip first 78000 batches of epoch 0 +2025-11-20 13:48:20,995 - INFO - Starting training loop... +2025-11-20 13:48:20,996 - INFO - +====================================================================== +2025-11-20 13:48:20,996 - INFO - Epoch 1/1 +2025-11-20 13:48:20,996 - INFO - ====================================================================== +2025-11-20 13:48:20,996 - INFO - Skipping first 78000 batches (mid-epoch resume) +2025-11-20 14:33:17,110 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_tiny_reconstruction_20251118_214704', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='tiny', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251118_214704', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=False, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint='outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=2, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=True, use_optimized_model=False, use_encoder_checkpointing=False) +2025-11-20 14:33:17,110 - INFO - Resuming training from checkpoint: outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 14:33:17,110 - INFO - Continuing outputs in directory: outputs/production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 14:33:17,110 - INFO - Using preset vision prompt: 'free_ocr' → ''\nFree OCR.'' +2025-11-20 14:33:17,110 - INFO - Setting random seed: 42 +2025-11-20 14:33:17,438 - INFO - Peeking checkpoint metadata from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 14:33:31,518 - INFO - Checkpoint metadata: epoch=0, batch_idx=77999, global_step=6500 +2025-11-20 14:33:31,519 - INFO - W&B run ID: tto6r4hl +2025-11-20 14:33:32,044 - INFO - Auto-generated W&B run name: production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 14:33:32,045 - INFO - Resuming W&B run with ID: tto6r4hl +2025-11-20 14:33:33,249 - INFO - Initialized W&B run: vision-compression-2/production_vision_tiny_reconstruction_20251118_214704 (ID: tto6r4hl) +2025-11-20 14:33:33,250 - INFO - Loading model and tokenizer... +2025-11-20 14:33:41,824 - INFO - Compiling model with torch.compile... +2025-11-20 14:33:41,824 - INFO - Note: First forward pass will compile (may take several minutes) +2025-11-20 14:33:42,724 - INFO - Created Vision Compression trainer (mode: tiny) +2025-11-20 14:33:42,724 - INFO - Training objective: reconstruction +2025-11-20 14:33:42,755 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640 +2025-11-20 14:33:42,755 - INFO - Loading training data from data/training/splits_510k/train.jsonl +2025-11-20 14:36:20,701 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl +2025-11-20 14:36:20,701 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 14:42:15,749 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_tiny_reconstruction_20251118_214704', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='tiny', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251118_214704', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=False, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint='outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=2, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=True, use_optimized_model=False, use_encoder_checkpointing=False) +2025-11-20 14:42:15,750 - INFO - Resuming training from checkpoint: outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 14:42:15,750 - INFO - Continuing outputs in directory: outputs/production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 14:42:15,750 - INFO - Using preset vision prompt: 'free_ocr' → ''\nFree OCR.'' +2025-11-20 14:42:15,750 - INFO - Setting random seed: 42 +2025-11-20 14:42:15,958 - INFO - Peeking checkpoint metadata from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 14:42:29,968 - INFO - Checkpoint metadata: epoch=0, batch_idx=77999, global_step=6500 +2025-11-20 14:42:29,968 - INFO - W&B run ID: tto6r4hl +2025-11-20 14:42:30,530 - INFO - Auto-generated W&B run name: production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 14:42:30,531 - INFO - Resuming W&B run with ID: tto6r4hl +2025-11-20 14:42:31,678 - INFO - Initialized W&B run: vision-compression-2/production_vision_tiny_reconstruction_20251118_214704 (ID: tto6r4hl) +2025-11-20 14:42:31,678 - INFO - Loading model and tokenizer... +2025-11-20 14:42:40,123 - INFO - Compiling model with torch.compile... +2025-11-20 14:42:40,123 - INFO - Note: First forward pass will compile (may take several minutes) +2025-11-20 14:42:41,041 - INFO - Created Vision Compression trainer (mode: tiny) +2025-11-20 14:42:41,041 - INFO - Training objective: reconstruction +2025-11-20 14:42:41,071 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640 +2025-11-20 14:42:41,071 - INFO - Loading training data from data/training/splits_510k/train.jsonl +2025-11-20 14:45:29,895 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl +2025-11-20 14:45:29,895 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 14:45:29,992 - INFO - Loading validation data from data/training/splits_510k/val.jsonl +2025-11-20 14:45:32,488 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl +2025-11-20 14:45:32,489 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 14:45:32,516 - INFO - Created AdamW optimizer with differential LR: + Encoder: 474 param tensors @ lr=1e-05 + Decoder: 2236 param tensors @ lr=0.0001 + Fused kernels: True +2025-11-20 14:45:32,516 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-20 14:45:32,516 - INFO - Loading checkpoint state (model/optimizer/scheduler) from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 14:45:45,008 - INFO - ✓ Successfully loaded optimizer state from checkpoint +2025-11-20 14:45:45,009 - INFO - ✓ Successfully loaded scheduler state from checkpoint +2025-11-20 14:45:45,009 - WARNING - Failed to restore RNG states: RNG state must be a torch.ByteTensor. Continuing with current RNG state. +2025-11-20 14:45:45,009 - INFO - Resumed from epoch 0, batch 77999, global_step 6500 + Best validation loss: 0.1407 +2025-11-20 14:45:45,010 - INFO - W&B run ID: tto6r4hl +2025-11-20 14:45:45,010 - INFO - ✓ Sampler state loaded (500000 samples) +2025-11-20 14:45:45,025 - INFO - Restored training state: epoch=0, batch_idx=77999, global_step=6500, best_val_loss=0.1407 +2025-11-20 14:45:45,030 - INFO - Resuming mid-epoch: will skip first 78000 batches of epoch 0 +2025-11-20 14:45:45,030 - INFO - Starting training loop... +2025-11-20 14:45:45,031 - INFO - +====================================================================== +2025-11-20 14:45:45,031 - INFO - Epoch 1/1 +2025-11-20 14:45:45,031 - INFO - ====================================================================== +2025-11-20 14:53:21,348 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_tiny_reconstruction_20251118_214704', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='tiny', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251118_214704', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=False, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint='outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=2, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=True, use_optimized_model=False, use_encoder_checkpointing=False) +2025-11-20 14:53:21,348 - INFO - Resuming training from checkpoint: outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 14:53:21,348 - INFO - Continuing outputs in directory: outputs/production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 14:53:21,348 - INFO - Using preset vision prompt: 'free_ocr' → ''\nFree OCR.'' +2025-11-20 14:53:21,348 - INFO - Setting random seed: 42 +2025-11-20 14:53:21,619 - INFO - Peeking checkpoint metadata from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 14:53:34,319 - INFO - Checkpoint metadata: epoch=0, batch_idx=77999, global_step=6500 +2025-11-20 14:53:34,319 - INFO - W&B run ID: tto6r4hl +2025-11-20 14:53:34,829 - INFO - Auto-generated W&B run name: production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 14:53:34,830 - INFO - Resuming W&B run with ID: tto6r4hl +2025-11-20 14:53:35,985 - INFO - Initialized W&B run: vision-compression-2/production_vision_tiny_reconstruction_20251118_214704 (ID: tto6r4hl) +2025-11-20 14:53:35,985 - INFO - Loading model and tokenizer... +2025-11-20 14:53:44,917 - INFO - Compiling model with torch.compile... +2025-11-20 14:53:44,918 - INFO - Note: First forward pass will compile (may take several minutes) +2025-11-20 14:53:45,748 - INFO - Created Vision Compression trainer (mode: tiny) +2025-11-20 14:53:45,748 - INFO - Training objective: reconstruction +2025-11-20 14:53:45,784 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640 +2025-11-20 14:53:45,784 - INFO - Loading training data from data/training/splits_510k/train.jsonl +2025-11-20 14:56:26,884 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl +2025-11-20 14:56:26,884 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 14:56:26,884 - INFO - Mid-epoch resume: skipping first 312000 samples at sampler level (batch 78000) +2025-11-20 14:56:26,981 - INFO - Loading validation data from data/training/splits_510k/val.jsonl +2025-11-20 14:56:29,604 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl +2025-11-20 14:56:29,604 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 14:56:29,631 - INFO - Created AdamW optimizer with differential LR: + Encoder: 474 param tensors @ lr=1e-05 + Decoder: 2236 param tensors @ lr=0.0001 + Fused kernels: True +2025-11-20 14:56:29,631 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-20 14:56:29,631 - INFO - Loading checkpoint state (model/optimizer/scheduler) from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 14:56:42,472 - INFO - ✓ Successfully loaded optimizer state from checkpoint +2025-11-20 14:56:42,473 - INFO - ✓ Successfully loaded scheduler state from checkpoint +2025-11-20 14:56:42,474 - WARNING - Failed to restore RNG states: RNG state must be a torch.ByteTensor. Continuing with current RNG state. +2025-11-20 14:56:42,474 - INFO - Resumed from epoch 0, batch 77999, global_step 6500 + Best validation loss: 0.1407 +2025-11-20 14:56:42,474 - INFO - W&B run ID: tto6r4hl +2025-11-20 14:56:42,474 - INFO - ✓ Sampler state loaded (500000 samples) +2025-11-20 14:56:42,492 - INFO - Restored training state: epoch=0, batch_idx=77999, global_step=6500, best_val_loss=0.1407 +2025-11-20 14:56:42,497 - INFO - Resuming mid-epoch: will skip first 78000 batches of epoch 0 +2025-11-20 14:56:42,498 - INFO - Starting training loop... +2025-11-20 14:56:42,498 - INFO - +====================================================================== +2025-11-20 14:56:42,498 - INFO - Epoch 1/1 +2025-11-20 14:56:42,498 - INFO - ====================================================================== +2025-11-20 15:04:35,546 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_tiny_reconstruction_20251118_214704', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='tiny', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251118_214704', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=False, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint='outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=2, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=True, use_optimized_model=False, use_encoder_checkpointing=False) +2025-11-20 15:04:35,547 - INFO - Resuming training from checkpoint: outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 15:04:35,547 - INFO - Continuing outputs in directory: outputs/production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 15:04:35,547 - INFO - Using preset vision prompt: 'free_ocr' → ''\nFree OCR.'' +2025-11-20 15:04:35,547 - INFO - Setting random seed: 42 +2025-11-20 15:04:35,768 - INFO - Peeking checkpoint metadata from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 15:04:50,018 - INFO - Checkpoint metadata: epoch=0, batch_idx=77999, global_step=6500 +2025-11-20 15:04:50,018 - INFO - W&B run ID: tto6r4hl +2025-11-20 15:04:50,583 - INFO - Auto-generated W&B run name: production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 15:04:50,584 - INFO - Resuming W&B run with ID: tto6r4hl +2025-11-20 15:04:51,684 - INFO - Initialized W&B run: vision-compression-2/production_vision_tiny_reconstruction_20251118_214704 (ID: tto6r4hl) +2025-11-20 15:04:51,685 - INFO - Loading model and tokenizer... +2025-11-20 15:05:00,105 - INFO - Compiling model with torch.compile... +2025-11-20 15:05:00,105 - INFO - Note: First forward pass will compile (may take several minutes) +2025-11-20 15:05:00,945 - INFO - Created Vision Compression trainer (mode: tiny) +2025-11-20 15:05:00,946 - INFO - Training objective: reconstruction +2025-11-20 15:05:00,946 - INFO - Loading training data from data/training/splits_510k/train.jsonl +2025-11-20 15:07:36,397 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl +2025-11-20 15:07:36,397 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 15:07:36,397 - INFO - Mid-epoch resume: skipping first 312000 samples at sampler level (batch 78000) +2025-11-20 15:07:36,503 - INFO - Loading validation data from data/training/splits_510k/val.jsonl +2025-11-20 15:07:39,020 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl +2025-11-20 15:07:39,020 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 15:07:39,048 - INFO - Created AdamW optimizer with differential LR: + Encoder: 474 param tensors @ lr=1e-05 + Decoder: 2236 param tensors @ lr=0.0001 + Fused kernels: True +2025-11-20 15:07:39,049 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-20 15:07:39,049 - INFO - Loading checkpoint state (model/optimizer/scheduler) from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 15:07:52,357 - INFO - ✓ Successfully loaded optimizer state from checkpoint +2025-11-20 15:07:52,358 - INFO - ✓ Successfully loaded scheduler state from checkpoint +2025-11-20 15:07:52,359 - WARNING - Failed to restore RNG states: RNG state must be a torch.ByteTensor. Continuing with current RNG state. +2025-11-20 15:07:52,359 - INFO - Resumed from epoch 0, batch 77999, global_step 6500 + Best validation loss: 0.1407 +2025-11-20 15:07:52,359 - INFO - W&B run ID: tto6r4hl +2025-11-20 15:07:52,359 - INFO - ✓ Sampler state loaded (500000 samples) +2025-11-20 15:07:52,380 - INFO - Restored training state: epoch=0, batch_idx=77999, global_step=6500, best_val_loss=0.1407 +2025-11-20 15:07:52,385 - INFO - Resuming mid-epoch: will skip first 78000 batches of epoch 0 +2025-11-20 15:07:52,385 - INFO - Starting training loop... +2025-11-20 15:07:52,385 - INFO - +====================================================================== +2025-11-20 15:07:52,385 - INFO - Epoch 1/1 +2025-11-20 15:07:52,385 - INFO - ====================================================================== +2025-11-20 15:09:07,296 - INFO - Effective context tokens (per-sample): 78 | Compression ratio: 12.82x +2025-11-20 15:09:07,296 - INFO - Target tokens per sample: 1000 +2025-11-20 15:12:39,079 - INFO - Epoch 1 Step 10 (Global: 6510): loss=0.1190, ppl=1.13, grad_norm=3.78, lr=3.71e-06, throughput=1674 tok/s +2025-11-20 15:15:41,811 - INFO - Epoch 1 Step 20 (Global: 6520): loss=0.1329, ppl=1.14, grad_norm=3.19, lr=3.69e-06, throughput=2627 tok/s +2025-11-20 15:18:41,333 - INFO - Epoch 1 Step 30 (Global: 6530): loss=0.1336, ppl=1.14, grad_norm=6.44, lr=3.67e-06, throughput=2674 tok/s +2025-11-20 15:21:39,156 - INFO - Epoch 1 Step 40 (Global: 6540): loss=0.1257, ppl=1.13, grad_norm=2.48, lr=3.66e-06, throughput=2699 tok/s +2025-11-20 15:24:49,071 - INFO - Epoch 1 Step 50 (Global: 6550): loss=0.1413, ppl=1.15, grad_norm=5.00, lr=3.64e-06, throughput=2527 tok/s +2025-11-20 15:27:51,047 - INFO - Epoch 1 Step 60 (Global: 6560): loss=0.1501, ppl=1.16, grad_norm=3.73, lr=3.63e-06, throughput=2638 tok/s +2025-11-20 15:30:48,429 - INFO - Epoch 1 Step 70 (Global: 6570): loss=0.1466, ppl=1.16, grad_norm=2.88, lr=3.61e-06, throughput=2706 tok/s +2025-11-20 15:33:47,880 - INFO - Epoch 1 Step 80 (Global: 6580): loss=0.1262, ppl=1.13, grad_norm=3.73, lr=3.59e-06, throughput=2675 tok/s +2025-11-20 15:36:44,855 - INFO - Epoch 1 Step 90 (Global: 6590): loss=0.1449, ppl=1.16, grad_norm=2.80, lr=3.58e-06, throughput=2712 tok/s +2025-11-20 15:39:41,715 - INFO - Epoch 1 Step 100 (Global: 6600): loss=0.1247, ppl=1.13, grad_norm=3.94, lr=3.56e-06, throughput=2714 tok/s +2025-11-20 15:42:49,578 - INFO - Epoch 1 Step 110 (Global: 6610): loss=0.1404, ppl=1.15, grad_norm=4.03, lr=3.55e-06, throughput=2555 tok/s +2025-11-20 15:45:45,409 - INFO - Epoch 1 Step 120 (Global: 6620): loss=0.1436, ppl=1.15, grad_norm=4.81, lr=3.53e-06, throughput=2730 tok/s +2025-11-20 15:48:41,961 - INFO - Epoch 1 Step 130 (Global: 6630): loss=0.1387, ppl=1.15, grad_norm=3.36, lr=3.51e-06, throughput=2719 tok/s +2025-11-20 15:51:38,005 - INFO - Epoch 1 Step 140 (Global: 6640): loss=0.1664, ppl=1.18, grad_norm=6.00, lr=3.50e-06, throughput=2727 tok/s +2025-11-20 15:54:34,063 - INFO - Epoch 1 Step 150 (Global: 6650): loss=0.1170, ppl=1.12, grad_norm=3.61, lr=3.48e-06, throughput=2726 tok/s +2025-11-20 15:57:42,411 - INFO - Epoch 1 Step 160 (Global: 6660): loss=0.1383, ppl=1.15, grad_norm=3.12, lr=3.47e-06, throughput=2549 tok/s +2025-11-20 16:00:39,327 - INFO - Epoch 1 Step 170 (Global: 6670): loss=0.1678, ppl=1.18, grad_norm=3.16, lr=3.45e-06, throughput=2713 tok/s +2025-11-20 16:03:36,234 - INFO - Epoch 1 Step 180 (Global: 6680): loss=0.1291, ppl=1.14, grad_norm=3.77, lr=3.43e-06, throughput=2713 tok/s +2025-11-20 16:06:31,687 - INFO - Epoch 1 Step 190 (Global: 6690): loss=0.1338, ppl=1.14, grad_norm=4.84, lr=3.42e-06, throughput=2736 tok/s +2025-11-20 16:09:28,780 - INFO - Epoch 1 Step 200 (Global: 6700): loss=0.1436, ppl=1.15, grad_norm=3.25, lr=3.40e-06, throughput=2710 tok/s +2025-11-20 16:12:28,155 - INFO - Epoch 1 Step 210 (Global: 6710): loss=0.1304, ppl=1.14, grad_norm=3.25, lr=3.39e-06, throughput=2676 tok/s +2025-11-20 16:15:38,093 - INFO - Epoch 1 Step 220 (Global: 6720): loss=0.1474, ppl=1.16, grad_norm=3.72, lr=3.37e-06, throughput=2527 tok/s +2025-11-20 16:18:35,060 - INFO - Epoch 1 Step 230 (Global: 6730): loss=0.1262, ppl=1.13, grad_norm=4.41, lr=3.35e-06, throughput=2712 tok/s +2025-11-20 16:21:31,805 - INFO - Epoch 1 Step 240 (Global: 6740): loss=0.1552, ppl=1.17, grad_norm=3.52, lr=3.34e-06, throughput=2716 tok/s +2025-11-20 16:24:29,738 - INFO - Epoch 1 Step 250 (Global: 6750): loss=0.1371, ppl=1.15, grad_norm=4.62, lr=3.32e-06, throughput=2698 tok/s +2025-11-20 16:27:26,925 - INFO - Epoch 1 Step 260 (Global: 6760): loss=0.1241, ppl=1.13, grad_norm=2.86, lr=3.31e-06, throughput=2709 tok/s +2025-11-20 16:30:23,235 - INFO - Epoch 1 Step 270 (Global: 6770): loss=0.1246, ppl=1.13, grad_norm=3.77, lr=3.29e-06, throughput=2723 tok/s +2025-11-20 16:33:31,938 - INFO - Epoch 1 Step 280 (Global: 6780): loss=0.1240, ppl=1.13, grad_norm=3.86, lr=3.28e-06, throughput=2544 tok/s +2025-11-20 16:36:29,510 - INFO - Epoch 1 Step 290 (Global: 6790): loss=0.1281, ppl=1.14, grad_norm=3.42, lr=3.26e-06, throughput=2703 tok/s +2025-11-20 16:39:26,185 - INFO - Epoch 1 Step 300 (Global: 6800): loss=0.1428, ppl=1.15, grad_norm=3.86, lr=3.24e-06, throughput=2717 tok/s +2025-11-20 16:42:24,293 - INFO - Epoch 1 Step 310 (Global: 6810): loss=0.1557, ppl=1.17, grad_norm=4.12, lr=3.23e-06, throughput=2695 tok/s +2025-11-20 17:11:17,051 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_tiny_reconstruction_20251118_214704', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='tiny', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251118_214704', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=False, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint='outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=2, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=True, use_optimized_model=False, use_encoder_checkpointing=False) +2025-11-20 17:11:17,051 - INFO - Resuming training from checkpoint: outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 17:11:17,052 - INFO - Continuing outputs in directory: outputs/production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 17:11:17,052 - INFO - Using preset vision prompt: 'free_ocr' → ''\nFree OCR.'' +2025-11-20 17:11:17,052 - INFO - Setting random seed: 42 +2025-11-20 17:11:17,294 - INFO - Peeking checkpoint metadata from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 17:11:32,369 - INFO - Checkpoint metadata: epoch=0, batch_idx=77999, global_step=6500 +2025-11-20 17:11:32,369 - INFO - W&B run ID: tto6r4hl +2025-11-20 17:11:32,902 - INFO - Auto-generated W&B run name: production_vision_tiny_reconstruction_20251118_214704 +2025-11-20 17:11:32,904 - INFO - Checkpoint has WandB run ID: tto6r4hl +2025-11-20 17:11:32,904 - INFO - Creating fresh WandB run (not resuming to avoid stale data) +2025-11-20 17:11:34,290 - INFO - Initialized W&B run: vision-compression-2/production_vision_tiny_reconstruction_20251118_214704 (ID: wim42ieq) +2025-11-20 17:11:34,291 - INFO - Loading model and tokenizer... +2025-11-20 17:11:43,434 - INFO - Compiling model with torch.compile... +2025-11-20 17:11:43,434 - INFO - Note: First forward pass will compile (may take several minutes) +2025-11-20 17:11:44,387 - INFO - Created Vision Compression trainer (mode: tiny) +2025-11-20 17:11:44,387 - INFO - Training objective: reconstruction +2025-11-20 17:11:44,423 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640 +2025-11-20 17:11:44,423 - INFO - Loading training data from data/training/splits_510k/train.jsonl +2025-11-20 17:14:33,305 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl +2025-11-20 17:14:33,305 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 17:14:33,306 - INFO - Mid-epoch resume: skipping first 312000 samples at sampler level (batch 78000) +2025-11-20 17:14:33,462 - INFO - Loading validation data from data/training/splits_510k/val.jsonl +2025-11-20 17:14:36,198 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl +2025-11-20 17:14:36,199 - INFO - Vision mode: tiny (73 tokens, 512x512) +2025-11-20 17:14:36,251 - INFO - Created AdamW optimizer with differential LR: + Encoder: 474 param tensors @ lr=1e-05 + Decoder: 2236 param tensors @ lr=0.0001 + Fused kernels: True +2025-11-20 17:14:36,252 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-20 17:14:36,252 - INFO - Loading checkpoint state (model/optimizer/scheduler) from outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-20 17:14:55,057 - INFO - ✓ Successfully loaded optimizer state from checkpoint +2025-11-20 17:14:55,057 - INFO - ✓ Successfully loaded scheduler state from checkpoint +2025-11-20 17:14:55,059 - WARNING - Failed to restore RNG states: RNG state must be a torch.ByteTensor. Continuing with current RNG state. +2025-11-20 17:14:55,059 - INFO - Resumed from epoch 0, batch 77999, global_step 6500 + Best validation loss: 0.1407 +2025-11-20 17:14:55,059 - INFO - W&B run ID: tto6r4hl +2025-11-20 17:14:55,059 - INFO - ✓ Sampler state loaded (500000 samples) +2025-11-20 17:14:56,817 - INFO - Restored training state: epoch=0, batch_idx=77999, global_step=6500, best_val_loss=0.1407 +2025-11-20 17:14:56,826 - INFO - Resuming mid-epoch: will skip first 78000 batches of epoch 0 +2025-11-20 17:14:56,826 - INFO - Starting training loop... +2025-11-20 17:14:56,826 - INFO - +====================================================================== +2025-11-20 17:14:56,826 - INFO - Epoch 1/1 +2025-11-20 17:14:56,826 - INFO - ====================================================================== +2025-11-20 17:16:37,837 - INFO - Effective context tokens (per-sample): 78 | Compression ratio: 12.82x +2025-11-20 17:16:37,837 - INFO - Target tokens per sample: 1000 +2025-11-20 17:19:47,867 - INFO - Epoch 1 Step 10 (Global: 6510): loss=0.1194, ppl=1.13, grad_norm=4.34, lr=3.71e-06, throughput=1649 tok/s +2025-11-20 17:22:50,509 - INFO - Epoch 1 Step 20 (Global: 6520): loss=0.1325, ppl=1.14, grad_norm=3.36, lr=3.69e-06, throughput=2628 tok/s +2025-11-20 17:25:46,757 - INFO - Epoch 1 Step 30 (Global: 6530): loss=0.1335, ppl=1.14, grad_norm=6.81, lr=3.67e-06, throughput=2723 tok/s +2025-11-20 17:28:40,059 - INFO - Epoch 1 Step 40 (Global: 6540): loss=0.1257, ppl=1.13, grad_norm=3.86, lr=3.66e-06, throughput=2770 tok/s +2025-11-20 17:31:40,974 - INFO - Epoch 1 Step 50 (Global: 6550): loss=0.1396, ppl=1.15, grad_norm=3.72, lr=3.64e-06, throughput=2653 tok/s +2025-11-20 17:34:30,568 - INFO - Epoch 1 Step 60 (Global: 6560): loss=0.1494, ppl=1.16, grad_norm=5.34, lr=3.63e-06, throughput=2830 tok/s +2025-11-20 17:37:22,927 - INFO - Epoch 1 Step 70 (Global: 6570): loss=0.1460, ppl=1.16, grad_norm=3.50, lr=3.61e-06, throughput=2785 tok/s +2025-11-20 17:40:16,621 - INFO - Epoch 1 Step 80 (Global: 6580): loss=0.1254, ppl=1.13, grad_norm=3.16, lr=3.59e-06, throughput=2764 tok/s +2025-11-20 17:43:09,003 - INFO - Epoch 1 Step 90 (Global: 6590): loss=0.1445, ppl=1.16, grad_norm=5.19, lr=3.58e-06, throughput=2785 tok/s +2025-11-20 17:46:14,667 - INFO - Epoch 1 Step 100 (Global: 6600): loss=0.1247, ppl=1.13, grad_norm=2.97, lr=3.56e-06, throughput=2585 tok/s +2025-11-20 17:49:10,995 - INFO - Epoch 1 Step 110 (Global: 6610): loss=0.1401, ppl=1.15, grad_norm=3.47, lr=3.55e-06, throughput=2722 tok/s +2025-11-20 17:52:01,878 - INFO - Epoch 1 Step 120 (Global: 6620): loss=0.1435, ppl=1.15, grad_norm=4.72, lr=3.53e-06, throughput=2809 tok/s +2025-11-20 17:54:53,335 - INFO - Epoch 1 Step 130 (Global: 6630): loss=0.1366, ppl=1.15, grad_norm=6.19, lr=3.51e-06, throughput=2800 tok/s +2025-11-20 17:57:45,372 - INFO - Epoch 1 Step 140 (Global: 6640): loss=0.1650, ppl=1.18, grad_norm=6.12, lr=3.50e-06, throughput=2790 tok/s +2025-11-20 18:00:36,106 - INFO - Epoch 1 Step 150 (Global: 6650): loss=0.1169, ppl=1.12, grad_norm=2.86, lr=3.48e-06, throughput=2811 tok/s +2025-11-20 18:03:38,008 - INFO - Epoch 1 Step 160 (Global: 6660): loss=0.1374, ppl=1.15, grad_norm=4.69, lr=3.47e-06, throughput=2639 tok/s +2025-11-20 18:06:30,260 - INFO - Epoch 1 Step 170 (Global: 6670): loss=0.1670, ppl=1.18, grad_norm=3.33, lr=3.45e-06, throughput=2787 tok/s +2025-11-20 18:09:21,939 - INFO - Epoch 1 Step 180 (Global: 6680): loss=0.1274, ppl=1.14, grad_norm=4.44, lr=3.43e-06, throughput=2796 tok/s +2025-11-20 18:12:11,593 - INFO - Epoch 1 Step 190 (Global: 6690): loss=0.1339, ppl=1.14, grad_norm=2.78, lr=3.42e-06, throughput=2829 tok/s +2025-11-20 18:15:01,026 - INFO - Epoch 1 Step 200 (Global: 6700): loss=0.1420, ppl=1.15, grad_norm=5.22, lr=3.40e-06, throughput=2833 tok/s +2025-11-20 18:18:00,488 - INFO - Epoch 1 Step 210 (Global: 6710): loss=0.1299, ppl=1.14, grad_norm=2.80, lr=3.39e-06, throughput=2675 tok/s +2025-11-20 18:20:50,608 - INFO - Epoch 1 Step 220 (Global: 6720): loss=0.1469, ppl=1.16, grad_norm=3.95, lr=3.37e-06, throughput=2822 tok/s +2025-11-20 18:23:41,120 - INFO - Epoch 1 Step 230 (Global: 6730): loss=0.1256, ppl=1.13, grad_norm=5.88, lr=3.35e-06, throughput=2815 tok/s +2025-11-20 18:26:30,414 - INFO - Epoch 1 Step 240 (Global: 6740): loss=0.1556, ppl=1.17, grad_norm=2.86, lr=3.34e-06, throughput=2835 tok/s +2025-11-20 18:29:20,481 - INFO - Epoch 1 Step 250 (Global: 6750): loss=0.1367, ppl=1.15, grad_norm=4.16, lr=3.32e-06, throughput=2822 tok/s +2025-11-20 18:32:09,385 - INFO - Epoch 1 Step 260 (Global: 6760): loss=0.1241, ppl=1.13, grad_norm=2.95, lr=3.31e-06, throughput=2842 tok/s +2025-11-20 18:35:08,507 - INFO - Epoch 1 Step 270 (Global: 6770): loss=0.1244, ppl=1.13, grad_norm=3.34, lr=3.29e-06, throughput=2680 tok/s +2025-11-20 18:37:56,784 - INFO - Epoch 1 Step 280 (Global: 6780): loss=0.1251, ppl=1.13, grad_norm=2.67, lr=3.28e-06, throughput=2852 tok/s +2025-11-20 18:40:45,685 - INFO - Epoch 1 Step 290 (Global: 6790): loss=0.1277, ppl=1.14, grad_norm=3.30, lr=3.26e-06, throughput=2842 tok/s +2025-11-20 18:43:34,454 - INFO - Epoch 1 Step 300 (Global: 6800): loss=0.1413, ppl=1.15, grad_norm=4.16, lr=3.24e-06, throughput=2844 tok/s +2025-11-20 18:46:42,738 - INFO - Epoch 1 Step 310 (Global: 6810): loss=0.1550, ppl=1.17, grad_norm=3.58, lr=3.23e-06, throughput=2549 tok/s +2025-11-20 18:49:34,726 - INFO - Epoch 1 Step 320 (Global: 6820): loss=0.1512, ppl=1.16, grad_norm=3.41, lr=3.21e-06, throughput=2791 tok/s +2025-11-20 18:52:35,467 - INFO - Epoch 1 Step 330 (Global: 6830): loss=0.1216, ppl=1.13, grad_norm=2.73, lr=3.20e-06, throughput=2656 tok/s +2025-11-20 18:55:25,174 - INFO - Epoch 1 Step 340 (Global: 6840): loss=0.1406, ppl=1.15, grad_norm=2.97, lr=3.18e-06, throughput=2828 tok/s +2025-11-20 18:58:14,737 - INFO - Epoch 1 Step 350 (Global: 6850): loss=0.1380, ppl=1.15, grad_norm=3.64, lr=3.17e-06, throughput=2831 tok/s +2025-11-20 19:01:02,890 - INFO - Epoch 1 Step 360 (Global: 6860): loss=0.1343, ppl=1.14, grad_norm=3.11, lr=3.15e-06, throughput=2855 tok/s +2025-11-20 19:03:51,737 - INFO - Epoch 1 Step 370 (Global: 6870): loss=0.1121, ppl=1.12, grad_norm=3.44, lr=3.13e-06, throughput=2843 tok/s +2025-11-20 19:06:50,407 - INFO - Epoch 1 Step 380 (Global: 6880): loss=0.1162, ppl=1.12, grad_norm=2.09, lr=3.12e-06, throughput=2687 tok/s +2025-11-20 19:09:38,842 - INFO - Epoch 1 Step 390 (Global: 6890): loss=0.1385, ppl=1.15, grad_norm=3.61, lr=3.10e-06, throughput=2850 tok/s +2025-11-20 19:12:28,531 - INFO - Epoch 1 Step 400 (Global: 6900): loss=0.1272, ppl=1.14, grad_norm=3.41, lr=3.09e-06, throughput=2829 tok/s +2025-11-20 19:15:17,419 - INFO - Epoch 1 Step 410 (Global: 6910): loss=0.1399, ppl=1.15, grad_norm=2.73, lr=3.07e-06, throughput=2842 tok/s +2025-11-20 19:18:05,790 - INFO - Epoch 1 Step 420 (Global: 6920): loss=0.1391, ppl=1.15, grad_norm=2.41, lr=3.06e-06, throughput=2851 tok/s +2025-11-20 19:20:55,661 - INFO - Epoch 1 Step 430 (Global: 6930): loss=0.1340, ppl=1.14, grad_norm=2.89, lr=3.04e-06, throughput=2826 tok/s +2025-11-20 19:23:55,849 - INFO - Epoch 1 Step 440 (Global: 6940): loss=0.1349, ppl=1.14, grad_norm=3.59, lr=3.03e-06, throughput=2664 tok/s +2025-11-20 19:26:46,511 - INFO - Epoch 1 Step 450 (Global: 6950): loss=0.1381, ppl=1.15, grad_norm=4.12, lr=3.01e-06, throughput=2813 tok/s +2025-11-20 19:29:35,516 - INFO - Epoch 1 Step 460 (Global: 6960): loss=0.1053, ppl=1.11, grad_norm=2.53, lr=3.00e-06, throughput=2840 tok/s +2025-11-20 19:32:24,762 - INFO - Epoch 1 Step 470 (Global: 6970): loss=0.1298, ppl=1.14, grad_norm=3.03, lr=2.98e-06, throughput=2836 tok/s +2025-11-20 19:35:13,451 - INFO - Epoch 1 Step 480 (Global: 6980): loss=0.1555, ppl=1.17, grad_norm=4.91, lr=2.96e-06, throughput=2846 tok/s +2025-11-20 19:38:02,447 - INFO - Epoch 1 Step 490 (Global: 6990): loss=0.1477, ppl=1.16, grad_norm=4.69, lr=2.95e-06, throughput=2840 tok/s +2025-11-20 19:41:01,295 - INFO - Epoch 1 Step 500 (Global: 7000): loss=0.1348, ppl=1.14, grad_norm=3.41, lr=2.93e-06, throughput=2684 tok/s +2025-11-20 19:43:49,748 - INFO - Epoch 1 Step 510 (Global: 7010): loss=0.1469, ppl=1.16, grad_norm=3.33, lr=2.92e-06, throughput=2850 tok/s +2025-11-20 19:46:38,153 - INFO - Epoch 1 Step 520 (Global: 7020): loss=0.1139, ppl=1.12, grad_norm=3.41, lr=2.90e-06, throughput=2850 tok/s +2025-11-20 19:49:26,668 - INFO - Epoch 1 Step 530 (Global: 7030): loss=0.1418, ppl=1.15, grad_norm=4.50, lr=2.89e-06, throughput=2848 tok/s +2025-11-20 19:52:15,473 - INFO - Epoch 1 Step 540 (Global: 7040): loss=0.1290, ppl=1.14, grad_norm=3.14, lr=2.87e-06, throughput=2844 tok/s +2025-11-20 19:55:13,740 - INFO - Epoch 1 Step 550 (Global: 7050): loss=0.1224, ppl=1.13, grad_norm=4.91, lr=2.86e-06, throughput=2693 tok/s +2025-11-20 19:58:02,676 - INFO - Epoch 1 Step 560 (Global: 7060): loss=0.1394, ppl=1.15, grad_norm=5.53, lr=2.84e-06, throughput=2841 tok/s +2025-11-20 20:00:51,212 - INFO - Epoch 1 Step 570 (Global: 7070): loss=0.1480, ppl=1.16, grad_norm=3.36, lr=2.83e-06, throughput=2848 tok/s +2025-11-20 20:03:38,883 - INFO - Epoch 1 Step 580 (Global: 7080): loss=0.1226, ppl=1.13, grad_norm=3.53, lr=2.81e-06, throughput=2863 tok/s +2025-11-20 20:06:26,828 - INFO - Epoch 1 Step 590 (Global: 7090): loss=0.1337, ppl=1.14, grad_norm=4.34, lr=2.80e-06, throughput=2858 tok/s +2025-11-20 20:09:16,405 - INFO - Epoch 1 Step 600 (Global: 7100): loss=0.1392, ppl=1.15, grad_norm=3.92, lr=2.78e-06, throughput=2831 tok/s +2025-11-20 20:12:15,512 - INFO - Epoch 1 Step 610 (Global: 7110): loss=0.1246, ppl=1.13, grad_norm=3.20, lr=2.77e-06, throughput=2680 tok/s +2025-11-20 20:15:05,567 - INFO - Epoch 1 Step 620 (Global: 7120): loss=0.1423, ppl=1.15, grad_norm=4.03, lr=2.75e-06, throughput=2823 tok/s +2025-11-20 20:17:55,698 - INFO - Epoch 1 Step 630 (Global: 7130): loss=0.1348, ppl=1.14, grad_norm=2.92, lr=2.74e-06, throughput=2821 tok/s +2025-11-20 20:20:45,520 - INFO - Epoch 1 Step 640 (Global: 7140): loss=0.1428, ppl=1.15, grad_norm=3.61, lr=2.72e-06, throughput=2827 tok/s +2025-11-20 20:23:36,977 - INFO - Epoch 1 Step 650 (Global: 7150): loss=0.1255, ppl=1.13, grad_norm=4.34, lr=2.71e-06, throughput=2800 tok/s +2025-11-20 20:26:28,472 - INFO - Epoch 1 Step 660 (Global: 7160): loss=0.1215, ppl=1.13, grad_norm=2.53, lr=2.69e-06, throughput=2799 tok/s +2025-11-20 20:29:28,229 - INFO - Epoch 1 Step 670 (Global: 7170): loss=0.1248, ppl=1.13, grad_norm=3.89, lr=2.68e-06, throughput=2670 tok/s +2025-11-20 20:32:17,765 - INFO - Epoch 1 Step 680 (Global: 7180): loss=0.1526, ppl=1.16, grad_norm=3.23, lr=2.66e-06, throughput=2831 tok/s +2025-11-20 20:35:09,753 - INFO - Epoch 1 Step 690 (Global: 7190): loss=0.1320, ppl=1.14, grad_norm=4.66, lr=2.65e-06, throughput=2791 tok/s +2025-11-20 20:37:58,010 - INFO - Epoch 1 Step 700 (Global: 7200): loss=0.1358, ppl=1.15, grad_norm=4.25, lr=2.63e-06, throughput=2853 tok/s +2025-11-20 20:40:47,859 - INFO - Epoch 1 Step 710 (Global: 7210): loss=0.1320, ppl=1.14, grad_norm=2.80, lr=2.62e-06, throughput=2826 tok/s +2025-11-20 20:43:48,112 - INFO - Epoch 1 Step 720 (Global: 7220): loss=0.1214, ppl=1.13, grad_norm=3.09, lr=2.60e-06, throughput=2663 tok/s +2025-11-20 20:46:39,337 - INFO - Epoch 1 Step 730 (Global: 7230): loss=0.1388, ppl=1.15, grad_norm=3.33, lr=2.59e-06, throughput=2803 tok/s +2025-11-20 20:49:29,132 - INFO - Epoch 1 Step 740 (Global: 7240): loss=0.1275, ppl=1.14, grad_norm=3.58, lr=2.58e-06, throughput=2827 tok/s +2025-11-20 20:52:19,249 - INFO - Epoch 1 Step 750 (Global: 7250): loss=0.1280, ppl=1.14, grad_norm=4.62, lr=2.56e-06, throughput=2822 tok/s +2025-11-20 20:55:09,041 - INFO - Epoch 1 Step 760 (Global: 7260): loss=0.1362, ppl=1.15, grad_norm=4.88, lr=2.55e-06, throughput=2827 tok/s +2025-11-20 20:58:02,956 - INFO - Epoch 1 Step 770 (Global: 7270): loss=0.1496, ppl=1.16, grad_norm=6.38, lr=2.53e-06, throughput=2760 tok/s +2025-11-20 21:01:05,831 - INFO - Epoch 1 Step 780 (Global: 7280): loss=0.1174, ppl=1.12, grad_norm=2.81, lr=2.52e-06, throughput=2625 tok/s +2025-11-20 21:03:57,458 - INFO - Epoch 1 Step 790 (Global: 7290): loss=0.1313, ppl=1.14, grad_norm=3.08, lr=2.50e-06, throughput=2797 tok/s +2025-11-20 21:06:48,947 - INFO - Epoch 1 Step 800 (Global: 7300): loss=0.1298, ppl=1.14, grad_norm=3.55, lr=2.49e-06, throughput=2799 tok/s +2025-11-20 21:09:43,104 - INFO - Epoch 1 Step 810 (Global: 7310): loss=0.1467, ppl=1.16, grad_norm=3.09, lr=2.47e-06, throughput=2756 tok/s +2025-11-20 21:12:37,868 - INFO - Epoch 1 Step 820 (Global: 7320): loss=0.1445, ppl=1.16, grad_norm=3.20, lr=2.46e-06, throughput=2747 tok/s +2025-11-20 21:15:29,878 - INFO - Epoch 1 Step 830 (Global: 7330): loss=0.1442, ppl=1.16, grad_norm=2.84, lr=2.44e-06, throughput=2791 tok/s +2025-11-20 21:18:32,167 - INFO - Epoch 1 Step 840 (Global: 7340): loss=0.1336, ppl=1.14, grad_norm=3.17, lr=2.43e-06, throughput=2633 tok/s +2025-11-20 21:21:25,812 - INFO - Epoch 1 Step 850 (Global: 7350): loss=0.1341, ppl=1.14, grad_norm=3.61, lr=2.42e-06, throughput=2764 tok/s +2025-11-20 21:24:17,646 - INFO - Epoch 1 Step 860 (Global: 7360): loss=0.1361, ppl=1.15, grad_norm=3.00, lr=2.40e-06, throughput=2793 tok/s +2025-11-20 21:27:10,454 - INFO - Epoch 1 Step 870 (Global: 7370): loss=0.1438, ppl=1.15, grad_norm=3.41, lr=2.39e-06, throughput=2778 tok/s +2025-11-20 21:30:04,474 - INFO - Epoch 1 Step 880 (Global: 7380): loss=0.1240, ppl=1.13, grad_norm=4.31, lr=2.37e-06, throughput=2758 tok/s +2025-11-20 21:33:04,048 - INFO - Epoch 1 Step 890 (Global: 7390): loss=0.1610, ppl=1.17, grad_norm=3.36, lr=2.36e-06, throughput=2673 tok/s +2025-11-20 21:35:56,727 - INFO - Epoch 1 Step 900 (Global: 7400): loss=0.1158, ppl=1.12, grad_norm=4.22, lr=2.34e-06, throughput=2780 tok/s +2025-11-20 21:38:48,446 - INFO - Epoch 1 Step 910 (Global: 7410): loss=0.1371, ppl=1.15, grad_norm=7.31, lr=2.33e-06, throughput=2795 tok/s +2025-11-20 21:41:38,923 - INFO - Epoch 1 Step 920 (Global: 7420): loss=0.2015, ppl=1.22, grad_norm=4.03, lr=2.32e-06, throughput=2816 tok/s +2025-11-20 21:44:30,088 - INFO - Epoch 1 Step 930 (Global: 7430): loss=0.1443, ppl=1.16, grad_norm=3.19, lr=2.30e-06, throughput=2804 tok/s +2025-11-20 21:47:21,008 - INFO - Epoch 1 Step 940 (Global: 7440): loss=0.1434, ppl=1.15, grad_norm=2.81, lr=2.29e-06, throughput=2808 tok/s +2025-11-20 21:50:24,263 - INFO - Epoch 1 Step 950 (Global: 7450): loss=0.1422, ppl=1.15, grad_norm=3.98, lr=2.27e-06, throughput=2619 tok/s +2025-11-20 21:53:17,063 - INFO - Epoch 1 Step 960 (Global: 7460): loss=0.1334, ppl=1.14, grad_norm=2.95, lr=2.26e-06, throughput=2778 tok/s +2025-11-20 21:56:09,823 - INFO - Epoch 1 Step 970 (Global: 7470): loss=0.1301, ppl=1.14, grad_norm=3.62, lr=2.25e-06, throughput=2778 tok/s +2025-11-20 21:59:02,792 - INFO - Epoch 1 Step 980 (Global: 7480): loss=0.1183, ppl=1.13, grad_norm=5.94, lr=2.23e-06, throughput=2775 tok/s +2025-11-20 22:01:54,975 - INFO - Epoch 1 Step 990 (Global: 7490): loss=0.1630, ppl=1.18, grad_norm=6.69, lr=2.22e-06, throughput=2788 tok/s +2025-11-20 22:04:47,959 - INFO - Epoch 1 Step 1000 (Global: 7500): loss=0.1322, ppl=1.14, grad_norm=2.86, lr=2.20e-06, throughput=2775 tok/s +2025-11-20 22:07:52,615 - INFO - Epoch 1 Step 1010 (Global: 7510): loss=0.1323, ppl=1.14, grad_norm=3.42, lr=2.19e-06, throughput=2599 tok/s +2025-11-20 22:10:45,235 - INFO - Epoch 1 Step 1020 (Global: 7520): loss=0.1199, ppl=1.13, grad_norm=3.47, lr=2.18e-06, throughput=2781 tok/s +2025-11-20 22:13:39,638 - INFO - Epoch 1 Step 1030 (Global: 7530): loss=0.1406, ppl=1.15, grad_norm=3.70, lr=2.16e-06, throughput=2752 tok/s +2025-11-20 22:16:34,561 - INFO - Epoch 1 Step 1040 (Global: 7540): loss=0.1101, ppl=1.12, grad_norm=4.16, lr=2.15e-06, throughput=2744 tok/s +2025-11-20 22:19:31,374 - INFO - Epoch 1 Step 1050 (Global: 7550): loss=0.1190, ppl=1.13, grad_norm=3.78, lr=2.14e-06, throughput=2715 tok/s +2025-11-20 22:22:37,360 - INFO - Epoch 1 Step 1060 (Global: 7560): loss=0.1277, ppl=1.14, grad_norm=2.86, lr=2.12e-06, throughput=2581 tok/s +2025-11-20 22:25:32,932 - INFO - Epoch 1 Step 1070 (Global: 7570): loss=0.1350, ppl=1.14, grad_norm=5.84, lr=2.11e-06, throughput=2734 tok/s +2025-11-20 22:28:23,609 - INFO - Epoch 1 Step 1080 (Global: 7580): loss=0.1338, ppl=1.14, grad_norm=4.34, lr=2.09e-06, throughput=2812 tok/s +2025-11-20 22:31:16,579 - INFO - Epoch 1 Step 1090 (Global: 7590): loss=0.1227, ppl=1.13, grad_norm=3.30, lr=2.08e-06, throughput=2775 tok/s +2025-11-20 22:34:07,813 - INFO - Epoch 1 Step 1100 (Global: 7600): loss=0.1310, ppl=1.14, grad_norm=4.78, lr=2.07e-06, throughput=2803 tok/s +2025-11-20 22:36:59,673 - INFO - Epoch 1 Step 1110 (Global: 7610): loss=0.1515, ppl=1.16, grad_norm=2.89, lr=2.05e-06, throughput=2793 tok/s +2025-11-20 22:40:03,193 - INFO - Epoch 1 Step 1120 (Global: 7620): loss=0.1055, ppl=1.11, grad_norm=2.27, lr=2.04e-06, throughput=2616 tok/s +2025-11-20 22:42:57,098 - INFO - Epoch 1 Step 1130 (Global: 7630): loss=0.1338, ppl=1.14, grad_norm=2.97, lr=2.03e-06, throughput=2760 tok/s +2025-11-20 22:45:48,323 - INFO - Epoch 1 Step 1140 (Global: 7640): loss=0.1339, ppl=1.14, grad_norm=5.91, lr=2.01e-06, throughput=2803 tok/s +2025-11-20 22:48:39,450 - INFO - Epoch 1 Step 1150 (Global: 7650): loss=0.1221, ppl=1.13, grad_norm=3.06, lr=2.00e-06, throughput=2805 tok/s +2025-11-20 22:51:34,248 - INFO - Epoch 1 Step 1160 (Global: 7660): loss=0.1123, ppl=1.12, grad_norm=2.70, lr=1.99e-06, throughput=2746 tok/s +2025-11-20 22:54:27,858 - INFO - Epoch 1 Step 1170 (Global: 7670): loss=0.1244, ppl=1.13, grad_norm=3.77, lr=1.97e-06, throughput=2765 tok/s +2025-11-20 22:57:28,861 - INFO - Epoch 1 Step 1180 (Global: 7680): loss=0.1429, ppl=1.15, grad_norm=4.03, lr=1.96e-06, throughput=2652 tok/s +2025-11-20 23:00:19,306 - INFO - Epoch 1 Step 1190 (Global: 7690): loss=0.1561, ppl=1.17, grad_norm=3.20, lr=1.95e-06, throughput=2816 tok/s +2025-11-20 23:03:11,638 - INFO - Epoch 1 Step 1200 (Global: 7700): loss=0.1306, ppl=1.14, grad_norm=3.64, lr=1.93e-06, throughput=2785 tok/s +2025-11-20 23:06:04,775 - INFO - Epoch 1 Step 1210 (Global: 7710): loss=0.1420, ppl=1.15, grad_norm=5.41, lr=1.92e-06, throughput=2772 tok/s +2025-11-20 23:08:56,172 - INFO - Epoch 1 Step 1220 (Global: 7720): loss=0.1391, ppl=1.15, grad_norm=3.91, lr=1.91e-06, throughput=2801 tok/s +2025-11-20 23:11:59,867 - INFO - Epoch 1 Step 1230 (Global: 7730): loss=0.1423, ppl=1.15, grad_norm=5.22, lr=1.89e-06, throughput=2613 tok/s +2025-11-20 23:14:52,737 - INFO - Epoch 1 Step 1240 (Global: 7740): loss=0.1318, ppl=1.14, grad_norm=5.22, lr=1.88e-06, throughput=2777 tok/s +2025-11-20 23:17:48,442 - INFO - Epoch 1 Step 1250 (Global: 7750): loss=0.1209, ppl=1.13, grad_norm=2.42, lr=1.87e-06, throughput=2732 tok/s +2025-11-20 23:20:41,146 - INFO - Epoch 1 Step 1260 (Global: 7760): loss=0.1236, ppl=1.13, grad_norm=2.56, lr=1.85e-06, throughput=2779 tok/s +2025-11-20 23:23:33,140 - INFO - Epoch 1 Step 1270 (Global: 7770): loss=0.1416, ppl=1.15, grad_norm=4.56, lr=1.84e-06, throughput=2791 tok/s +2025-11-20 23:26:25,915 - INFO - Epoch 1 Step 1280 (Global: 7780): loss=0.1415, ppl=1.15, grad_norm=3.28, lr=1.83e-06, throughput=2778 tok/s +2025-11-20 23:29:32,076 - INFO - Epoch 1 Step 1290 (Global: 7790): loss=0.1382, ppl=1.15, grad_norm=3.17, lr=1.82e-06, throughput=2578 tok/s +2025-11-20 23:32:23,388 - INFO - Epoch 1 Step 1300 (Global: 7800): loss=0.1373, ppl=1.15, grad_norm=5.41, lr=1.80e-06, throughput=2802 tok/s +2025-11-20 23:35:15,934 - INFO - Epoch 1 Step 1310 (Global: 7810): loss=0.1439, ppl=1.15, grad_norm=7.16, lr=1.79e-06, throughput=2782 tok/s +2025-11-20 23:38:07,835 - INFO - Epoch 1 Step 1320 (Global: 7820): loss=0.1273, ppl=1.14, grad_norm=3.36, lr=1.78e-06, throughput=2792 tok/s +2025-11-20 23:40:58,969 - INFO - Epoch 1 Step 1330 (Global: 7830): loss=0.1316, ppl=1.14, grad_norm=4.19, lr=1.76e-06, throughput=2805 tok/s +2025-11-20 23:43:49,806 - INFO - Epoch 1 Step 1340 (Global: 7840): loss=0.1494, ppl=1.16, grad_norm=3.62, lr=1.75e-06, throughput=2810 tok/s +2025-11-20 23:46:52,190 - INFO - Epoch 1 Step 1350 (Global: 7850): loss=0.1341, ppl=1.14, grad_norm=3.70, lr=1.74e-06, throughput=2632 tok/s +2025-11-20 23:49:45,768 - INFO - Epoch 1 Step 1360 (Global: 7860): loss=0.1179, ppl=1.13, grad_norm=2.88, lr=1.73e-06, throughput=2765 tok/s +2025-11-20 23:52:37,492 - INFO - Epoch 1 Step 1370 (Global: 7870): loss=0.1230, ppl=1.13, grad_norm=3.09, lr=1.71e-06, throughput=2795 tok/s +2025-11-20 23:55:29,177 - INFO - Epoch 1 Step 1380 (Global: 7880): loss=0.1251, ppl=1.13, grad_norm=3.02, lr=1.70e-06, throughput=2796 tok/s +2025-11-20 23:58:20,858 - INFO - Epoch 1 Step 1390 (Global: 7890): loss=0.1309, ppl=1.14, grad_norm=3.23, lr=1.69e-06, throughput=2796 tok/s +2025-11-21 00:01:26,035 - INFO - Epoch 1 Step 1400 (Global: 7900): loss=0.1081, ppl=1.11, grad_norm=3.72, lr=1.68e-06, throughput=2592 tok/s +2025-11-21 00:04:18,479 - INFO - Epoch 1 Step 1410 (Global: 7910): loss=0.1319, ppl=1.14, grad_norm=2.86, lr=1.66e-06, throughput=2784 tok/s +2025-11-21 00:07:10,732 - INFO - Epoch 1 Step 1420 (Global: 7920): loss=0.1229, ppl=1.13, grad_norm=4.03, lr=1.65e-06, throughput=2787 tok/s +2025-11-21 00:10:03,628 - INFO - Epoch 1 Step 1430 (Global: 7930): loss=0.1441, ppl=1.16, grad_norm=3.70, lr=1.64e-06, throughput=2776 tok/s +2025-11-21 00:12:57,673 - INFO - Epoch 1 Step 1440 (Global: 7940): loss=0.1160, ppl=1.12, grad_norm=2.52, lr=1.63e-06, throughput=2758 tok/s +2025-11-21 00:15:54,638 - INFO - Epoch 1 Step 1450 (Global: 7950): loss=0.1398, ppl=1.15, grad_norm=5.97, lr=1.61e-06, throughput=2712 tok/s +2025-11-21 00:18:57,534 - INFO - Epoch 1 Step 1460 (Global: 7960): loss=0.1214, ppl=1.13, grad_norm=2.89, lr=1.60e-06, throughput=2624 tok/s +2025-11-21 00:21:52,188 - INFO - Epoch 1 Step 1470 (Global: 7970): loss=0.1254, ppl=1.13, grad_norm=2.86, lr=1.59e-06, throughput=2748 tok/s +2025-11-21 00:24:45,939 - INFO - Epoch 1 Step 1480 (Global: 7980): loss=0.1263, ppl=1.13, grad_norm=2.58, lr=1.58e-06, throughput=2763 tok/s +2025-11-21 00:27:37,706 - INFO - Epoch 1 Step 1490 (Global: 7990): loss=0.1353, ppl=1.14, grad_norm=3.06, lr=1.56e-06, throughput=2795 tok/s +2025-11-21 00:30:33,192 - INFO - Epoch 1 Step 1500 (Global: 8000): loss=0.1377, ppl=1.15, grad_norm=3.03, lr=1.55e-06, throughput=2735 tok/s +2025-11-21 00:30:33,192 - INFO - +Running validation at step 8000... +2025-11-21 00:41:32,723 - INFO - Validation loss: 0.1345, perplexity: 1.14 +2025-11-21 00:41:32,723 - INFO - Qualitative metrics (n=5): +2025-11-21 00:41:32,723 - INFO - BLEU: 0.8374 +2025-11-21 00:41:32,723 - INFO - METEOR: 0.9103 +2025-11-21 00:41:32,723 - INFO - Edit Distance: 0.1482 +2025-11-21 00:41:32,724 - INFO - F-measure: 0.9123 +2025-11-21 00:41:32,724 - INFO - +====================================================================== +2025-11-21 00:41:32,724 - INFO - Qualitative Evaluation Samples: +2025-11-21 00:41:32,724 - INFO - ====================================================================== +2025-11-21 00:41:32,724 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-21 00:41:32,724 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-21 00:41:32,724 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s a yes-or-wubert. But i...' +2025-11-21 00:41:32,724 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-21 00:41:32,724 - INFO - ---------------------------------------------------------------------- +2025-11-21 00:41:32,724 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-21 00:41:32,724 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-21 00:41:32,725 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-21 00:41:32,725 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-21 00:41:32,725 - INFO - ---------------------------------------------------------------------- +2025-11-21 00:41:32,725 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-21 00:41:32,725 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-21 00:41:32,725 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-21 00:41:32,725 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-21 00:41:32,725 - INFO - ---------------------------------------------------------------------- +2025-11-21 00:41:32,725 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-21 00:41:32,725 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-21 00:41:32,725 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-21 00:41:32,725 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-21 00:41:32,726 - INFO - ---------------------------------------------------------------------- +2025-11-21 00:41:32,726 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-21 00:41:32,726 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-21 00:41:32,726 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian Abyss ...' +2025-11-21 00:41:32,726 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-21 00:41:32,726 - INFO - ---------------------------------------------------------------------- +2025-11-21 00:41:32,727 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_8000.jsonl +2025-11-21 00:42:19,834 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-21 00:42:19,841 - INFO - New best validation loss: 0.1345, perplexity: 1.14 +2025-11-21 00:45:10,731 - INFO - Epoch 1 Step 1510 (Global: 8010): loss=0.1388, ppl=1.15, grad_norm=2.77, lr=1.54e-06, throughput=2809 tok/s +2025-11-21 00:48:00,913 - INFO - Epoch 1 Step 1520 (Global: 8020): loss=0.1270, ppl=1.14, grad_norm=3.97, lr=1.53e-06, throughput=2821 tok/s +2025-11-21 00:50:53,382 - INFO - Epoch 1 Step 1530 (Global: 8030): loss=0.1366, ppl=1.15, grad_norm=3.81, lr=1.52e-06, throughput=2783 tok/s +2025-11-21 00:53:43,364 - INFO - Epoch 1 Step 1540 (Global: 8040): loss=0.1458, ppl=1.16, grad_norm=4.16, lr=1.50e-06, throughput=2824 tok/s +2025-11-21 00:56:37,923 - INFO - Epoch 1 Step 1550 (Global: 8050): loss=0.1339, ppl=1.14, grad_norm=3.36, lr=1.49e-06, throughput=2750 tok/s +2025-11-21 00:59:32,624 - INFO - Epoch 1 Step 1560 (Global: 8060): loss=0.1110, ppl=1.12, grad_norm=3.89, lr=1.48e-06, throughput=2748 tok/s +2025-11-21 01:02:25,388 - INFO - Epoch 1 Step 1570 (Global: 8070): loss=0.1453, ppl=1.16, grad_norm=3.14, lr=1.47e-06, throughput=2778 tok/s +2025-11-21 01:05:21,192 - INFO - Epoch 1 Step 1580 (Global: 8080): loss=0.1242, ppl=1.13, grad_norm=3.78, lr=1.46e-06, throughput=2730 tok/s +2025-11-21 01:08:11,340 - INFO - Epoch 1 Step 1590 (Global: 8090): loss=0.1354, ppl=1.14, grad_norm=3.08, lr=1.44e-06, throughput=2821 tok/s +2025-11-21 01:11:10,932 - INFO - Epoch 1 Step 1600 (Global: 8100): loss=0.1340, ppl=1.14, grad_norm=2.86, lr=1.43e-06, throughput=2673 tok/s +2025-11-21 01:14:00,951 - INFO - Epoch 1 Step 1610 (Global: 8110): loss=0.1262, ppl=1.13, grad_norm=2.88, lr=1.42e-06, throughput=2823 tok/s +2025-11-21 01:16:50,036 - INFO - Epoch 1 Step 1620 (Global: 8120): loss=0.1265, ppl=1.13, grad_norm=2.78, lr=1.41e-06, throughput=2839 tok/s +2025-11-21 01:19:39,717 - INFO - Epoch 1 Step 1630 (Global: 8130): loss=0.1341, ppl=1.14, grad_norm=4.94, lr=1.40e-06, throughput=2829 tok/s +2025-11-21 01:22:30,815 - INFO - Epoch 1 Step 1640 (Global: 8140): loss=0.1405, ppl=1.15, grad_norm=4.91, lr=1.39e-06, throughput=2805 tok/s +2025-11-21 01:25:22,562 - INFO - Epoch 1 Step 1650 (Global: 8150): loss=0.1539, ppl=1.17, grad_norm=2.98, lr=1.37e-06, throughput=2795 tok/s +2025-11-21 01:28:12,088 - INFO - Epoch 1 Step 1660 (Global: 8160): loss=0.1277, ppl=1.14, grad_norm=3.72, lr=1.36e-06, throughput=2831 tok/s +2025-11-21 01:31:00,927 - INFO - Epoch 1 Step 1670 (Global: 8170): loss=0.1326, ppl=1.14, grad_norm=3.95, lr=1.35e-06, throughput=2843 tok/s +2025-11-21 01:33:49,016 - INFO - Epoch 1 Step 1680 (Global: 8180): loss=0.1247, ppl=1.13, grad_norm=2.75, lr=1.34e-06, throughput=2856 tok/s +2025-11-21 01:36:36,964 - INFO - Epoch 1 Step 1690 (Global: 8190): loss=0.1369, ppl=1.15, grad_norm=3.56, lr=1.33e-06, throughput=2858 tok/s +2025-11-21 01:39:26,090 - INFO - Epoch 1 Step 1700 (Global: 8200): loss=0.1187, ppl=1.13, grad_norm=3.28, lr=1.32e-06, throughput=2838 tok/s +2025-11-21 01:42:14,374 - INFO - Epoch 1 Step 1710 (Global: 8210): loss=0.1434, ppl=1.15, grad_norm=3.45, lr=1.31e-06, throughput=2852 tok/s +2025-11-21 01:45:03,608 - INFO - Epoch 1 Step 1720 (Global: 8220): loss=0.1689, ppl=1.18, grad_norm=4.38, lr=1.29e-06, throughput=2836 tok/s +2025-11-21 01:48:02,300 - INFO - Epoch 1 Step 1730 (Global: 8230): loss=0.1306, ppl=1.14, grad_norm=4.50, lr=1.28e-06, throughput=2686 tok/s +2025-11-21 01:50:51,714 - INFO - Epoch 1 Step 1740 (Global: 8240): loss=0.1301, ppl=1.14, grad_norm=2.84, lr=1.27e-06, throughput=2833 tok/s +2025-11-21 01:53:40,922 - INFO - Epoch 1 Step 1750 (Global: 8250): loss=0.1091, ppl=1.12, grad_norm=2.38, lr=1.26e-06, throughput=2837 tok/s +2025-11-21 01:56:30,237 - INFO - Epoch 1 Step 1760 (Global: 8260): loss=0.1205, ppl=1.13, grad_norm=2.83, lr=1.25e-06, throughput=2835 tok/s +2025-11-21 01:59:19,241 - INFO - Epoch 1 Step 1770 (Global: 8270): loss=0.1442, ppl=1.16, grad_norm=3.23, lr=1.24e-06, throughput=2840 tok/s +2025-11-21 02:02:09,103 - INFO - Epoch 1 Step 1780 (Global: 8280): loss=0.1283, ppl=1.14, grad_norm=4.28, lr=1.23e-06, throughput=2826 tok/s +2025-11-21 02:04:59,349 - INFO - Epoch 1 Step 1790 (Global: 8290): loss=0.1352, ppl=1.14, grad_norm=2.66, lr=1.22e-06, throughput=2820 tok/s +2025-11-21 02:07:49,844 - INFO - Epoch 1 Step 1800 (Global: 8300): loss=0.1333, ppl=1.14, grad_norm=3.23, lr=1.21e-06, throughput=2815 tok/s +2025-11-21 02:10:40,837 - INFO - Epoch 1 Step 1810 (Global: 8310): loss=0.1445, ppl=1.16, grad_norm=5.53, lr=1.20e-06, throughput=2807 tok/s +2025-11-21 02:13:29,875 - INFO - Epoch 1 Step 1820 (Global: 8320): loss=0.1278, ppl=1.14, grad_norm=4.06, lr=1.18e-06, throughput=2840 tok/s +2025-11-21 02:16:30,851 - INFO - Epoch 1 Step 1830 (Global: 8330): loss=0.1359, ppl=1.15, grad_norm=3.28, lr=1.17e-06, throughput=2652 tok/s +2025-11-21 02:19:25,823 - INFO - Epoch 1 Step 1840 (Global: 8340): loss=0.1358, ppl=1.15, grad_norm=2.50, lr=1.16e-06, throughput=2743 tok/s +2025-11-21 02:22:24,809 - INFO - Epoch 1 Step 1850 (Global: 8350): loss=0.1578, ppl=1.17, grad_norm=3.22, lr=1.15e-06, throughput=2682 tok/s +2025-11-21 02:25:14,683 - INFO - Epoch 1 Step 1860 (Global: 8360): loss=0.1472, ppl=1.16, grad_norm=3.39, lr=1.14e-06, throughput=2826 tok/s +2025-11-21 02:28:04,091 - INFO - Epoch 1 Step 1870 (Global: 8370): loss=0.1292, ppl=1.14, grad_norm=3.67, lr=1.13e-06, throughput=2833 tok/s +2025-11-21 02:30:52,943 - INFO - Epoch 1 Step 1880 (Global: 8380): loss=0.1435, ppl=1.15, grad_norm=3.55, lr=1.12e-06, throughput=2843 tok/s +2025-11-21 02:33:42,090 - INFO - Epoch 1 Step 1890 (Global: 8390): loss=0.1309, ppl=1.14, grad_norm=3.00, lr=1.11e-06, throughput=2838 tok/s +2025-11-21 02:36:30,793 - INFO - Epoch 1 Step 1900 (Global: 8400): loss=0.1272, ppl=1.14, grad_norm=4.03, lr=1.10e-06, throughput=2845 tok/s +2025-11-21 02:39:19,954 - INFO - Epoch 1 Step 1910 (Global: 8410): loss=0.1264, ppl=1.13, grad_norm=3.44, lr=1.09e-06, throughput=2838 tok/s +2025-11-21 02:42:07,879 - INFO - Epoch 1 Step 1920 (Global: 8420): loss=0.1225, ppl=1.13, grad_norm=4.50, lr=1.08e-06, throughput=2858 tok/s +2025-11-21 02:44:56,068 - INFO - Epoch 1 Step 1930 (Global: 8430): loss=0.1181, ppl=1.13, grad_norm=3.47, lr=1.07e-06, throughput=2854 tok/s +2025-11-21 02:47:44,327 - INFO - Epoch 1 Step 1940 (Global: 8440): loss=0.1637, ppl=1.18, grad_norm=2.91, lr=1.06e-06, throughput=2853 tok/s +2025-11-21 02:50:32,585 - INFO - Epoch 1 Step 1950 (Global: 8450): loss=0.1176, ppl=1.12, grad_norm=2.75, lr=1.05e-06, throughput=2853 tok/s +2025-11-21 02:53:21,399 - INFO - Epoch 1 Step 1960 (Global: 8460): loss=0.1266, ppl=1.13, grad_norm=3.41, lr=1.04e-06, throughput=2843 tok/s +2025-11-21 02:56:09,921 - INFO - Epoch 1 Step 1970 (Global: 8470): loss=0.0940, ppl=1.10, grad_norm=2.22, lr=1.03e-06, throughput=2848 tok/s +2025-11-21 02:59:07,578 - INFO - Epoch 1 Step 1980 (Global: 8480): loss=0.1429, ppl=1.15, grad_norm=3.97, lr=1.02e-06, throughput=2702 tok/s +2025-11-21 03:01:55,770 - INFO - Epoch 1 Step 1990 (Global: 8490): loss=0.1286, ppl=1.14, grad_norm=5.31, lr=1.01e-06, throughput=2854 tok/s +2025-11-21 03:04:44,277 - INFO - Epoch 1 Step 2000 (Global: 8500): loss=0.1284, ppl=1.14, grad_norm=3.53, lr=9.96e-07, throughput=2849 tok/s +2025-11-21 03:07:33,728 - INFO - Epoch 1 Step 2010 (Global: 8510): loss=0.1346, ppl=1.14, grad_norm=2.91, lr=9.86e-07, throughput=2833 tok/s +2025-11-21 03:10:23,029 - INFO - Epoch 1 Step 2020 (Global: 8520): loss=0.1420, ppl=1.15, grad_norm=4.34, lr=9.76e-07, throughput=2835 tok/s +2025-11-21 03:13:13,306 - INFO - Epoch 1 Step 2030 (Global: 8530): loss=0.1467, ppl=1.16, grad_norm=3.94, lr=9.67e-07, throughput=2819 tok/s +2025-11-21 03:16:03,327 - INFO - Epoch 1 Step 2040 (Global: 8540): loss=0.1500, ppl=1.16, grad_norm=4.12, lr=9.57e-07, throughput=2823 tok/s +2025-11-21 03:18:52,922 - INFO - Epoch 1 Step 2050 (Global: 8550): loss=0.1351, ppl=1.14, grad_norm=3.66, lr=9.47e-07, throughput=2830 tok/s +2025-11-21 03:21:41,819 - INFO - Epoch 1 Step 2060 (Global: 8560): loss=0.1319, ppl=1.14, grad_norm=3.17, lr=9.37e-07, throughput=2842 tok/s +2025-11-21 03:24:31,100 - INFO - Epoch 1 Step 2070 (Global: 8570): loss=0.1180, ppl=1.13, grad_norm=3.39, lr=9.27e-07, throughput=2836 tok/s +2025-11-21 03:27:29,660 - INFO - Epoch 1 Step 2080 (Global: 8580): loss=0.1524, ppl=1.16, grad_norm=3.97, lr=9.18e-07, throughput=2688 tok/s +2025-11-21 03:30:18,861 - INFO - Epoch 1 Step 2090 (Global: 8590): loss=0.1164, ppl=1.12, grad_norm=2.17, lr=9.08e-07, throughput=2837 tok/s +2025-11-21 03:33:07,976 - INFO - Epoch 1 Step 2100 (Global: 8600): loss=0.1382, ppl=1.15, grad_norm=3.22, lr=8.98e-07, throughput=2838 tok/s +2025-11-21 03:35:56,774 - INFO - Epoch 1 Step 2110 (Global: 8610): loss=0.1244, ppl=1.13, grad_norm=3.53, lr=8.89e-07, throughput=2844 tok/s +2025-11-21 03:38:45,732 - INFO - Epoch 1 Step 2120 (Global: 8620): loss=0.1278, ppl=1.14, grad_norm=3.22, lr=8.79e-07, throughput=2841 tok/s +2025-11-21 03:41:34,603 - INFO - Epoch 1 Step 2130 (Global: 8630): loss=0.1260, ppl=1.13, grad_norm=3.20, lr=8.70e-07, throughput=2842 tok/s +2025-11-21 03:44:24,458 - INFO - Epoch 1 Step 2140 (Global: 8640): loss=0.1149, ppl=1.12, grad_norm=4.41, lr=8.60e-07, throughput=2826 tok/s +2025-11-21 03:47:13,161 - INFO - Epoch 1 Step 2150 (Global: 8650): loss=0.1248, ppl=1.13, grad_norm=2.94, lr=8.51e-07, throughput=2845 tok/s +2025-11-21 03:50:01,516 - INFO - Epoch 1 Step 2160 (Global: 8660): loss=0.1280, ppl=1.14, grad_norm=3.23, lr=8.42e-07, throughput=2851 tok/s +2025-11-21 03:52:49,179 - INFO - Epoch 1 Step 2170 (Global: 8670): loss=0.1220, ppl=1.13, grad_norm=4.06, lr=8.32e-07, throughput=2863 tok/s +2025-11-21 03:55:37,373 - INFO - Epoch 1 Step 2180 (Global: 8680): loss=0.1242, ppl=1.13, grad_norm=2.91, lr=8.23e-07, throughput=2854 tok/s +2025-11-21 03:58:36,026 - INFO - Epoch 1 Step 2190 (Global: 8690): loss=0.1325, ppl=1.14, grad_norm=2.92, lr=8.14e-07, throughput=2687 tok/s +2025-11-21 04:01:25,783 - INFO - Epoch 1 Step 2200 (Global: 8700): loss=0.1341, ppl=1.14, grad_norm=2.97, lr=8.05e-07, throughput=2828 tok/s +2025-11-21 04:04:15,124 - INFO - Epoch 1 Step 2210 (Global: 8710): loss=0.1246, ppl=1.13, grad_norm=2.97, lr=7.96e-07, throughput=2835 tok/s +2025-11-21 04:07:04,436 - INFO - Epoch 1 Step 2220 (Global: 8720): loss=0.1374, ppl=1.15, grad_norm=4.66, lr=7.87e-07, throughput=2835 tok/s +2025-11-21 04:09:54,044 - INFO - Epoch 1 Step 2230 (Global: 8730): loss=0.1590, ppl=1.17, grad_norm=2.89, lr=7.78e-07, throughput=2830 tok/s +2025-11-21 04:12:43,933 - INFO - Epoch 1 Step 2240 (Global: 8740): loss=0.1111, ppl=1.12, grad_norm=2.77, lr=7.69e-07, throughput=2825 tok/s +2025-11-21 04:15:33,349 - INFO - Epoch 1 Step 2250 (Global: 8750): loss=0.1347, ppl=1.14, grad_norm=3.23, lr=7.60e-07, throughput=2833 tok/s +2025-11-21 04:18:22,389 - INFO - Epoch 1 Step 2260 (Global: 8760): loss=0.1571, ppl=1.17, grad_norm=4.19, lr=7.51e-07, throughput=2840 tok/s +2025-11-21 04:21:11,442 - INFO - Epoch 1 Step 2270 (Global: 8770): loss=0.1311, ppl=1.14, grad_norm=4.47, lr=7.42e-07, throughput=2839 tok/s +2025-11-21 04:23:59,493 - INFO - Epoch 1 Step 2280 (Global: 8780): loss=0.1264, ppl=1.13, grad_norm=3.62, lr=7.33e-07, throughput=2856 tok/s +2025-11-21 04:26:50,488 - INFO - Epoch 1 Step 2290 (Global: 8790): loss=0.1290, ppl=1.14, grad_norm=3.48, lr=7.25e-07, throughput=2807 tok/s +2025-11-21 04:29:39,486 - INFO - Epoch 1 Step 2300 (Global: 8800): loss=0.1104, ppl=1.12, grad_norm=3.00, lr=7.16e-07, throughput=2840 tok/s +2025-11-21 04:32:38,175 - INFO - Epoch 1 Step 2310 (Global: 8810): loss=0.1396, ppl=1.15, grad_norm=2.89, lr=7.07e-07, throughput=2686 tok/s +2025-11-21 04:35:26,620 - INFO - Epoch 1 Step 2320 (Global: 8820): loss=0.1330, ppl=1.14, grad_norm=2.72, lr=6.99e-07, throughput=2850 tok/s +2025-11-21 04:38:15,307 - INFO - Epoch 1 Step 2330 (Global: 8830): loss=0.1370, ppl=1.15, grad_norm=3.92, lr=6.90e-07, throughput=2846 tok/s +2025-11-21 04:41:03,830 - INFO - Epoch 1 Step 2340 (Global: 8840): loss=0.1282, ppl=1.14, grad_norm=2.73, lr=6.82e-07, throughput=2848 tok/s +2025-11-21 04:43:51,734 - INFO - Epoch 1 Step 2350 (Global: 8850): loss=0.1535, ppl=1.17, grad_norm=3.22, lr=6.74e-07, throughput=2859 tok/s +2025-11-21 04:46:40,170 - INFO - Epoch 1 Step 2360 (Global: 8860): loss=0.1213, ppl=1.13, grad_norm=4.03, lr=6.65e-07, throughput=2850 tok/s +2025-11-21 04:49:27,598 - INFO - Epoch 1 Step 2370 (Global: 8870): loss=0.1430, ppl=1.15, grad_norm=3.89, lr=6.57e-07, throughput=2867 tok/s +2025-11-21 04:52:15,883 - INFO - Epoch 1 Step 2380 (Global: 8880): loss=0.1423, ppl=1.15, grad_norm=8.44, lr=6.49e-07, throughput=2852 tok/s +2025-11-21 04:55:03,092 - INFO - Epoch 1 Step 2390 (Global: 8890): loss=0.1517, ppl=1.16, grad_norm=3.84, lr=6.40e-07, throughput=2871 tok/s +2025-11-21 04:57:51,593 - INFO - Epoch 1 Step 2400 (Global: 8900): loss=0.1278, ppl=1.14, grad_norm=3.30, lr=6.32e-07, throughput=2849 tok/s +2025-11-21 05:00:39,969 - INFO - Epoch 1 Step 2410 (Global: 8910): loss=0.1320, ppl=1.14, grad_norm=4.03, lr=6.24e-07, throughput=2851 tok/s +2025-11-21 05:03:39,512 - INFO - Epoch 1 Step 2420 (Global: 8920): loss=0.1496, ppl=1.16, grad_norm=2.50, lr=6.16e-07, throughput=2673 tok/s +2025-11-21 05:06:29,950 - INFO - Epoch 1 Step 2430 (Global: 8930): loss=0.1049, ppl=1.11, grad_norm=4.34, lr=6.08e-07, throughput=2816 tok/s +2025-11-21 05:09:20,049 - INFO - Epoch 1 Step 2440 (Global: 8940): loss=0.1553, ppl=1.17, grad_norm=3.47, lr=6.00e-07, throughput=2822 tok/s +2025-11-21 05:12:08,768 - INFO - Epoch 1 Step 2450 (Global: 8950): loss=0.1225, ppl=1.13, grad_norm=5.88, lr=5.92e-07, throughput=2845 tok/s +2025-11-21 05:14:59,724 - INFO - Epoch 1 Step 2460 (Global: 8960): loss=0.1237, ppl=1.13, grad_norm=4.91, lr=5.84e-07, throughput=2808 tok/s +2025-11-21 05:17:50,763 - INFO - Epoch 1 Step 2470 (Global: 8970): loss=0.1285, ppl=1.14, grad_norm=3.86, lr=5.76e-07, throughput=2806 tok/s +2025-11-21 05:20:41,278 - INFO - Epoch 1 Step 2480 (Global: 8980): loss=0.1315, ppl=1.14, grad_norm=3.83, lr=5.68e-07, throughput=2815 tok/s +2025-11-21 05:23:32,441 - INFO - Epoch 1 Step 2490 (Global: 8990): loss=0.1126, ppl=1.12, grad_norm=2.64, lr=5.61e-07, throughput=2804 tok/s +2025-11-21 05:26:23,237 - INFO - Epoch 1 Step 2500 (Global: 9000): loss=0.1098, ppl=1.12, grad_norm=3.22, lr=5.53e-07, throughput=2810 tok/s +2025-11-21 05:29:11,471 - INFO - Epoch 1 Step 2510 (Global: 9010): loss=0.1362, ppl=1.15, grad_norm=2.72, lr=5.45e-07, throughput=2853 tok/s +2025-11-21 05:32:00,031 - INFO - Epoch 1 Step 2520 (Global: 9020): loss=0.1437, ppl=1.15, grad_norm=2.75, lr=5.38e-07, throughput=2848 tok/s +2025-11-21 05:34:57,947 - INFO - Epoch 1 Step 2530 (Global: 9030): loss=0.1347, ppl=1.14, grad_norm=3.28, lr=5.30e-07, throughput=2698 tok/s +2025-11-21 05:37:47,307 - INFO - Epoch 1 Step 2540 (Global: 9040): loss=0.1337, ppl=1.14, grad_norm=4.16, lr=5.23e-07, throughput=2834 tok/s +2025-11-21 05:40:35,372 - INFO - Epoch 1 Step 2550 (Global: 9050): loss=0.1397, ppl=1.15, grad_norm=3.14, lr=5.15e-07, throughput=2856 tok/s +2025-11-21 05:43:23,570 - INFO - Epoch 1 Step 2560 (Global: 9060): loss=0.1487, ppl=1.16, grad_norm=4.09, lr=5.08e-07, throughput=2854 tok/s +2025-11-21 05:46:11,938 - INFO - Epoch 1 Step 2570 (Global: 9070): loss=0.1145, ppl=1.12, grad_norm=3.47, lr=5.01e-07, throughput=2851 tok/s +2025-11-21 05:49:00,624 - INFO - Epoch 1 Step 2580 (Global: 9080): loss=0.1026, ppl=1.11, grad_norm=2.58, lr=4.93e-07, throughput=2846 tok/s +2025-11-21 05:51:49,465 - INFO - Epoch 1 Step 2590 (Global: 9090): loss=0.1177, ppl=1.12, grad_norm=2.95, lr=4.86e-07, throughput=2843 tok/s +2025-11-21 05:54:38,545 - INFO - Epoch 1 Step 2600 (Global: 9100): loss=0.1352, ppl=1.14, grad_norm=3.02, lr=4.79e-07, throughput=2839 tok/s +2025-11-21 05:57:27,314 - INFO - Epoch 1 Step 2610 (Global: 9110): loss=0.1342, ppl=1.14, grad_norm=4.06, lr=4.72e-07, throughput=2844 tok/s +2025-11-21 06:00:16,508 - INFO - Epoch 1 Step 2620 (Global: 9120): loss=0.1003, ppl=1.11, grad_norm=2.48, lr=4.65e-07, throughput=2837 tok/s +2025-11-21 06:03:05,624 - INFO - Epoch 1 Step 2630 (Global: 9130): loss=0.1181, ppl=1.13, grad_norm=3.47, lr=4.58e-07, throughput=2838 tok/s +2025-11-21 06:06:04,519 - INFO - Epoch 1 Step 2640 (Global: 9140): loss=0.1293, ppl=1.14, grad_norm=3.06, lr=4.51e-07, throughput=2683 tok/s +2025-11-21 06:08:53,166 - INFO - Epoch 1 Step 2650 (Global: 9150): loss=0.1147, ppl=1.12, grad_norm=4.34, lr=4.44e-07, throughput=2846 tok/s +2025-11-21 06:11:41,440 - INFO - Epoch 1 Step 2660 (Global: 9160): loss=0.1619, ppl=1.18, grad_norm=6.16, lr=4.37e-07, throughput=2853 tok/s +2025-11-21 06:14:28,835 - INFO - Epoch 1 Step 2670 (Global: 9170): loss=0.1179, ppl=1.13, grad_norm=3.27, lr=4.30e-07, throughput=2868 tok/s +2025-11-21 06:17:17,462 - INFO - Epoch 1 Step 2680 (Global: 9180): loss=0.1319, ppl=1.14, grad_norm=2.25, lr=4.23e-07, throughput=2847 tok/s +2025-11-21 06:20:07,971 - INFO - Epoch 1 Step 2690 (Global: 9190): loss=0.1397, ppl=1.15, grad_norm=3.89, lr=4.17e-07, throughput=2815 tok/s +2025-11-21 06:22:58,312 - INFO - Epoch 1 Step 2700 (Global: 9200): loss=0.1544, ppl=1.17, grad_norm=4.81, lr=4.10e-07, throughput=2818 tok/s +2025-11-21 06:25:57,581 - INFO - Epoch 1 Step 2710 (Global: 9210): loss=0.1358, ppl=1.15, grad_norm=6.06, lr=4.03e-07, throughput=2678 tok/s +2025-11-21 06:28:44,725 - INFO - Epoch 1 Step 2720 (Global: 9220): loss=0.1393, ppl=1.15, grad_norm=2.95, lr=3.97e-07, throughput=2872 tok/s +2025-11-21 06:31:33,121 - INFO - Epoch 1 Step 2730 (Global: 9230): loss=0.1143, ppl=1.12, grad_norm=3.09, lr=3.90e-07, throughput=2850 tok/s +2025-11-21 06:34:22,408 - INFO - Epoch 1 Step 2740 (Global: 9240): loss=0.1381, ppl=1.15, grad_norm=2.66, lr=3.84e-07, throughput=2835 tok/s +2025-11-21 06:37:10,418 - INFO - Epoch 1 Step 2750 (Global: 9250): loss=0.1292, ppl=1.14, grad_norm=2.73, lr=3.77e-07, throughput=2857 tok/s +2025-11-21 06:40:02,940 - INFO - Epoch 1 Step 2760 (Global: 9260): loss=0.1361, ppl=1.15, grad_norm=3.08, lr=3.71e-07, throughput=2782 tok/s +2025-11-21 06:42:52,359 - INFO - Epoch 1 Step 2770 (Global: 9270): loss=0.1138, ppl=1.12, grad_norm=2.72, lr=3.65e-07, throughput=2833 tok/s +2025-11-21 06:45:42,017 - INFO - Epoch 1 Step 2780 (Global: 9280): loss=0.1543, ppl=1.17, grad_norm=3.41, lr=3.58e-07, throughput=2829 tok/s +2025-11-21 06:48:31,143 - INFO - Epoch 1 Step 2790 (Global: 9290): loss=0.1194, ppl=1.13, grad_norm=3.17, lr=3.52e-07, throughput=2838 tok/s +2025-11-21 06:51:20,189 - INFO - Epoch 1 Step 2800 (Global: 9300): loss=0.1324, ppl=1.14, grad_norm=4.16, lr=3.46e-07, throughput=2840 tok/s +2025-11-21 06:54:07,907 - INFO - Epoch 1 Step 2810 (Global: 9310): loss=0.1239, ppl=1.13, grad_norm=6.12, lr=3.40e-07, throughput=2862 tok/s +2025-11-21 06:57:07,456 - INFO - Epoch 1 Step 2820 (Global: 9320): loss=0.1320, ppl=1.14, grad_norm=2.66, lr=3.34e-07, throughput=2673 tok/s +2025-11-21 06:59:56,168 - INFO - Epoch 1 Step 2830 (Global: 9330): loss=0.1385, ppl=1.15, grad_norm=6.06, lr=3.28e-07, throughput=2845 tok/s +2025-11-21 07:02:45,851 - INFO - Epoch 1 Step 2840 (Global: 9340): loss=0.1344, ppl=1.14, grad_norm=2.86, lr=3.22e-07, throughput=2829 tok/s +2025-11-21 07:05:35,306 - INFO - Epoch 1 Step 2850 (Global: 9350): loss=0.1312, ppl=1.14, grad_norm=2.84, lr=3.16e-07, throughput=2833 tok/s +2025-11-21 07:08:24,386 - INFO - Epoch 1 Step 2860 (Global: 9360): loss=0.1240, ppl=1.13, grad_norm=3.62, lr=3.10e-07, throughput=2839 tok/s +2025-11-21 07:11:14,901 - INFO - Epoch 1 Step 2870 (Global: 9370): loss=0.1156, ppl=1.12, grad_norm=3.12, lr=3.05e-07, throughput=2815 tok/s +2025-11-21 07:14:05,250 - INFO - Epoch 1 Step 2880 (Global: 9380): loss=0.1352, ppl=1.14, grad_norm=3.47, lr=2.99e-07, throughput=2818 tok/s +2025-11-21 07:16:54,413 - INFO - Epoch 1 Step 2890 (Global: 9390): loss=0.1138, ppl=1.12, grad_norm=2.94, lr=2.93e-07, throughput=2838 tok/s +2025-11-21 07:19:44,944 - INFO - Epoch 1 Step 2900 (Global: 9400): loss=0.1351, ppl=1.14, grad_norm=3.67, lr=2.88e-07, throughput=2815 tok/s +2025-11-21 07:22:34,862 - INFO - Epoch 1 Step 2910 (Global: 9410): loss=0.1482, ppl=1.16, grad_norm=3.38, lr=2.82e-07, throughput=2825 tok/s +2025-11-21 07:25:26,505 - INFO - Epoch 1 Step 2920 (Global: 9420): loss=0.1198, ppl=1.13, grad_norm=2.75, lr=2.76e-07, throughput=2797 tok/s +2025-11-21 07:28:24,955 - INFO - Epoch 1 Step 2930 (Global: 9430): loss=0.1414, ppl=1.15, grad_norm=4.84, lr=2.71e-07, throughput=2690 tok/s +2025-11-21 07:31:13,890 - INFO - Epoch 1 Step 2940 (Global: 9440): loss=0.1178, ppl=1.12, grad_norm=2.84, lr=2.66e-07, throughput=2841 tok/s +2025-11-21 07:34:03,397 - INFO - Epoch 1 Step 2950 (Global: 9450): loss=0.1262, ppl=1.13, grad_norm=4.03, lr=2.60e-07, throughput=2832 tok/s +2025-11-21 07:36:51,295 - INFO - Epoch 1 Step 2960 (Global: 9460): loss=0.1549, ppl=1.17, grad_norm=3.27, lr=2.55e-07, throughput=2859 tok/s +2025-11-21 07:39:39,499 - INFO - Epoch 1 Step 2970 (Global: 9470): loss=0.1527, ppl=1.17, grad_norm=4.34, lr=2.50e-07, throughput=2854 tok/s +2025-11-21 07:42:28,162 - INFO - Epoch 1 Step 2980 (Global: 9480): loss=0.1468, ppl=1.16, grad_norm=3.80, lr=2.44e-07, throughput=2846 tok/s +2025-11-21 07:45:17,623 - INFO - Epoch 1 Step 2990 (Global: 9490): loss=0.1472, ppl=1.16, grad_norm=2.86, lr=2.39e-07, throughput=2833 tok/s +2025-11-21 07:48:06,350 - INFO - Epoch 1 Step 3000 (Global: 9500): loss=0.1258, ppl=1.13, grad_norm=3.17, lr=2.34e-07, throughput=2845 tok/s +2025-11-21 07:50:54,994 - INFO - Epoch 1 Step 3010 (Global: 9510): loss=0.1483, ppl=1.16, grad_norm=4.06, lr=2.29e-07, throughput=2846 tok/s +2025-11-21 07:53:44,056 - INFO - Epoch 1 Step 3020 (Global: 9520): loss=0.1539, ppl=1.17, grad_norm=4.56, lr=2.24e-07, throughput=2839 tok/s +2025-11-21 07:56:42,734 - INFO - Epoch 1 Step 3030 (Global: 9530): loss=0.1241, ppl=1.13, grad_norm=4.69, lr=2.19e-07, throughput=2686 tok/s +2025-11-21 07:59:31,830 - INFO - Epoch 1 Step 3040 (Global: 9540): loss=0.1309, ppl=1.14, grad_norm=4.44, lr=2.14e-07, throughput=2839 tok/s +2025-11-21 08:02:20,400 - INFO - Epoch 1 Step 3050 (Global: 9550): loss=0.1238, ppl=1.13, grad_norm=3.23, lr=2.10e-07, throughput=2848 tok/s +2025-11-21 08:05:09,985 - INFO - Epoch 1 Step 3060 (Global: 9560): loss=0.1363, ppl=1.15, grad_norm=4.00, lr=2.05e-07, throughput=2830 tok/s +2025-11-21 08:07:58,832 - INFO - Epoch 1 Step 3070 (Global: 9570): loss=0.1444, ppl=1.16, grad_norm=2.97, lr=2.00e-07, throughput=2843 tok/s +2025-11-21 08:10:48,848 - INFO - Epoch 1 Step 3080 (Global: 9580): loss=0.1355, ppl=1.15, grad_norm=4.16, lr=1.95e-07, throughput=2823 tok/s +2025-11-21 08:13:39,611 - INFO - Epoch 1 Step 3090 (Global: 9590): loss=0.1258, ppl=1.13, grad_norm=2.95, lr=1.91e-07, throughput=2811 tok/s +2025-11-21 08:16:28,719 - INFO - Epoch 1 Step 3100 (Global: 9600): loss=0.1377, ppl=1.15, grad_norm=3.67, lr=1.86e-07, throughput=2838 tok/s +2025-11-21 08:19:19,187 - INFO - Epoch 1 Step 3110 (Global: 9610): loss=0.1308, ppl=1.14, grad_norm=3.05, lr=1.82e-07, throughput=2816 tok/s +2025-11-21 08:22:20,273 - INFO - Epoch 1 Step 3120 (Global: 9620): loss=0.1928, ppl=1.21, grad_norm=5.19, lr=1.77e-07, throughput=2651 tok/s +2025-11-21 08:25:10,197 - INFO - Epoch 1 Step 3130 (Global: 9630): loss=0.1565, ppl=1.17, grad_norm=4.06, lr=1.73e-07, throughput=2825 tok/s +2025-11-21 08:27:59,217 - INFO - Epoch 1 Step 3140 (Global: 9640): loss=0.1374, ppl=1.15, grad_norm=5.59, lr=1.68e-07, throughput=2840 tok/s +2025-11-21 08:30:48,235 - INFO - Epoch 1 Step 3150 (Global: 9650): loss=0.1283, ppl=1.14, grad_norm=2.61, lr=1.64e-07, throughput=2840 tok/s +2025-11-21 08:33:36,771 - INFO - Epoch 1 Step 3160 (Global: 9660): loss=0.1388, ppl=1.15, grad_norm=4.06, lr=1.60e-07, throughput=2848 tok/s +2025-11-21 08:36:25,602 - INFO - Epoch 1 Step 3170 (Global: 9670): loss=0.1445, ppl=1.16, grad_norm=2.83, lr=1.56e-07, throughput=2843 tok/s +2025-11-21 08:39:13,752 - INFO - Epoch 1 Step 3180 (Global: 9680): loss=0.1064, ppl=1.11, grad_norm=5.19, lr=1.52e-07, throughput=2855 tok/s +2025-11-21 08:42:02,266 - INFO - Epoch 1 Step 3190 (Global: 9690): loss=0.1268, ppl=1.14, grad_norm=2.98, lr=1.48e-07, throughput=2848 tok/s +2025-11-21 08:44:50,547 - INFO - Epoch 1 Step 3200 (Global: 9700): loss=0.1532, ppl=1.17, grad_norm=3.59, lr=1.44e-07, throughput=2852 tok/s +2025-11-21 08:47:39,567 - INFO - Epoch 1 Step 3210 (Global: 9710): loss=0.1374, ppl=1.15, grad_norm=3.41, lr=1.40e-07, throughput=2840 tok/s +2025-11-21 08:50:27,297 - INFO - Epoch 1 Step 3220 (Global: 9720): loss=0.1393, ppl=1.15, grad_norm=5.41, lr=1.36e-07, throughput=2862 tok/s +2025-11-21 08:53:25,718 - INFO - Epoch 1 Step 3230 (Global: 9730): loss=0.1557, ppl=1.17, grad_norm=4.78, lr=1.32e-07, throughput=2690 tok/s +2025-11-21 08:56:15,630 - INFO - Epoch 1 Step 3240 (Global: 9740): loss=0.1332, ppl=1.14, grad_norm=2.50, lr=1.28e-07, throughput=2825 tok/s +2025-11-21 08:59:04,433 - INFO - Epoch 1 Step 3250 (Global: 9750): loss=0.1260, ppl=1.13, grad_norm=2.27, lr=1.24e-07, throughput=2844 tok/s +2025-11-21 09:01:53,305 - INFO - Epoch 1 Step 3260 (Global: 9760): loss=0.1333, ppl=1.14, grad_norm=3.39, lr=1.21e-07, throughput=2842 tok/s +2025-11-21 09:04:42,103 - INFO - Epoch 1 Step 3270 (Global: 9770): loss=0.1340, ppl=1.14, grad_norm=4.25, lr=1.17e-07, throughput=2844 tok/s +2025-11-21 09:07:31,259 - INFO - Epoch 1 Step 3280 (Global: 9780): loss=0.1307, ppl=1.14, grad_norm=3.55, lr=1.13e-07, throughput=2838 tok/s +2025-11-21 09:10:19,818 - INFO - Epoch 1 Step 3290 (Global: 9790): loss=0.1352, ppl=1.14, grad_norm=3.58, lr=1.10e-07, throughput=2848 tok/s +2025-11-21 09:13:09,995 - INFO - Epoch 1 Step 3300 (Global: 9800): loss=0.1356, ppl=1.15, grad_norm=4.47, lr=1.06e-07, throughput=2821 tok/s +2025-11-21 09:15:59,508 - INFO - Epoch 1 Step 3310 (Global: 9810): loss=0.1278, ppl=1.14, grad_norm=3.66, lr=1.03e-07, throughput=2832 tok/s +2025-11-21 09:18:50,119 - INFO - Epoch 1 Step 3320 (Global: 9820): loss=0.1639, ppl=1.18, grad_norm=4.91, lr=9.97e-08, throughput=2813 tok/s +2025-11-21 09:21:39,947 - INFO - Epoch 1 Step 3330 (Global: 9830): loss=0.1297, ppl=1.14, grad_norm=3.45, lr=9.64e-08, throughput=2826 tok/s +2025-11-21 09:24:30,161 - INFO - Epoch 1 Step 3340 (Global: 9840): loss=0.1276, ppl=1.14, grad_norm=2.81, lr=9.32e-08, throughput=2820 tok/s +2025-11-21 09:27:32,151 - INFO - Epoch 1 Step 3350 (Global: 9850): loss=0.1419, ppl=1.15, grad_norm=4.53, lr=9.00e-08, throughput=2638 tok/s +2025-11-21 09:30:23,798 - INFO - Epoch 1 Step 3360 (Global: 9860): loss=0.1150, ppl=1.12, grad_norm=2.64, lr=8.68e-08, throughput=2796 tok/s +2025-11-21 09:33:16,004 - INFO - Epoch 1 Step 3370 (Global: 9870): loss=0.1235, ppl=1.13, grad_norm=5.25, lr=8.37e-08, throughput=2787 tok/s +2025-11-21 09:36:06,705 - INFO - Epoch 1 Step 3380 (Global: 9880): loss=0.1371, ppl=1.15, grad_norm=4.03, lr=8.07e-08, throughput=2812 tok/s +2025-11-21 09:38:58,114 - INFO - Epoch 1 Step 3390 (Global: 9890): loss=0.1196, ppl=1.13, grad_norm=4.06, lr=7.77e-08, throughput=2800 tok/s +2025-11-21 09:41:49,165 - INFO - Epoch 1 Step 3400 (Global: 9900): loss=0.1222, ppl=1.13, grad_norm=2.78, lr=7.48e-08, throughput=2806 tok/s +2025-11-21 09:44:39,012 - INFO - Epoch 1 Step 3410 (Global: 9910): loss=0.1278, ppl=1.14, grad_norm=2.97, lr=7.20e-08, throughput=2826 tok/s +2025-11-21 09:47:29,085 - INFO - Epoch 1 Step 3420 (Global: 9920): loss=0.1225, ppl=1.13, grad_norm=2.80, lr=6.92e-08, throughput=2822 tok/s +2025-11-21 09:50:19,630 - INFO - Epoch 1 Step 3430 (Global: 9930): loss=0.1312, ppl=1.14, grad_norm=3.88, lr=6.64e-08, throughput=2815 tok/s +2025-11-21 09:53:07,915 - INFO - Epoch 1 Step 3440 (Global: 9940): loss=0.1254, ppl=1.13, grad_norm=5.16, lr=6.37e-08, throughput=2852 tok/s +2025-11-21 09:55:59,010 - INFO - Epoch 1 Step 3450 (Global: 9950): loss=0.1341, ppl=1.14, grad_norm=3.27, lr=6.11e-08, throughput=2805 tok/s +2025-11-21 09:58:49,680 - INFO - Epoch 1 Step 3460 (Global: 9960): loss=0.1429, ppl=1.15, grad_norm=4.88, lr=5.85e-08, throughput=2812 tok/s +2025-11-21 10:01:39,473 - INFO - Epoch 1 Step 3470 (Global: 9970): loss=0.1372, ppl=1.15, grad_norm=2.62, lr=5.60e-08, throughput=2827 tok/s +2025-11-21 10:04:41,204 - INFO - Epoch 1 Step 3480 (Global: 9980): loss=0.1329, ppl=1.14, grad_norm=2.81, lr=5.35e-08, throughput=2641 tok/s +2025-11-21 10:07:31,304 - INFO - Epoch 1 Step 3490 (Global: 9990): loss=0.1287, ppl=1.14, grad_norm=2.83, lr=5.11e-08, throughput=2822 tok/s +2025-11-21 10:10:21,860 - INFO - Epoch 1 Step 3500 (Global: 10000): loss=0.1457, ppl=1.16, grad_norm=5.12, lr=4.87e-08, throughput=2814 tok/s +2025-11-21 10:10:21,860 - INFO - +Running validation at step 10000... +2025-11-21 10:20:17,734 - INFO - Validation loss: 0.1336, perplexity: 1.14 +2025-11-21 10:20:17,735 - INFO - Qualitative metrics (n=5): +2025-11-21 10:20:17,735 - INFO - BLEU: 0.8406 +2025-11-21 10:20:17,735 - INFO - METEOR: 0.9138 +2025-11-21 10:20:17,735 - INFO - Edit Distance: 0.1422 +2025-11-21 10:20:17,735 - INFO - F-measure: 0.9229 +2025-11-21 10:20:17,735 - INFO - +====================================================================== +2025-11-21 10:20:17,735 - INFO - Qualitative Evaluation Samples: +2025-11-21 10:20:17,736 - INFO - ====================================================================== +2025-11-21 10:20:17,736 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-21 10:20:17,736 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-21 10:20:17,736 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s a yes-or-wubert. But i...' +2025-11-21 10:20:17,736 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-21 10:20:17,736 - INFO - ---------------------------------------------------------------------- +2025-11-21 10:20:17,736 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-21 10:20:17,737 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-21 10:20:17,737 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-21 10:20:17,737 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-21 10:20:17,737 - INFO - ---------------------------------------------------------------------- +2025-11-21 10:20:17,737 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-21 10:20:17,737 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-21 10:20:17,737 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-21 10:20:17,737 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-21 10:20:17,737 - INFO - ---------------------------------------------------------------------- +2025-11-21 10:20:17,738 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-21 10:20:17,738 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-21 10:20:17,738 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-21 10:20:17,738 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-21 10:20:17,738 - INFO - ---------------------------------------------------------------------- +2025-11-21 10:20:17,738 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-21 10:20:17,738 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-21 10:20:17,738 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian Abyss ...' +2025-11-21 10:20:17,739 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-21 10:20:17,739 - INFO - ---------------------------------------------------------------------- +2025-11-21 10:20:17,741 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_10000.jsonl +2025-11-21 10:21:11,535 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/best_checkpoint.pt +2025-11-21 10:21:11,556 - INFO - New best validation loss: 0.1336, perplexity: 1.14 +2025-11-21 10:24:04,441 - INFO - Epoch 1 Step 3510 (Global: 10010): loss=0.1391, ppl=1.15, grad_norm=3.14, lr=4.64e-08, throughput=2777 tok/s +2025-11-21 10:26:57,134 - INFO - Epoch 1 Step 3520 (Global: 10020): loss=0.1323, ppl=1.14, grad_norm=5.00, lr=4.42e-08, throughput=2780 tok/s +2025-11-21 10:29:50,554 - INFO - Epoch 1 Step 3530 (Global: 10030): loss=0.1379, ppl=1.15, grad_norm=3.50, lr=4.20e-08, throughput=2768 tok/s +2025-11-21 10:32:44,477 - INFO - Epoch 1 Step 3540 (Global: 10040): loss=0.1440, ppl=1.15, grad_norm=5.09, lr=3.98e-08, throughput=2760 tok/s +2025-11-21 10:35:35,523 - INFO - Epoch 1 Step 3550 (Global: 10050): loss=0.1383, ppl=1.15, grad_norm=3.70, lr=3.78e-08, throughput=2806 tok/s +2025-11-21 10:38:34,117 - INFO - Epoch 1 Step 3560 (Global: 10060): loss=0.1341, ppl=1.14, grad_norm=2.69, lr=3.57e-08, throughput=2688 tok/s +2025-11-21 10:41:21,935 - INFO - Epoch 1 Step 3570 (Global: 10070): loss=0.1344, ppl=1.14, grad_norm=3.72, lr=3.38e-08, throughput=2860 tok/s +2025-11-21 10:44:10,208 - INFO - Epoch 1 Step 3580 (Global: 10080): loss=0.1333, ppl=1.14, grad_norm=2.95, lr=3.18e-08, throughput=2853 tok/s +2025-11-21 10:46:58,895 - INFO - Epoch 1 Step 3590 (Global: 10090): loss=0.1407, ppl=1.15, grad_norm=3.31, lr=3.00e-08, throughput=2846 tok/s +2025-11-21 10:49:47,291 - INFO - Epoch 1 Step 3600 (Global: 10100): loss=0.1303, ppl=1.14, grad_norm=2.98, lr=2.82e-08, throughput=2850 tok/s +2025-11-21 10:52:35,344 - INFO - Epoch 1 Step 3610 (Global: 10110): loss=0.1440, ppl=1.15, grad_norm=3.50, lr=2.64e-08, throughput=2856 tok/s +2025-11-21 10:55:22,799 - INFO - Epoch 1 Step 3620 (Global: 10120): loss=0.1195, ppl=1.13, grad_norm=2.98, lr=2.47e-08, throughput=2866 tok/s +2025-11-21 10:58:11,275 - INFO - Epoch 1 Step 3630 (Global: 10130): loss=0.1362, ppl=1.15, grad_norm=3.23, lr=2.31e-08, throughput=2849 tok/s +2025-11-21 11:00:59,535 - INFO - Epoch 1 Step 3640 (Global: 10140): loss=0.1227, ppl=1.13, grad_norm=3.44, lr=2.15e-08, throughput=2853 tok/s +2025-11-21 11:03:48,676 - INFO - Epoch 1 Step 3650 (Global: 10150): loss=0.1464, ppl=1.16, grad_norm=3.31, lr=2.00e-08, throughput=2838 tok/s +2025-11-21 11:06:46,601 - INFO - Epoch 1 Step 3660 (Global: 10160): loss=0.1424, ppl=1.15, grad_norm=3.44, lr=1.85e-08, throughput=2698 tok/s +2025-11-21 11:09:35,134 - INFO - Epoch 1 Step 3670 (Global: 10170): loss=0.1362, ppl=1.15, grad_norm=3.67, lr=1.71e-08, throughput=2848 tok/s +2025-11-21 11:12:24,251 - INFO - Epoch 1 Step 3680 (Global: 10180): loss=0.1291, ppl=1.14, grad_norm=3.89, lr=1.58e-08, throughput=2838 tok/s +2025-11-21 11:15:13,102 - INFO - Epoch 1 Step 3690 (Global: 10190): loss=0.1591, ppl=1.17, grad_norm=3.33, lr=1.45e-08, throughput=2843 tok/s +2025-11-21 11:18:02,812 - INFO - Epoch 1 Step 3700 (Global: 10200): loss=0.1411, ppl=1.15, grad_norm=4.53, lr=1.32e-08, throughput=2828 tok/s +2025-11-21 11:20:51,897 - INFO - Epoch 1 Step 3710 (Global: 10210): loss=0.1665, ppl=1.18, grad_norm=3.03, lr=1.20e-08, throughput=2839 tok/s +2025-11-21 11:23:40,991 - INFO - Epoch 1 Step 3720 (Global: 10220): loss=0.1376, ppl=1.15, grad_norm=2.80, lr=1.09e-08, throughput=2839 tok/s +2025-11-21 11:26:29,567 - INFO - Epoch 1 Step 3730 (Global: 10230): loss=0.1268, ppl=1.14, grad_norm=2.95, lr=9.81e-09, throughput=2847 tok/s +2025-11-21 11:29:18,203 - INFO - Epoch 1 Step 3740 (Global: 10240): loss=0.1347, ppl=1.14, grad_norm=3.02, lr=8.79e-09, throughput=2846 tok/s +2025-11-21 11:32:06,369 - INFO - Epoch 1 Step 3750 (Global: 10250): loss=0.1260, ppl=1.13, grad_norm=3.59, lr=7.83e-09, throughput=2854 tok/s +2025-11-21 11:34:54,377 - INFO - Epoch 1 Step 3760 (Global: 10260): loss=0.1352, ppl=1.14, grad_norm=3.48, lr=6.92e-09, throughput=2857 tok/s +2025-11-21 11:37:43,375 - INFO - Epoch 1 Step 3770 (Global: 10270): loss=0.1582, ppl=1.17, grad_norm=5.31, lr=6.06e-09, throughput=2840 tok/s +2025-11-21 11:40:31,534 - INFO - Epoch 1 Step 3780 (Global: 10280): loss=0.1668, ppl=1.18, grad_norm=4.16, lr=5.27e-09, throughput=2854 tok/s +2025-11-21 11:43:30,287 - INFO - Epoch 1 Step 3790 (Global: 10290): loss=0.1366, ppl=1.15, grad_norm=4.75, lr=4.53e-09, throughput=2685 tok/s +2025-11-21 11:46:19,326 - INFO - Epoch 1 Step 3800 (Global: 10300): loss=0.1478, ppl=1.16, grad_norm=4.75, lr=3.84e-09, throughput=2840 tok/s +2025-11-21 11:49:07,885 - INFO - Epoch 1 Step 3810 (Global: 10310): loss=0.1308, ppl=1.14, grad_norm=2.69, lr=3.21e-09, throughput=2848 tok/s +2025-11-21 11:51:55,579 - INFO - Epoch 1 Step 3820 (Global: 10320): loss=0.1296, ppl=1.14, grad_norm=2.50, lr=2.64e-09, throughput=2862 tok/s +2025-11-21 11:54:44,478 - INFO - Epoch 1 Step 3830 (Global: 10330): loss=0.1508, ppl=1.16, grad_norm=3.83, lr=2.12e-09, throughput=2842 tok/s +2025-11-21 11:57:32,826 - INFO - Epoch 1 Step 3840 (Global: 10340): loss=0.1372, ppl=1.15, grad_norm=3.47, lr=1.66e-09, throughput=2851 tok/s +2025-11-21 12:00:20,664 - INFO - Epoch 1 Step 3850 (Global: 10350): loss=0.1360, ppl=1.15, grad_norm=2.98, lr=1.26e-09, throughput=2860 tok/s +2025-11-21 12:03:08,562 - INFO - Epoch 1 Step 3860 (Global: 10360): loss=0.1216, ppl=1.13, grad_norm=2.97, lr=9.12e-10, throughput=2859 tok/s +2025-11-21 12:05:56,370 - INFO - Epoch 1 Step 3870 (Global: 10370): loss=0.1482, ppl=1.16, grad_norm=6.97, lr=6.20e-10, throughput=2860 tok/s +2025-11-21 12:08:54,750 - INFO - Epoch 1 Step 3880 (Global: 10380): loss=0.1275, ppl=1.14, grad_norm=2.95, lr=3.84e-10, throughput=2691 tok/s +2025-11-21 12:11:44,598 - INFO - Epoch 1 Step 3890 (Global: 10390): loss=0.1100, ppl=1.12, grad_norm=2.70, lr=2.05e-10, throughput=2826 tok/s +2025-11-21 12:14:34,988 - INFO - Epoch 1 Step 3900 (Global: 10400): loss=0.1316, ppl=1.14, grad_norm=3.11, lr=8.11e-11, throughput=2817 tok/s +2025-11-21 12:17:25,567 - INFO - Epoch 1 Step 3910 (Global: 10410): loss=0.1391, ppl=1.15, grad_norm=3.70, lr=1.38e-11, throughput=2814 tok/s +2025-11-21 12:19:12,429 - INFO - Flushing 8 remainder batches from gradient accumulation +2025-11-21 12:19:12,431 - INFO - Rescaling gradients by 1.50x (compensating for 8/12 batches) +2025-11-21 12:19:12,648 - INFO - Remainder batch: loss=0.1286, ppl=1.14, grad_norm=3.78 +2025-11-21 12:19:12,659 - INFO - Epoch 1 training: loss=0.1342, ppl=1.14, grad_norm=3.70, throughput=2738 tok/s (68655.8s total) +2025-11-21 12:19:12,660 - INFO - +Running final validation... +2025-11-21 12:29:15,838 - INFO - Validation loss: 0.1336, perplexity: 1.14 +2025-11-21 12:29:15,839 - INFO - Qualitative metrics (n=5): +2025-11-21 12:29:15,840 - INFO - BLEU: 0.8408 +2025-11-21 12:29:15,840 - INFO - METEOR: 0.9152 +2025-11-21 12:29:15,840 - INFO - Edit Distance: 0.1440 +2025-11-21 12:29:15,840 - INFO - F-measure: 0.9169 +2025-11-21 12:29:15,840 - INFO - +====================================================================== +2025-11-21 12:29:15,841 - INFO - Qualitative Evaluation Samples: +2025-11-21 12:29:15,841 - INFO - ====================================================================== +2025-11-21 12:29:15,841 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-21 12:29:15,841 - INFO - Context: [Image: sample_141920_chunk_1] + " +Free OCR." +2025-11-21 12:29:15,841 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s a yes-or-wubert. But i...' +2025-11-21 12:29:15,842 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-21 12:29:15,842 - INFO - ---------------------------------------------------------------------- +2025-11-21 12:29:15,842 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-21 12:29:15,842 - INFO - Context: [Image: sample_170543_chunk_2] + " +Free OCR." +2025-11-21 12:29:15,843 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-21 12:29:15,843 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-21 12:29:15,843 - INFO - ---------------------------------------------------------------------- +2025-11-21 12:29:15,843 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-21 12:29:15,843 - INFO - Context: [Image: sample_107152_chunk_9] + " +Free OCR." +2025-11-21 12:29:15,844 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-21 12:29:15,844 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-21 12:29:15,844 - INFO - ---------------------------------------------------------------------- +2025-11-21 12:29:15,844 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-21 12:29:15,844 - INFO - Context: [Image: sample_069148_chunk_0] + " +Free OCR." +2025-11-21 12:29:15,844 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-21 12:29:15,845 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-21 12:29:15,845 - INFO - ---------------------------------------------------------------------- +2025-11-21 12:29:15,845 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-21 12:29:15,845 - INFO - Context: [Image: sample_103176_chunk_4] + " +Free OCR." +2025-11-21 12:29:15,845 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores | [ 132 ] |\n| Ultima Underworld: The Stygian Abyss ...' +2025-11-21 12:29:15,846 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-21 12:29:15,846 - INFO - ---------------------------------------------------------------------- +2025-11-21 12:29:15,847 - INFO - +Qualitative samples saved to: outputs/production_vision_tiny_reconstruction_20251118_214704/qualitative_step_10417.jsonl +2025-11-21 12:29:16,514 - INFO - +Training complete! +2025-11-21 12:30:05,937 - INFO - Saved checkpoint to outputs/production_vision_tiny_reconstruction_20251118_214704/final_checkpoint.pt +2025-11-21 12:30:05,947 - INFO - Final checkpoint saved to outputs/production_vision_tiny_reconstruction_20251118_214704/final_checkpoint.pt +2025-11-21 12:30:05,948 - INFO - Best validation loss: 0.1336, perplexity: 1.14 +2025-11-21 12:30:05,948 - INFO - Checkpoints saved to outputs/production_vision_tiny_reconstruction_20251118_214704 +2025-11-21 12:30:06,577 - INFO - W&B run finished