| 2025-11-29 17:16:02,813 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train_arrow', output_dir='outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554', objective='lm', val_data_path='data/training/splits_510k/val_arrow', max_samples=None, vision_mode='base', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='\nFree OCR.', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=True, compression_target=None, conv_kernel=5, timestamp=None, batch_size=12, gradient_accumulation_steps=4, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=True, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint=None, resume=None, init_from_checkpoint='./outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt', allow_objective_switch=True, aux_loss_weight=0.5, num_workers=8, prefetch_factor=2, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=False, compile_mode='default', use_optimized_model=True, use_encoder_checkpointing=True, use_decoder_checkpointing=True, use_8bit_optimizer=True) | |
| 2025-11-29 17:16:02,813 - WARNING - --train_projection is deprecated. Use --train_encoder instead. Automatically setting --train_encoder=True. | |
| 2025-11-29 17:16:02,813 - INFO - Will initialize model from checkpoint: ./outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt | |
| 2025-11-29 17:16:02,814 - INFO - Using custom vision prompt: ''\nFree OCR.'' | |
| 2025-11-29 17:16:02,814 - INFO - Setting random seed: 42 | |
| 2025-11-29 17:16:03,893 - INFO - Auto-generated W&B run name: production_vision_base_lm_20251129_171603 | |
| 2025-11-29 17:16:05,563 - INFO - Initialized W&B run: vision-compression-2/production_vision_base_lm_20251129_171603 (ID: m8g7fh9k) | |
| 2025-11-29 17:16:05,564 - INFO - Loading model and tokenizer... | |
| 2025-11-29 17:16:16,942 - INFO - Enabling decoder gradient checkpointing... | |
| 2025-11-29 17:16:16,951 - INFO - β Decoder checkpointing enabled for 12 transformer layers | |
| 2025-11-29 17:16:16,951 - INFO - Expected: ~30-50% activation memory reduction, ~15-20% compute overhead | |
| 2025-11-29 17:16:16,983 - INFO - Created Vision Compression trainer (mode: base) | |
| 2025-11-29 17:16:16,984 - INFO - Training objective: lm | |
| 2025-11-29 17:16:16,984 - INFO - | |
| ================================================================================ | |
| 2025-11-29 17:16:16,984 - INFO - TWO-STAGE TRAINING: Loading Stage 1 checkpoint | |
| 2025-11-29 17:16:16,984 - INFO - ================================================================================ | |
| 2025-11-29 17:16:16,984 - INFO - Peeking checkpoint metadata from outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt | |
| 2025-11-29 17:16:30,184 - WARNING - Checkpoint best_checkpoint.pt has no format_version field. Assuming compatibility with current version 2.0. This checkpoint was created before versioning was added. | |
| 2025-11-29 17:16:30,185 - INFO - Checkpoint metadata: epoch=0, batch_idx=239999, global_step=10000 | |
| 2025-11-29 17:16:30,185 - INFO - W&B run ID: 1jsg7rd3 | |
| 2025-11-29 17:16:30,368 - INFO - β Objective switch: reconstruction β lm (two-stage training) | |
| 2025-11-29 17:16:30,371 - INFO - Loading model weights for two-stage training from outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt | |
| 2025-11-29 17:16:43,955 - WARNING - Checkpoint best_checkpoint.pt has no format_version field. Assuming compatibility with current version 2.0. This checkpoint was created before versioning was added. | |
| 2025-11-29 17:16:43,978 - INFO - torch.compile mismatch: checkpoint=compiled, model=uncompiled. Normalizing keys by removing _orig_mod. prefix. | |
| 2025-11-29 17:16:44,074 - INFO - β Skipping optimizer/scheduler/RNG states (two-stage training) | |
| 2025-11-29 17:16:44,129 - INFO - | |
| Stage 1 β Stage 2 Transition: | |
| 2025-11-29 17:16:44,132 - INFO - Stage 1 checkpoint: ./outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt | |
| 2025-11-29 17:16:44,132 - INFO - Stage 1 regime: vision | |
| 2025-11-29 17:16:44,132 - INFO - Stage 1 objective: reconstruction | |
| 2025-11-29 17:16:44,132 - INFO - Stage 1 epoch: 0 | |
| 2025-11-29 17:16:44,133 - INFO - Stage 1 best_val_loss: 0.03180741931886878 | |
| 2025-11-29 17:16:44,133 - INFO - Stage 1 W&B run: 1jsg7rd3 | |
| 2025-11-29 17:16:44,133 - INFO - | |
| Stage 2 regime: vision β MATCH | |
| 2025-11-29 17:16:44,133 - INFO - Stage 2 objective: lm (CHANGED from reconstruction) | |
| 2025-11-29 17:16:44,133 - INFO - | |
| β Successfully loaded model weights from Stage 1 | |
| 2025-11-29 17:16:44,134 - INFO - β Fresh optimizer will be created for Stage 2 | |
| 2025-11-29 17:16:44,134 - INFO - β New W&B run will track Stage 2 | |
| 2025-11-29 17:16:44,134 - INFO - ================================================================================ | |
| 2025-11-29 17:16:44,172 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640 | |
| 2025-11-29 17:16:44,172 - INFO - Logged Stage 1 metadata to W&B config for tracking | |
| 2025-11-29 17:16:44,172 - INFO - Loading training data from data/training/splits_510k/train_arrow | |
| 2025-11-29 17:16:44,173 - INFO - Detected Arrow format: data/training/splits_510k/train_arrow | |
| 2025-11-29 17:16:44,173 - INFO - Loading Arrow dataset from data/training/splits_510k/train_arrow (memory-mapped) | |
| 2025-11-29 17:16:44,408 - INFO - Loaded 500,000 samples from data/training/splits_510k/train_arrow (memory-mapped) | |
| 2025-11-29 17:16:44,409 - INFO - Vision mode: base (273 tokens, 1024x1024) | |
| 2025-11-29 17:16:44,431 - INFO - Loading validation data from data/training/splits_510k/val_arrow | |
| 2025-11-29 17:16:44,432 - INFO - Detected Arrow format: data/training/splits_510k/val_arrow | |
| 2025-11-29 17:16:44,432 - INFO - Loading Arrow dataset from data/training/splits_510k/val_arrow (memory-mapped) | |
| 2025-11-29 17:16:44,443 - INFO - Loaded 10,000 samples from data/training/splits_510k/val_arrow (memory-mapped) | |
| 2025-11-29 17:16:44,443 - INFO - Vision mode: base (273 tokens, 1024x1024) | |
| 2025-11-29 17:16:46,279 - INFO - Created 8-bit AdamW optimizer (bitsandbytes) with differential LR: | |
| Encoder: 474 param tensors @ lr=1e-05 | |
| Decoder: 2236 param tensors @ lr=0.0001 | |
| Memory savings: ~75% optimizer state (16.8GB for 2.8B params) | |
| Expected overhead: ~2-5% | |
| 2025-11-29 17:16:46,279 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 | |
| 2025-11-29 17:16:46,289 - INFO - Logged optimizer config to W&B: type=adamw_8bit, memory=6.21GB | |
| 2025-11-29 17:16:46,289 - INFO - Starting training loop... | |
| 2025-11-29 17:16:46,289 - INFO - | |
| ====================================================================== | |
| 2025-11-29 17:16:46,289 - INFO - Running initial validation (before any training)... | |
| 2025-11-29 17:16:46,289 - INFO - ====================================================================== | |
| 2025-11-29 17:28:20,765 - WARNING - NLTK wordnet data missing - METEOR score unavailable. Run: python -m nltk.downloader wordnet omw-1.4 | |
| 2025-11-29 17:28:20,786 - INFO - Validation loss: 3.1288, perplexity: 22.85 | |
| 2025-11-29 17:28:20,786 - INFO - | |
| ====================================================================== | |
| 2025-11-29 17:28:20,787 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-29 17:28:20,787 - INFO - ====================================================================== | |
| 2025-11-29 17:28:20,787 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-29 17:28:20,787 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-29 17:28:20,788 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-29 17:28:20,788 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' | |
| 2025-11-29 17:28:20,788 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-29 17:28:20,788 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-29 17:28:20,789 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-29 17:28:20,789 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-29 17:28:20,789 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' | |
| 2025-11-29 17:28:20,789 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-29 17:28:20,789 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-29 17:28:20,789 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-29 17:28:20,790 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-29 17:28:20,790 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." | |
| 2025-11-29 17:28:20,790 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-29 17:28:20,790 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-29 17:28:20,791 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-29 17:28:20,791 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-29 17:28:20,791 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' | |
| 2025-11-29 17:28:20,791 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-29 17:28:20,792 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-29 17:28:20,792 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-29 17:28:20,792 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-29 17:28:20,792 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' | |
| 2025-11-29 17:28:20,793 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-29 17:28:20,794 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/qualitative_step_0.jsonl | |
| 2025-11-29 17:28:21,616 - INFO - Initial validation - Loss: 3.1288, Perplexity: 22.85 | |
| 2025-11-29 17:28:21,617 - INFO - ====================================================================== | |
| 2025-11-29 17:28:25,109 - INFO - Cleared GPU memory cache after initial validation | |
| 2025-11-29 17:28:25,111 - INFO - | |
| ====================================================================== | |
| 2025-11-29 17:28:25,111 - INFO - Epoch 1/1 | |
| 2025-11-29 17:28:25,112 - INFO - ====================================================================== | |
| 2025-11-29 17:28:31,542 - WARNING - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`transformers. | |
| 2025-11-29 17:28:33,442 - INFO - Effective context tokens (per-sample): 278 | Compression ratio: 3.60x | |
| 2025-11-29 17:28:33,442 - INFO - Target tokens per sample: 1000 | |
| 2025-11-29 17:32:03,145 - INFO - Epoch 1 Step 10 (Global: 10): loss=2.1274, ppl=8.39, grad_norm=4.28, lr=1.09e-06, throughput=2202 tok/s | |
| 2025-11-29 17:35:19,689 - INFO - Epoch 1 Step 20 (Global: 20): loss=2.1621, ppl=8.69, grad_norm=2.50, lr=1.17e-06, throughput=2442 tok/s | |
| 2025-11-29 17:38:36,106 - INFO - Epoch 1 Step 30 (Global: 30): loss=2.2290, ppl=9.29, grad_norm=1.88, lr=1.26e-06, throughput=2444 tok/s | |
| 2025-11-29 17:41:49,451 - INFO - Epoch 1 Step 40 (Global: 40): loss=2.1152, ppl=8.29, grad_norm=1.58, lr=1.35e-06, throughput=2483 tok/s | |
| 2025-11-29 17:45:05,099 - INFO - Epoch 1 Step 50 (Global: 50): loss=2.0094, ppl=7.46, grad_norm=1.31, lr=1.43e-06, throughput=2453 tok/s | |
| 2025-11-29 17:48:21,988 - INFO - Epoch 1 Step 60 (Global: 60): loss=2.1351, ppl=8.46, grad_norm=1.66, lr=1.52e-06, throughput=2438 tok/s | |
| 2025-11-29 17:51:35,495 - INFO - Epoch 1 Step 70 (Global: 70): loss=1.8202, ppl=6.17, grad_norm=1.53, lr=1.61e-06, throughput=2481 tok/s | |
| 2025-11-29 17:54:49,728 - INFO - Epoch 1 Step 80 (Global: 80): loss=2.0637, ppl=7.87, grad_norm=1.28, lr=1.69e-06, throughput=2471 tok/s | |
| 2025-11-29 17:58:04,895 - INFO - Epoch 1 Step 90 (Global: 90): loss=2.0585, ppl=7.83, grad_norm=1.36, lr=1.78e-06, throughput=2459 tok/s | |
| 2025-11-29 18:01:18,818 - INFO - Epoch 1 Step 100 (Global: 100): loss=1.9769, ppl=7.22, grad_norm=1.55, lr=1.86e-06, throughput=2475 tok/s | |
| 2025-11-29 18:04:33,974 - INFO - Epoch 1 Step 110 (Global: 110): loss=1.7442, ppl=5.72, grad_norm=1.52, lr=1.95e-06, throughput=2460 tok/s | |
| 2025-11-29 18:07:50,669 - INFO - Epoch 1 Step 120 (Global: 120): loss=2.0509, ppl=7.77, grad_norm=1.41, lr=2.04e-06, throughput=2440 tok/s | |
| 2025-11-29 18:11:06,472 - INFO - Epoch 1 Step 130 (Global: 130): loss=2.0371, ppl=7.67, grad_norm=1.34, lr=2.12e-06, throughput=2451 tok/s | |
| 2025-11-29 18:14:22,123 - INFO - Epoch 1 Step 140 (Global: 140): loss=1.9153, ppl=6.79, grad_norm=1.33, lr=2.21e-06, throughput=2454 tok/s | |
| 2025-11-29 18:17:37,740 - INFO - Epoch 1 Step 150 (Global: 150): loss=1.9368, ppl=6.94, grad_norm=1.39, lr=2.30e-06, throughput=2454 tok/s | |
| 2025-11-29 18:20:53,718 - INFO - Epoch 1 Step 160 (Global: 160): loss=1.8910, ppl=6.63, grad_norm=1.56, lr=2.38e-06, throughput=2449 tok/s | |
| 2025-11-29 18:24:09,301 - INFO - Epoch 1 Step 170 (Global: 170): loss=1.7459, ppl=5.73, grad_norm=1.34, lr=2.47e-06, throughput=2454 tok/s | |
| 2025-11-29 18:27:24,667 - INFO - Epoch 1 Step 180 (Global: 180): loss=2.1782, ppl=8.83, grad_norm=1.45, lr=2.56e-06, throughput=2457 tok/s | |
| 2025-11-29 18:30:40,193 - INFO - Epoch 1 Step 190 (Global: 190): loss=2.0716, ppl=7.94, grad_norm=1.69, lr=2.64e-06, throughput=2455 tok/s | |
| 2025-11-29 18:33:56,601 - INFO - Epoch 1 Step 200 (Global: 200): loss=1.9874, ppl=7.30, grad_norm=1.84, lr=2.73e-06, throughput=2444 tok/s | |
| 2025-11-29 18:37:11,208 - INFO - Epoch 1 Step 210 (Global: 210): loss=1.9282, ppl=6.88, grad_norm=1.23, lr=2.82e-06, throughput=2467 tok/s | |
| 2025-11-29 18:40:26,396 - INFO - Epoch 1 Step 220 (Global: 220): loss=1.8215, ppl=6.18, grad_norm=1.49, lr=2.90e-06, throughput=2459 tok/s | |
| 2025-11-29 18:43:42,953 - INFO - Epoch 1 Step 230 (Global: 230): loss=1.8632, ppl=6.44, grad_norm=1.57, lr=2.99e-06, throughput=2442 tok/s | |
| 2025-11-29 18:46:58,794 - INFO - Epoch 1 Step 240 (Global: 240): loss=1.7691, ppl=5.87, grad_norm=1.43, lr=3.07e-06, throughput=2451 tok/s | |
| 2025-11-29 18:50:16,651 - INFO - Epoch 1 Step 250 (Global: 250): loss=1.6357, ppl=5.13, grad_norm=1.59, lr=3.16e-06, throughput=2426 tok/s | |
| 2025-11-29 18:53:37,352 - INFO - Epoch 1 Step 260 (Global: 260): loss=1.7422, ppl=5.71, grad_norm=1.55, lr=3.25e-06, throughput=2392 tok/s | |
| 2025-11-29 18:56:52,501 - INFO - Epoch 1 Step 270 (Global: 270): loss=1.6829, ppl=5.38, grad_norm=1.48, lr=3.33e-06, throughput=2460 tok/s | |
| 2025-11-29 19:00:07,919 - INFO - Epoch 1 Step 280 (Global: 280): loss=1.8002, ppl=6.05, grad_norm=1.43, lr=3.42e-06, throughput=2456 tok/s | |
| 2025-11-29 19:03:24,530 - INFO - Epoch 1 Step 290 (Global: 290): loss=1.8497, ppl=6.36, grad_norm=1.38, lr=3.51e-06, throughput=2441 tok/s | |
| 2025-11-29 19:06:40,739 - INFO - Epoch 1 Step 300 (Global: 300): loss=1.8971, ppl=6.67, grad_norm=1.57, lr=3.59e-06, throughput=2446 tok/s | |
| 2025-11-29 19:09:56,630 - INFO - Epoch 1 Step 310 (Global: 310): loss=1.8019, ppl=6.06, grad_norm=2.02, lr=3.68e-06, throughput=2450 tok/s | |
| 2025-11-29 19:13:11,772 - INFO - Epoch 1 Step 320 (Global: 320): loss=1.7653, ppl=5.84, grad_norm=1.38, lr=3.77e-06, throughput=2460 tok/s | |
| 2025-11-29 19:16:26,584 - INFO - Epoch 1 Step 330 (Global: 330): loss=1.9078, ppl=6.74, grad_norm=1.65, lr=3.85e-06, throughput=2464 tok/s | |
| 2025-11-29 19:19:41,570 - INFO - Epoch 1 Step 340 (Global: 340): loss=1.8745, ppl=6.52, grad_norm=1.54, lr=3.94e-06, throughput=2462 tok/s | |
| 2025-11-29 19:22:56,655 - INFO - Epoch 1 Step 350 (Global: 350): loss=1.8377, ppl=6.28, grad_norm=1.70, lr=4.03e-06, throughput=2460 tok/s | |
| 2025-11-29 19:26:12,797 - INFO - Epoch 1 Step 360 (Global: 360): loss=1.7660, ppl=5.85, grad_norm=1.95, lr=4.11e-06, throughput=2447 tok/s | |
| 2025-11-29 19:29:34,126 - INFO - Epoch 1 Step 370 (Global: 370): loss=1.9695, ppl=7.17, grad_norm=1.48, lr=4.20e-06, throughput=2384 tok/s | |
| 2025-11-29 19:32:54,351 - INFO - Epoch 1 Step 380 (Global: 380): loss=1.9194, ppl=6.82, grad_norm=1.68, lr=4.29e-06, throughput=2397 tok/s | |
| 2025-11-29 19:36:14,003 - INFO - Epoch 1 Step 390 (Global: 390): loss=1.8333, ppl=6.25, grad_norm=1.49, lr=4.37e-06, throughput=2404 tok/s | |
| 2025-11-29 19:39:37,183 - INFO - Epoch 1 Step 400 (Global: 400): loss=1.8403, ppl=6.30, grad_norm=1.64, lr=4.46e-06, throughput=2362 tok/s | |
| 2025-11-29 19:42:57,050 - INFO - Epoch 1 Step 410 (Global: 410): loss=1.8325, ppl=6.25, grad_norm=1.84, lr=4.54e-06, throughput=2402 tok/s | |
| 2025-11-29 19:46:17,156 - INFO - Epoch 1 Step 420 (Global: 420): loss=1.9927, ppl=7.33, grad_norm=1.73, lr=4.63e-06, throughput=2399 tok/s | |
| 2025-11-29 19:49:36,220 - INFO - Epoch 1 Step 430 (Global: 430): loss=2.0399, ppl=7.69, grad_norm=1.88, lr=4.72e-06, throughput=2411 tok/s | |
| 2025-11-29 19:52:55,853 - INFO - Epoch 1 Step 440 (Global: 440): loss=1.8625, ppl=6.44, grad_norm=1.68, lr=4.80e-06, throughput=2404 tok/s | |
| 2025-11-29 19:56:14,473 - INFO - Epoch 1 Step 450 (Global: 450): loss=1.7772, ppl=5.91, grad_norm=1.49, lr=4.89e-06, throughput=2417 tok/s | |
| 2025-11-29 19:59:29,905 - INFO - Epoch 1 Step 460 (Global: 460): loss=1.7686, ppl=5.86, grad_norm=1.40, lr=4.98e-06, throughput=2456 tok/s | |
| 2025-11-29 20:02:46,388 - INFO - Epoch 1 Step 470 (Global: 470): loss=2.0577, ppl=7.83, grad_norm=1.50, lr=5.06e-06, throughput=2443 tok/s | |
| 2025-11-29 20:06:03,143 - INFO - Epoch 1 Step 480 (Global: 480): loss=1.7704, ppl=5.87, grad_norm=2.86, lr=5.15e-06, throughput=2440 tok/s | |
| 2025-11-29 20:09:19,821 - INFO - Epoch 1 Step 490 (Global: 490): loss=1.6852, ppl=5.39, grad_norm=1.73, lr=5.24e-06, throughput=2441 tok/s | |
| 2025-11-29 20:12:35,593 - INFO - Epoch 1 Step 500 (Global: 500): loss=1.9528, ppl=7.05, grad_norm=1.52, lr=5.32e-06, throughput=2452 tok/s | |
| 2025-11-29 20:15:55,726 - INFO - Epoch 1 Step 510 (Global: 510): loss=2.0123, ppl=7.48, grad_norm=1.83, lr=5.41e-06, throughput=2398 tok/s | |
| 2025-11-29 20:19:10,944 - INFO - Epoch 1 Step 520 (Global: 520): loss=1.7731, ppl=5.89, grad_norm=1.62, lr=5.50e-06, throughput=2459 tok/s | |
| 2025-11-29 20:22:27,744 - INFO - Epoch 1 Step 530 (Global: 530): loss=1.8692, ppl=6.48, grad_norm=3.67, lr=5.58e-06, throughput=2439 tok/s | |
| 2025-11-29 20:25:44,038 - INFO - Epoch 1 Step 540 (Global: 540): loss=1.8334, ppl=6.26, grad_norm=1.62, lr=5.67e-06, throughput=2445 tok/s | |
| 2025-11-29 20:29:02,329 - INFO - Epoch 1 Step 550 (Global: 550): loss=1.7739, ppl=5.89, grad_norm=1.44, lr=5.76e-06, throughput=2421 tok/s | |
| 2025-11-29 20:32:20,038 - INFO - Epoch 1 Step 560 (Global: 560): loss=1.6316, ppl=5.11, grad_norm=1.71, lr=5.84e-06, throughput=2428 tok/s | |
| 2025-11-29 20:35:36,668 - INFO - Epoch 1 Step 570 (Global: 570): loss=1.7324, ppl=5.65, grad_norm=1.76, lr=5.93e-06, throughput=2441 tok/s | |
| 2025-11-29 20:38:55,037 - INFO - Epoch 1 Step 580 (Global: 580): loss=1.6880, ppl=5.41, grad_norm=1.58, lr=6.01e-06, throughput=2420 tok/s | |
| 2025-11-29 20:42:12,568 - INFO - Epoch 1 Step 590 (Global: 590): loss=1.7183, ppl=5.57, grad_norm=1.53, lr=6.10e-06, throughput=2430 tok/s | |
| 2025-11-29 20:45:27,524 - INFO - Epoch 1 Step 600 (Global: 600): loss=1.8531, ppl=6.38, grad_norm=1.55, lr=6.19e-06, throughput=2462 tok/s | |
| 2025-11-29 20:48:44,312 - INFO - Epoch 1 Step 610 (Global: 610): loss=1.7971, ppl=6.03, grad_norm=1.48, lr=6.27e-06, throughput=2439 tok/s | |
| 2025-11-29 20:52:01,970 - INFO - Epoch 1 Step 620 (Global: 620): loss=1.8944, ppl=6.65, grad_norm=2.53, lr=6.36e-06, throughput=2428 tok/s | |
| 2025-11-29 20:55:20,120 - INFO - Epoch 1 Step 630 (Global: 630): loss=1.9766, ppl=7.22, grad_norm=2.09, lr=6.45e-06, throughput=2422 tok/s | |
| 2025-11-29 20:58:39,425 - INFO - Epoch 1 Step 640 (Global: 640): loss=1.7785, ppl=5.92, grad_norm=1.97, lr=6.53e-06, throughput=2408 tok/s | |
| 2025-11-29 21:01:57,232 - INFO - Epoch 1 Step 650 (Global: 650): loss=1.8729, ppl=6.51, grad_norm=1.38, lr=6.62e-06, throughput=2427 tok/s | |
| 2025-11-29 21:05:12,010 - INFO - Epoch 1 Step 660 (Global: 660): loss=1.4382, ppl=4.21, grad_norm=1.34, lr=6.71e-06, throughput=2464 tok/s | |
| 2025-11-29 21:08:26,720 - INFO - Epoch 1 Step 670 (Global: 670): loss=1.8316, ppl=6.24, grad_norm=1.82, lr=6.79e-06, throughput=2465 tok/s | |
| 2025-11-29 21:11:42,431 - INFO - Epoch 1 Step 680 (Global: 680): loss=1.9135, ppl=6.78, grad_norm=1.62, lr=6.88e-06, throughput=2453 tok/s | |
| 2025-11-29 21:14:57,390 - INFO - Epoch 1 Step 690 (Global: 690): loss=1.6908, ppl=5.42, grad_norm=1.92, lr=6.97e-06, throughput=2462 tok/s | |
| 2025-11-29 21:18:12,475 - INFO - Epoch 1 Step 700 (Global: 700): loss=1.9608, ppl=7.11, grad_norm=5.19, lr=7.05e-06, throughput=2460 tok/s | |
| 2025-11-29 21:21:35,260 - INFO - Epoch 1 Step 710 (Global: 710): loss=1.7545, ppl=5.78, grad_norm=1.97, lr=7.14e-06, throughput=2367 tok/s | |
| 2025-11-29 21:25:23,224 - INFO - Epoch 1 Step 720 (Global: 720): loss=1.9789, ppl=7.23, grad_norm=1.41, lr=7.22e-06, throughput=2106 tok/s | |
| 2025-11-29 21:28:40,224 - INFO - Epoch 1 Step 730 (Global: 730): loss=2.0159, ppl=7.51, grad_norm=1.84, lr=7.31e-06, throughput=2437 tok/s | |
| 2025-11-29 21:31:56,201 - INFO - Epoch 1 Step 740 (Global: 740): loss=1.7262, ppl=5.62, grad_norm=1.48, lr=7.40e-06, throughput=2449 tok/s | |
| 2025-11-29 21:35:11,259 - INFO - Epoch 1 Step 750 (Global: 750): loss=1.6315, ppl=5.11, grad_norm=1.46, lr=7.48e-06, throughput=2461 tok/s | |
| 2025-11-29 21:38:26,370 - INFO - Epoch 1 Step 760 (Global: 760): loss=1.8630, ppl=6.44, grad_norm=1.80, lr=7.57e-06, throughput=2460 tok/s | |
| 2025-11-29 21:41:41,079 - INFO - Epoch 1 Step 770 (Global: 770): loss=1.7153, ppl=5.56, grad_norm=2.30, lr=7.66e-06, throughput=2465 tok/s | |
| 2025-11-29 21:44:57,333 - INFO - Epoch 1 Step 780 (Global: 780): loss=1.6793, ppl=5.36, grad_norm=1.40, lr=7.74e-06, throughput=2446 tok/s | |
| 2025-11-29 21:48:15,633 - INFO - Epoch 1 Step 790 (Global: 790): loss=1.5759, ppl=4.84, grad_norm=1.86, lr=7.83e-06, throughput=2421 tok/s | |
| 2025-11-29 21:51:33,630 - INFO - Epoch 1 Step 800 (Global: 800): loss=1.8361, ppl=6.27, grad_norm=1.50, lr=7.92e-06, throughput=2424 tok/s | |
| 2025-11-29 21:54:51,352 - INFO - Epoch 1 Step 810 (Global: 810): loss=1.8479, ppl=6.35, grad_norm=1.29, lr=8.00e-06, throughput=2428 tok/s | |
| 2025-11-29 21:58:05,269 - INFO - Epoch 1 Step 820 (Global: 820): loss=1.6783, ppl=5.36, grad_norm=1.62, lr=8.09e-06, throughput=2475 tok/s | |
| 2025-11-29 22:01:19,014 - INFO - Epoch 1 Step 830 (Global: 830): loss=1.8596, ppl=6.42, grad_norm=1.94, lr=8.18e-06, throughput=2478 tok/s | |
| 2025-11-29 22:04:34,847 - INFO - Epoch 1 Step 840 (Global: 840): loss=1.8260, ppl=6.21, grad_norm=1.35, lr=8.26e-06, throughput=2451 tok/s | |
| 2025-11-29 22:07:54,126 - INFO - Epoch 1 Step 850 (Global: 850): loss=2.1386, ppl=8.49, grad_norm=1.43, lr=8.35e-06, throughput=2409 tok/s | |
| 2025-11-29 22:11:12,373 - INFO - Epoch 1 Step 860 (Global: 860): loss=1.8304, ppl=6.24, grad_norm=1.44, lr=8.44e-06, throughput=2421 tok/s | |
| 2025-11-29 22:14:30,616 - INFO - Epoch 1 Step 870 (Global: 870): loss=1.5464, ppl=4.69, grad_norm=1.41, lr=8.52e-06, throughput=2421 tok/s | |
| 2025-11-29 22:17:49,176 - INFO - Epoch 1 Step 880 (Global: 880): loss=1.8838, ppl=6.58, grad_norm=1.76, lr=8.61e-06, throughput=2417 tok/s | |
| 2025-11-29 22:21:07,074 - INFO - Epoch 1 Step 890 (Global: 890): loss=1.6332, ppl=5.12, grad_norm=1.46, lr=8.69e-06, throughput=2426 tok/s | |
| 2025-11-29 22:24:25,489 - INFO - Epoch 1 Step 900 (Global: 900): loss=1.6734, ppl=5.33, grad_norm=1.43, lr=8.78e-06, throughput=2419 tok/s | |
| 2025-11-29 22:27:43,375 - INFO - Epoch 1 Step 910 (Global: 910): loss=1.8110, ppl=6.12, grad_norm=1.66, lr=8.87e-06, throughput=2426 tok/s | |
| 2025-11-29 22:31:02,136 - INFO - Epoch 1 Step 920 (Global: 920): loss=1.7746, ppl=5.90, grad_norm=1.67, lr=8.95e-06, throughput=2415 tok/s | |
| 2025-11-29 22:34:16,476 - INFO - Epoch 1 Step 930 (Global: 930): loss=1.9614, ppl=7.11, grad_norm=1.73, lr=9.04e-06, throughput=2470 tok/s | |
| 2025-11-29 22:37:31,373 - INFO - Epoch 1 Step 940 (Global: 940): loss=1.6942, ppl=5.44, grad_norm=2.14, lr=9.13e-06, throughput=2463 tok/s | |
| 2025-11-29 22:40:46,436 - INFO - Epoch 1 Step 950 (Global: 950): loss=1.7677, ppl=5.86, grad_norm=1.68, lr=9.21e-06, throughput=2461 tok/s | |
| 2025-11-29 22:44:13,931 - INFO - Epoch 1 Step 960 (Global: 960): loss=1.7584, ppl=5.80, grad_norm=1.52, lr=9.30e-06, throughput=2313 tok/s | |
| 2025-11-29 22:47:35,533 - INFO - Epoch 1 Step 970 (Global: 970): loss=1.8420, ppl=6.31, grad_norm=1.36, lr=9.39e-06, throughput=2381 tok/s | |
| 2025-11-29 22:50:54,468 - INFO - Epoch 1 Step 980 (Global: 980): loss=1.7629, ppl=5.83, grad_norm=1.93, lr=9.47e-06, throughput=2413 tok/s | |
| 2025-11-29 22:54:12,613 - INFO - Epoch 1 Step 990 (Global: 990): loss=1.7409, ppl=5.70, grad_norm=1.52, lr=9.56e-06, throughput=2422 tok/s | |
| 2025-11-29 22:57:31,552 - INFO - Epoch 1 Step 1000 (Global: 1000): loss=1.8265, ppl=6.21, grad_norm=1.59, lr=9.65e-06, throughput=2413 tok/s | |
| 2025-11-29 23:00:50,379 - INFO - Epoch 1 Step 1010 (Global: 1010): loss=2.0798, ppl=8.00, grad_norm=1.71, lr=9.73e-06, throughput=2414 tok/s | |
| 2025-11-29 23:04:08,891 - INFO - Epoch 1 Step 1020 (Global: 1020): loss=2.0901, ppl=8.09, grad_norm=1.51, lr=9.82e-06, throughput=2418 tok/s | |
| 2025-11-29 23:07:27,092 - INFO - Epoch 1 Step 1030 (Global: 1030): loss=1.8234, ppl=6.19, grad_norm=2.28, lr=9.90e-06, throughput=2422 tok/s | |
| 2025-11-29 23:10:44,417 - INFO - Epoch 1 Step 1040 (Global: 1040): loss=1.7015, ppl=5.48, grad_norm=1.59, lr=9.99e-06, throughput=2433 tok/s | |
| 2025-11-29 23:14:01,499 - INFO - Epoch 1 Step 1050 (Global: 1050): loss=1.7392, ppl=5.69, grad_norm=1.52, lr=1.00e-05, throughput=2436 tok/s | |
| 2025-11-29 23:17:18,971 - INFO - Epoch 1 Step 1060 (Global: 1060): loss=1.8136, ppl=6.13, grad_norm=1.31, lr=1.00e-05, throughput=2431 tok/s | |
| 2025-11-29 23:20:42,899 - INFO - Epoch 1 Step 1070 (Global: 1070): loss=1.7334, ppl=5.66, grad_norm=1.69, lr=1.00e-05, throughput=2354 tok/s | |
| 2025-11-29 23:24:12,000 - INFO - Epoch 1 Step 1080 (Global: 1080): loss=2.0142, ppl=7.49, grad_norm=1.48, lr=1.00e-05, throughput=2296 tok/s | |
| 2025-11-29 23:27:37,832 - INFO - Epoch 1 Step 1090 (Global: 1090): loss=1.7976, ppl=6.04, grad_norm=1.37, lr=1.00e-05, throughput=2332 tok/s | |
| 2025-11-29 23:30:57,560 - INFO - Epoch 1 Step 1100 (Global: 1100): loss=1.8070, ppl=6.09, grad_norm=1.41, lr=1.00e-05, throughput=2403 tok/s | |
| 2025-11-29 23:34:17,316 - INFO - Epoch 1 Step 1110 (Global: 1110): loss=1.6815, ppl=5.37, grad_norm=1.41, lr=1.00e-05, throughput=2403 tok/s | |
| 2025-11-29 23:37:36,445 - INFO - Epoch 1 Step 1120 (Global: 1120): loss=1.9539, ppl=7.06, grad_norm=1.73, lr=1.00e-05, throughput=2411 tok/s | |
| 2025-11-29 23:40:54,997 - INFO - Epoch 1 Step 1130 (Global: 1130): loss=1.7741, ppl=5.89, grad_norm=1.60, lr=1.00e-05, throughput=2418 tok/s | |
| 2025-11-29 23:44:13,529 - INFO - Epoch 1 Step 1140 (Global: 1140): loss=1.7837, ppl=5.95, grad_norm=3.17, lr=1.00e-05, throughput=2418 tok/s | |
| 2025-11-29 23:47:32,999 - INFO - Epoch 1 Step 1150 (Global: 1150): loss=1.8191, ppl=6.17, grad_norm=1.63, lr=1.00e-05, throughput=2406 tok/s | |
| 2025-11-29 23:50:52,607 - INFO - Epoch 1 Step 1160 (Global: 1160): loss=1.8457, ppl=6.33, grad_norm=2.22, lr=1.00e-05, throughput=2405 tok/s | |
| 2025-11-29 23:54:12,274 - INFO - Epoch 1 Step 1170 (Global: 1170): loss=1.6885, ppl=5.41, grad_norm=1.38, lr=1.00e-05, throughput=2404 tok/s | |
| 2025-11-29 23:57:32,137 - INFO - Epoch 1 Step 1180 (Global: 1180): loss=1.7916, ppl=6.00, grad_norm=1.18, lr=9.99e-06, throughput=2402 tok/s | |
| 2025-11-30 00:00:50,940 - INFO - Epoch 1 Step 1190 (Global: 1190): loss=1.9267, ppl=6.87, grad_norm=1.61, lr=9.99e-06, throughput=2414 tok/s | |
| 2025-11-30 00:04:09,204 - INFO - Epoch 1 Step 1200 (Global: 1200): loss=1.5930, ppl=4.92, grad_norm=1.45, lr=9.99e-06, throughput=2421 tok/s | |
| 2025-11-30 00:07:28,664 - INFO - Epoch 1 Step 1210 (Global: 1210): loss=1.6547, ppl=5.23, grad_norm=1.42, lr=9.99e-06, throughput=2407 tok/s | |
| 2025-11-30 00:10:50,021 - INFO - Epoch 1 Step 1220 (Global: 1220): loss=1.8876, ppl=6.60, grad_norm=1.89, lr=9.99e-06, throughput=2384 tok/s | |
| 2025-11-30 00:14:11,518 - INFO - Epoch 1 Step 1230 (Global: 1230): loss=1.7697, ppl=5.87, grad_norm=1.44, lr=9.99e-06, throughput=2382 tok/s | |
| 2025-11-30 00:17:33,488 - INFO - Epoch 1 Step 1240 (Global: 1240): loss=1.5586, ppl=4.75, grad_norm=1.51, lr=9.99e-06, throughput=2377 tok/s | |
| 2025-11-30 00:20:51,907 - INFO - Epoch 1 Step 1250 (Global: 1250): loss=1.7873, ppl=5.97, grad_norm=1.69, lr=9.99e-06, throughput=2419 tok/s | |
| 2025-11-30 00:24:13,360 - INFO - Epoch 1 Step 1260 (Global: 1260): loss=1.8446, ppl=6.33, grad_norm=1.44, lr=9.99e-06, throughput=2383 tok/s | |
| 2025-11-30 00:27:32,114 - INFO - Epoch 1 Step 1270 (Global: 1270): loss=1.8665, ppl=6.47, grad_norm=1.62, lr=9.99e-06, throughput=2415 tok/s | |
| 2025-11-30 00:30:51,253 - INFO - Epoch 1 Step 1280 (Global: 1280): loss=1.8033, ppl=6.07, grad_norm=1.84, lr=9.98e-06, throughput=2410 tok/s | |
| 2025-11-30 00:34:09,438 - INFO - Epoch 1 Step 1290 (Global: 1290): loss=1.7300, ppl=5.64, grad_norm=1.44, lr=9.98e-06, throughput=2422 tok/s | |
| 2025-11-30 00:37:26,921 - INFO - Epoch 1 Step 1300 (Global: 1300): loss=1.6709, ppl=5.32, grad_norm=1.41, lr=9.98e-06, throughput=2431 tok/s | |
| 2025-11-30 00:40:44,789 - INFO - Epoch 1 Step 1310 (Global: 1310): loss=1.5126, ppl=4.54, grad_norm=1.61, lr=9.98e-06, throughput=2426 tok/s | |
| 2025-11-30 00:44:01,153 - INFO - Epoch 1 Step 1320 (Global: 1320): loss=1.9810, ppl=7.25, grad_norm=1.50, lr=9.98e-06, throughput=2444 tok/s | |
| 2025-11-30 00:47:19,075 - INFO - Epoch 1 Step 1330 (Global: 1330): loss=2.0326, ppl=7.63, grad_norm=1.48, lr=9.98e-06, throughput=2425 tok/s | |
| 2025-11-30 00:50:36,947 - INFO - Epoch 1 Step 1340 (Global: 1340): loss=1.7493, ppl=5.75, grad_norm=1.69, lr=9.97e-06, throughput=2426 tok/s | |
| 2025-11-30 00:53:54,446 - INFO - Epoch 1 Step 1350 (Global: 1350): loss=1.9526, ppl=7.05, grad_norm=1.82, lr=9.97e-06, throughput=2430 tok/s | |
| 2025-11-30 00:57:11,774 - INFO - Epoch 1 Step 1360 (Global: 1360): loss=1.9097, ppl=6.75, grad_norm=1.48, lr=9.97e-06, throughput=2433 tok/s | |
| 2025-11-30 01:00:29,017 - INFO - Epoch 1 Step 1370 (Global: 1370): loss=1.9810, ppl=7.25, grad_norm=1.57, lr=9.97e-06, throughput=2434 tok/s | |
| 2025-11-30 01:03:47,022 - INFO - Epoch 1 Step 1380 (Global: 1380): loss=1.6891, ppl=5.41, grad_norm=1.43, lr=9.97e-06, throughput=2424 tok/s | |
| 2025-11-30 01:07:05,230 - INFO - Epoch 1 Step 1390 (Global: 1390): loss=1.8835, ppl=6.58, grad_norm=1.33, lr=9.97e-06, throughput=2422 tok/s | |
| 2025-11-30 01:10:23,109 - INFO - Epoch 1 Step 1400 (Global: 1400): loss=1.9553, ppl=7.07, grad_norm=1.38, lr=9.96e-06, throughput=2426 tok/s | |
| 2025-11-30 01:13:40,804 - INFO - Epoch 1 Step 1410 (Global: 1410): loss=1.7404, ppl=5.70, grad_norm=1.80, lr=9.96e-06, throughput=2428 tok/s | |
| 2025-11-30 01:16:58,485 - INFO - Epoch 1 Step 1420 (Global: 1420): loss=1.7527, ppl=5.77, grad_norm=1.53, lr=9.96e-06, throughput=2428 tok/s | |
| 2025-11-30 01:20:15,228 - INFO - Epoch 1 Step 1430 (Global: 1430): loss=1.6409, ppl=5.16, grad_norm=2.12, lr=9.96e-06, throughput=2440 tok/s | |
| 2025-11-30 01:23:30,640 - INFO - Epoch 1 Step 1440 (Global: 1440): loss=1.8774, ppl=6.54, grad_norm=1.55, lr=9.96e-06, throughput=2456 tok/s | |
| 2025-11-30 01:26:48,390 - INFO - Epoch 1 Step 1450 (Global: 1450): loss=2.0454, ppl=7.73, grad_norm=1.23, lr=9.95e-06, throughput=2427 tok/s | |
| 2025-11-30 01:30:07,212 - INFO - Epoch 1 Step 1460 (Global: 1460): loss=1.6271, ppl=5.09, grad_norm=1.20, lr=9.95e-06, throughput=2414 tok/s | |
| 2025-11-30 01:33:26,160 - INFO - Epoch 1 Step 1470 (Global: 1470): loss=1.8970, ppl=6.67, grad_norm=2.05, lr=9.95e-06, throughput=2413 tok/s | |
| 2025-11-30 01:36:45,309 - INFO - Epoch 1 Step 1480 (Global: 1480): loss=1.8250, ppl=6.20, grad_norm=1.51, lr=9.95e-06, throughput=2410 tok/s | |
| 2025-11-30 01:40:03,574 - INFO - Epoch 1 Step 1490 (Global: 1490): loss=1.7238, ppl=5.61, grad_norm=1.37, lr=9.94e-06, throughput=2421 tok/s | |
| 2025-11-30 01:43:23,750 - INFO - Epoch 1 Step 1500 (Global: 1500): loss=1.8704, ppl=6.49, grad_norm=1.34, lr=9.94e-06, throughput=2398 tok/s | |
| 2025-11-30 01:46:42,099 - INFO - Epoch 1 Step 1510 (Global: 1510): loss=1.8585, ppl=6.41, grad_norm=1.23, lr=9.94e-06, throughput=2420 tok/s | |
| 2025-11-30 01:50:05,609 - INFO - Epoch 1 Step 1520 (Global: 1520): loss=1.8039, ppl=6.07, grad_norm=1.58, lr=9.94e-06, throughput=2359 tok/s | |
| 2025-11-30 01:53:23,892 - INFO - Epoch 1 Step 1530 (Global: 1530): loss=1.6359, ppl=5.13, grad_norm=5.31, lr=9.93e-06, throughput=2421 tok/s | |
| 2025-11-30 01:56:42,155 - INFO - Epoch 1 Step 1540 (Global: 1540): loss=1.3949, ppl=4.03, grad_norm=1.46, lr=9.93e-06, throughput=2421 tok/s | |
| 2025-11-30 02:00:01,279 - INFO - Epoch 1 Step 1550 (Global: 1550): loss=1.5629, ppl=4.77, grad_norm=1.37, lr=9.93e-06, throughput=2411 tok/s | |
| 2025-11-30 02:03:19,092 - INFO - Epoch 1 Step 1560 (Global: 1560): loss=2.0189, ppl=7.53, grad_norm=1.27, lr=9.92e-06, throughput=2427 tok/s | |
| 2025-11-30 02:06:36,769 - INFO - Epoch 1 Step 1570 (Global: 1570): loss=1.8626, ppl=6.44, grad_norm=1.70, lr=9.92e-06, throughput=2428 tok/s | |
| 2025-11-30 02:09:55,230 - INFO - Epoch 1 Step 1580 (Global: 1580): loss=1.8385, ppl=6.29, grad_norm=2.77, lr=9.92e-06, throughput=2419 tok/s | |
| 2025-11-30 02:13:14,465 - INFO - Epoch 1 Step 1590 (Global: 1590): loss=1.7335, ppl=5.66, grad_norm=1.56, lr=9.92e-06, throughput=2409 tok/s | |
| 2025-11-30 02:16:33,428 - INFO - Epoch 1 Step 1600 (Global: 1600): loss=1.7289, ppl=5.63, grad_norm=1.45, lr=9.91e-06, throughput=2413 tok/s | |
| 2025-11-30 02:19:52,036 - INFO - Epoch 1 Step 1610 (Global: 1610): loss=1.8189, ppl=6.17, grad_norm=1.36, lr=9.91e-06, throughput=2417 tok/s | |
| 2025-11-30 02:23:12,592 - INFO - Epoch 1 Step 1620 (Global: 1620): loss=1.5153, ppl=4.55, grad_norm=1.74, lr=9.91e-06, throughput=2393 tok/s | |
| 2025-11-30 02:26:31,619 - INFO - Epoch 1 Step 1630 (Global: 1630): loss=1.8641, ppl=6.45, grad_norm=1.89, lr=9.90e-06, throughput=2412 tok/s | |
| 2025-11-30 02:29:50,178 - INFO - Epoch 1 Step 1640 (Global: 1640): loss=1.6574, ppl=5.25, grad_norm=1.38, lr=9.90e-06, throughput=2417 tok/s | |
| 2025-11-30 02:33:08,148 - INFO - Epoch 1 Step 1650 (Global: 1650): loss=1.9016, ppl=6.70, grad_norm=1.28, lr=9.90e-06, throughput=2425 tok/s | |
| 2025-11-30 02:36:25,174 - INFO - Epoch 1 Step 1660 (Global: 1660): loss=1.9225, ppl=6.84, grad_norm=1.59, lr=9.89e-06, throughput=2436 tok/s | |
| 2025-11-30 02:39:42,841 - INFO - Epoch 1 Step 1670 (Global: 1670): loss=1.9356, ppl=6.93, grad_norm=1.62, lr=9.89e-06, throughput=2428 tok/s | |
| 2025-11-30 02:42:58,844 - INFO - Epoch 1 Step 1680 (Global: 1680): loss=1.6339, ppl=5.12, grad_norm=1.67, lr=9.89e-06, throughput=2449 tok/s | |
| 2025-11-30 02:46:18,396 - INFO - Epoch 1 Step 1690 (Global: 1690): loss=1.7725, ppl=5.89, grad_norm=1.71, lr=9.88e-06, throughput=2405 tok/s | |
| 2025-11-30 02:49:37,848 - INFO - Epoch 1 Step 1700 (Global: 1700): loss=1.8128, ppl=6.13, grad_norm=1.55, lr=9.88e-06, throughput=2407 tok/s | |
| 2025-11-30 02:52:55,424 - INFO - Epoch 1 Step 1710 (Global: 1710): loss=1.8894, ppl=6.62, grad_norm=1.41, lr=9.87e-06, throughput=2429 tok/s | |
| 2025-11-30 02:56:13,827 - INFO - Epoch 1 Step 1720 (Global: 1720): loss=1.8961, ppl=6.66, grad_norm=1.46, lr=9.87e-06, throughput=2419 tok/s | |
| 2025-11-30 02:59:33,553 - INFO - Epoch 1 Step 1730 (Global: 1730): loss=1.6073, ppl=4.99, grad_norm=2.05, lr=9.87e-06, throughput=2403 tok/s | |
| 2025-11-30 03:02:51,950 - INFO - Epoch 1 Step 1740 (Global: 1740): loss=1.9096, ppl=6.75, grad_norm=1.45, lr=9.86e-06, throughput=2419 tok/s | |
| 2025-11-30 03:06:11,398 - INFO - Epoch 1 Step 1750 (Global: 1750): loss=1.6991, ppl=5.47, grad_norm=2.69, lr=9.86e-06, throughput=2407 tok/s | |
| 2025-11-30 03:09:30,758 - INFO - Epoch 1 Step 1760 (Global: 1760): loss=1.5453, ppl=4.69, grad_norm=2.14, lr=9.86e-06, throughput=2408 tok/s | |
| 2025-11-30 03:12:49,753 - INFO - Epoch 1 Step 1770 (Global: 1770): loss=1.7667, ppl=5.85, grad_norm=1.85, lr=9.85e-06, throughput=2412 tok/s | |
| 2025-11-30 03:16:08,827 - INFO - Epoch 1 Step 1780 (Global: 1780): loss=1.6637, ppl=5.28, grad_norm=1.97, lr=9.85e-06, throughput=2411 tok/s | |
| 2025-11-30 03:19:25,890 - INFO - Epoch 1 Step 1790 (Global: 1790): loss=1.7712, ppl=5.88, grad_norm=1.33, lr=9.84e-06, throughput=2436 tok/s | |
| 2025-11-30 03:22:38,832 - INFO - Epoch 1 Step 1800 (Global: 1800): loss=1.6283, ppl=5.10, grad_norm=1.29, lr=9.84e-06, throughput=2488 tok/s | |
| 2025-11-30 03:25:51,394 - INFO - Epoch 1 Step 1810 (Global: 1810): loss=1.8108, ppl=6.12, grad_norm=1.39, lr=9.83e-06, throughput=2493 tok/s | |
| 2025-11-30 03:29:03,604 - INFO - Epoch 1 Step 1820 (Global: 1820): loss=1.6857, ppl=5.40, grad_norm=1.69, lr=9.83e-06, throughput=2497 tok/s | |
| 2025-11-30 03:32:15,745 - INFO - Epoch 1 Step 1830 (Global: 1830): loss=1.5171, ppl=4.56, grad_norm=1.82, lr=9.83e-06, throughput=2498 tok/s | |
| 2025-11-30 03:35:27,694 - INFO - Epoch 1 Step 1840 (Global: 1840): loss=1.8735, ppl=6.51, grad_norm=1.41, lr=9.82e-06, throughput=2501 tok/s | |
| 2025-11-30 03:38:39,115 - INFO - Epoch 1 Step 1850 (Global: 1850): loss=1.8585, ppl=6.41, grad_norm=1.55, lr=9.82e-06, throughput=2508 tok/s | |
| 2025-11-30 03:41:50,687 - INFO - Epoch 1 Step 1860 (Global: 1860): loss=1.7668, ppl=5.85, grad_norm=2.22, lr=9.81e-06, throughput=2506 tok/s | |
| 2025-11-30 03:45:02,502 - INFO - Epoch 1 Step 1870 (Global: 1870): loss=1.7228, ppl=5.60, grad_norm=2.02, lr=9.81e-06, throughput=2502 tok/s | |
| 2025-11-30 03:48:14,161 - INFO - Epoch 1 Step 1880 (Global: 1880): loss=1.4673, ppl=4.34, grad_norm=2.52, lr=9.80e-06, throughput=2504 tok/s | |
| 2025-11-30 03:51:25,734 - INFO - Epoch 1 Step 1890 (Global: 1890): loss=1.7598, ppl=5.81, grad_norm=2.34, lr=9.80e-06, throughput=2506 tok/s | |
| 2025-11-30 03:54:37,302 - INFO - Epoch 1 Step 1900 (Global: 1900): loss=1.6353, ppl=5.13, grad_norm=1.45, lr=9.79e-06, throughput=2506 tok/s | |
| 2025-11-30 03:57:48,740 - INFO - Epoch 1 Step 1910 (Global: 1910): loss=1.8238, ppl=6.20, grad_norm=2.69, lr=9.79e-06, throughput=2507 tok/s | |
| 2025-11-30 04:01:00,363 - INFO - Epoch 1 Step 1920 (Global: 1920): loss=1.8414, ppl=6.31, grad_norm=1.54, lr=9.78e-06, throughput=2505 tok/s | |
| 2025-11-30 04:04:11,751 - INFO - Epoch 1 Step 1930 (Global: 1930): loss=1.5809, ppl=4.86, grad_norm=1.34, lr=9.78e-06, throughput=2508 tok/s | |
| 2025-11-30 04:07:23,702 - INFO - Epoch 1 Step 1940 (Global: 1940): loss=1.7965, ppl=6.03, grad_norm=9.44, lr=9.77e-06, throughput=2501 tok/s | |
| 2025-11-30 04:10:35,199 - INFO - Epoch 1 Step 1950 (Global: 1950): loss=1.6406, ppl=5.16, grad_norm=1.49, lr=9.77e-06, throughput=2507 tok/s | |
| 2025-11-30 04:13:47,267 - INFO - Epoch 1 Step 1960 (Global: 1960): loss=1.6999, ppl=5.47, grad_norm=1.62, lr=9.76e-06, throughput=2499 tok/s | |
| 2025-11-30 04:16:59,318 - INFO - Epoch 1 Step 1970 (Global: 1970): loss=1.7475, ppl=5.74, grad_norm=1.97, lr=9.76e-06, throughput=2499 tok/s | |
| 2025-11-30 04:20:11,163 - INFO - Epoch 1 Step 1980 (Global: 1980): loss=1.7844, ppl=5.96, grad_norm=2.33, lr=9.75e-06, throughput=2502 tok/s | |
| 2025-11-30 04:23:23,242 - INFO - Epoch 1 Step 1990 (Global: 1990): loss=1.6242, ppl=5.07, grad_norm=1.56, lr=9.75e-06, throughput=2499 tok/s | |
| 2025-11-30 04:26:35,180 - INFO - Epoch 1 Step 2000 (Global: 2000): loss=1.8631, ppl=6.44, grad_norm=2.72, lr=9.74e-06, throughput=2501 tok/s | |
| 2025-11-30 04:26:35,181 - INFO - | |
| Running validation at step 2000... | |
| 2025-11-30 04:37:56,923 - INFO - Validation loss: 1.7794, perplexity: 5.93 | |
| 2025-11-30 04:37:56,923 - INFO - | |
| ====================================================================== | |
| 2025-11-30 04:37:56,924 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-30 04:37:56,924 - INFO - ====================================================================== | |
| 2025-11-30 04:37:56,924 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-30 04:37:56,924 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-30 04:37:56,925 - INFO - Generated: ' to the previous year\'s album, The Black Album, which was described as "a more mature, more personal, and more reflective album" by the magazine. The Black Album was described as "a more mature, more ...' | |
| 2025-11-30 04:37:56,925 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' | |
| 2025-11-30 04:37:56,925 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 04:37:56,925 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-30 04:37:56,925 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-30 04:37:56,925 - INFO - Generated: 'aternally or culturally connected to the state of Michigan, and the Order of Angel was not a fraternal organization. The Order of Angel is not a fraternal organization, but it is a fraternal organizat...' | |
| 2025-11-30 04:37:56,926 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' | |
| 2025-11-30 04:37:56,926 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 04:37:56,926 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-30 04:37:56,926 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-30 04:37:56,927 - INFO - Generated: " be killed by Oga. Teimou's shadow is also the reason for the defeat of the Red Tails, as he is the only one who can defeat Teimou. Teimou's shadow is also the reason for the defeat of the Red Tails, ..." | |
| 2025-11-30 04:37:56,927 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." | |
| 2025-11-30 04:37:56,927 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 04:37:56,927 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-30 04:37:56,927 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-30 04:37:56,927 - INFO - Generated: ' | 0x01.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0....' | |
| 2025-11-30 04:37:56,928 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' | |
| 2025-11-30 04:37:56,928 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 04:37:56,928 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-30 04:37:56,928 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-30 04:37:56,929 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 151 ] |\n| Madden NFL 12 | August 30, 2011 | iOS | E...' | |
| 2025-11-30 04:37:56,929 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' | |
| 2025-11-30 04:37:56,930 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 04:37:56,931 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/qualitative_step_2000.jsonl | |
| 2025-11-30 04:38:27,028 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/best_checkpoint.pt | |
| 2025-11-30 04:38:27,033 - INFO - New best validation loss: 1.7794, perplexity: 5.93 | |
| 2025-11-30 04:41:41,879 - INFO - Epoch 1 Step 2010 (Global: 2010): loss=1.5956, ppl=4.93, grad_norm=1.19, lr=9.74e-06, throughput=2464 tok/s | |
| 2025-11-30 04:44:56,913 - INFO - Epoch 1 Step 2020 (Global: 2020): loss=1.8956, ppl=6.66, grad_norm=1.51, lr=9.73e-06, throughput=2461 tok/s | |
| 2025-11-30 04:48:13,046 - INFO - Epoch 1 Step 2030 (Global: 2030): loss=1.6134, ppl=5.02, grad_norm=1.65, lr=9.73e-06, throughput=2447 tok/s | |
| 2025-11-30 04:51:29,901 - INFO - Epoch 1 Step 2040 (Global: 2040): loss=1.7184, ppl=5.58, grad_norm=1.12, lr=9.72e-06, throughput=2438 tok/s | |
| 2025-11-30 04:54:45,988 - INFO - Epoch 1 Step 2050 (Global: 2050): loss=2.0161, ppl=7.51, grad_norm=1.69, lr=9.72e-06, throughput=2448 tok/s | |
| 2025-11-30 04:58:00,483 - INFO - Epoch 1 Step 2060 (Global: 2060): loss=1.5670, ppl=4.79, grad_norm=3.20, lr=9.71e-06, throughput=2468 tok/s | |
| 2025-11-30 05:01:14,397 - INFO - Epoch 1 Step 2070 (Global: 2070): loss=1.9089, ppl=6.75, grad_norm=1.23, lr=9.71e-06, throughput=2475 tok/s | |
| 2025-11-30 05:04:28,778 - INFO - Epoch 1 Step 2080 (Global: 2080): loss=1.5894, ppl=4.90, grad_norm=3.81, lr=9.70e-06, throughput=2469 tok/s | |
| 2025-11-30 05:07:43,358 - INFO - Epoch 1 Step 2090 (Global: 2090): loss=1.5486, ppl=4.71, grad_norm=1.25, lr=9.69e-06, throughput=2467 tok/s | |
| 2025-11-30 05:10:57,109 - INFO - Epoch 1 Step 2100 (Global: 2100): loss=1.7420, ppl=5.71, grad_norm=1.48, lr=9.69e-06, throughput=2477 tok/s | |
| 2025-11-30 05:14:10,630 - INFO - Epoch 1 Step 2110 (Global: 2110): loss=1.7334, ppl=5.66, grad_norm=1.46, lr=9.68e-06, throughput=2480 tok/s | |
| 2025-11-30 05:17:24,453 - INFO - Epoch 1 Step 2120 (Global: 2120): loss=1.7418, ppl=5.71, grad_norm=2.80, lr=9.68e-06, throughput=2477 tok/s | |
| 2025-11-30 05:20:39,195 - INFO - Epoch 1 Step 2130 (Global: 2130): loss=1.8098, ppl=6.11, grad_norm=1.31, lr=9.67e-06, throughput=2465 tok/s | |
| 2025-11-30 05:23:53,576 - INFO - Epoch 1 Step 2140 (Global: 2140): loss=1.7792, ppl=5.93, grad_norm=1.16, lr=9.66e-06, throughput=2469 tok/s | |
| 2025-11-30 05:27:10,778 - INFO - Epoch 1 Step 2150 (Global: 2150): loss=1.7430, ppl=5.71, grad_norm=1.63, lr=9.66e-06, throughput=2434 tok/s | |
| 2025-11-30 05:30:24,487 - INFO - Epoch 1 Step 2160 (Global: 2160): loss=1.6752, ppl=5.34, grad_norm=1.67, lr=9.65e-06, throughput=2478 tok/s | |
| 2025-11-30 05:33:38,695 - INFO - Epoch 1 Step 2170 (Global: 2170): loss=1.6555, ppl=5.24, grad_norm=1.44, lr=9.65e-06, throughput=2472 tok/s | |
| 2025-11-30 05:36:53,385 - INFO - Epoch 1 Step 2180 (Global: 2180): loss=1.7146, ppl=5.55, grad_norm=1.20, lr=9.64e-06, throughput=2465 tok/s | |
| 2025-11-30 05:40:08,023 - INFO - Epoch 1 Step 2190 (Global: 2190): loss=1.7545, ppl=5.78, grad_norm=2.45, lr=9.63e-06, throughput=2466 tok/s | |
| 2025-11-30 05:43:22,538 - INFO - Epoch 1 Step 2200 (Global: 2200): loss=1.7484, ppl=5.75, grad_norm=2.17, lr=9.63e-06, throughput=2468 tok/s | |
| 2025-11-30 05:46:36,895 - INFO - Epoch 1 Step 2210 (Global: 2210): loss=1.6745, ppl=5.34, grad_norm=1.76, lr=9.62e-06, throughput=2470 tok/s | |
| 2025-11-30 05:49:50,378 - INFO - Epoch 1 Step 2220 (Global: 2220): loss=1.7212, ppl=5.59, grad_norm=1.61, lr=9.61e-06, throughput=2481 tok/s | |
| 2025-11-30 05:53:03,808 - INFO - Epoch 1 Step 2230 (Global: 2230): loss=1.7576, ppl=5.80, grad_norm=1.57, lr=9.61e-06, throughput=2482 tok/s | |
| 2025-11-30 05:56:17,349 - INFO - Epoch 1 Step 2240 (Global: 2240): loss=1.6814, ppl=5.37, grad_norm=2.58, lr=9.60e-06, throughput=2480 tok/s | |
| 2025-11-30 05:59:31,021 - INFO - Epoch 1 Step 2250 (Global: 2250): loss=1.6790, ppl=5.36, grad_norm=1.90, lr=9.60e-06, throughput=2478 tok/s | |
| 2025-11-30 06:02:44,873 - INFO - Epoch 1 Step 2260 (Global: 2260): loss=1.8200, ppl=6.17, grad_norm=1.41, lr=9.59e-06, throughput=2476 tok/s | |
| 2025-11-30 06:05:58,634 - INFO - Epoch 1 Step 2270 (Global: 2270): loss=1.7198, ppl=5.58, grad_norm=1.80, lr=9.58e-06, throughput=2477 tok/s | |
| 2025-11-30 06:09:12,766 - INFO - Epoch 1 Step 2280 (Global: 2280): loss=1.7209, ppl=5.59, grad_norm=1.28, lr=9.58e-06, throughput=2473 tok/s | |
| 2025-11-30 06:12:26,979 - INFO - Epoch 1 Step 2290 (Global: 2290): loss=1.7654, ppl=5.84, grad_norm=1.83, lr=9.57e-06, throughput=2472 tok/s | |
| 2025-11-30 06:15:41,018 - INFO - Epoch 1 Step 2300 (Global: 2300): loss=1.4186, ppl=4.13, grad_norm=1.39, lr=9.56e-06, throughput=2474 tok/s | |
| 2025-11-30 06:18:55,022 - INFO - Epoch 1 Step 2310 (Global: 2310): loss=1.7522, ppl=5.77, grad_norm=1.48, lr=9.55e-06, throughput=2474 tok/s | |
| 2025-11-30 06:22:08,606 - INFO - Epoch 1 Step 2320 (Global: 2320): loss=1.8613, ppl=6.43, grad_norm=1.23, lr=9.55e-06, throughput=2480 tok/s | |
| 2025-11-30 06:25:22,161 - INFO - Epoch 1 Step 2330 (Global: 2330): loss=1.9623, ppl=7.12, grad_norm=1.71, lr=9.54e-06, throughput=2480 tok/s | |
| 2025-11-30 06:28:35,844 - INFO - Epoch 1 Step 2340 (Global: 2340): loss=1.6644, ppl=5.28, grad_norm=1.57, lr=9.53e-06, throughput=2478 tok/s | |
| 2025-11-30 06:31:50,251 - INFO - Epoch 1 Step 2350 (Global: 2350): loss=1.7920, ppl=6.00, grad_norm=1.24, lr=9.53e-06, throughput=2469 tok/s | |
| 2025-11-30 06:35:04,195 - INFO - Epoch 1 Step 2360 (Global: 2360): loss=1.6308, ppl=5.11, grad_norm=1.58, lr=9.52e-06, throughput=2475 tok/s | |
| 2025-11-30 06:38:18,418 - INFO - Epoch 1 Step 2370 (Global: 2370): loss=1.7258, ppl=5.62, grad_norm=1.77, lr=9.51e-06, throughput=2471 tok/s | |
| 2025-11-30 06:41:32,568 - INFO - Epoch 1 Step 2380 (Global: 2380): loss=1.7960, ppl=6.03, grad_norm=1.33, lr=9.51e-06, throughput=2472 tok/s | |
| 2025-11-30 06:44:46,988 - INFO - Epoch 1 Step 2390 (Global: 2390): loss=1.6179, ppl=5.04, grad_norm=1.23, lr=9.50e-06, throughput=2469 tok/s | |
| 2025-11-30 06:48:01,829 - INFO - Epoch 1 Step 2400 (Global: 2400): loss=1.9549, ppl=7.06, grad_norm=2.53, lr=9.49e-06, throughput=2464 tok/s | |
| 2025-11-30 06:51:16,421 - INFO - Epoch 1 Step 2410 (Global: 2410): loss=1.7159, ppl=5.56, grad_norm=1.41, lr=9.48e-06, throughput=2467 tok/s | |
| 2025-11-30 06:54:30,154 - INFO - Epoch 1 Step 2420 (Global: 2420): loss=1.7437, ppl=5.72, grad_norm=1.31, lr=9.48e-06, throughput=2478 tok/s | |
| 2025-11-30 06:57:44,330 - INFO - Epoch 1 Step 2430 (Global: 2430): loss=1.6955, ppl=5.45, grad_norm=1.57, lr=9.47e-06, throughput=2472 tok/s | |
| 2025-11-30 07:00:58,394 - INFO - Epoch 1 Step 2440 (Global: 2440): loss=1.7419, ppl=5.71, grad_norm=3.39, lr=9.46e-06, throughput=2473 tok/s | |
| 2025-11-30 07:04:12,734 - INFO - Epoch 1 Step 2450 (Global: 2450): loss=1.6265, ppl=5.09, grad_norm=1.27, lr=9.45e-06, throughput=2470 tok/s | |
| 2025-11-30 07:07:26,978 - INFO - Epoch 1 Step 2460 (Global: 2460): loss=1.6483, ppl=5.20, grad_norm=1.20, lr=9.45e-06, throughput=2471 tok/s | |
| 2025-11-30 07:10:41,365 - INFO - Epoch 1 Step 2470 (Global: 2470): loss=1.9200, ppl=6.82, grad_norm=1.38, lr=9.44e-06, throughput=2469 tok/s | |
| 2025-11-30 07:13:56,028 - INFO - Epoch 1 Step 2480 (Global: 2480): loss=1.7728, ppl=5.89, grad_norm=1.36, lr=9.43e-06, throughput=2466 tok/s | |
| 2025-11-30 07:17:10,682 - INFO - Epoch 1 Step 2490 (Global: 2490): loss=1.7918, ppl=6.00, grad_norm=1.45, lr=9.42e-06, throughput=2466 tok/s | |
| 2025-11-30 07:20:25,309 - INFO - Epoch 1 Step 2500 (Global: 2500): loss=1.7238, ppl=5.61, grad_norm=1.53, lr=9.41e-06, throughput=2466 tok/s | |
| 2025-11-30 07:23:41,348 - INFO - Epoch 1 Step 2510 (Global: 2510): loss=1.7001, ppl=5.47, grad_norm=2.02, lr=9.41e-06, throughput=2449 tok/s | |
| 2025-11-30 07:26:55,685 - INFO - Epoch 1 Step 2520 (Global: 2520): loss=1.8295, ppl=6.23, grad_norm=1.74, lr=9.40e-06, throughput=2470 tok/s | |
| 2025-11-30 07:30:10,362 - INFO - Epoch 1 Step 2530 (Global: 2530): loss=1.6847, ppl=5.39, grad_norm=1.16, lr=9.39e-06, throughput=2466 tok/s | |
| 2025-11-30 07:33:24,332 - INFO - Epoch 1 Step 2540 (Global: 2540): loss=1.8410, ppl=6.30, grad_norm=2.20, lr=9.38e-06, throughput=2475 tok/s | |
| 2025-11-30 07:36:37,922 - INFO - Epoch 1 Step 2550 (Global: 2550): loss=1.9096, ppl=6.75, grad_norm=3.33, lr=9.37e-06, throughput=2479 tok/s | |
| 2025-11-30 07:39:52,432 - INFO - Epoch 1 Step 2560 (Global: 2560): loss=1.6514, ppl=5.21, grad_norm=1.58, lr=9.37e-06, throughput=2468 tok/s | |
| 2025-11-30 07:43:06,608 - INFO - Epoch 1 Step 2570 (Global: 2570): loss=1.6908, ppl=5.42, grad_norm=1.59, lr=9.36e-06, throughput=2472 tok/s | |
| 2025-11-30 07:46:22,014 - INFO - Epoch 1 Step 2580 (Global: 2580): loss=1.7972, ppl=6.03, grad_norm=1.51, lr=9.35e-06, throughput=2456 tok/s | |
| 2025-11-30 07:49:37,253 - INFO - Epoch 1 Step 2590 (Global: 2590): loss=1.7236, ppl=5.60, grad_norm=2.36, lr=9.34e-06, throughput=2459 tok/s | |
| 2025-11-30 07:52:51,255 - INFO - Epoch 1 Step 2600 (Global: 2600): loss=1.7932, ppl=6.01, grad_norm=1.70, lr=9.33e-06, throughput=2474 tok/s | |
| 2025-11-30 07:56:13,489 - INFO - Epoch 1 Step 2610 (Global: 2610): loss=1.6732, ppl=5.33, grad_norm=1.50, lr=9.32e-06, throughput=2373 tok/s | |
| 2025-11-30 07:59:38,108 - INFO - Epoch 1 Step 2620 (Global: 2620): loss=1.7797, ppl=5.93, grad_norm=1.48, lr=9.32e-06, throughput=2346 tok/s | |
| 2025-11-30 08:03:02,811 - INFO - Epoch 1 Step 2630 (Global: 2630): loss=1.4352, ppl=4.20, grad_norm=1.28, lr=9.31e-06, throughput=2345 tok/s | |
| 2025-11-30 08:06:26,519 - INFO - Epoch 1 Step 2640 (Global: 2640): loss=1.8791, ppl=6.55, grad_norm=2.00, lr=9.30e-06, throughput=2356 tok/s | |
| 2025-11-30 08:09:51,007 - INFO - Epoch 1 Step 2650 (Global: 2650): loss=1.9281, ppl=6.88, grad_norm=1.54, lr=9.29e-06, throughput=2347 tok/s | |
| 2025-11-30 08:13:15,168 - INFO - Epoch 1 Step 2660 (Global: 2660): loss=1.5310, ppl=4.62, grad_norm=1.61, lr=9.28e-06, throughput=2351 tok/s | |
| 2025-11-30 08:16:41,291 - INFO - Epoch 1 Step 2670 (Global: 2670): loss=1.7084, ppl=5.52, grad_norm=1.60, lr=9.27e-06, throughput=2329 tok/s | |
| 2025-11-30 08:20:07,672 - INFO - Epoch 1 Step 2680 (Global: 2680): loss=1.9283, ppl=6.88, grad_norm=2.58, lr=9.26e-06, throughput=2326 tok/s | |
| 2025-11-30 08:23:35,150 - INFO - Epoch 1 Step 2690 (Global: 2690): loss=1.6566, ppl=5.24, grad_norm=1.23, lr=9.26e-06, throughput=2314 tok/s | |
| 2025-11-30 08:26:59,412 - INFO - Epoch 1 Step 2700 (Global: 2700): loss=1.8088, ppl=6.10, grad_norm=1.23, lr=9.25e-06, throughput=2350 tok/s | |
| 2025-11-30 08:30:24,105 - INFO - Epoch 1 Step 2710 (Global: 2710): loss=1.8665, ppl=6.47, grad_norm=1.91, lr=9.24e-06, throughput=2345 tok/s | |
| 2025-11-30 08:33:49,194 - INFO - Epoch 1 Step 2720 (Global: 2720): loss=1.8014, ppl=6.06, grad_norm=1.38, lr=9.23e-06, throughput=2340 tok/s | |
| 2025-11-30 08:37:13,295 - INFO - Epoch 1 Step 2730 (Global: 2730): loss=1.8135, ppl=6.13, grad_norm=1.29, lr=9.22e-06, throughput=2352 tok/s | |
| 2025-11-30 08:40:32,698 - INFO - Epoch 1 Step 2740 (Global: 2740): loss=1.7252, ppl=5.61, grad_norm=1.19, lr=9.21e-06, throughput=2407 tok/s | |
| 2025-11-30 08:43:58,283 - INFO - Epoch 1 Step 2750 (Global: 2750): loss=1.5970, ppl=4.94, grad_norm=1.74, lr=9.20e-06, throughput=2335 tok/s | |
| 2025-11-30 08:47:24,268 - INFO - Epoch 1 Step 2760 (Global: 2760): loss=1.7095, ppl=5.53, grad_norm=1.89, lr=9.19e-06, throughput=2330 tok/s | |
| 2025-11-30 08:50:50,271 - INFO - Epoch 1 Step 2770 (Global: 2770): loss=1.7099, ppl=5.53, grad_norm=1.32, lr=9.18e-06, throughput=2330 tok/s | |
| 2025-11-30 08:54:13,680 - INFO - Epoch 1 Step 2780 (Global: 2780): loss=1.6187, ppl=5.05, grad_norm=1.48, lr=9.17e-06, throughput=2360 tok/s | |
| 2025-11-30 08:57:38,034 - INFO - Epoch 1 Step 2790 (Global: 2790): loss=1.6265, ppl=5.09, grad_norm=1.40, lr=9.17e-06, throughput=2349 tok/s | |
| 2025-11-30 09:01:01,970 - INFO - Epoch 1 Step 2800 (Global: 2800): loss=1.7860, ppl=5.97, grad_norm=1.31, lr=9.16e-06, throughput=2354 tok/s | |
| 2025-11-30 09:04:16,339 - INFO - Epoch 1 Step 2810 (Global: 2810): loss=1.6245, ppl=5.08, grad_norm=1.92, lr=9.15e-06, throughput=2470 tok/s | |
| 2025-11-30 09:07:29,589 - INFO - Epoch 1 Step 2820 (Global: 2820): loss=1.8162, ppl=6.15, grad_norm=1.41, lr=9.14e-06, throughput=2484 tok/s | |
| 2025-11-30 09:10:44,626 - INFO - Epoch 1 Step 2830 (Global: 2830): loss=1.6665, ppl=5.29, grad_norm=1.62, lr=9.13e-06, throughput=2461 tok/s | |
| 2025-11-30 09:14:06,526 - INFO - Epoch 1 Step 2840 (Global: 2840): loss=1.8787, ppl=6.55, grad_norm=1.51, lr=9.12e-06, throughput=2377 tok/s | |
| 2025-11-30 09:17:28,948 - INFO - Epoch 1 Step 2850 (Global: 2850): loss=1.6063, ppl=4.98, grad_norm=2.11, lr=9.11e-06, throughput=2371 tok/s | |
| 2025-11-30 09:20:45,876 - INFO - Epoch 1 Step 2860 (Global: 2860): loss=1.7986, ppl=6.04, grad_norm=1.28, lr=9.10e-06, throughput=2437 tok/s | |
| 2025-11-30 09:24:06,768 - INFO - Epoch 1 Step 2870 (Global: 2870): loss=1.7140, ppl=5.55, grad_norm=1.48, lr=9.09e-06, throughput=2389 tok/s | |
| 2025-11-30 09:27:29,046 - INFO - Epoch 1 Step 2880 (Global: 2880): loss=1.6697, ppl=5.31, grad_norm=1.83, lr=9.08e-06, throughput=2373 tok/s | |
| 2025-11-30 09:30:50,838 - INFO - Epoch 1 Step 2890 (Global: 2890): loss=1.8233, ppl=6.19, grad_norm=1.59, lr=9.07e-06, throughput=2379 tok/s | |
| 2025-11-30 09:34:10,489 - INFO - Epoch 1 Step 2900 (Global: 2900): loss=1.7912, ppl=6.00, grad_norm=1.34, lr=9.06e-06, throughput=2404 tok/s | |
| 2025-11-30 09:37:32,516 - INFO - Epoch 1 Step 2910 (Global: 2910): loss=1.6956, ppl=5.45, grad_norm=1.41, lr=9.05e-06, throughput=2376 tok/s | |
| 2025-11-30 09:40:54,701 - INFO - Epoch 1 Step 2920 (Global: 2920): loss=1.6776, ppl=5.35, grad_norm=1.66, lr=9.04e-06, throughput=2374 tok/s | |
| 2025-11-30 09:44:15,244 - INFO - Epoch 1 Step 2930 (Global: 2930): loss=1.6974, ppl=5.46, grad_norm=1.55, lr=9.03e-06, throughput=2394 tok/s | |
| 2025-11-30 09:47:35,856 - INFO - Epoch 1 Step 2940 (Global: 2940): loss=1.6881, ppl=5.41, grad_norm=1.34, lr=9.02e-06, throughput=2393 tok/s | |
| 2025-11-30 09:50:54,127 - INFO - Epoch 1 Step 2950 (Global: 2950): loss=1.7985, ppl=6.04, grad_norm=1.47, lr=9.01e-06, throughput=2421 tok/s | |
| 2025-11-30 09:54:08,339 - INFO - Epoch 1 Step 2960 (Global: 2960): loss=1.6480, ppl=5.20, grad_norm=1.18, lr=9.00e-06, throughput=2472 tok/s | |
| 2025-11-30 09:57:22,398 - INFO - Epoch 1 Step 2970 (Global: 2970): loss=1.8078, ppl=6.10, grad_norm=1.43, lr=8.99e-06, throughput=2474 tok/s | |
| 2025-11-30 10:00:35,931 - INFO - Epoch 1 Step 2980 (Global: 2980): loss=1.6513, ppl=5.21, grad_norm=1.23, lr=8.98e-06, throughput=2480 tok/s | |
| 2025-11-30 10:03:50,395 - INFO - Epoch 1 Step 2990 (Global: 2990): loss=1.7329, ppl=5.66, grad_norm=1.52, lr=8.97e-06, throughput=2468 tok/s | |
| 2025-11-30 10:07:04,674 - INFO - Epoch 1 Step 3000 (Global: 3000): loss=1.5825, ppl=4.87, grad_norm=1.62, lr=8.96e-06, throughput=2471 tok/s | |
| 2025-11-30 10:10:19,387 - INFO - Epoch 1 Step 3010 (Global: 3010): loss=1.6771, ppl=5.35, grad_norm=2.50, lr=8.95e-06, throughput=2465 tok/s | |
| 2025-11-30 10:13:34,265 - INFO - Epoch 1 Step 3020 (Global: 3020): loss=1.8577, ppl=6.41, grad_norm=1.93, lr=8.94e-06, throughput=2463 tok/s | |
| 2025-11-30 10:16:48,005 - INFO - Epoch 1 Step 3030 (Global: 3030): loss=1.8153, ppl=6.14, grad_norm=1.70, lr=8.93e-06, throughput=2478 tok/s | |
| 2025-11-30 10:20:04,841 - INFO - Epoch 1 Step 3040 (Global: 3040): loss=1.5882, ppl=4.89, grad_norm=3.03, lr=8.92e-06, throughput=2439 tok/s | |
| 2025-11-30 10:23:21,298 - INFO - Epoch 1 Step 3050 (Global: 3050): loss=1.7574, ppl=5.80, grad_norm=2.39, lr=8.91e-06, throughput=2443 tok/s | |
| 2025-11-30 10:26:40,598 - INFO - Epoch 1 Step 3060 (Global: 3060): loss=1.7843, ppl=5.96, grad_norm=2.84, lr=8.90e-06, throughput=2408 tok/s | |
| 2025-11-30 10:30:01,510 - INFO - Epoch 1 Step 3070 (Global: 3070): loss=1.7588, ppl=5.81, grad_norm=1.80, lr=8.89e-06, throughput=2389 tok/s | |
| 2025-11-30 10:33:25,963 - INFO - Epoch 1 Step 3080 (Global: 3080): loss=1.6130, ppl=5.02, grad_norm=1.47, lr=8.88e-06, throughput=2348 tok/s | |
| 2025-11-30 10:36:47,362 - INFO - Epoch 1 Step 3090 (Global: 3090): loss=1.3802, ppl=3.98, grad_norm=1.57, lr=8.87e-06, throughput=2383 tok/s | |
| 2025-11-30 10:40:11,675 - INFO - Epoch 1 Step 3100 (Global: 3100): loss=1.6574, ppl=5.25, grad_norm=1.41, lr=8.86e-06, throughput=2349 tok/s | |
| 2025-11-30 10:43:32,407 - INFO - Epoch 1 Step 3110 (Global: 3110): loss=1.5625, ppl=4.77, grad_norm=1.24, lr=8.85e-06, throughput=2391 tok/s | |
| 2025-11-30 10:46:52,494 - INFO - Epoch 1 Step 3120 (Global: 3120): loss=1.7834, ppl=5.95, grad_norm=1.14, lr=8.84e-06, throughput=2399 tok/s | |
| 2025-11-30 10:50:15,771 - INFO - Epoch 1 Step 3130 (Global: 3130): loss=1.8128, ppl=6.13, grad_norm=1.43, lr=8.82e-06, throughput=2361 tok/s | |
| 2025-11-30 10:53:35,223 - INFO - Epoch 1 Step 3140 (Global: 3140): loss=1.7177, ppl=5.57, grad_norm=1.22, lr=8.81e-06, throughput=2407 tok/s | |
| 2025-11-30 10:56:55,028 - INFO - Epoch 1 Step 3150 (Global: 3150): loss=1.5238, ppl=4.59, grad_norm=1.27, lr=8.80e-06, throughput=2402 tok/s | |
| 2025-11-30 11:00:12,250 - INFO - Epoch 1 Step 3160 (Global: 3160): loss=1.5867, ppl=4.89, grad_norm=1.24, lr=8.79e-06, throughput=2434 tok/s | |
| 2025-11-30 11:03:30,939 - INFO - Epoch 1 Step 3170 (Global: 3170): loss=1.8491, ppl=6.35, grad_norm=1.32, lr=8.78e-06, throughput=2416 tok/s | |
| 2025-11-30 11:06:48,637 - INFO - Epoch 1 Step 3180 (Global: 3180): loss=1.9155, ppl=6.79, grad_norm=1.73, lr=8.77e-06, throughput=2428 tok/s | |
| 2025-11-30 11:10:08,653 - INFO - Epoch 1 Step 3190 (Global: 3190): loss=1.7998, ppl=6.05, grad_norm=2.62, lr=8.76e-06, throughput=2400 tok/s | |
| 2025-11-30 11:13:30,750 - INFO - Epoch 1 Step 3200 (Global: 3200): loss=1.6156, ppl=5.03, grad_norm=1.26, lr=8.75e-06, throughput=2375 tok/s | |
| 2025-11-30 11:16:49,286 - INFO - Epoch 1 Step 3210 (Global: 3210): loss=1.8901, ppl=6.62, grad_norm=1.34, lr=8.74e-06, throughput=2418 tok/s | |
| 2025-11-30 11:20:09,008 - INFO - Epoch 1 Step 3220 (Global: 3220): loss=1.7328, ppl=5.66, grad_norm=1.45, lr=8.73e-06, throughput=2403 tok/s | |
| 2025-11-30 11:23:28,693 - INFO - Epoch 1 Step 3230 (Global: 3230): loss=1.4371, ppl=4.21, grad_norm=1.62, lr=8.71e-06, throughput=2404 tok/s | |
| 2025-11-30 11:26:50,043 - INFO - Epoch 1 Step 3240 (Global: 3240): loss=1.8445, ppl=6.33, grad_norm=1.84, lr=8.70e-06, throughput=2384 tok/s | |
| 2025-11-30 11:30:14,972 - INFO - Epoch 1 Step 3250 (Global: 3250): loss=1.6049, ppl=4.98, grad_norm=1.20, lr=8.69e-06, throughput=2342 tok/s | |
| 2025-11-30 11:33:38,938 - INFO - Epoch 1 Step 3260 (Global: 3260): loss=1.6173, ppl=5.04, grad_norm=1.61, lr=8.68e-06, throughput=2353 tok/s | |
| 2025-11-30 11:36:58,707 - INFO - Epoch 1 Step 3270 (Global: 3270): loss=1.9331, ppl=6.91, grad_norm=1.42, lr=8.67e-06, throughput=2403 tok/s | |
| 2025-11-30 11:40:16,336 - INFO - Epoch 1 Step 3280 (Global: 3280): loss=1.8896, ppl=6.62, grad_norm=2.12, lr=8.66e-06, throughput=2429 tok/s | |
| 2025-11-30 11:43:33,269 - INFO - Epoch 1 Step 3290 (Global: 3290): loss=1.9655, ppl=7.14, grad_norm=2.17, lr=8.65e-06, throughput=2437 tok/s | |
| 2025-11-30 11:46:50,587 - INFO - Epoch 1 Step 3300 (Global: 3300): loss=2.1192, ppl=8.32, grad_norm=1.48, lr=8.63e-06, throughput=2433 tok/s | |
| 2025-11-30 11:50:13,357 - INFO - Epoch 1 Step 3310 (Global: 3310): loss=1.6930, ppl=5.44, grad_norm=1.59, lr=8.62e-06, throughput=2367 tok/s | |
| 2025-11-30 11:53:32,224 - INFO - Epoch 1 Step 3320 (Global: 3320): loss=1.7162, ppl=5.56, grad_norm=2.56, lr=8.61e-06, throughput=2414 tok/s | |
| 2025-11-30 11:56:52,010 - INFO - Epoch 1 Step 3330 (Global: 3330): loss=1.3997, ppl=4.05, grad_norm=1.66, lr=8.60e-06, throughput=2403 tok/s | |
| 2025-11-30 12:00:10,648 - INFO - Epoch 1 Step 3340 (Global: 3340): loss=2.1275, ppl=8.39, grad_norm=1.34, lr=8.59e-06, throughput=2416 tok/s | |
| 2025-11-30 12:03:27,243 - INFO - Epoch 1 Step 3350 (Global: 3350): loss=1.9013, ppl=6.69, grad_norm=1.46, lr=8.58e-06, throughput=2442 tok/s | |
| 2025-11-30 12:06:42,601 - INFO - Epoch 1 Step 3360 (Global: 3360): loss=1.5843, ppl=4.88, grad_norm=1.47, lr=8.57e-06, throughput=2457 tok/s | |
| 2025-11-30 12:09:57,793 - INFO - Epoch 1 Step 3370 (Global: 3370): loss=1.7385, ppl=5.69, grad_norm=1.41, lr=8.55e-06, throughput=2459 tok/s | |
| 2025-11-30 12:13:12,218 - INFO - Epoch 1 Step 3380 (Global: 3380): loss=1.7035, ppl=5.49, grad_norm=2.14, lr=8.54e-06, throughput=2469 tok/s | |
| 2025-11-30 12:16:28,077 - INFO - Epoch 1 Step 3390 (Global: 3390): loss=1.8487, ppl=6.35, grad_norm=1.92, lr=8.53e-06, throughput=2451 tok/s | |
| 2025-11-30 12:19:42,604 - INFO - Epoch 1 Step 3400 (Global: 3400): loss=1.5037, ppl=4.50, grad_norm=2.36, lr=8.52e-06, throughput=2468 tok/s | |
| 2025-11-30 12:22:57,937 - INFO - Epoch 1 Step 3410 (Global: 3410): loss=1.7774, ppl=5.91, grad_norm=1.14, lr=8.51e-06, throughput=2457 tok/s | |
| 2025-11-30 12:26:13,356 - INFO - Epoch 1 Step 3420 (Global: 3420): loss=1.8538, ppl=6.38, grad_norm=1.97, lr=8.49e-06, throughput=2456 tok/s | |
| 2025-11-30 12:29:28,895 - INFO - Epoch 1 Step 3430 (Global: 3430): loss=1.5249, ppl=4.59, grad_norm=1.46, lr=8.48e-06, throughput=2455 tok/s | |
| 2025-11-30 12:32:45,844 - INFO - Epoch 1 Step 3440 (Global: 3440): loss=1.6358, ppl=5.13, grad_norm=1.84, lr=8.47e-06, throughput=2437 tok/s | |
| 2025-11-30 12:36:01,689 - INFO - Epoch 1 Step 3450 (Global: 3450): loss=1.6329, ppl=5.12, grad_norm=1.29, lr=8.46e-06, throughput=2451 tok/s | |
| 2025-11-30 12:39:17,033 - INFO - Epoch 1 Step 3460 (Global: 3460): loss=1.8464, ppl=6.34, grad_norm=1.54, lr=8.45e-06, throughput=2457 tok/s | |
| 2025-11-30 12:42:34,137 - INFO - Epoch 1 Step 3470 (Global: 3470): loss=1.7598, ppl=5.81, grad_norm=1.88, lr=8.43e-06, throughput=2435 tok/s | |
| 2025-11-30 12:45:50,896 - INFO - Epoch 1 Step 3480 (Global: 3480): loss=1.5873, ppl=4.89, grad_norm=1.73, lr=8.42e-06, throughput=2440 tok/s | |
| 2025-11-30 12:49:08,850 - INFO - Epoch 1 Step 3490 (Global: 3490): loss=1.7584, ppl=5.80, grad_norm=1.38, lr=8.41e-06, throughput=2425 tok/s | |
| 2025-11-30 12:52:25,021 - INFO - Epoch 1 Step 3500 (Global: 3500): loss=1.9355, ppl=6.93, grad_norm=1.20, lr=8.40e-06, throughput=2447 tok/s | |
| 2025-11-30 12:55:39,641 - INFO - Epoch 1 Step 3510 (Global: 3510): loss=1.8492, ppl=6.35, grad_norm=1.74, lr=8.38e-06, throughput=2466 tok/s | |
| 2025-11-30 12:58:55,289 - INFO - Epoch 1 Step 3520 (Global: 3520): loss=1.8529, ppl=6.38, grad_norm=1.88, lr=8.37e-06, throughput=2453 tok/s | |
| 2025-11-30 13:02:11,954 - INFO - Epoch 1 Step 3530 (Global: 3530): loss=1.6198, ppl=5.05, grad_norm=1.37, lr=8.36e-06, throughput=2441 tok/s | |
| 2025-11-30 13:05:27,390 - INFO - Epoch 1 Step 3540 (Global: 3540): loss=1.7140, ppl=5.55, grad_norm=2.08, lr=8.35e-06, throughput=2456 tok/s | |
| 2025-11-30 13:08:41,402 - INFO - Epoch 1 Step 3550 (Global: 3550): loss=1.8852, ppl=6.59, grad_norm=1.98, lr=8.33e-06, throughput=2474 tok/s | |
| 2025-11-30 13:11:58,724 - INFO - Epoch 1 Step 3560 (Global: 3560): loss=1.9218, ppl=6.83, grad_norm=1.17, lr=8.32e-06, throughput=2433 tok/s | |
| 2025-11-30 13:15:18,520 - INFO - Epoch 1 Step 3570 (Global: 3570): loss=1.5662, ppl=4.79, grad_norm=1.55, lr=8.31e-06, throughput=2402 tok/s | |
| 2025-11-30 13:18:36,194 - INFO - Epoch 1 Step 3580 (Global: 3580): loss=1.8803, ppl=6.56, grad_norm=1.79, lr=8.30e-06, throughput=2428 tok/s | |
| 2025-11-30 13:21:53,545 - INFO - Epoch 1 Step 3590 (Global: 3590): loss=1.6153, ppl=5.03, grad_norm=2.23, lr=8.28e-06, throughput=2432 tok/s | |
| 2025-11-30 13:25:10,878 - INFO - Epoch 1 Step 3600 (Global: 3600): loss=1.5010, ppl=4.49, grad_norm=1.02, lr=8.27e-06, throughput=2432 tok/s | |
| 2025-11-30 13:28:28,945 - INFO - Epoch 1 Step 3610 (Global: 3610): loss=1.6819, ppl=5.38, grad_norm=1.91, lr=8.26e-06, throughput=2423 tok/s | |
| 2025-11-30 13:31:48,191 - INFO - Epoch 1 Step 3620 (Global: 3620): loss=1.5301, ppl=4.62, grad_norm=1.45, lr=8.25e-06, throughput=2409 tok/s | |
| 2025-11-30 13:35:07,555 - INFO - Epoch 1 Step 3630 (Global: 3630): loss=1.3520, ppl=3.87, grad_norm=1.60, lr=8.23e-06, throughput=2408 tok/s | |
| 2025-11-30 13:38:24,942 - INFO - Epoch 1 Step 3640 (Global: 3640): loss=1.5008, ppl=4.49, grad_norm=1.48, lr=8.22e-06, throughput=2432 tok/s | |
| 2025-11-30 13:41:41,774 - INFO - Epoch 1 Step 3650 (Global: 3650): loss=1.8512, ppl=6.37, grad_norm=1.21, lr=8.21e-06, throughput=2439 tok/s | |
| 2025-11-30 13:44:58,599 - INFO - Epoch 1 Step 3660 (Global: 3660): loss=1.7480, ppl=5.74, grad_norm=1.90, lr=8.20e-06, throughput=2439 tok/s | |
| 2025-11-30 13:48:16,370 - INFO - Epoch 1 Step 3670 (Global: 3670): loss=1.6294, ppl=5.10, grad_norm=1.48, lr=8.18e-06, throughput=2427 tok/s | |
| 2025-11-30 13:51:32,746 - INFO - Epoch 1 Step 3680 (Global: 3680): loss=1.2802, ppl=3.60, grad_norm=1.55, lr=8.17e-06, throughput=2444 tok/s | |
| 2025-11-30 13:54:49,497 - INFO - Epoch 1 Step 3690 (Global: 3690): loss=1.6859, ppl=5.40, grad_norm=1.41, lr=8.16e-06, throughput=2440 tok/s | |
| 2025-11-30 13:58:06,307 - INFO - Epoch 1 Step 3700 (Global: 3700): loss=1.6187, ppl=5.05, grad_norm=2.62, lr=8.14e-06, throughput=2439 tok/s | |
| 2025-11-30 14:01:22,013 - INFO - Epoch 1 Step 3710 (Global: 3710): loss=1.4438, ppl=4.24, grad_norm=2.17, lr=8.13e-06, throughput=2453 tok/s | |
| 2025-11-30 14:04:40,886 - INFO - Epoch 1 Step 3720 (Global: 3720): loss=1.5870, ppl=4.89, grad_norm=2.52, lr=8.12e-06, throughput=2414 tok/s | |
| 2025-11-30 14:07:59,264 - INFO - Epoch 1 Step 3730 (Global: 3730): loss=1.8718, ppl=6.50, grad_norm=1.46, lr=8.10e-06, throughput=2420 tok/s | |
| 2025-11-30 14:11:17,239 - INFO - Epoch 1 Step 3740 (Global: 3740): loss=1.8201, ppl=6.17, grad_norm=1.62, lr=8.09e-06, throughput=2425 tok/s | |
| 2025-11-30 14:14:36,538 - INFO - Epoch 1 Step 3750 (Global: 3750): loss=1.5511, ppl=4.72, grad_norm=1.27, lr=8.08e-06, throughput=2408 tok/s | |
| 2025-11-30 14:17:55,617 - INFO - Epoch 1 Step 3760 (Global: 3760): loss=1.7049, ppl=5.50, grad_norm=1.17, lr=8.06e-06, throughput=2411 tok/s | |
| 2025-11-30 14:21:13,138 - INFO - Epoch 1 Step 3770 (Global: 3770): loss=1.5745, ppl=4.83, grad_norm=1.20, lr=8.05e-06, throughput=2430 tok/s | |
| 2025-11-30 14:24:31,216 - INFO - Epoch 1 Step 3780 (Global: 3780): loss=1.7722, ppl=5.88, grad_norm=2.23, lr=8.04e-06, throughput=2423 tok/s | |
| 2025-11-30 14:27:47,417 - INFO - Epoch 1 Step 3790 (Global: 3790): loss=1.4312, ppl=4.18, grad_norm=1.66, lr=8.02e-06, throughput=2446 tok/s | |
| 2025-11-30 14:31:04,866 - INFO - Epoch 1 Step 3800 (Global: 3800): loss=1.8111, ppl=6.12, grad_norm=1.41, lr=8.01e-06, throughput=2431 tok/s | |
| 2025-11-30 14:34:22,190 - INFO - Epoch 1 Step 3810 (Global: 3810): loss=1.7404, ppl=5.70, grad_norm=1.48, lr=8.00e-06, throughput=2433 tok/s | |
| 2025-11-30 14:37:37,967 - INFO - Epoch 1 Step 3820 (Global: 3820): loss=1.7358, ppl=5.67, grad_norm=1.71, lr=7.98e-06, throughput=2452 tok/s | |
| 2025-11-30 14:40:53,974 - INFO - Epoch 1 Step 3830 (Global: 3830): loss=1.8289, ppl=6.23, grad_norm=1.41, lr=7.97e-06, throughput=2449 tok/s | |
| 2025-11-30 14:44:08,791 - INFO - Epoch 1 Step 3840 (Global: 3840): loss=1.6815, ppl=5.37, grad_norm=1.52, lr=7.96e-06, throughput=2464 tok/s | |
| 2025-11-30 14:47:23,975 - INFO - Epoch 1 Step 3850 (Global: 3850): loss=1.7579, ppl=5.80, grad_norm=1.88, lr=7.94e-06, throughput=2459 tok/s | |
| 2025-11-30 14:50:39,016 - INFO - Epoch 1 Step 3860 (Global: 3860): loss=1.4883, ppl=4.43, grad_norm=1.46, lr=7.93e-06, throughput=2461 tok/s | |
| 2025-11-30 14:53:53,247 - INFO - Epoch 1 Step 3870 (Global: 3870): loss=1.6564, ppl=5.24, grad_norm=1.38, lr=7.92e-06, throughput=2471 tok/s | |
| 2025-11-30 14:57:07,991 - INFO - Epoch 1 Step 3880 (Global: 3880): loss=1.4742, ppl=4.37, grad_norm=2.33, lr=7.90e-06, throughput=2465 tok/s | |
| 2025-11-30 15:00:23,330 - INFO - Epoch 1 Step 3890 (Global: 3890): loss=1.8121, ppl=6.12, grad_norm=4.31, lr=7.89e-06, throughput=2457 tok/s | |
| 2025-11-30 15:03:38,865 - INFO - Epoch 1 Step 3900 (Global: 3900): loss=1.5546, ppl=4.73, grad_norm=2.12, lr=7.88e-06, throughput=2455 tok/s | |
| 2025-11-30 15:06:54,353 - INFO - Epoch 1 Step 3910 (Global: 3910): loss=1.7023, ppl=5.49, grad_norm=1.75, lr=7.86e-06, throughput=2455 tok/s | |
| 2025-11-30 15:10:11,261 - INFO - Epoch 1 Step 3920 (Global: 3920): loss=1.6156, ppl=5.03, grad_norm=2.22, lr=7.85e-06, throughput=2438 tok/s | |
| 2025-11-30 15:13:29,275 - INFO - Epoch 1 Step 3930 (Global: 3930): loss=1.6811, ppl=5.37, grad_norm=1.28, lr=7.83e-06, throughput=2424 tok/s | |
| 2025-11-30 15:16:46,820 - INFO - Epoch 1 Step 3940 (Global: 3940): loss=1.6818, ppl=5.37, grad_norm=1.47, lr=7.82e-06, throughput=2430 tok/s | |
| 2025-11-30 15:20:07,457 - INFO - Epoch 1 Step 3950 (Global: 3950): loss=1.8988, ppl=6.68, grad_norm=1.77, lr=7.81e-06, throughput=2392 tok/s | |
| 2025-11-30 15:23:27,922 - INFO - Epoch 1 Step 3960 (Global: 3960): loss=1.9106, ppl=6.76, grad_norm=1.33, lr=7.79e-06, throughput=2394 tok/s | |
| 2025-11-30 15:26:49,396 - INFO - Epoch 1 Step 3970 (Global: 3970): loss=1.6992, ppl=5.47, grad_norm=1.07, lr=7.78e-06, throughput=2382 tok/s | |
| 2025-11-30 15:30:09,930 - INFO - Epoch 1 Step 3980 (Global: 3980): loss=1.7925, ppl=6.00, grad_norm=1.49, lr=7.77e-06, throughput=2394 tok/s | |
| 2025-11-30 15:33:32,080 - INFO - Epoch 1 Step 3990 (Global: 3990): loss=1.6309, ppl=5.11, grad_norm=1.85, lr=7.75e-06, throughput=2374 tok/s | |
| 2025-11-30 15:36:52,260 - INFO - Epoch 1 Step 4000 (Global: 4000): loss=1.5731, ppl=4.82, grad_norm=1.21, lr=7.74e-06, throughput=2398 tok/s | |
| 2025-11-30 15:36:52,261 - INFO - | |
| Running validation at step 4000... | |
| 2025-11-30 15:48:16,085 - INFO - Validation loss: 1.7207, perplexity: 5.59 | |
| 2025-11-30 15:48:16,085 - INFO - | |
| ====================================================================== | |
| 2025-11-30 15:48:16,086 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-30 15:48:16,086 - INFO - ====================================================================== | |
| 2025-11-30 15:48:16,086 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-30 15:48:16,086 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-30 15:48:16,086 - INFO - Generated: ' to the band\'s previous album, The A.V. Club, and said that it was "a more mature, more complex, and more interesting record" than the band\'s previous album. He said that the album was "a more mature,...' | |
| 2025-11-30 15:48:16,087 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' | |
| 2025-11-30 15:48:16,087 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 15:48:16,087 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-30 15:48:16,087 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-30 15:48:16,088 - INFO - Generated: 'aternity, as the Order of Angel is a fraternal organization. The Order of Angel is a fraternal organization, and the Order of Angel is a fraternal organization. The Order of Angel is a fraternal organ...' | |
| 2025-11-30 15:48:16,088 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' | |
| 2025-11-30 15:48:16,088 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 15:48:16,088 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-30 15:48:16,088 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-30 15:48:16,088 - INFO - Generated: " be defeated by Oga and Miki. Oga and Miki then go on to become the strongest fighters in the Red Tails, and are later defeated by Teimou's shadow group, the Six Knights, who are led by the shadow of ..." | |
| 2025-11-30 15:48:16,089 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." | |
| 2025-11-30 15:48:16,089 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 15:48:16,089 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-30 15:48:16,089 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-30 15:48:16,089 - INFO - Generated: ' | 0x00 | 0x00 | 0x00 | U+0B01..0B03, 0B05..0B0C, 0B07..0B10, 0B13..0B28, 0B2A..0B30, 0B32..0B33, 0B36..0B39, 0B3C..0B43, 0B47..0B48, 0B4B..0B4D, 0B57, 0B5C..0B5D, 0B5F..0B61...' | |
| 2025-11-30 15:48:16,090 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' | |
| 2025-11-30 15:48:16,090 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 15:48:16,090 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-30 15:48:16,090 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-30 15:48:16,090 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 | August 30, 2011 | PlayStation 4 | EA Tiburon ...' | |
| 2025-11-30 15:48:16,090 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' | |
| 2025-11-30 15:48:16,091 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-30 15:48:16,091 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/qualitative_step_4000.jsonl | |
| 2025-11-30 15:50:25,324 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/best_checkpoint.pt | |
| 2025-11-30 15:50:25,365 - INFO - New best validation loss: 1.7207, perplexity: 5.59 | |
| 2025-11-30 15:53:43,802 - INFO - Epoch 1 Step 4010 (Global: 4010): loss=1.6228, ppl=5.07, grad_norm=1.44, lr=7.72e-06, throughput=2419 tok/s | |
| 2025-11-30 15:57:01,871 - INFO - Epoch 1 Step 4020 (Global: 4020): loss=1.9098, ppl=6.75, grad_norm=1.63, lr=7.71e-06, throughput=2423 tok/s | |
| 2025-11-30 16:00:20,650 - INFO - Epoch 1 Step 4030 (Global: 4030): loss=1.5825, ppl=4.87, grad_norm=1.23, lr=7.70e-06, throughput=2415 tok/s | |
| 2025-11-30 16:03:39,333 - INFO - Epoch 1 Step 4040 (Global: 4040): loss=1.7322, ppl=5.65, grad_norm=2.39, lr=7.68e-06, throughput=2416 tok/s | |
| 2025-11-30 16:06:55,485 - INFO - Epoch 1 Step 4050 (Global: 4050): loss=1.5071, ppl=4.51, grad_norm=1.56, lr=7.67e-06, throughput=2447 tok/s | |
| 2025-11-30 16:10:12,564 - INFO - Epoch 1 Step 4060 (Global: 4060): loss=1.8305, ppl=6.24, grad_norm=1.45, lr=7.65e-06, throughput=2436 tok/s | |
| 2025-11-30 16:13:29,054 - INFO - Epoch 1 Step 4070 (Global: 4070): loss=1.5693, ppl=4.80, grad_norm=2.05, lr=7.64e-06, throughput=2443 tok/s | |
| 2025-11-30 16:16:45,976 - INFO - Epoch 1 Step 4080 (Global: 4080): loss=1.7724, ppl=5.88, grad_norm=1.53, lr=7.62e-06, throughput=2438 tok/s | |
| 2025-11-30 16:20:03,391 - INFO - Epoch 1 Step 4090 (Global: 4090): loss=1.7391, ppl=5.69, grad_norm=1.86, lr=7.61e-06, throughput=2431 tok/s | |
| 2025-11-30 16:23:21,234 - INFO - Epoch 1 Step 4100 (Global: 4100): loss=1.6796, ppl=5.36, grad_norm=1.41, lr=7.60e-06, throughput=2426 tok/s | |
| 2025-11-30 16:26:38,993 - INFO - Epoch 1 Step 4110 (Global: 4110): loss=1.5941, ppl=4.92, grad_norm=1.05, lr=7.58e-06, throughput=2427 tok/s | |
| 2025-11-30 16:29:56,148 - INFO - Epoch 1 Step 4120 (Global: 4120): loss=1.8876, ppl=6.60, grad_norm=1.55, lr=7.57e-06, throughput=2435 tok/s | |
| 2025-11-30 16:33:13,223 - INFO - Epoch 1 Step 4130 (Global: 4130): loss=1.8152, ppl=6.14, grad_norm=1.35, lr=7.55e-06, throughput=2436 tok/s | |
| 2025-11-30 16:36:30,638 - INFO - Epoch 1 Step 4140 (Global: 4140): loss=1.7266, ppl=5.62, grad_norm=1.16, lr=7.54e-06, throughput=2431 tok/s | |
| 2025-11-30 16:39:48,219 - INFO - Epoch 1 Step 4150 (Global: 4150): loss=1.7379, ppl=5.69, grad_norm=1.50, lr=7.52e-06, throughput=2429 tok/s | |
| 2025-11-30 16:43:03,725 - INFO - Epoch 1 Step 4160 (Global: 4160): loss=1.5015, ppl=4.49, grad_norm=1.33, lr=7.51e-06, throughput=2455 tok/s | |
| 2025-11-30 16:46:20,311 - INFO - Epoch 1 Step 4170 (Global: 4170): loss=1.8840, ppl=6.58, grad_norm=1.34, lr=7.49e-06, throughput=2442 tok/s | |
| 2025-11-30 16:49:36,678 - INFO - Epoch 1 Step 4180 (Global: 4180): loss=1.7856, ppl=5.96, grad_norm=1.47, lr=7.48e-06, throughput=2444 tok/s | |
| 2025-11-30 16:52:52,534 - INFO - Epoch 1 Step 4190 (Global: 4190): loss=1.5090, ppl=4.52, grad_norm=1.36, lr=7.47e-06, throughput=2451 tok/s | |
| 2025-11-30 16:56:08,292 - INFO - Epoch 1 Step 4200 (Global: 4200): loss=1.6266, ppl=5.09, grad_norm=1.98, lr=7.45e-06, throughput=2452 tok/s | |
| 2025-11-30 16:59:23,552 - INFO - Epoch 1 Step 4210 (Global: 4210): loss=1.6514, ppl=5.21, grad_norm=2.08, lr=7.44e-06, throughput=2458 tok/s | |
| 2025-11-30 17:02:39,685 - INFO - Epoch 1 Step 4220 (Global: 4220): loss=1.8068, ppl=6.09, grad_norm=1.45, lr=7.42e-06, throughput=2448 tok/s | |
| 2025-11-30 17:05:56,449 - INFO - Epoch 1 Step 4230 (Global: 4230): loss=1.7355, ppl=5.67, grad_norm=2.27, lr=7.41e-06, throughput=2439 tok/s | |
| 2025-11-30 17:09:13,920 - INFO - Epoch 1 Step 4240 (Global: 4240): loss=1.7907, ppl=5.99, grad_norm=2.64, lr=7.39e-06, throughput=2431 tok/s | |
| 2025-11-30 17:12:31,038 - INFO - Epoch 1 Step 4250 (Global: 4250): loss=1.7490, ppl=5.75, grad_norm=2.66, lr=7.38e-06, throughput=2435 tok/s | |
| 2025-11-30 17:15:48,521 - INFO - Epoch 1 Step 4260 (Global: 4260): loss=1.7224, ppl=5.60, grad_norm=1.66, lr=7.36e-06, throughput=2431 tok/s | |
| 2025-11-30 17:19:07,123 - INFO - Epoch 1 Step 4270 (Global: 4270): loss=1.6473, ppl=5.19, grad_norm=1.84, lr=7.35e-06, throughput=2417 tok/s | |
| 2025-11-30 17:22:25,122 - INFO - Epoch 1 Step 4280 (Global: 4280): loss=1.5968, ppl=4.94, grad_norm=2.48, lr=7.33e-06, throughput=2424 tok/s | |
| 2025-11-30 17:25:44,120 - INFO - Epoch 1 Step 4290 (Global: 4290): loss=1.5973, ppl=4.94, grad_norm=1.26, lr=7.32e-06, throughput=2412 tok/s | |
| 2025-11-30 17:29:02,982 - INFO - Epoch 1 Step 4300 (Global: 4300): loss=1.4252, ppl=4.16, grad_norm=1.37, lr=7.30e-06, throughput=2414 tok/s | |
| 2025-11-30 17:32:23,422 - INFO - Epoch 1 Step 4310 (Global: 4310): loss=1.4065, ppl=4.08, grad_norm=1.38, lr=7.29e-06, throughput=2395 tok/s | |
| 2025-11-30 17:35:41,278 - INFO - Epoch 1 Step 4320 (Global: 4320): loss=1.7365, ppl=5.68, grad_norm=2.03, lr=7.27e-06, throughput=2426 tok/s | |
| 2025-11-30 17:38:58,862 - INFO - Epoch 1 Step 4330 (Global: 4330): loss=1.6298, ppl=5.10, grad_norm=1.22, lr=7.26e-06, throughput=2429 tok/s | |
| 2025-11-30 17:42:13,613 - INFO - Epoch 1 Step 4340 (Global: 4340): loss=1.5436, ppl=4.68, grad_norm=1.48, lr=7.24e-06, throughput=2465 tok/s | |
| 2025-11-30 17:45:30,508 - INFO - Epoch 1 Step 4350 (Global: 4350): loss=1.6504, ppl=5.21, grad_norm=1.16, lr=7.23e-06, throughput=2438 tok/s | |
| 2025-11-30 17:48:49,003 - INFO - Epoch 1 Step 4360 (Global: 4360): loss=1.4532, ppl=4.28, grad_norm=1.33, lr=7.21e-06, throughput=2418 tok/s | |
| 2025-11-30 17:52:07,388 - INFO - Epoch 1 Step 4370 (Global: 4370): loss=1.7841, ppl=5.95, grad_norm=1.11, lr=7.20e-06, throughput=2420 tok/s | |
| 2025-11-30 17:55:26,763 - INFO - Epoch 1 Step 4380 (Global: 4380): loss=1.8025, ppl=6.06, grad_norm=1.73, lr=7.18e-06, throughput=2408 tok/s | |
| 2025-11-30 17:58:44,367 - INFO - Epoch 1 Step 4390 (Global: 4390): loss=1.7768, ppl=5.91, grad_norm=2.94, lr=7.17e-06, throughput=2429 tok/s | |
| 2025-11-30 18:02:02,873 - INFO - Epoch 1 Step 4400 (Global: 4400): loss=1.5801, ppl=4.86, grad_norm=1.16, lr=7.15e-06, throughput=2418 tok/s | |
| 2025-11-30 18:05:21,820 - INFO - Epoch 1 Step 4410 (Global: 4410): loss=1.8255, ppl=6.21, grad_norm=1.36, lr=7.14e-06, throughput=2413 tok/s | |
| 2025-11-30 18:08:42,448 - INFO - Epoch 1 Step 4420 (Global: 4420): loss=1.8581, ppl=6.41, grad_norm=2.08, lr=7.12e-06, throughput=2393 tok/s | |
| 2025-11-30 18:11:59,713 - INFO - Epoch 1 Step 4430 (Global: 4430): loss=1.6683, ppl=5.30, grad_norm=1.25, lr=7.11e-06, throughput=2433 tok/s | |
| 2025-11-30 18:15:18,643 - INFO - Epoch 1 Step 4440 (Global: 4440): loss=1.9946, ppl=7.35, grad_norm=1.41, lr=7.09e-06, throughput=2413 tok/s | |
| 2025-11-30 18:18:39,691 - INFO - Epoch 1 Step 4450 (Global: 4450): loss=1.5769, ppl=4.84, grad_norm=1.73, lr=7.08e-06, throughput=2388 tok/s | |
| 2025-11-30 18:22:02,087 - INFO - Epoch 1 Step 4460 (Global: 4460): loss=1.8654, ppl=6.46, grad_norm=7.41, lr=7.06e-06, throughput=2372 tok/s | |
| 2025-11-30 18:25:23,643 - INFO - Epoch 1 Step 4470 (Global: 4470): loss=1.6603, ppl=5.26, grad_norm=2.25, lr=7.05e-06, throughput=2381 tok/s | |
| 2025-11-30 18:28:44,021 - INFO - Epoch 1 Step 4480 (Global: 4480): loss=1.8884, ppl=6.61, grad_norm=2.58, lr=7.03e-06, throughput=2395 tok/s | |
| 2025-11-30 18:32:02,491 - INFO - Epoch 1 Step 4490 (Global: 4490): loss=1.8129, ppl=6.13, grad_norm=1.32, lr=7.02e-06, throughput=2419 tok/s | |
| 2025-11-30 18:35:22,026 - INFO - Epoch 1 Step 4500 (Global: 4500): loss=1.8050, ppl=6.08, grad_norm=1.22, lr=7.00e-06, throughput=2406 tok/s | |
| 2025-11-30 18:38:40,050 - INFO - Epoch 1 Step 4510 (Global: 4510): loss=1.6610, ppl=5.26, grad_norm=1.47, lr=6.99e-06, throughput=2424 tok/s | |
| 2025-11-30 18:41:59,048 - INFO - Epoch 1 Step 4520 (Global: 4520): loss=1.6439, ppl=5.18, grad_norm=1.30, lr=6.97e-06, throughput=2412 tok/s | |
| 2025-11-30 18:45:16,252 - INFO - Epoch 1 Step 4530 (Global: 4530): loss=1.6243, ppl=5.08, grad_norm=1.55, lr=6.96e-06, throughput=2434 tok/s | |
| 2025-11-30 18:48:34,568 - INFO - Epoch 1 Step 4540 (Global: 4540): loss=1.5010, ppl=4.49, grad_norm=1.72, lr=6.94e-06, throughput=2420 tok/s | |
| 2025-11-30 18:51:52,848 - INFO - Epoch 1 Step 4550 (Global: 4550): loss=1.6771, ppl=5.35, grad_norm=1.23, lr=6.92e-06, throughput=2421 tok/s | |
| 2025-11-30 18:55:10,218 - INFO - Epoch 1 Step 4560 (Global: 4560): loss=1.6860, ppl=5.40, grad_norm=1.29, lr=6.91e-06, throughput=2432 tok/s | |
| 2025-11-30 18:58:29,449 - INFO - Epoch 1 Step 4570 (Global: 4570): loss=1.5308, ppl=4.62, grad_norm=1.23, lr=6.89e-06, throughput=2409 tok/s | |
| 2025-11-30 19:01:47,387 - INFO - Epoch 1 Step 4580 (Global: 4580): loss=1.5107, ppl=4.53, grad_norm=1.55, lr=6.88e-06, throughput=2425 tok/s | |
| 2025-11-30 19:05:05,097 - INFO - Epoch 1 Step 4590 (Global: 4590): loss=1.7332, ppl=5.66, grad_norm=1.66, lr=6.86e-06, throughput=2428 tok/s | |
| 2025-11-30 19:08:24,206 - INFO - Epoch 1 Step 4600 (Global: 4600): loss=1.7066, ppl=5.51, grad_norm=2.03, lr=6.85e-06, throughput=2411 tok/s | |
| 2025-11-30 19:11:43,168 - INFO - Epoch 1 Step 4610 (Global: 4610): loss=1.5658, ppl=4.79, grad_norm=2.12, lr=6.83e-06, throughput=2413 tok/s | |
| 2025-11-30 19:14:58,688 - INFO - Epoch 1 Step 4620 (Global: 4620): loss=1.9123, ppl=6.77, grad_norm=1.62, lr=6.82e-06, throughput=2455 tok/s | |
| 2025-11-30 19:18:14,802 - INFO - Epoch 1 Step 4630 (Global: 4630): loss=1.6624, ppl=5.27, grad_norm=1.41, lr=6.80e-06, throughput=2448 tok/s | |
| 2025-11-30 19:21:32,096 - INFO - Epoch 1 Step 4640 (Global: 4640): loss=1.8144, ppl=6.14, grad_norm=3.05, lr=6.78e-06, throughput=2433 tok/s | |
| 2025-11-30 19:24:46,990 - INFO - Epoch 1 Step 4650 (Global: 4650): loss=1.4991, ppl=4.48, grad_norm=1.39, lr=6.77e-06, throughput=2463 tok/s | |
| 2025-11-30 19:28:03,391 - INFO - Epoch 1 Step 4660 (Global: 4660): loss=1.7217, ppl=5.59, grad_norm=1.52, lr=6.75e-06, throughput=2444 tok/s | |
| 2025-11-30 19:31:18,862 - INFO - Epoch 1 Step 4670 (Global: 4670): loss=1.5655, ppl=4.79, grad_norm=2.12, lr=6.74e-06, throughput=2456 tok/s | |
| 2025-11-30 19:34:34,666 - INFO - Epoch 1 Step 4680 (Global: 4680): loss=1.4471, ppl=4.25, grad_norm=1.28, lr=6.72e-06, throughput=2451 tok/s | |
| 2025-11-30 19:37:49,813 - INFO - Epoch 1 Step 4690 (Global: 4690): loss=1.7331, ppl=5.66, grad_norm=1.61, lr=6.71e-06, throughput=2460 tok/s | |
| 2025-11-30 19:41:04,875 - INFO - Epoch 1 Step 4700 (Global: 4700): loss=1.6648, ppl=5.28, grad_norm=1.65, lr=6.69e-06, throughput=2461 tok/s | |
| 2025-11-30 19:44:19,485 - INFO - Epoch 1 Step 4710 (Global: 4710): loss=1.5953, ppl=4.93, grad_norm=1.27, lr=6.67e-06, throughput=2466 tok/s | |
| 2025-11-30 19:47:38,903 - INFO - Epoch 1 Step 4720 (Global: 4720): loss=1.7107, ppl=5.53, grad_norm=1.57, lr=6.66e-06, throughput=2407 tok/s | |
| 2025-11-30 19:51:00,503 - INFO - Epoch 1 Step 4730 (Global: 4730): loss=1.8327, ppl=6.25, grad_norm=1.57, lr=6.64e-06, throughput=2381 tok/s | |
| 2025-11-30 19:54:20,352 - INFO - Epoch 1 Step 4740 (Global: 4740): loss=1.8799, ppl=6.55, grad_norm=1.48, lr=6.63e-06, throughput=2402 tok/s | |
| 2025-11-30 19:57:39,123 - INFO - Epoch 1 Step 4750 (Global: 4750): loss=1.7334, ppl=5.66, grad_norm=1.20, lr=6.61e-06, throughput=2415 tok/s | |
| 2025-11-30 20:00:58,271 - INFO - Epoch 1 Step 4760 (Global: 4760): loss=1.5228, ppl=4.59, grad_norm=1.31, lr=6.60e-06, throughput=2410 tok/s | |
| 2025-11-30 20:04:17,772 - INFO - Epoch 1 Step 4770 (Global: 4770): loss=1.9155, ppl=6.79, grad_norm=1.16, lr=6.58e-06, throughput=2406 tok/s | |
| 2025-11-30 20:07:35,009 - INFO - Epoch 1 Step 4780 (Global: 4780): loss=1.6345, ppl=5.13, grad_norm=1.55, lr=6.56e-06, throughput=2434 tok/s | |
| 2025-11-30 20:10:52,837 - INFO - Epoch 1 Step 4790 (Global: 4790): loss=1.5824, ppl=4.87, grad_norm=1.63, lr=6.55e-06, throughput=2426 tok/s | |
| 2025-11-30 20:14:08,669 - INFO - Epoch 1 Step 4800 (Global: 4800): loss=1.7458, ppl=5.73, grad_norm=1.31, lr=6.53e-06, throughput=2451 tok/s | |
| 2025-11-30 20:17:25,756 - INFO - Epoch 1 Step 4810 (Global: 4810): loss=1.7278, ppl=5.63, grad_norm=2.16, lr=6.52e-06, throughput=2436 tok/s | |
| 2025-11-30 20:20:40,699 - INFO - Epoch 1 Step 4820 (Global: 4820): loss=1.6376, ppl=5.14, grad_norm=1.10, lr=6.50e-06, throughput=2462 tok/s | |
| 2025-11-30 20:23:55,675 - INFO - Epoch 1 Step 4830 (Global: 4830): loss=1.6128, ppl=5.02, grad_norm=1.76, lr=6.48e-06, throughput=2462 tok/s | |
| 2025-11-30 20:27:13,747 - INFO - Epoch 1 Step 4840 (Global: 4840): loss=1.7913, ppl=6.00, grad_norm=1.41, lr=6.47e-06, throughput=2423 tok/s | |
| 2025-11-30 20:30:29,734 - INFO - Epoch 1 Step 4850 (Global: 4850): loss=1.6913, ppl=5.43, grad_norm=1.24, lr=6.45e-06, throughput=2449 tok/s | |
| 2025-11-30 20:33:48,524 - INFO - Epoch 1 Step 4860 (Global: 4860): loss=1.7234, ppl=5.60, grad_norm=1.25, lr=6.44e-06, throughput=2415 tok/s | |
| 2025-11-30 20:37:07,469 - INFO - Epoch 1 Step 4870 (Global: 4870): loss=1.7941, ppl=6.01, grad_norm=1.49, lr=6.42e-06, throughput=2413 tok/s | |
| 2025-11-30 20:40:24,233 - INFO - Epoch 1 Step 4880 (Global: 4880): loss=1.4013, ppl=4.06, grad_norm=1.01, lr=6.40e-06, throughput=2439 tok/s | |
| 2025-11-30 20:43:43,032 - INFO - Epoch 1 Step 4890 (Global: 4890): loss=1.7143, ppl=5.55, grad_norm=1.49, lr=6.39e-06, throughput=2415 tok/s | |
| 2025-11-30 20:47:01,709 - INFO - Epoch 1 Step 4900 (Global: 4900): loss=1.6867, ppl=5.40, grad_norm=1.85, lr=6.37e-06, throughput=2416 tok/s | |
| 2025-11-30 20:50:20,216 - INFO - Epoch 1 Step 4910 (Global: 4910): loss=1.6948, ppl=5.45, grad_norm=1.40, lr=6.35e-06, throughput=2418 tok/s | |
| 2025-11-30 20:53:38,918 - INFO - Epoch 1 Step 4920 (Global: 4920): loss=1.6757, ppl=5.34, grad_norm=1.41, lr=6.34e-06, throughput=2416 tok/s | |
| 2025-11-30 20:56:56,039 - INFO - Epoch 1 Step 4930 (Global: 4930): loss=1.5461, ppl=4.69, grad_norm=1.14, lr=6.32e-06, throughput=2435 tok/s | |
| 2025-11-30 21:00:13,161 - INFO - Epoch 1 Step 4940 (Global: 4940): loss=1.7467, ppl=5.74, grad_norm=1.94, lr=6.31e-06, throughput=2435 tok/s | |
| 2025-11-30 21:03:33,179 - INFO - Epoch 1 Step 4950 (Global: 4950): loss=1.8238, ppl=6.20, grad_norm=1.70, lr=6.29e-06, throughput=2400 tok/s | |
| 2025-11-30 21:06:50,823 - INFO - Epoch 1 Step 4960 (Global: 4960): loss=1.7105, ppl=5.53, grad_norm=1.23, lr=6.27e-06, throughput=2429 tok/s | |
| 2025-11-30 21:10:09,468 - INFO - Epoch 1 Step 4970 (Global: 4970): loss=1.5022, ppl=4.49, grad_norm=2.19, lr=6.26e-06, throughput=2416 tok/s | |
| 2025-11-30 21:13:28,648 - INFO - Epoch 1 Step 4980 (Global: 4980): loss=1.6156, ppl=5.03, grad_norm=1.61, lr=6.24e-06, throughput=2410 tok/s | |
| 2025-11-30 21:16:48,673 - INFO - Epoch 1 Step 4990 (Global: 4990): loss=1.4454, ppl=4.24, grad_norm=1.59, lr=6.23e-06, throughput=2400 tok/s | |
| 2025-11-30 21:20:02,820 - INFO - Epoch 1 Step 5000 (Global: 5000): loss=1.5817, ppl=4.86, grad_norm=1.86, lr=6.21e-06, throughput=2472 tok/s | |
| 2025-11-30 21:23:17,584 - INFO - Epoch 1 Step 5010 (Global: 5010): loss=1.8172, ppl=6.15, grad_norm=1.65, lr=6.19e-06, throughput=2465 tok/s | |
| 2025-11-30 21:26:34,357 - INFO - Epoch 1 Step 5020 (Global: 5020): loss=1.6298, ppl=5.10, grad_norm=2.06, lr=6.18e-06, throughput=2439 tok/s | |
| 2025-11-30 21:29:53,371 - INFO - Epoch 1 Step 5030 (Global: 5030): loss=1.4160, ppl=4.12, grad_norm=1.88, lr=6.16e-06, throughput=2412 tok/s | |
| 2025-11-30 21:33:13,288 - INFO - Epoch 1 Step 5040 (Global: 5040): loss=1.6838, ppl=5.39, grad_norm=1.43, lr=6.14e-06, throughput=2401 tok/s | |
| 2025-11-30 21:36:31,825 - INFO - Epoch 1 Step 5050 (Global: 5050): loss=1.9101, ppl=6.75, grad_norm=1.18, lr=6.13e-06, throughput=2418 tok/s | |
| 2025-11-30 21:39:48,539 - INFO - Epoch 1 Step 5060 (Global: 5060): loss=1.4446, ppl=4.24, grad_norm=2.25, lr=6.11e-06, throughput=2440 tok/s | |
| 2025-11-30 21:43:05,968 - INFO - Epoch 1 Step 5070 (Global: 5070): loss=1.5332, ppl=4.63, grad_norm=1.36, lr=6.10e-06, throughput=2431 tok/s | |
| 2025-11-30 21:46:22,201 - INFO - Epoch 1 Step 5080 (Global: 5080): loss=1.8351, ppl=6.27, grad_norm=1.59, lr=6.08e-06, throughput=2446 tok/s | |
| 2025-11-30 21:49:39,174 - INFO - Epoch 1 Step 5090 (Global: 5090): loss=1.7404, ppl=5.70, grad_norm=1.59, lr=6.06e-06, throughput=2437 tok/s | |
| 2025-11-30 21:52:57,048 - INFO - Epoch 1 Step 5100 (Global: 5100): loss=1.3801, ppl=3.98, grad_norm=5.38, lr=6.05e-06, throughput=2426 tok/s | |
| 2025-11-30 21:56:13,956 - INFO - Epoch 1 Step 5110 (Global: 5110): loss=1.5754, ppl=4.83, grad_norm=1.73, lr=6.03e-06, throughput=2438 tok/s | |
| 2025-11-30 21:59:29,956 - INFO - Epoch 1 Step 5120 (Global: 5120): loss=1.6850, ppl=5.39, grad_norm=1.41, lr=6.01e-06, throughput=2449 tok/s | |
| 2025-11-30 22:02:47,261 - INFO - Epoch 1 Step 5130 (Global: 5130): loss=1.7147, ppl=5.55, grad_norm=1.80, lr=6.00e-06, throughput=2433 tok/s | |
| 2025-11-30 22:06:04,649 - INFO - Epoch 1 Step 5140 (Global: 5140): loss=1.6922, ppl=5.43, grad_norm=1.88, lr=5.98e-06, throughput=2432 tok/s | |
| 2025-11-30 22:09:22,685 - INFO - Epoch 1 Step 5150 (Global: 5150): loss=1.5069, ppl=4.51, grad_norm=4.97, lr=5.96e-06, throughput=2424 tok/s | |
| 2025-11-30 22:12:39,196 - INFO - Epoch 1 Step 5160 (Global: 5160): loss=1.6784, ppl=5.36, grad_norm=1.21, lr=5.95e-06, throughput=2443 tok/s | |
| 2025-11-30 22:15:57,708 - INFO - Epoch 1 Step 5170 (Global: 5170): loss=1.5542, ppl=4.73, grad_norm=1.09, lr=5.93e-06, throughput=2418 tok/s | |
| 2025-11-30 22:19:16,469 - INFO - Epoch 1 Step 5180 (Global: 5180): loss=1.5750, ppl=4.83, grad_norm=1.36, lr=5.91e-06, throughput=2415 tok/s | |
| 2025-11-30 22:22:34,916 - INFO - Epoch 1 Step 5190 (Global: 5190): loss=1.5882, ppl=4.90, grad_norm=1.28, lr=5.90e-06, throughput=2419 tok/s | |
| 2025-11-30 22:25:52,189 - INFO - Epoch 1 Step 5200 (Global: 5200): loss=1.6710, ppl=5.32, grad_norm=1.34, lr=5.88e-06, throughput=2433 tok/s | |
| 2025-11-30 22:29:11,120 - INFO - Epoch 1 Step 5210 (Global: 5210): loss=1.9247, ppl=6.85, grad_norm=1.61, lr=5.87e-06, throughput=2413 tok/s | |
| 2025-11-30 22:32:28,257 - INFO - Epoch 1 Step 5220 (Global: 5220): loss=1.7634, ppl=5.83, grad_norm=1.91, lr=5.85e-06, throughput=2435 tok/s | |
| 2025-11-30 22:35:45,810 - INFO - Epoch 1 Step 5230 (Global: 5230): loss=1.5574, ppl=4.75, grad_norm=1.45, lr=5.83e-06, throughput=2430 tok/s | |
| 2025-11-30 22:39:04,557 - INFO - Epoch 1 Step 5240 (Global: 5240): loss=1.5943, ppl=4.92, grad_norm=1.87, lr=5.82e-06, throughput=2415 tok/s | |
| 2025-11-30 22:42:20,740 - INFO - Epoch 1 Step 5250 (Global: 5250): loss=1.7339, ppl=5.66, grad_norm=1.52, lr=5.80e-06, throughput=2447 tok/s | |
| 2025-11-30 22:45:38,913 - INFO - Epoch 1 Step 5260 (Global: 5260): loss=1.7242, ppl=5.61, grad_norm=1.60, lr=5.78e-06, throughput=2422 tok/s | |
| 2025-11-30 22:48:58,278 - INFO - Epoch 1 Step 5270 (Global: 5270): loss=1.8037, ppl=6.07, grad_norm=1.32, lr=5.77e-06, throughput=2408 tok/s | |
| 2025-11-30 22:52:17,056 - INFO - Epoch 1 Step 5280 (Global: 5280): loss=1.8403, ppl=6.30, grad_norm=1.21, lr=5.75e-06, throughput=2415 tok/s | |
| 2025-11-30 22:55:37,005 - INFO - Epoch 1 Step 5290 (Global: 5290): loss=1.6767, ppl=5.35, grad_norm=1.97, lr=5.73e-06, throughput=2401 tok/s | |
| 2025-11-30 22:58:54,909 - INFO - Epoch 1 Step 5300 (Global: 5300): loss=1.4934, ppl=4.45, grad_norm=1.27, lr=5.72e-06, throughput=2425 tok/s | |
| 2025-11-30 23:02:13,399 - INFO - Epoch 1 Step 5310 (Global: 5310): loss=1.4806, ppl=4.40, grad_norm=1.14, lr=5.70e-06, throughput=2418 tok/s | |
| 2025-11-30 23:05:31,516 - INFO - Epoch 1 Step 5320 (Global: 5320): loss=1.7607, ppl=5.82, grad_norm=1.77, lr=5.68e-06, throughput=2423 tok/s | |
| 2025-11-30 23:08:49,054 - INFO - Epoch 1 Step 5330 (Global: 5330): loss=1.6282, ppl=5.09, grad_norm=1.22, lr=5.67e-06, throughput=2430 tok/s | |
| 2025-11-30 23:12:08,252 - INFO - Epoch 1 Step 5340 (Global: 5340): loss=1.5505, ppl=4.71, grad_norm=1.14, lr=5.65e-06, throughput=2410 tok/s | |
| 2025-11-30 23:15:26,765 - INFO - Epoch 1 Step 5350 (Global: 5350): loss=1.5006, ppl=4.48, grad_norm=1.38, lr=5.63e-06, throughput=2418 tok/s | |
| 2025-11-30 23:18:46,710 - INFO - Epoch 1 Step 5360 (Global: 5360): loss=1.7141, ppl=5.55, grad_norm=1.52, lr=5.62e-06, throughput=2401 tok/s | |
| 2025-11-30 23:22:06,098 - INFO - Epoch 1 Step 5370 (Global: 5370): loss=1.8627, ppl=6.44, grad_norm=1.20, lr=5.60e-06, throughput=2407 tok/s | |
| 2025-11-30 23:25:23,291 - INFO - Epoch 1 Step 5380 (Global: 5380): loss=1.6585, ppl=5.25, grad_norm=1.84, lr=5.58e-06, throughput=2434 tok/s | |
| 2025-11-30 23:28:42,571 - INFO - Epoch 1 Step 5390 (Global: 5390): loss=1.8375, ppl=6.28, grad_norm=3.52, lr=5.57e-06, throughput=2409 tok/s | |
| 2025-11-30 23:32:00,874 - INFO - Epoch 1 Step 5400 (Global: 5400): loss=1.4922, ppl=4.45, grad_norm=1.52, lr=5.55e-06, throughput=2421 tok/s | |
| 2025-11-30 23:35:19,208 - INFO - Epoch 1 Step 5410 (Global: 5410): loss=1.6122, ppl=5.01, grad_norm=1.41, lr=5.53e-06, throughput=2420 tok/s | |
| 2025-11-30 23:38:37,367 - INFO - Epoch 1 Step 5420 (Global: 5420): loss=1.5649, ppl=4.78, grad_norm=2.05, lr=5.52e-06, throughput=2422 tok/s | |
| 2025-11-30 23:41:55,181 - INFO - Epoch 1 Step 5430 (Global: 5430): loss=1.6781, ppl=5.36, grad_norm=2.00, lr=5.50e-06, throughput=2427 tok/s | |
| 2025-11-30 23:45:14,486 - INFO - Epoch 1 Step 5440 (Global: 5440): loss=1.9711, ppl=7.18, grad_norm=1.54, lr=5.48e-06, throughput=2408 tok/s | |
| 2025-11-30 23:48:33,077 - INFO - Epoch 1 Step 5450 (Global: 5450): loss=1.7762, ppl=5.91, grad_norm=1.45, lr=5.47e-06, throughput=2417 tok/s | |
| 2025-11-30 23:51:49,286 - INFO - Epoch 1 Step 5460 (Global: 5460): loss=1.7029, ppl=5.49, grad_norm=1.43, lr=5.45e-06, throughput=2446 tok/s | |
| 2025-11-30 23:55:07,967 - INFO - Epoch 1 Step 5470 (Global: 5470): loss=1.5467, ppl=4.70, grad_norm=1.89, lr=5.43e-06, throughput=2416 tok/s | |
| 2025-11-30 23:58:26,116 - INFO - Epoch 1 Step 5480 (Global: 5480): loss=1.6463, ppl=5.19, grad_norm=1.46, lr=5.42e-06, throughput=2422 tok/s | |
| 2025-12-01 00:01:44,646 - INFO - Epoch 1 Step 5490 (Global: 5490): loss=1.5322, ppl=4.63, grad_norm=1.81, lr=5.40e-06, throughput=2418 tok/s | |
| 2025-12-01 00:05:02,886 - INFO - Epoch 1 Step 5500 (Global: 5500): loss=1.7104, ppl=5.53, grad_norm=1.87, lr=5.38e-06, throughput=2421 tok/s | |
| 2025-12-01 00:08:21,817 - INFO - Epoch 1 Step 5510 (Global: 5510): loss=1.5022, ppl=4.49, grad_norm=1.77, lr=5.37e-06, throughput=2413 tok/s | |
| 2025-12-01 00:11:39,776 - INFO - Epoch 1 Step 5520 (Global: 5520): loss=1.6634, ppl=5.28, grad_norm=1.26, lr=5.35e-06, throughput=2425 tok/s | |
| 2025-12-01 00:14:58,414 - INFO - Epoch 1 Step 5530 (Global: 5530): loss=1.7628, ppl=5.83, grad_norm=1.19, lr=5.33e-06, throughput=2416 tok/s | |
| 2025-12-01 00:18:17,044 - INFO - Epoch 1 Step 5540 (Global: 5540): loss=1.5865, ppl=4.89, grad_norm=1.78, lr=5.32e-06, throughput=2417 tok/s | |
| 2025-12-01 00:21:35,632 - INFO - Epoch 1 Step 5550 (Global: 5550): loss=1.6297, ppl=5.10, grad_norm=2.47, lr=5.30e-06, throughput=2417 tok/s | |
| 2025-12-01 00:24:54,243 - INFO - Epoch 1 Step 5560 (Global: 5560): loss=1.6084, ppl=4.99, grad_norm=1.98, lr=5.28e-06, throughput=2417 tok/s | |
| 2025-12-01 00:28:12,166 - INFO - Epoch 1 Step 5570 (Global: 5570): loss=1.7628, ppl=5.83, grad_norm=1.18, lr=5.27e-06, throughput=2425 tok/s | |
| 2025-12-01 00:31:31,072 - INFO - Epoch 1 Step 5580 (Global: 5580): loss=1.6630, ppl=5.28, grad_norm=1.86, lr=5.25e-06, throughput=2413 tok/s | |
| 2025-12-01 00:34:49,896 - INFO - Epoch 1 Step 5590 (Global: 5590): loss=1.9998, ppl=7.39, grad_norm=1.84, lr=5.23e-06, throughput=2414 tok/s | |
| 2025-12-01 00:38:08,227 - INFO - Epoch 1 Step 5600 (Global: 5600): loss=1.7131, ppl=5.55, grad_norm=1.45, lr=5.22e-06, throughput=2420 tok/s | |
| 2025-12-01 00:41:27,097 - INFO - Epoch 1 Step 5610 (Global: 5610): loss=1.5279, ppl=4.61, grad_norm=1.75, lr=5.20e-06, throughput=2414 tok/s | |
| 2025-12-01 00:44:44,230 - INFO - Epoch 1 Step 5620 (Global: 5620): loss=1.7714, ppl=5.88, grad_norm=1.46, lr=5.18e-06, throughput=2435 tok/s | |
| 2025-12-01 00:48:00,222 - INFO - Epoch 1 Step 5630 (Global: 5630): loss=1.3825, ppl=3.98, grad_norm=4.88, lr=5.17e-06, throughput=2449 tok/s | |
| 2025-12-01 00:51:17,609 - INFO - Epoch 1 Step 5640 (Global: 5640): loss=1.4765, ppl=4.38, grad_norm=1.67, lr=5.15e-06, throughput=2432 tok/s | |
| 2025-12-01 00:54:34,314 - INFO - Epoch 1 Step 5650 (Global: 5650): loss=1.7451, ppl=5.73, grad_norm=1.09, lr=5.13e-06, throughput=2440 tok/s | |
| 2025-12-01 00:57:51,741 - INFO - Epoch 1 Step 5660 (Global: 5660): loss=1.6486, ppl=5.20, grad_norm=2.77, lr=5.12e-06, throughput=2431 tok/s | |
| 2025-12-01 01:01:07,974 - INFO - Epoch 1 Step 5670 (Global: 5670): loss=1.9114, ppl=6.76, grad_norm=1.51, lr=5.10e-06, throughput=2446 tok/s | |
| 2025-12-01 01:04:24,915 - INFO - Epoch 1 Step 5680 (Global: 5680): loss=1.7619, ppl=5.82, grad_norm=1.23, lr=5.08e-06, throughput=2437 tok/s | |
| 2025-12-01 01:07:41,261 - INFO - Epoch 1 Step 5690 (Global: 5690): loss=1.5228, ppl=4.59, grad_norm=2.03, lr=5.07e-06, throughput=2445 tok/s | |
| 2025-12-01 01:10:58,658 - INFO - Epoch 1 Step 5700 (Global: 5700): loss=1.5246, ppl=4.59, grad_norm=1.22, lr=5.05e-06, throughput=2432 tok/s | |
| 2025-12-01 01:14:17,371 - INFO - Epoch 1 Step 5710 (Global: 5710): loss=1.6234, ppl=5.07, grad_norm=1.33, lr=5.03e-06, throughput=2416 tok/s | |
| 2025-12-01 01:17:35,349 - INFO - Epoch 1 Step 5720 (Global: 5720): loss=1.5925, ppl=4.92, grad_norm=2.00, lr=5.02e-06, throughput=2425 tok/s | |
| 2025-12-01 01:20:53,278 - INFO - Epoch 1 Step 5730 (Global: 5730): loss=1.6906, ppl=5.42, grad_norm=1.11, lr=5.00e-06, throughput=2425 tok/s | |
| 2025-12-01 01:24:12,567 - INFO - Epoch 1 Step 5740 (Global: 5740): loss=1.6378, ppl=5.14, grad_norm=1.51, lr=4.98e-06, throughput=2409 tok/s | |
| 2025-12-01 01:27:31,076 - INFO - Epoch 1 Step 5750 (Global: 5750): loss=1.4816, ppl=4.40, grad_norm=1.66, lr=4.96e-06, throughput=2418 tok/s | |
| 2025-12-01 01:30:48,786 - INFO - Epoch 1 Step 5760 (Global: 5760): loss=1.6961, ppl=5.45, grad_norm=1.61, lr=4.95e-06, throughput=2428 tok/s | |
| 2025-12-01 01:34:07,185 - INFO - Epoch 1 Step 5770 (Global: 5770): loss=1.5618, ppl=4.77, grad_norm=1.54, lr=4.93e-06, throughput=2419 tok/s | |
| 2025-12-01 01:37:24,944 - INFO - Epoch 1 Step 5780 (Global: 5780): loss=1.5769, ppl=4.84, grad_norm=2.09, lr=4.91e-06, throughput=2427 tok/s | |
| 2025-12-01 01:40:43,729 - INFO - Epoch 1 Step 5790 (Global: 5790): loss=1.4895, ppl=4.43, grad_norm=1.65, lr=4.90e-06, throughput=2415 tok/s | |
| 2025-12-01 01:44:02,770 - INFO - Epoch 1 Step 5800 (Global: 5800): loss=1.6498, ppl=5.21, grad_norm=1.89, lr=4.88e-06, throughput=2412 tok/s | |
| 2025-12-01 01:47:21,160 - INFO - Epoch 1 Step 5810 (Global: 5810): loss=1.8091, ppl=6.10, grad_norm=1.80, lr=4.86e-06, throughput=2419 tok/s | |
| 2025-12-01 01:50:38,546 - INFO - Epoch 1 Step 5820 (Global: 5820): loss=1.4036, ppl=4.07, grad_norm=1.19, lr=4.85e-06, throughput=2432 tok/s | |
| 2025-12-01 01:53:56,105 - INFO - Epoch 1 Step 5830 (Global: 5830): loss=1.6654, ppl=5.29, grad_norm=1.60, lr=4.83e-06, throughput=2430 tok/s | |
| 2025-12-01 01:57:14,735 - INFO - Epoch 1 Step 5840 (Global: 5840): loss=1.6885, ppl=5.41, grad_norm=1.41, lr=4.81e-06, throughput=2417 tok/s | |
| 2025-12-01 02:00:32,503 - INFO - Epoch 1 Step 5850 (Global: 5850): loss=1.7107, ppl=5.53, grad_norm=1.34, lr=4.80e-06, throughput=2427 tok/s | |
| 2025-12-01 02:03:50,760 - INFO - Epoch 1 Step 5860 (Global: 5860): loss=1.6921, ppl=5.43, grad_norm=1.21, lr=4.78e-06, throughput=2421 tok/s | |
| 2025-12-01 02:07:09,986 - INFO - Epoch 1 Step 5870 (Global: 5870): loss=1.6316, ppl=5.11, grad_norm=1.48, lr=4.76e-06, throughput=2409 tok/s | |
| 2025-12-01 02:10:29,091 - INFO - Epoch 1 Step 5880 (Global: 5880): loss=1.4592, ppl=4.30, grad_norm=1.18, lr=4.75e-06, throughput=2411 tok/s | |
| 2025-12-01 02:13:46,128 - INFO - Epoch 1 Step 5890 (Global: 5890): loss=1.8452, ppl=6.33, grad_norm=1.56, lr=4.73e-06, throughput=2436 tok/s | |
| 2025-12-01 02:17:04,109 - INFO - Epoch 1 Step 5900 (Global: 5900): loss=1.7572, ppl=5.80, grad_norm=1.48, lr=4.71e-06, throughput=2425 tok/s | |
| 2025-12-01 02:20:21,424 - INFO - Epoch 1 Step 5910 (Global: 5910): loss=1.9067, ppl=6.73, grad_norm=1.43, lr=4.70e-06, throughput=2433 tok/s | |
| 2025-12-01 02:23:41,593 - INFO - Epoch 1 Step 5920 (Global: 5920): loss=1.6364, ppl=5.14, grad_norm=1.35, lr=4.68e-06, throughput=2398 tok/s | |
| 2025-12-01 02:26:59,166 - INFO - Epoch 1 Step 5930 (Global: 5930): loss=1.5303, ppl=4.62, grad_norm=1.49, lr=4.66e-06, throughput=2430 tok/s | |
| 2025-12-01 02:30:16,451 - INFO - Epoch 1 Step 5940 (Global: 5940): loss=1.5827, ppl=4.87, grad_norm=1.73, lr=4.65e-06, throughput=2433 tok/s | |
| 2025-12-01 02:33:34,697 - INFO - Epoch 1 Step 5950 (Global: 5950): loss=1.5060, ppl=4.51, grad_norm=1.33, lr=4.63e-06, throughput=2421 tok/s | |
| 2025-12-01 02:36:53,592 - INFO - Epoch 1 Step 5960 (Global: 5960): loss=1.4873, ppl=4.43, grad_norm=1.16, lr=4.61e-06, throughput=2413 tok/s | |
| 2025-12-01 02:40:11,218 - INFO - Epoch 1 Step 5970 (Global: 5970): loss=1.6470, ppl=5.19, grad_norm=2.56, lr=4.60e-06, throughput=2429 tok/s | |
| 2025-12-01 02:43:29,397 - INFO - Epoch 1 Step 5980 (Global: 5980): loss=1.5643, ppl=4.78, grad_norm=1.89, lr=4.58e-06, throughput=2422 tok/s | |
| 2025-12-01 02:46:48,039 - INFO - Epoch 1 Step 5990 (Global: 5990): loss=1.5419, ppl=4.67, grad_norm=1.76, lr=4.56e-06, throughput=2416 tok/s | |
| 2025-12-01 02:50:04,879 - INFO - Epoch 1 Step 6000 (Global: 6000): loss=1.6185, ppl=5.05, grad_norm=1.60, lr=4.55e-06, throughput=2439 tok/s | |
| 2025-12-01 02:50:04,880 - INFO - | |
| Running validation at step 6000... | |
| 2025-12-01 03:01:32,038 - INFO - Validation loss: 1.6570, perplexity: 5.24 | |
| 2025-12-01 03:01:32,038 - INFO - | |
| ====================================================================== | |
| 2025-12-01 03:01:32,038 - INFO - Qualitative Evaluation Samples: | |
| 2025-12-01 03:01:32,039 - INFO - ====================================================================== | |
| 2025-12-01 03:01:32,039 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-12-01 03:01:32,039 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-12-01 03:01:32,039 - INFO - Generated: ' to the band\'s previous work, saying that "it\'s not a bad album, but it\'s not a great one either. It\'s not a bad record, but it\'s not a great one either." He also said that "the band is still a great ...' | |
| 2025-12-01 03:01:32,040 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' | |
| 2025-12-01 03:01:32,040 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 03:01:32,040 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-12-01 03:01:32,040 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-12-01 03:01:32,040 - INFO - Generated: 'aternity and sorority life, but it has a long history of fraternity and sorority life among Native American students. The first fraternity and sorority on campus was the Alpha Kappa Alpha sorority, es...' | |
| 2025-12-01 03:01:32,041 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' | |
| 2025-12-01 03:01:32,041 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 03:01:32,041 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-12-01 03:01:32,041 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-12-01 03:01:32,042 - INFO - Generated: " be defeated by Oga's shadow. Oga is then revealed to be the Shingetsu's master, and is killed by the Shingetsu's master's shadow. Oga's shadow is then revealed to be the Shingetsu's master's shadow, ..." | |
| 2025-12-01 03:01:32,042 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." | |
| 2025-12-01 03:01:32,042 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 03:01:32,043 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-12-01 03:01:32,043 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-12-01 03:01:32,043 - INFO - Generated: ' | 0 | 0 | 0 | 0 |\n| 1.1 | U+0B01..U+0B3...' | |
| 2025-12-01 03:01:32,044 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' | |
| 2025-12-01 03:01:32,044 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 03:01:32,044 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-12-01 03:01:32,044 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-12-01 03:01:32,045 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 | August 30, 2011 | PlayStation 4 | EA Tiburon ...' | |
| 2025-12-01 03:01:32,045 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' | |
| 2025-12-01 03:01:32,045 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 03:01:32,046 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/qualitative_step_6000.jsonl | |
| 2025-12-01 03:03:02,031 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/best_checkpoint.pt | |
| 2025-12-01 03:03:02,045 - INFO - New best validation loss: 1.6570, perplexity: 5.24 | |
| 2025-12-01 03:06:18,098 - INFO - Epoch 1 Step 6010 (Global: 6010): loss=1.5488, ppl=4.71, grad_norm=1.73, lr=4.53e-06, throughput=2449 tok/s | |
| 2025-12-01 03:09:35,373 - INFO - Epoch 1 Step 6020 (Global: 6020): loss=1.5946, ppl=4.93, grad_norm=1.38, lr=4.51e-06, throughput=2433 tok/s | |
| 2025-12-01 03:12:51,336 - INFO - Epoch 1 Step 6030 (Global: 6030): loss=1.8325, ppl=6.25, grad_norm=1.61, lr=4.50e-06, throughput=2449 tok/s | |
| 2025-12-01 03:16:09,392 - INFO - Epoch 1 Step 6040 (Global: 6040): loss=1.6135, ppl=5.02, grad_norm=1.12, lr=4.48e-06, throughput=2424 tok/s | |
| 2025-12-01 03:19:25,981 - INFO - Epoch 1 Step 6050 (Global: 6050): loss=1.6369, ppl=5.14, grad_norm=1.86, lr=4.46e-06, throughput=2442 tok/s | |
| 2025-12-01 03:22:43,938 - INFO - Epoch 1 Step 6060 (Global: 6060): loss=1.7540, ppl=5.78, grad_norm=1.97, lr=4.45e-06, throughput=2425 tok/s | |
| 2025-12-01 03:26:00,981 - INFO - Epoch 1 Step 6070 (Global: 6070): loss=1.9966, ppl=7.36, grad_norm=1.69, lr=4.43e-06, throughput=2436 tok/s | |
| 2025-12-01 03:29:19,503 - INFO - Epoch 1 Step 6080 (Global: 6080): loss=1.5633, ppl=4.77, grad_norm=1.71, lr=4.41e-06, throughput=2418 tok/s | |
| 2025-12-01 03:32:40,086 - INFO - Epoch 1 Step 6090 (Global: 6090): loss=1.6830, ppl=5.38, grad_norm=2.39, lr=4.40e-06, throughput=2393 tok/s | |
| 2025-12-01 03:35:57,697 - INFO - Epoch 1 Step 6100 (Global: 6100): loss=1.4622, ppl=4.32, grad_norm=1.35, lr=4.38e-06, throughput=2429 tok/s | |
| 2025-12-01 03:39:15,491 - INFO - Epoch 1 Step 6110 (Global: 6110): loss=1.3851, ppl=4.00, grad_norm=1.07, lr=4.36e-06, throughput=2427 tok/s | |
| 2025-12-01 03:42:32,673 - INFO - Epoch 1 Step 6120 (Global: 6120): loss=1.6662, ppl=5.29, grad_norm=1.30, lr=4.35e-06, throughput=2434 tok/s | |
| 2025-12-01 03:45:50,600 - INFO - Epoch 1 Step 6130 (Global: 6130): loss=1.6198, ppl=5.05, grad_norm=1.25, lr=4.33e-06, throughput=2425 tok/s | |
| 2025-12-01 03:49:09,215 - INFO - Epoch 1 Step 6140 (Global: 6140): loss=1.6180, ppl=5.04, grad_norm=1.45, lr=4.31e-06, throughput=2417 tok/s | |
| 2025-12-01 03:52:26,624 - INFO - Epoch 1 Step 6150 (Global: 6150): loss=1.8394, ppl=6.29, grad_norm=1.26, lr=4.30e-06, throughput=2432 tok/s | |
| 2025-12-01 03:55:44,565 - INFO - Epoch 1 Step 6160 (Global: 6160): loss=1.6231, ppl=5.07, grad_norm=2.05, lr=4.28e-06, throughput=2425 tok/s | |
| 2025-12-01 03:59:01,934 - INFO - Epoch 1 Step 6170 (Global: 6170): loss=1.8429, ppl=6.31, grad_norm=5.25, lr=4.26e-06, throughput=2432 tok/s | |
| 2025-12-01 04:02:18,954 - INFO - Epoch 1 Step 6180 (Global: 6180): loss=1.6689, ppl=5.31, grad_norm=1.96, lr=4.25e-06, throughput=2436 tok/s | |
| 2025-12-01 04:05:38,412 - INFO - Epoch 1 Step 6190 (Global: 6190): loss=1.5899, ppl=4.90, grad_norm=1.70, lr=4.23e-06, throughput=2407 tok/s | |
| 2025-12-01 04:09:01,365 - INFO - Epoch 1 Step 6200 (Global: 6200): loss=1.5817, ppl=4.86, grad_norm=1.34, lr=4.21e-06, throughput=2365 tok/s | |
| 2025-12-01 04:12:21,018 - INFO - Epoch 1 Step 6210 (Global: 6210): loss=1.5301, ppl=4.62, grad_norm=2.00, lr=4.20e-06, throughput=2404 tok/s | |
| 2025-12-01 04:15:39,996 - INFO - Epoch 1 Step 6220 (Global: 6220): loss=1.8052, ppl=6.08, grad_norm=1.79, lr=4.18e-06, throughput=2412 tok/s | |
| 2025-12-01 04:18:58,186 - INFO - Epoch 1 Step 6230 (Global: 6230): loss=1.5514, ppl=4.72, grad_norm=1.45, lr=4.16e-06, throughput=2422 tok/s | |
| 2025-12-01 04:22:16,729 - INFO - Epoch 1 Step 6240 (Global: 6240): loss=1.6003, ppl=4.95, grad_norm=1.30, lr=4.15e-06, throughput=2418 tok/s | |
| 2025-12-01 04:25:34,316 - INFO - Epoch 1 Step 6250 (Global: 6250): loss=1.4018, ppl=4.06, grad_norm=1.44, lr=4.13e-06, throughput=2429 tok/s | |
| 2025-12-01 04:28:54,361 - INFO - Epoch 1 Step 6260 (Global: 6260): loss=1.7588, ppl=5.81, grad_norm=1.69, lr=4.12e-06, throughput=2399 tok/s | |
| 2025-12-01 04:32:12,570 - INFO - Epoch 1 Step 6270 (Global: 6270): loss=1.6220, ppl=5.06, grad_norm=1.94, lr=4.10e-06, throughput=2422 tok/s | |
| 2025-12-01 04:35:32,318 - INFO - Epoch 1 Step 6280 (Global: 6280): loss=1.4585, ppl=4.30, grad_norm=1.03, lr=4.08e-06, throughput=2403 tok/s | |
| 2025-12-01 04:38:51,381 - INFO - Epoch 1 Step 6290 (Global: 6290): loss=1.5427, ppl=4.68, grad_norm=1.76, lr=4.07e-06, throughput=2411 tok/s | |
| 2025-12-01 04:42:09,039 - INFO - Epoch 1 Step 6300 (Global: 6300): loss=1.6273, ppl=5.09, grad_norm=1.42, lr=4.05e-06, throughput=2428 tok/s | |
| 2025-12-01 04:45:29,030 - INFO - Epoch 1 Step 6310 (Global: 6310): loss=1.4807, ppl=4.40, grad_norm=1.96, lr=4.03e-06, throughput=2400 tok/s | |
| 2025-12-01 04:48:47,153 - INFO - Epoch 1 Step 6320 (Global: 6320): loss=1.8204, ppl=6.17, grad_norm=2.36, lr=4.02e-06, throughput=2423 tok/s | |
| 2025-12-01 04:52:04,285 - INFO - Epoch 1 Step 6330 (Global: 6330): loss=1.5177, ppl=4.56, grad_norm=1.62, lr=4.00e-06, throughput=2435 tok/s | |
| 2025-12-01 04:55:21,256 - INFO - Epoch 1 Step 6340 (Global: 6340): loss=1.6562, ppl=5.24, grad_norm=1.58, lr=3.98e-06, throughput=2437 tok/s | |
| 2025-12-01 04:58:36,930 - INFO - Epoch 1 Step 6350 (Global: 6350): loss=1.5330, ppl=4.63, grad_norm=1.36, lr=3.97e-06, throughput=2453 tok/s | |
| 2025-12-01 05:01:53,434 - INFO - Epoch 1 Step 6360 (Global: 6360): loss=1.5252, ppl=4.60, grad_norm=1.35, lr=3.95e-06, throughput=2443 tok/s | |
| 2025-12-01 05:05:15,084 - INFO - Epoch 1 Step 6370 (Global: 6370): loss=1.5306, ppl=4.62, grad_norm=1.12, lr=3.93e-06, throughput=2380 tok/s | |
| 2025-12-01 05:08:36,694 - INFO - Epoch 1 Step 6380 (Global: 6380): loss=1.7318, ppl=5.65, grad_norm=1.76, lr=3.92e-06, throughput=2381 tok/s | |
| 2025-12-01 05:11:56,406 - INFO - Epoch 1 Step 6390 (Global: 6390): loss=1.6977, ppl=5.46, grad_norm=1.84, lr=3.90e-06, throughput=2403 tok/s | |
| 2025-12-01 05:15:14,735 - INFO - Epoch 1 Step 6400 (Global: 6400): loss=1.6708, ppl=5.32, grad_norm=1.20, lr=3.89e-06, throughput=2420 tok/s | |
| 2025-12-01 05:18:34,236 - INFO - Epoch 1 Step 6410 (Global: 6410): loss=1.6960, ppl=5.45, grad_norm=1.31, lr=3.87e-06, throughput=2406 tok/s | |
| 2025-12-01 05:21:52,053 - INFO - Epoch 1 Step 6420 (Global: 6420): loss=1.6209, ppl=5.06, grad_norm=1.35, lr=3.85e-06, throughput=2427 tok/s | |
| 2025-12-01 05:25:09,795 - INFO - Epoch 1 Step 6430 (Global: 6430): loss=1.8088, ppl=6.10, grad_norm=2.33, lr=3.84e-06, throughput=2427 tok/s | |
| 2025-12-01 05:28:28,948 - INFO - Epoch 1 Step 6440 (Global: 6440): loss=1.6045, ppl=4.98, grad_norm=1.46, lr=3.82e-06, throughput=2410 tok/s | |
| 2025-12-01 05:31:46,807 - INFO - Epoch 1 Step 6450 (Global: 6450): loss=1.4744, ppl=4.37, grad_norm=1.57, lr=3.80e-06, throughput=2426 tok/s | |
| 2025-12-01 05:35:05,204 - INFO - Epoch 1 Step 6460 (Global: 6460): loss=1.8764, ppl=6.53, grad_norm=3.69, lr=3.79e-06, throughput=2419 tok/s | |
| 2025-12-01 05:38:23,327 - INFO - Epoch 1 Step 6470 (Global: 6470): loss=1.5870, ppl=4.89, grad_norm=1.30, lr=3.77e-06, throughput=2423 tok/s | |
| 2025-12-01 05:41:40,909 - INFO - Epoch 1 Step 6480 (Global: 6480): loss=1.7459, ppl=5.73, grad_norm=1.27, lr=3.76e-06, throughput=2429 tok/s | |
| 2025-12-01 05:44:58,387 - INFO - Epoch 1 Step 6490 (Global: 6490): loss=1.4488, ppl=4.26, grad_norm=1.68, lr=3.74e-06, throughput=2431 tok/s | |
| 2025-12-01 05:48:16,874 - INFO - Epoch 1 Step 6500 (Global: 6500): loss=1.9047, ppl=6.72, grad_norm=3.11, lr=3.72e-06, throughput=2418 tok/s | |
| 2025-12-01 05:51:34,184 - INFO - Epoch 1 Step 6510 (Global: 6510): loss=1.4838, ppl=4.41, grad_norm=1.70, lr=3.71e-06, throughput=2433 tok/s | |
| 2025-12-01 05:54:53,394 - INFO - Epoch 1 Step 6520 (Global: 6520): loss=1.7208, ppl=5.59, grad_norm=1.58, lr=3.69e-06, throughput=2410 tok/s | |
| 2025-12-01 05:58:11,796 - INFO - Epoch 1 Step 6530 (Global: 6530): loss=1.7098, ppl=5.53, grad_norm=13.25, lr=3.67e-06, throughput=2419 tok/s | |
| 2025-12-01 06:01:28,879 - INFO - Epoch 1 Step 6540 (Global: 6540): loss=1.6913, ppl=5.43, grad_norm=1.66, lr=3.66e-06, throughput=2436 tok/s | |
| 2025-12-01 06:04:46,195 - INFO - Epoch 1 Step 6550 (Global: 6550): loss=1.6568, ppl=5.24, grad_norm=1.21, lr=3.64e-06, throughput=2433 tok/s | |
| 2025-12-01 06:08:03,425 - INFO - Epoch 1 Step 6560 (Global: 6560): loss=1.7655, ppl=5.84, grad_norm=1.34, lr=3.63e-06, throughput=2434 tok/s | |
| 2025-12-01 06:11:22,333 - INFO - Epoch 1 Step 6570 (Global: 6570): loss=1.3736, ppl=3.95, grad_norm=1.26, lr=3.61e-06, throughput=2413 tok/s | |
| 2025-12-01 06:14:39,016 - INFO - Epoch 1 Step 6580 (Global: 6580): loss=1.6107, ppl=5.01, grad_norm=1.43, lr=3.59e-06, throughput=2440 tok/s | |
| 2025-12-01 06:17:56,099 - INFO - Epoch 1 Step 6590 (Global: 6590): loss=1.5197, ppl=4.57, grad_norm=2.17, lr=3.58e-06, throughput=2436 tok/s | |
| 2025-12-01 06:21:12,502 - INFO - Epoch 1 Step 6600 (Global: 6600): loss=1.5674, ppl=4.79, grad_norm=2.05, lr=3.56e-06, throughput=2444 tok/s | |
| 2025-12-01 06:24:28,868 - INFO - Epoch 1 Step 6610 (Global: 6610): loss=1.6638, ppl=5.28, grad_norm=1.48, lr=3.55e-06, throughput=2444 tok/s | |
| 2025-12-01 06:27:45,267 - INFO - Epoch 1 Step 6620 (Global: 6620): loss=1.5763, ppl=4.84, grad_norm=1.91, lr=3.53e-06, throughput=2444 tok/s | |
| 2025-12-01 06:31:02,506 - INFO - Epoch 1 Step 6630 (Global: 6630): loss=1.5848, ppl=4.88, grad_norm=1.57, lr=3.51e-06, throughput=2434 tok/s | |
| 2025-12-01 06:34:18,511 - INFO - Epoch 1 Step 6640 (Global: 6640): loss=2.0233, ppl=7.56, grad_norm=2.66, lr=3.50e-06, throughput=2449 tok/s | |
| 2025-12-01 06:37:34,386 - INFO - Epoch 1 Step 6650 (Global: 6650): loss=1.4029, ppl=4.07, grad_norm=1.38, lr=3.48e-06, throughput=2451 tok/s | |
| 2025-12-01 06:40:51,733 - INFO - Epoch 1 Step 6660 (Global: 6660): loss=1.4329, ppl=4.19, grad_norm=1.68, lr=3.47e-06, throughput=2432 tok/s | |
| 2025-12-01 06:44:08,124 - INFO - Epoch 1 Step 6670 (Global: 6670): loss=1.7745, ppl=5.90, grad_norm=1.72, lr=3.45e-06, throughput=2444 tok/s | |
| 2025-12-01 06:47:25,179 - INFO - Epoch 1 Step 6680 (Global: 6680): loss=1.7237, ppl=5.61, grad_norm=1.51, lr=3.43e-06, throughput=2436 tok/s | |
| 2025-12-01 06:50:41,199 - INFO - Epoch 1 Step 6690 (Global: 6690): loss=1.6598, ppl=5.26, grad_norm=1.75, lr=3.42e-06, throughput=2449 tok/s | |
| 2025-12-01 06:53:57,375 - INFO - Epoch 1 Step 6700 (Global: 6700): loss=1.5999, ppl=4.95, grad_norm=1.09, lr=3.40e-06, throughput=2447 tok/s | |
| 2025-12-01 06:57:13,946 - INFO - Epoch 1 Step 6710 (Global: 6710): loss=1.6203, ppl=5.05, grad_norm=1.62, lr=3.39e-06, throughput=2442 tok/s | |
| 2025-12-01 07:00:30,736 - INFO - Epoch 1 Step 6720 (Global: 6720): loss=1.7222, ppl=5.60, grad_norm=1.65, lr=3.37e-06, throughput=2439 tok/s | |
| 2025-12-01 07:03:46,475 - INFO - Epoch 1 Step 6730 (Global: 6730): loss=1.5679, ppl=4.80, grad_norm=1.59, lr=3.35e-06, throughput=2452 tok/s | |
| 2025-12-01 07:07:02,421 - INFO - Epoch 1 Step 6740 (Global: 6740): loss=1.6420, ppl=5.17, grad_norm=1.45, lr=3.34e-06, throughput=2450 tok/s | |
| 2025-12-01 07:10:17,915 - INFO - Epoch 1 Step 6750 (Global: 6750): loss=1.7245, ppl=5.61, grad_norm=1.79, lr=3.32e-06, throughput=2455 tok/s | |
| 2025-12-01 07:13:32,991 - INFO - Epoch 1 Step 6760 (Global: 6760): loss=1.4684, ppl=4.34, grad_norm=1.13, lr=3.31e-06, throughput=2461 tok/s | |
| 2025-12-01 07:16:48,672 - INFO - Epoch 1 Step 6770 (Global: 6770): loss=1.5101, ppl=4.53, grad_norm=1.21, lr=3.29e-06, throughput=2453 tok/s | |
| 2025-12-01 07:20:04,221 - INFO - Epoch 1 Step 6780 (Global: 6780): loss=1.6372, ppl=5.14, grad_norm=1.54, lr=3.28e-06, throughput=2455 tok/s | |
| 2025-12-01 07:23:20,396 - INFO - Epoch 1 Step 6790 (Global: 6790): loss=1.5202, ppl=4.57, grad_norm=1.48, lr=3.26e-06, throughput=2447 tok/s | |
| 2025-12-01 07:26:40,815 - INFO - Epoch 1 Step 6800 (Global: 6800): loss=1.6703, ppl=5.31, grad_norm=1.76, lr=3.24e-06, throughput=2395 tok/s | |
| 2025-12-01 07:29:57,045 - INFO - Epoch 1 Step 6810 (Global: 6810): loss=1.7475, ppl=5.74, grad_norm=1.72, lr=3.23e-06, throughput=2446 tok/s | |
| 2025-12-01 07:33:15,107 - INFO - Epoch 1 Step 6820 (Global: 6820): loss=1.6985, ppl=5.47, grad_norm=1.46, lr=3.21e-06, throughput=2424 tok/s | |
| 2025-12-01 07:36:29,695 - INFO - Epoch 1 Step 6830 (Global: 6830): loss=1.4844, ppl=4.41, grad_norm=1.74, lr=3.20e-06, throughput=2467 tok/s | |
| 2025-12-01 07:39:44,547 - INFO - Epoch 1 Step 6840 (Global: 6840): loss=1.7370, ppl=5.68, grad_norm=1.73, lr=3.18e-06, throughput=2463 tok/s | |
| 2025-12-01 07:42:59,002 - INFO - Epoch 1 Step 6850 (Global: 6850): loss=1.5859, ppl=4.88, grad_norm=1.92, lr=3.17e-06, throughput=2468 tok/s | |
| 2025-12-01 07:46:14,812 - INFO - Epoch 1 Step 6860 (Global: 6860): loss=1.6567, ppl=5.24, grad_norm=1.35, lr=3.15e-06, throughput=2451 tok/s | |
| 2025-12-01 07:49:32,851 - INFO - Epoch 1 Step 6870 (Global: 6870): loss=1.5440, ppl=4.68, grad_norm=1.59, lr=3.13e-06, throughput=2424 tok/s | |
| 2025-12-01 07:52:50,434 - INFO - Epoch 1 Step 6880 (Global: 6880): loss=1.4144, ppl=4.11, grad_norm=1.20, lr=3.12e-06, throughput=2429 tok/s | |
| 2025-12-01 07:56:08,847 - INFO - Epoch 1 Step 6890 (Global: 6890): loss=1.6014, ppl=4.96, grad_norm=1.59, lr=3.10e-06, throughput=2419 tok/s | |
| 2025-12-01 07:59:26,826 - INFO - Epoch 1 Step 6900 (Global: 6900): loss=1.5809, ppl=4.86, grad_norm=2.61, lr=3.09e-06, throughput=2425 tok/s | |
| 2025-12-01 08:02:45,323 - INFO - Epoch 1 Step 6910 (Global: 6910): loss=1.7424, ppl=5.71, grad_norm=1.17, lr=3.07e-06, throughput=2418 tok/s | |
| 2025-12-01 08:06:04,576 - INFO - Epoch 1 Step 6920 (Global: 6920): loss=1.6672, ppl=5.30, grad_norm=1.84, lr=3.06e-06, throughput=2409 tok/s | |
| 2025-12-01 08:09:22,105 - INFO - Epoch 1 Step 6930 (Global: 6930): loss=1.6187, ppl=5.05, grad_norm=1.70, lr=3.04e-06, throughput=2430 tok/s | |
| 2025-12-01 08:12:39,182 - INFO - Epoch 1 Step 6940 (Global: 6940): loss=1.7753, ppl=5.90, grad_norm=1.74, lr=3.03e-06, throughput=2436 tok/s | |
| 2025-12-01 08:15:56,671 - INFO - Epoch 1 Step 6950 (Global: 6950): loss=1.5976, ppl=4.94, grad_norm=1.70, lr=3.01e-06, throughput=2431 tok/s | |
| 2025-12-01 08:19:14,147 - INFO - Epoch 1 Step 6960 (Global: 6960): loss=1.3463, ppl=3.84, grad_norm=1.52, lr=3.00e-06, throughput=2431 tok/s | |
| 2025-12-01 08:22:30,716 - INFO - Epoch 1 Step 6970 (Global: 6970): loss=1.6252, ppl=5.08, grad_norm=1.58, lr=2.98e-06, throughput=2442 tok/s | |
| 2025-12-01 08:25:46,566 - INFO - Epoch 1 Step 6980 (Global: 6980): loss=1.8987, ppl=6.68, grad_norm=1.25, lr=2.96e-06, throughput=2451 tok/s | |
| 2025-12-01 08:29:02,449 - INFO - Epoch 1 Step 6990 (Global: 6990): loss=1.6414, ppl=5.16, grad_norm=1.89, lr=2.95e-06, throughput=2450 tok/s | |
| 2025-12-01 08:32:18,531 - INFO - Epoch 1 Step 7000 (Global: 7000): loss=1.7902, ppl=5.99, grad_norm=4.12, lr=2.93e-06, throughput=2448 tok/s | |
| 2025-12-01 08:35:36,454 - INFO - Epoch 1 Step 7010 (Global: 7010): loss=1.5944, ppl=4.93, grad_norm=1.12, lr=2.92e-06, throughput=2425 tok/s | |
| 2025-12-01 08:38:55,225 - INFO - Epoch 1 Step 7020 (Global: 7020): loss=1.4426, ppl=4.23, grad_norm=1.70, lr=2.90e-06, throughput=2415 tok/s | |
| 2025-12-01 08:42:13,552 - INFO - Epoch 1 Step 7030 (Global: 7030): loss=1.6179, ppl=5.04, grad_norm=1.60, lr=2.89e-06, throughput=2420 tok/s | |
| 2025-12-01 08:45:31,059 - INFO - Epoch 1 Step 7040 (Global: 7040): loss=1.6082, ppl=4.99, grad_norm=1.34, lr=2.87e-06, throughput=2430 tok/s | |
| 2025-12-01 08:48:47,196 - INFO - Epoch 1 Step 7050 (Global: 7050): loss=1.3635, ppl=3.91, grad_norm=2.28, lr=2.86e-06, throughput=2447 tok/s | |
| 2025-12-01 08:52:04,991 - INFO - Epoch 1 Step 7060 (Global: 7060): loss=1.7816, ppl=5.94, grad_norm=1.38, lr=2.84e-06, throughput=2427 tok/s | |
| 2025-12-01 08:55:21,224 - INFO - Epoch 1 Step 7070 (Global: 7070): loss=1.7460, ppl=5.73, grad_norm=1.78, lr=2.83e-06, throughput=2446 tok/s | |
| 2025-12-01 08:58:37,577 - INFO - Epoch 1 Step 7080 (Global: 7080): loss=1.4046, ppl=4.07, grad_norm=1.10, lr=2.81e-06, throughput=2445 tok/s | |
| 2025-12-01 09:01:56,193 - INFO - Epoch 1 Step 7090 (Global: 7090): loss=1.6137, ppl=5.02, grad_norm=1.12, lr=2.80e-06, throughput=2417 tok/s | |
| 2025-12-01 09:05:13,316 - INFO - Epoch 1 Step 7100 (Global: 7100): loss=1.5901, ppl=4.90, grad_norm=1.41, lr=2.78e-06, throughput=2435 tok/s | |
| 2025-12-01 09:08:30,058 - INFO - Epoch 1 Step 7110 (Global: 7110): loss=1.5181, ppl=4.56, grad_norm=1.86, lr=2.77e-06, throughput=2440 tok/s | |
| 2025-12-01 09:11:45,929 - INFO - Epoch 1 Step 7120 (Global: 7120): loss=1.7180, ppl=5.57, grad_norm=1.42, lr=2.75e-06, throughput=2451 tok/s | |
| 2025-12-01 09:15:02,905 - INFO - Epoch 1 Step 7130 (Global: 7130): loss=1.5932, ppl=4.92, grad_norm=1.48, lr=2.74e-06, throughput=2437 tok/s | |
| 2025-12-01 09:18:20,756 - INFO - Epoch 1 Step 7140 (Global: 7140): loss=1.6345, ppl=5.13, grad_norm=3.52, lr=2.72e-06, throughput=2426 tok/s | |
| 2025-12-01 09:21:36,601 - INFO - Epoch 1 Step 7150 (Global: 7150): loss=1.7274, ppl=5.63, grad_norm=1.73, lr=2.71e-06, throughput=2451 tok/s | |
| 2025-12-01 09:24:53,119 - INFO - Epoch 1 Step 7160 (Global: 7160): loss=1.6876, ppl=5.41, grad_norm=1.51, lr=2.69e-06, throughput=2443 tok/s | |
| 2025-12-01 09:28:10,767 - INFO - Epoch 1 Step 7170 (Global: 7170): loss=1.6448, ppl=5.18, grad_norm=1.34, lr=2.68e-06, throughput=2429 tok/s | |
| 2025-12-01 09:31:28,164 - INFO - Epoch 1 Step 7180 (Global: 7180): loss=1.4794, ppl=4.39, grad_norm=1.45, lr=2.66e-06, throughput=2432 tok/s | |
| 2025-12-01 09:34:44,867 - INFO - Epoch 1 Step 7190 (Global: 7190): loss=1.6871, ppl=5.40, grad_norm=1.67, lr=2.65e-06, throughput=2440 tok/s | |
| 2025-12-01 09:38:01,015 - INFO - Epoch 1 Step 7200 (Global: 7200): loss=1.5984, ppl=4.95, grad_norm=1.28, lr=2.63e-06, throughput=2447 tok/s | |
| 2025-12-01 09:41:17,690 - INFO - Epoch 1 Step 7210 (Global: 7210): loss=1.7735, ppl=5.89, grad_norm=1.71, lr=2.62e-06, throughput=2441 tok/s | |
| 2025-12-01 09:44:33,906 - INFO - Epoch 1 Step 7220 (Global: 7220): loss=1.7317, ppl=5.65, grad_norm=1.67, lr=2.60e-06, throughput=2446 tok/s | |
| 2025-12-01 09:47:51,755 - INFO - Epoch 1 Step 7230 (Global: 7230): loss=1.5082, ppl=4.52, grad_norm=2.88, lr=2.59e-06, throughput=2426 tok/s | |
| 2025-12-01 09:51:06,758 - INFO - Epoch 1 Step 7240 (Global: 7240): loss=1.5095, ppl=4.52, grad_norm=1.62, lr=2.58e-06, throughput=2462 tok/s | |
| 2025-12-01 09:54:21,040 - INFO - Epoch 1 Step 7250 (Global: 7250): loss=1.7503, ppl=5.76, grad_norm=1.24, lr=2.56e-06, throughput=2471 tok/s | |
| 2025-12-01 09:57:36,233 - INFO - Epoch 1 Step 7260 (Global: 7260): loss=1.4999, ppl=4.48, grad_norm=1.25, lr=2.55e-06, throughput=2459 tok/s | |
| 2025-12-01 10:00:52,252 - INFO - Epoch 1 Step 7270 (Global: 7270): loss=1.7447, ppl=5.72, grad_norm=1.85, lr=2.53e-06, throughput=2449 tok/s | |
| 2025-12-01 10:04:08,160 - INFO - Epoch 1 Step 7280 (Global: 7280): loss=1.4662, ppl=4.33, grad_norm=1.17, lr=2.52e-06, throughput=2450 tok/s | |
| 2025-12-01 10:07:24,118 - INFO - Epoch 1 Step 7290 (Global: 7290): loss=1.4280, ppl=4.17, grad_norm=1.09, lr=2.50e-06, throughput=2450 tok/s | |
| 2025-12-01 10:10:39,749 - INFO - Epoch 1 Step 7300 (Global: 7300): loss=1.6672, ppl=5.30, grad_norm=2.12, lr=2.49e-06, throughput=2454 tok/s | |
| 2025-12-01 10:13:56,138 - INFO - Epoch 1 Step 7310 (Global: 7310): loss=1.8129, ppl=6.13, grad_norm=1.79, lr=2.47e-06, throughput=2444 tok/s | |
| 2025-12-01 10:17:12,718 - INFO - Epoch 1 Step 7320 (Global: 7320): loss=1.6953, ppl=5.45, grad_norm=1.47, lr=2.46e-06, throughput=2442 tok/s | |
| 2025-12-01 10:20:29,820 - INFO - Epoch 1 Step 7330 (Global: 7330): loss=1.7512, ppl=5.76, grad_norm=2.09, lr=2.44e-06, throughput=2435 tok/s | |
| 2025-12-01 10:23:45,870 - INFO - Epoch 1 Step 7340 (Global: 7340): loss=1.5066, ppl=4.51, grad_norm=1.09, lr=2.43e-06, throughput=2448 tok/s | |
| 2025-12-01 10:27:01,874 - INFO - Epoch 1 Step 7350 (Global: 7350): loss=1.5882, ppl=4.89, grad_norm=1.38, lr=2.42e-06, throughput=2449 tok/s | |
| 2025-12-01 10:30:18,827 - INFO - Epoch 1 Step 7360 (Global: 7360): loss=1.6115, ppl=5.01, grad_norm=1.41, lr=2.40e-06, throughput=2437 tok/s | |
| 2025-12-01 10:33:36,350 - INFO - Epoch 1 Step 7370 (Global: 7370): loss=1.7321, ppl=5.65, grad_norm=1.32, lr=2.39e-06, throughput=2430 tok/s | |
| 2025-12-01 10:36:54,718 - INFO - Epoch 1 Step 7380 (Global: 7380): loss=1.4615, ppl=4.31, grad_norm=1.38, lr=2.37e-06, throughput=2420 tok/s | |
| 2025-12-01 10:40:11,672 - INFO - Epoch 1 Step 7390 (Global: 7390): loss=1.8455, ppl=6.33, grad_norm=1.76, lr=2.36e-06, throughput=2437 tok/s | |
| 2025-12-01 10:43:27,590 - INFO - Epoch 1 Step 7400 (Global: 7400): loss=1.5447, ppl=4.69, grad_norm=1.09, lr=2.34e-06, throughput=2450 tok/s | |
| 2025-12-01 10:46:43,938 - INFO - Epoch 1 Step 7410 (Global: 7410): loss=1.6086, ppl=5.00, grad_norm=2.39, lr=2.33e-06, throughput=2445 tok/s | |
| 2025-12-01 10:50:01,159 - INFO - Epoch 1 Step 7420 (Global: 7420): loss=1.7147, ppl=5.55, grad_norm=1.84, lr=2.32e-06, throughput=2434 tok/s | |
| 2025-12-01 10:53:23,237 - INFO - Epoch 1 Step 7430 (Global: 7430): loss=1.9956, ppl=7.36, grad_norm=1.60, lr=2.30e-06, throughput=2375 tok/s | |
| 2025-12-01 10:56:45,124 - INFO - Epoch 1 Step 7440 (Global: 7440): loss=1.6376, ppl=5.14, grad_norm=1.38, lr=2.29e-06, throughput=2378 tok/s | |
| 2025-12-01 11:00:05,476 - INFO - Epoch 1 Step 7450 (Global: 7450): loss=1.6525, ppl=5.22, grad_norm=1.63, lr=2.27e-06, throughput=2396 tok/s | |
| 2025-12-01 11:03:25,672 - INFO - Epoch 1 Step 7460 (Global: 7460): loss=1.6644, ppl=5.28, grad_norm=1.39, lr=2.26e-06, throughput=2398 tok/s | |
| 2025-12-01 11:06:43,248 - INFO - Epoch 1 Step 7470 (Global: 7470): loss=1.7297, ppl=5.64, grad_norm=1.20, lr=2.25e-06, throughput=2429 tok/s | |
| 2025-12-01 11:10:03,391 - INFO - Epoch 1 Step 7480 (Global: 7480): loss=1.5191, ppl=4.57, grad_norm=1.90, lr=2.23e-06, throughput=2398 tok/s | |
| 2025-12-01 11:13:21,278 - INFO - Epoch 1 Step 7490 (Global: 7490): loss=1.8294, ppl=6.23, grad_norm=1.33, lr=2.22e-06, throughput=2426 tok/s | |
| 2025-12-01 11:16:39,054 - INFO - Epoch 1 Step 7500 (Global: 7500): loss=1.6158, ppl=5.03, grad_norm=2.42, lr=2.20e-06, throughput=2427 tok/s | |
| 2025-12-01 11:19:56,993 - INFO - Epoch 1 Step 7510 (Global: 7510): loss=1.7329, ppl=5.66, grad_norm=1.37, lr=2.19e-06, throughput=2425 tok/s | |
| 2025-12-01 11:23:15,881 - INFO - Epoch 1 Step 7520 (Global: 7520): loss=1.4360, ppl=4.20, grad_norm=1.33, lr=2.18e-06, throughput=2413 tok/s | |
| 2025-12-01 11:26:34,193 - INFO - Epoch 1 Step 7530 (Global: 7530): loss=1.6190, ppl=5.05, grad_norm=1.97, lr=2.16e-06, throughput=2420 tok/s | |
| 2025-12-01 11:29:51,602 - INFO - Epoch 1 Step 7540 (Global: 7540): loss=1.3339, ppl=3.80, grad_norm=1.33, lr=2.15e-06, throughput=2432 tok/s | |
| 2025-12-01 11:33:08,509 - INFO - Epoch 1 Step 7550 (Global: 7550): loss=1.4767, ppl=4.38, grad_norm=1.35, lr=2.14e-06, throughput=2438 tok/s | |
| 2025-12-01 11:36:26,072 - INFO - Epoch 1 Step 7560 (Global: 7560): loss=1.7047, ppl=5.50, grad_norm=1.59, lr=2.12e-06, throughput=2430 tok/s | |
| 2025-12-01 11:39:43,284 - INFO - Epoch 1 Step 7570 (Global: 7570): loss=1.6304, ppl=5.11, grad_norm=1.57, lr=2.11e-06, throughput=2434 tok/s | |
| 2025-12-01 11:43:01,541 - INFO - Epoch 1 Step 7580 (Global: 7580): loss=1.5135, ppl=4.54, grad_norm=1.58, lr=2.09e-06, throughput=2421 tok/s | |
| 2025-12-01 11:46:18,675 - INFO - Epoch 1 Step 7590 (Global: 7590): loss=1.7279, ppl=5.63, grad_norm=1.23, lr=2.08e-06, throughput=2435 tok/s | |
| 2025-12-01 11:49:34,078 - INFO - Epoch 1 Step 7600 (Global: 7600): loss=1.3945, ppl=4.03, grad_norm=1.47, lr=2.07e-06, throughput=2456 tok/s | |
| 2025-12-01 11:52:50,793 - INFO - Epoch 1 Step 7610 (Global: 7610): loss=1.6581, ppl=5.25, grad_norm=1.60, lr=2.05e-06, throughput=2440 tok/s | |
| 2025-12-01 11:56:06,451 - INFO - Epoch 1 Step 7620 (Global: 7620): loss=1.4529, ppl=4.28, grad_norm=1.16, lr=2.04e-06, throughput=2453 tok/s | |
| 2025-12-01 11:59:22,207 - INFO - Epoch 1 Step 7630 (Global: 7630): loss=1.5709, ppl=4.81, grad_norm=2.17, lr=2.03e-06, throughput=2452 tok/s | |
| 2025-12-01 12:02:39,098 - INFO - Epoch 1 Step 7640 (Global: 7640): loss=1.4771, ppl=4.38, grad_norm=1.80, lr=2.01e-06, throughput=2438 tok/s | |
| 2025-12-01 12:05:55,505 - INFO - Epoch 1 Step 7650 (Global: 7650): loss=1.4946, ppl=4.46, grad_norm=1.03, lr=2.00e-06, throughput=2444 tok/s | |
| 2025-12-01 12:09:11,996 - INFO - Epoch 1 Step 7660 (Global: 7660): loss=1.6902, ppl=5.42, grad_norm=1.45, lr=1.99e-06, throughput=2443 tok/s | |
| 2025-12-01 12:12:27,966 - INFO - Epoch 1 Step 7670 (Global: 7670): loss=1.6660, ppl=5.29, grad_norm=1.22, lr=1.97e-06, throughput=2449 tok/s | |
| 2025-12-01 12:15:50,843 - INFO - Epoch 1 Step 7680 (Global: 7680): loss=1.5924, ppl=4.92, grad_norm=1.97, lr=1.96e-06, throughput=2366 tok/s | |
| 2025-12-01 12:20:11,188 - INFO - Epoch 1 Step 7690 (Global: 7690): loss=1.7560, ppl=5.79, grad_norm=2.34, lr=1.95e-06, throughput=1844 tok/s | |
| 2025-12-01 12:24:37,838 - INFO - Epoch 1 Step 7700 (Global: 7700): loss=1.6172, ppl=5.04, grad_norm=1.87, lr=1.93e-06, throughput=1800 tok/s | |
| 2025-12-01 12:28:32,702 - INFO - Epoch 1 Step 7710 (Global: 7710): loss=1.5709, ppl=4.81, grad_norm=1.89, lr=1.92e-06, throughput=2044 tok/s | |
| 2025-12-01 12:31:55,045 - INFO - Epoch 1 Step 7720 (Global: 7720): loss=1.7900, ppl=5.99, grad_norm=1.98, lr=1.91e-06, throughput=2372 tok/s | |
| 2025-12-01 12:35:14,978 - INFO - Epoch 1 Step 7730 (Global: 7730): loss=1.5563, ppl=4.74, grad_norm=1.31, lr=1.89e-06, throughput=2401 tok/s | |
| 2025-12-01 12:38:36,682 - INFO - Epoch 1 Step 7740 (Global: 7740): loss=1.5242, ppl=4.59, grad_norm=1.37, lr=1.88e-06, throughput=2380 tok/s | |
| 2025-12-01 12:41:57,291 - INFO - Epoch 1 Step 7750 (Global: 7750): loss=1.3372, ppl=3.81, grad_norm=1.32, lr=1.87e-06, throughput=2393 tok/s | |
| 2025-12-01 12:45:16,131 - INFO - Epoch 1 Step 7760 (Global: 7760): loss=1.6983, ppl=5.46, grad_norm=1.47, lr=1.85e-06, throughput=2414 tok/s | |
| 2025-12-01 12:48:35,814 - INFO - Epoch 1 Step 7770 (Global: 7770): loss=1.6480, ppl=5.20, grad_norm=1.81, lr=1.84e-06, throughput=2404 tok/s | |
| 2025-12-01 12:51:56,161 - INFO - Epoch 1 Step 7780 (Global: 7780): loss=1.6394, ppl=5.15, grad_norm=1.47, lr=1.83e-06, throughput=2396 tok/s | |
| 2025-12-01 12:55:15,446 - INFO - Epoch 1 Step 7790 (Global: 7790): loss=1.6327, ppl=5.12, grad_norm=1.19, lr=1.82e-06, throughput=2409 tok/s | |
| 2025-12-01 12:58:36,159 - INFO - Epoch 1 Step 7800 (Global: 7800): loss=1.7058, ppl=5.51, grad_norm=1.58, lr=1.80e-06, throughput=2391 tok/s | |
| 2025-12-01 13:01:57,795 - INFO - Epoch 1 Step 7810 (Global: 7810): loss=1.7617, ppl=5.82, grad_norm=1.24, lr=1.79e-06, throughput=2381 tok/s | |
| 2025-12-01 13:05:16,009 - INFO - Epoch 1 Step 7820 (Global: 7820): loss=1.6258, ppl=5.08, grad_norm=1.12, lr=1.78e-06, throughput=2422 tok/s | |
| 2025-12-01 13:08:38,803 - INFO - Epoch 1 Step 7830 (Global: 7830): loss=1.6032, ppl=4.97, grad_norm=1.61, lr=1.76e-06, throughput=2367 tok/s | |
| 2025-12-01 13:12:03,475 - INFO - Epoch 1 Step 7840 (Global: 7840): loss=1.8764, ppl=6.53, grad_norm=1.18, lr=1.75e-06, throughput=2345 tok/s | |
| 2025-12-01 13:15:27,129 - INFO - Epoch 1 Step 7850 (Global: 7850): loss=1.5173, ppl=4.56, grad_norm=1.15, lr=1.74e-06, throughput=2357 tok/s | |
| 2025-12-01 13:18:49,948 - INFO - Epoch 1 Step 7860 (Global: 7860): loss=1.3444, ppl=3.84, grad_norm=1.57, lr=1.73e-06, throughput=2367 tok/s | |
| 2025-12-01 13:22:13,004 - INFO - Epoch 1 Step 7870 (Global: 7870): loss=1.6548, ppl=5.23, grad_norm=1.38, lr=1.71e-06, throughput=2364 tok/s | |
| 2025-12-01 13:25:36,926 - INFO - Epoch 1 Step 7880 (Global: 7880): loss=1.5831, ppl=4.87, grad_norm=1.62, lr=1.70e-06, throughput=2354 tok/s | |
| 2025-12-01 13:28:58,281 - INFO - Epoch 1 Step 7890 (Global: 7890): loss=1.5432, ppl=4.68, grad_norm=1.97, lr=1.69e-06, throughput=2384 tok/s | |
| 2025-12-01 13:32:23,431 - INFO - Epoch 1 Step 7900 (Global: 7900): loss=1.3669, ppl=3.92, grad_norm=1.27, lr=1.68e-06, throughput=2340 tok/s | |
| 2025-12-01 13:35:52,416 - INFO - Epoch 1 Step 7910 (Global: 7910): loss=1.6902, ppl=5.42, grad_norm=1.61, lr=1.66e-06, throughput=2297 tok/s | |
| 2025-12-01 13:39:17,320 - INFO - Epoch 1 Step 7920 (Global: 7920): loss=1.4296, ppl=4.18, grad_norm=1.55, lr=1.65e-06, throughput=2343 tok/s | |
| 2025-12-01 13:42:37,788 - INFO - Epoch 1 Step 7930 (Global: 7930): loss=1.6865, ppl=5.40, grad_norm=1.18, lr=1.64e-06, throughput=2394 tok/s | |
| 2025-12-01 13:45:57,930 - INFO - Epoch 1 Step 7940 (Global: 7940): loss=1.4530, ppl=4.28, grad_norm=1.45, lr=1.63e-06, throughput=2398 tok/s | |
| 2025-12-01 13:49:19,675 - INFO - Epoch 1 Step 7950 (Global: 7950): loss=1.5327, ppl=4.63, grad_norm=1.22, lr=1.61e-06, throughput=2379 tok/s | |
| 2025-12-01 13:52:40,104 - INFO - Epoch 1 Step 7960 (Global: 7960): loss=1.5915, ppl=4.91, grad_norm=1.09, lr=1.60e-06, throughput=2395 tok/s | |
| 2025-12-01 13:56:00,630 - INFO - Epoch 1 Step 7970 (Global: 7970): loss=1.7796, ppl=5.93, grad_norm=1.77, lr=1.59e-06, throughput=2394 tok/s | |
| 2025-12-01 13:59:20,126 - INFO - Epoch 1 Step 7980 (Global: 7980): loss=1.5383, ppl=4.66, grad_norm=1.46, lr=1.58e-06, throughput=2406 tok/s | |
| 2025-12-01 14:02:39,271 - INFO - Epoch 1 Step 7990 (Global: 7990): loss=1.7803, ppl=5.93, grad_norm=1.43, lr=1.56e-06, throughput=2410 tok/s | |
| 2025-12-01 14:05:57,758 - INFO - Epoch 1 Step 8000 (Global: 8000): loss=1.8287, ppl=6.23, grad_norm=1.24, lr=1.55e-06, throughput=2418 tok/s | |
| 2025-12-01 14:05:57,758 - INFO - | |
| Running validation at step 8000... | |
| 2025-12-01 14:17:24,653 - INFO - Validation loss: 1.6254, perplexity: 5.08 | |
| 2025-12-01 14:17:24,654 - INFO - | |
| ====================================================================== | |
| 2025-12-01 14:17:24,654 - INFO - Qualitative Evaluation Samples: | |
| 2025-12-01 14:17:24,654 - INFO - ====================================================================== | |
| 2025-12-01 14:17:24,654 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-12-01 14:17:24,654 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-12-01 14:17:24,655 - INFO - Generated: ' to the band\'s previous work, stating that "the band\'s sound has evolved, but it\'s still the same band." In a review for The A.V. Club, Fitzmaurice gave the album three stars out of five and said that...' | |
| 2025-12-01 14:17:24,655 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' | |
| 2025-12-01 14:17:24,655 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 14:17:24,655 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-12-01 14:17:24,655 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-12-01 14:17:24,656 - INFO - Generated: 'aternity, and the Order of the Arrow has been a major part of the Native American culture since its founding. The Order of the Arrow has been a part of the Native American culture since the 1920s. The...' | |
| 2025-12-01 14:17:24,656 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' | |
| 2025-12-01 14:17:24,656 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 14:17:24,656 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-12-01 14:17:24,656 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-12-01 14:17:24,656 - INFO - Generated: " be defeated by Oga and Mikii. Oga is then sent to the Shingetsu Temple to be trained by the Shingetsu's headmaster, Kira, who is revealed to be a demon. Kira is revealed to be a demon who was once a ..." | |
| 2025-12-01 14:17:24,657 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." | |
| 2025-12-01 14:17:24,657 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 14:17:24,657 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-12-01 14:17:24,657 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-12-01 14:17:24,658 - INFO - Generated: '-01 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x...' | |
| 2025-12-01 14:17:24,658 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' | |
| 2025-12-01 14:17:24,658 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 14:17:24,658 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-12-01 14:17:24,658 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-12-01 14:17:24,658 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' | |
| 2025-12-01 14:17:24,659 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' | |
| 2025-12-01 14:17:24,659 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-01 14:17:24,660 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/qualitative_step_8000.jsonl | |
| 2025-12-01 14:18:55,333 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/best_checkpoint.pt | |
| 2025-12-01 14:18:55,350 - INFO - New best validation loss: 1.6254, perplexity: 5.08 | |
| 2025-12-01 14:22:50,312 - INFO - Epoch 1 Step 8010 (Global: 8010): loss=1.6227, ppl=5.07, grad_norm=2.11, lr=1.54e-06, throughput=2043 tok/s | |
| 2025-12-01 14:26:39,227 - INFO - Epoch 1 Step 8020 (Global: 8020): loss=1.5752, ppl=4.83, grad_norm=2.11, lr=1.53e-06, throughput=2097 tok/s | |
| 2025-12-01 14:29:58,332 - INFO - Epoch 1 Step 8030 (Global: 8030): loss=1.7933, ppl=6.01, grad_norm=2.56, lr=1.52e-06, throughput=2411 tok/s | |
| 2025-12-01 14:33:17,870 - INFO - Epoch 1 Step 8040 (Global: 8040): loss=1.7807, ppl=5.93, grad_norm=1.51, lr=1.50e-06, throughput=2406 tok/s | |
| 2025-12-01 14:36:37,206 - INFO - Epoch 1 Step 8050 (Global: 8050): loss=1.6245, ppl=5.08, grad_norm=1.80, lr=1.49e-06, throughput=2408 tok/s | |
| 2025-12-01 14:39:55,921 - INFO - Epoch 1 Step 8060 (Global: 8060): loss=1.4992, ppl=4.48, grad_norm=1.47, lr=1.48e-06, throughput=2416 tok/s | |
| 2025-12-01 14:43:13,418 - INFO - Epoch 1 Step 8070 (Global: 8070): loss=1.7170, ppl=5.57, grad_norm=2.02, lr=1.47e-06, throughput=2430 tok/s | |
| 2025-12-01 14:46:31,284 - INFO - Epoch 1 Step 8080 (Global: 8080): loss=1.5054, ppl=4.51, grad_norm=1.23, lr=1.46e-06, throughput=2426 tok/s | |
| 2025-12-01 14:49:48,097 - INFO - Epoch 1 Step 8090 (Global: 8090): loss=1.5796, ppl=4.85, grad_norm=1.84, lr=1.44e-06, throughput=2439 tok/s | |
| 2025-12-01 14:53:04,842 - INFO - Epoch 1 Step 8100 (Global: 8100): loss=1.5479, ppl=4.70, grad_norm=1.81, lr=1.43e-06, throughput=2440 tok/s | |
| 2025-12-01 14:56:21,436 - INFO - Epoch 1 Step 8110 (Global: 8110): loss=1.6681, ppl=5.30, grad_norm=1.63, lr=1.42e-06, throughput=2442 tok/s | |
| 2025-12-01 14:59:38,382 - INFO - Epoch 1 Step 8120 (Global: 8120): loss=1.5763, ppl=4.84, grad_norm=1.20, lr=1.41e-06, throughput=2437 tok/s | |
| 2025-12-01 15:02:55,956 - INFO - Epoch 1 Step 8130 (Global: 8130): loss=1.7818, ppl=5.94, grad_norm=1.25, lr=1.40e-06, throughput=2429 tok/s | |
| 2025-12-01 15:06:14,377 - INFO - Epoch 1 Step 8140 (Global: 8140): loss=1.8462, ppl=6.34, grad_norm=2.38, lr=1.39e-06, throughput=2419 tok/s | |
| 2025-12-01 15:09:33,213 - INFO - Epoch 1 Step 8150 (Global: 8150): loss=1.6910, ppl=5.43, grad_norm=2.09, lr=1.37e-06, throughput=2414 tok/s | |
| 2025-12-01 15:12:50,366 - INFO - Epoch 1 Step 8160 (Global: 8160): loss=1.5390, ppl=4.66, grad_norm=1.62, lr=1.36e-06, throughput=2435 tok/s | |
| 2025-12-01 15:16:06,927 - INFO - Epoch 1 Step 8170 (Global: 8170): loss=1.6400, ppl=5.16, grad_norm=1.44, lr=1.35e-06, throughput=2442 tok/s | |
| 2025-12-01 15:19:22,536 - INFO - Epoch 1 Step 8180 (Global: 8180): loss=1.5863, ppl=4.89, grad_norm=1.59, lr=1.34e-06, throughput=2454 tok/s | |
| 2025-12-01 15:22:41,061 - INFO - Epoch 1 Step 8190 (Global: 8190): loss=1.7635, ppl=5.83, grad_norm=1.21, lr=1.33e-06, throughput=2418 tok/s | |
| 2025-12-01 15:25:57,304 - INFO - Epoch 1 Step 8200 (Global: 8200): loss=1.5522, ppl=4.72, grad_norm=1.57, lr=1.32e-06, throughput=2446 tok/s | |
| 2025-12-01 15:29:13,150 - INFO - Epoch 1 Step 8210 (Global: 8210): loss=2.0363, ppl=7.66, grad_norm=1.70, lr=1.31e-06, throughput=2451 tok/s | |
| 2025-12-01 15:32:28,979 - INFO - Epoch 1 Step 8220 (Global: 8220): loss=1.8159, ppl=6.15, grad_norm=1.54, lr=1.29e-06, throughput=2451 tok/s | |
| 2025-12-01 15:35:46,249 - INFO - Epoch 1 Step 8230 (Global: 8230): loss=1.6016, ppl=4.96, grad_norm=1.57, lr=1.28e-06, throughput=2433 tok/s | |
| 2025-12-01 15:39:01,949 - INFO - Epoch 1 Step 8240 (Global: 8240): loss=1.5602, ppl=4.76, grad_norm=3.06, lr=1.27e-06, throughput=2453 tok/s | |
| 2025-12-01 15:42:17,112 - INFO - Epoch 1 Step 8250 (Global: 8250): loss=1.3583, ppl=3.89, grad_norm=1.52, lr=1.26e-06, throughput=2460 tok/s | |
| 2025-12-01 15:45:38,226 - INFO - Epoch 1 Step 8260 (Global: 8260): loss=1.6559, ppl=5.24, grad_norm=1.61, lr=1.25e-06, throughput=2387 tok/s | |
| 2025-12-01 15:49:00,931 - INFO - Epoch 1 Step 8270 (Global: 8270): loss=1.6401, ppl=5.16, grad_norm=2.06, lr=1.24e-06, throughput=2368 tok/s | |
| 2025-12-01 15:52:22,384 - INFO - Epoch 1 Step 8280 (Global: 8280): loss=1.6941, ppl=5.44, grad_norm=1.41, lr=1.23e-06, throughput=2383 tok/s | |
| 2025-12-01 15:56:27,814 - INFO - Epoch 1 Step 8290 (Global: 8290): loss=1.7131, ppl=5.55, grad_norm=2.23, lr=1.22e-06, throughput=1956 tok/s | |
| 2025-12-01 16:00:10,310 - INFO - Epoch 1 Step 8300 (Global: 8300): loss=1.5775, ppl=4.84, grad_norm=1.34, lr=1.21e-06, throughput=2157 tok/s | |
| 2025-12-01 16:03:30,953 - INFO - Epoch 1 Step 8310 (Global: 8310): loss=1.7646, ppl=5.84, grad_norm=1.36, lr=1.20e-06, throughput=2392 tok/s | |
| 2025-12-01 16:06:58,987 - INFO - Epoch 1 Step 8320 (Global: 8320): loss=1.5964, ppl=4.94, grad_norm=1.66, lr=1.18e-06, throughput=2307 tok/s | |
| 2025-12-01 16:10:16,798 - INFO - Epoch 1 Step 8330 (Global: 8330): loss=1.5409, ppl=4.67, grad_norm=1.45, lr=1.17e-06, throughput=2427 tok/s | |
| 2025-12-01 16:13:42,481 - INFO - Epoch 1 Step 8340 (Global: 8340): loss=1.6201, ppl=5.05, grad_norm=1.33, lr=1.16e-06, throughput=2334 tok/s | |
| 2025-12-01 16:17:12,951 - INFO - Epoch 1 Step 8350 (Global: 8350): loss=1.9470, ppl=7.01, grad_norm=1.48, lr=1.15e-06, throughput=2281 tok/s | |
| 2025-12-01 16:20:41,151 - INFO - Epoch 1 Step 8360 (Global: 8360): loss=1.5845, ppl=4.88, grad_norm=1.52, lr=1.14e-06, throughput=2305 tok/s | |
| 2025-12-01 16:24:05,390 - INFO - Epoch 1 Step 8370 (Global: 8370): loss=1.5293, ppl=4.61, grad_norm=1.06, lr=1.13e-06, throughput=2350 tok/s | |
| 2025-12-01 16:27:23,738 - INFO - Epoch 1 Step 8380 (Global: 8380): loss=1.6682, ppl=5.30, grad_norm=1.19, lr=1.12e-06, throughput=2420 tok/s | |
| 2025-12-01 16:30:42,624 - INFO - Epoch 1 Step 8390 (Global: 8390): loss=1.6543, ppl=5.23, grad_norm=1.16, lr=1.11e-06, throughput=2413 tok/s | |
| 2025-12-01 16:34:00,907 - INFO - Epoch 1 Step 8400 (Global: 8400): loss=1.4275, ppl=4.17, grad_norm=1.98, lr=1.10e-06, throughput=2421 tok/s | |
| 2025-12-01 16:37:21,592 - INFO - Epoch 1 Step 8410 (Global: 8410): loss=1.3930, ppl=4.03, grad_norm=1.48, lr=1.09e-06, throughput=2392 tok/s | |
| 2025-12-01 16:40:49,381 - INFO - Epoch 1 Step 8420 (Global: 8420): loss=1.3900, ppl=4.01, grad_norm=1.30, lr=1.08e-06, throughput=2310 tok/s | |
| 2025-12-01 16:44:10,281 - INFO - Epoch 1 Step 8430 (Global: 8430): loss=1.5997, ppl=4.95, grad_norm=1.19, lr=1.07e-06, throughput=2389 tok/s | |
| 2025-12-01 16:47:36,163 - INFO - Epoch 1 Step 8440 (Global: 8440): loss=1.6555, ppl=5.24, grad_norm=1.77, lr=1.06e-06, throughput=2331 tok/s | |
| 2025-12-01 16:51:01,750 - INFO - Epoch 1 Step 8450 (Global: 8450): loss=1.5921, ppl=4.91, grad_norm=1.80, lr=1.05e-06, throughput=2335 tok/s | |
| 2025-12-01 16:54:24,869 - INFO - Epoch 1 Step 8460 (Global: 8460): loss=1.4737, ppl=4.37, grad_norm=1.16, lr=1.04e-06, throughput=2363 tok/s | |
| 2025-12-01 16:57:47,573 - INFO - Epoch 1 Step 8470 (Global: 8470): loss=1.3246, ppl=3.76, grad_norm=2.05, lr=1.03e-06, throughput=2368 tok/s | |
| 2025-12-01 17:01:04,875 - INFO - Epoch 1 Step 8480 (Global: 8480): loss=1.6079, ppl=4.99, grad_norm=2.59, lr=1.02e-06, throughput=2433 tok/s | |
| 2025-12-01 17:04:25,237 - INFO - Epoch 1 Step 8490 (Global: 8490): loss=1.5575, ppl=4.75, grad_norm=1.13, lr=1.01e-06, throughput=2396 tok/s | |
| 2025-12-01 17:07:49,159 - INFO - Epoch 1 Step 8500 (Global: 8500): loss=1.5097, ppl=4.53, grad_norm=2.09, lr=9.96e-07, throughput=2354 tok/s | |
| 2025-12-01 17:11:09,914 - INFO - Epoch 1 Step 8510 (Global: 8510): loss=1.6541, ppl=5.23, grad_norm=1.49, lr=9.86e-07, throughput=2391 tok/s | |
| 2025-12-01 17:14:27,992 - INFO - Epoch 1 Step 8520 (Global: 8520): loss=1.5418, ppl=4.67, grad_norm=1.25, lr=9.76e-07, throughput=2423 tok/s | |
| 2025-12-01 17:17:47,101 - INFO - Epoch 1 Step 8530 (Global: 8530): loss=1.7606, ppl=5.82, grad_norm=1.30, lr=9.67e-07, throughput=2411 tok/s | |
| 2025-12-01 17:21:08,544 - INFO - Epoch 1 Step 8540 (Global: 8540): loss=1.6161, ppl=5.03, grad_norm=1.51, lr=9.57e-07, throughput=2383 tok/s | |
| 2025-12-01 17:24:39,163 - INFO - Epoch 1 Step 8550 (Global: 8550): loss=1.7924, ppl=6.00, grad_norm=1.59, lr=9.47e-07, throughput=2279 tok/s | |
| 2025-12-01 17:28:01,837 - INFO - Epoch 1 Step 8560 (Global: 8560): loss=1.5989, ppl=4.95, grad_norm=1.49, lr=9.37e-07, throughput=2368 tok/s | |
| 2025-12-01 17:31:25,968 - INFO - Epoch 1 Step 8570 (Global: 8570): loss=1.5445, ppl=4.69, grad_norm=1.52, lr=9.27e-07, throughput=2351 tok/s | |
| 2025-12-01 17:35:03,162 - INFO - Epoch 1 Step 8580 (Global: 8580): loss=1.8391, ppl=6.29, grad_norm=1.45, lr=9.18e-07, throughput=2210 tok/s | |
| 2025-12-01 17:39:03,657 - INFO - Epoch 1 Step 8590 (Global: 8590): loss=1.3833, ppl=3.99, grad_norm=1.42, lr=9.08e-07, throughput=1996 tok/s | |
| 2025-12-01 17:42:29,117 - INFO - Epoch 1 Step 8600 (Global: 8600): loss=1.3888, ppl=4.01, grad_norm=1.29, lr=8.98e-07, throughput=2336 tok/s | |
| 2025-12-01 17:45:54,711 - INFO - Epoch 1 Step 8610 (Global: 8610): loss=1.4811, ppl=4.40, grad_norm=1.41, lr=8.89e-07, throughput=2335 tok/s | |
| 2025-12-01 17:49:19,340 - INFO - Epoch 1 Step 8620 (Global: 8620): loss=1.4318, ppl=4.19, grad_norm=1.51, lr=8.79e-07, throughput=2346 tok/s | |
| 2025-12-01 17:52:43,794 - INFO - Epoch 1 Step 8630 (Global: 8630): loss=1.7773, ppl=5.91, grad_norm=2.86, lr=8.70e-07, throughput=2348 tok/s | |
| 2025-12-01 17:56:07,071 - INFO - Epoch 1 Step 8640 (Global: 8640): loss=1.2849, ppl=3.61, grad_norm=1.47, lr=8.60e-07, throughput=2361 tok/s | |
| 2025-12-01 17:59:31,211 - INFO - Epoch 1 Step 8650 (Global: 8650): loss=1.6420, ppl=5.17, grad_norm=1.46, lr=8.51e-07, throughput=2351 tok/s | |
| 2025-12-01 18:02:55,362 - INFO - Epoch 1 Step 8660 (Global: 8660): loss=1.4026, ppl=4.07, grad_norm=1.19, lr=8.42e-07, throughput=2351 tok/s | |
| 2025-12-01 18:06:19,513 - INFO - Epoch 1 Step 8670 (Global: 8670): loss=1.4544, ppl=4.28, grad_norm=1.30, lr=8.32e-07, throughput=2351 tok/s | |
| 2025-12-01 18:09:49,850 - INFO - Epoch 1 Step 8680 (Global: 8680): loss=1.6840, ppl=5.39, grad_norm=1.70, lr=8.23e-07, throughput=2282 tok/s | |
| 2025-12-01 18:13:16,099 - INFO - Epoch 1 Step 8690 (Global: 8690): loss=1.4607, ppl=4.31, grad_norm=1.88, lr=8.14e-07, throughput=2327 tok/s | |
| 2025-12-01 18:16:49,445 - INFO - Epoch 1 Step 8700 (Global: 8700): loss=1.8332, ppl=6.25, grad_norm=1.62, lr=8.05e-07, throughput=2250 tok/s | |
| 2025-12-01 18:20:13,456 - INFO - Epoch 1 Step 8710 (Global: 8710): loss=1.4456, ppl=4.24, grad_norm=1.41, lr=7.96e-07, throughput=2353 tok/s | |
| 2025-12-01 18:23:29,732 - INFO - Epoch 1 Step 8720 (Global: 8720): loss=1.6157, ppl=5.03, grad_norm=1.33, lr=7.87e-07, throughput=2446 tok/s | |
| 2025-12-01 18:26:46,747 - INFO - Epoch 1 Step 8730 (Global: 8730): loss=1.5819, ppl=4.86, grad_norm=1.80, lr=7.78e-07, throughput=2436 tok/s | |
| 2025-12-01 18:30:02,796 - INFO - Epoch 1 Step 8740 (Global: 8740): loss=1.3701, ppl=3.94, grad_norm=1.23, lr=7.69e-07, throughput=2448 tok/s | |
| 2025-12-01 18:33:18,949 - INFO - Epoch 1 Step 8750 (Global: 8750): loss=1.6424, ppl=5.17, grad_norm=1.32, lr=7.60e-07, throughput=2447 tok/s | |
| 2025-12-01 18:36:35,958 - INFO - Epoch 1 Step 8760 (Global: 8760): loss=1.7062, ppl=5.51, grad_norm=1.98, lr=7.51e-07, throughput=2436 tok/s | |
| 2025-12-01 18:39:52,147 - INFO - Epoch 1 Step 8770 (Global: 8770): loss=1.5845, ppl=4.88, grad_norm=1.71, lr=7.42e-07, throughput=2447 tok/s | |
| 2025-12-01 18:43:08,370 - INFO - Epoch 1 Step 8780 (Global: 8780): loss=1.6048, ppl=4.98, grad_norm=1.20, lr=7.33e-07, throughput=2446 tok/s | |
| 2025-12-01 18:46:25,119 - INFO - Epoch 1 Step 8790 (Global: 8790): loss=1.5998, ppl=4.95, grad_norm=1.62, lr=7.25e-07, throughput=2440 tok/s | |
| 2025-12-01 18:49:42,003 - INFO - Epoch 1 Step 8800 (Global: 8800): loss=1.3825, ppl=3.98, grad_norm=1.20, lr=7.16e-07, throughput=2438 tok/s | |
| 2025-12-01 18:52:58,861 - INFO - Epoch 1 Step 8810 (Global: 8810): loss=1.8522, ppl=6.37, grad_norm=1.84, lr=7.07e-07, throughput=2438 tok/s | |
| 2025-12-01 18:56:16,205 - INFO - Epoch 1 Step 8820 (Global: 8820): loss=1.5724, ppl=4.82, grad_norm=1.17, lr=6.99e-07, throughput=2432 tok/s | |
| 2025-12-01 18:59:33,068 - INFO - Epoch 1 Step 8830 (Global: 8830): loss=1.5657, ppl=4.79, grad_norm=1.98, lr=6.90e-07, throughput=2438 tok/s | |
| 2025-12-01 19:02:50,206 - INFO - Epoch 1 Step 8840 (Global: 8840): loss=1.5638, ppl=4.78, grad_norm=1.68, lr=6.82e-07, throughput=2435 tok/s | |
| 2025-12-01 19:06:06,817 - INFO - Epoch 1 Step 8850 (Global: 8850): loss=1.7025, ppl=5.49, grad_norm=1.23, lr=6.74e-07, throughput=2441 tok/s | |
| 2025-12-01 19:09:25,057 - INFO - Epoch 1 Step 8860 (Global: 8860): loss=1.5102, ppl=4.53, grad_norm=0.98, lr=6.65e-07, throughput=2421 tok/s | |
| 2025-12-01 19:12:42,647 - INFO - Epoch 1 Step 8870 (Global: 8870): loss=1.8143, ppl=6.14, grad_norm=1.56, lr=6.57e-07, throughput=2429 tok/s | |
| 2025-12-01 19:16:02,693 - INFO - Epoch 1 Step 8880 (Global: 8880): loss=1.5773, ppl=4.84, grad_norm=2.02, lr=6.49e-07, throughput=2399 tok/s | |
| 2025-12-01 19:20:01,790 - INFO - Epoch 1 Step 8890 (Global: 8890): loss=1.6667, ppl=5.29, grad_norm=1.22, lr=6.40e-07, throughput=2008 tok/s | |
| 2025-12-01 19:25:32,049 - INFO - Epoch 1 Step 8900 (Global: 8900): loss=1.6401, ppl=5.16, grad_norm=1.15, lr=6.32e-07, throughput=1453 tok/s | |
| 2025-12-01 19:28:51,893 - INFO - Epoch 1 Step 8910 (Global: 8910): loss=1.7216, ppl=5.59, grad_norm=1.48, lr=6.24e-07, throughput=2402 tok/s | |
| 2025-12-01 19:32:11,804 - INFO - Epoch 1 Step 8920 (Global: 8920): loss=1.7024, ppl=5.49, grad_norm=1.40, lr=6.16e-07, throughput=2401 tok/s | |
| 2025-12-01 19:35:33,062 - INFO - Epoch 1 Step 8930 (Global: 8930): loss=1.5431, ppl=4.68, grad_norm=1.76, lr=6.08e-07, throughput=2385 tok/s | |
| 2025-12-01 19:38:51,297 - INFO - Epoch 1 Step 8940 (Global: 8940): loss=1.5899, ppl=4.90, grad_norm=1.35, lr=6.00e-07, throughput=2421 tok/s | |
| 2025-12-01 19:42:18,139 - INFO - Epoch 1 Step 8950 (Global: 8950): loss=1.3355, ppl=3.80, grad_norm=3.58, lr=5.92e-07, throughput=2321 tok/s | |
| 2025-12-01 19:45:41,868 - INFO - Epoch 1 Step 8960 (Global: 8960): loss=1.5199, ppl=4.57, grad_norm=1.73, lr=5.84e-07, throughput=2356 tok/s | |
| 2025-12-01 19:49:10,963 - INFO - Epoch 1 Step 8970 (Global: 8970): loss=1.6107, ppl=5.01, grad_norm=1.48, lr=5.76e-07, throughput=2296 tok/s | |
| 2025-12-01 19:52:39,260 - INFO - Epoch 1 Step 8980 (Global: 8980): loss=1.6176, ppl=5.04, grad_norm=1.85, lr=5.68e-07, throughput=2304 tok/s | |
| 2025-12-01 19:56:11,474 - INFO - Epoch 1 Step 8990 (Global: 8990): loss=1.3896, ppl=4.01, grad_norm=1.45, lr=5.61e-07, throughput=2262 tok/s | |
| 2025-12-01 19:59:32,376 - INFO - Epoch 1 Step 9000 (Global: 9000): loss=1.3188, ppl=3.74, grad_norm=1.12, lr=5.53e-07, throughput=2389 tok/s | |
| 2025-12-01 20:02:53,099 - INFO - Epoch 1 Step 9010 (Global: 9010): loss=1.7371, ppl=5.68, grad_norm=1.40, lr=5.45e-07, throughput=2391 tok/s | |
| 2025-12-01 20:06:10,593 - INFO - Epoch 1 Step 9020 (Global: 9020): loss=1.5907, ppl=4.91, grad_norm=1.27, lr=5.38e-07, throughput=2430 tok/s | |
| 2025-12-01 20:09:27,542 - INFO - Epoch 1 Step 9030 (Global: 9030): loss=1.5531, ppl=4.73, grad_norm=1.14, lr=5.30e-07, throughput=2437 tok/s | |
| 2025-12-01 20:12:44,946 - INFO - Epoch 1 Step 9040 (Global: 9040): loss=1.6484, ppl=5.20, grad_norm=2.53, lr=5.23e-07, throughput=2432 tok/s | |
| 2025-12-01 20:16:01,843 - INFO - Epoch 1 Step 9050 (Global: 9050): loss=1.8454, ppl=6.33, grad_norm=1.41, lr=5.15e-07, throughput=2438 tok/s | |
| 2025-12-01 20:19:18,564 - INFO - Epoch 1 Step 9060 (Global: 9060): loss=1.6863, ppl=5.40, grad_norm=1.82, lr=5.08e-07, throughput=2440 tok/s | |
| 2025-12-01 20:22:35,407 - INFO - Epoch 1 Step 9070 (Global: 9070): loss=1.5629, ppl=4.77, grad_norm=1.35, lr=5.01e-07, throughput=2439 tok/s | |
| 2025-12-01 20:25:52,634 - INFO - Epoch 1 Step 9080 (Global: 9080): loss=1.4419, ppl=4.23, grad_norm=1.25, lr=4.93e-07, throughput=2434 tok/s | |
| 2025-12-01 20:29:10,229 - INFO - Epoch 1 Step 9090 (Global: 9090): loss=1.4758, ppl=4.37, grad_norm=1.16, lr=4.86e-07, throughput=2429 tok/s | |
| 2025-12-01 20:32:28,021 - INFO - Epoch 1 Step 9100 (Global: 9100): loss=1.6185, ppl=5.05, grad_norm=1.30, lr=4.79e-07, throughput=2427 tok/s | |
| 2025-12-01 20:35:47,188 - INFO - Epoch 1 Step 9110 (Global: 9110): loss=1.6468, ppl=5.19, grad_norm=1.44, lr=4.72e-07, throughput=2410 tok/s | |
| 2025-12-01 20:39:06,203 - INFO - Epoch 1 Step 9120 (Global: 9120): loss=1.3987, ppl=4.05, grad_norm=1.41, lr=4.65e-07, throughput=2412 tok/s | |
| 2025-12-01 20:42:25,144 - INFO - Epoch 1 Step 9130 (Global: 9130): loss=1.4421, ppl=4.23, grad_norm=1.97, lr=4.58e-07, throughput=2413 tok/s | |
| 2025-12-01 20:45:43,649 - INFO - Epoch 1 Step 9140 (Global: 9140): loss=1.5405, ppl=4.67, grad_norm=1.21, lr=4.51e-07, throughput=2418 tok/s | |
| 2025-12-01 20:49:02,945 - INFO - Epoch 1 Step 9150 (Global: 9150): loss=1.4185, ppl=4.13, grad_norm=1.45, lr=4.44e-07, throughput=2409 tok/s | |
| 2025-12-01 20:52:21,961 - INFO - Epoch 1 Step 9160 (Global: 9160): loss=1.7985, ppl=6.04, grad_norm=1.64, lr=4.37e-07, throughput=2412 tok/s | |
| 2025-12-01 20:55:41,215 - INFO - Epoch 1 Step 9170 (Global: 9170): loss=1.5585, ppl=4.75, grad_norm=1.93, lr=4.30e-07, throughput=2409 tok/s | |
| 2025-12-01 20:59:00,720 - INFO - Epoch 1 Step 9180 (Global: 9180): loss=1.4662, ppl=4.33, grad_norm=1.11, lr=4.23e-07, throughput=2406 tok/s | |
| 2025-12-01 21:02:19,990 - INFO - Epoch 1 Step 9190 (Global: 9190): loss=1.8128, ppl=6.13, grad_norm=1.33, lr=4.17e-07, throughput=2409 tok/s | |
| 2025-12-01 21:05:39,513 - INFO - Epoch 1 Step 9200 (Global: 9200): loss=1.7695, ppl=5.87, grad_norm=3.88, lr=4.10e-07, throughput=2406 tok/s | |
| 2025-12-01 21:08:58,549 - INFO - Epoch 1 Step 9210 (Global: 9210): loss=1.8198, ppl=6.17, grad_norm=1.30, lr=4.03e-07, throughput=2412 tok/s | |
| 2025-12-01 21:12:18,400 - INFO - Epoch 1 Step 9220 (Global: 9220): loss=1.7082, ppl=5.52, grad_norm=1.98, lr=3.97e-07, throughput=2402 tok/s | |
| 2025-12-01 21:17:53,548 - INFO - Epoch 1 Step 9230 (Global: 9230): loss=1.3893, ppl=4.01, grad_norm=1.23, lr=3.90e-07, throughput=1432 tok/s | |
| 2025-12-01 21:21:16,332 - INFO - Epoch 1 Step 9240 (Global: 9240): loss=1.5647, ppl=4.78, grad_norm=1.61, lr=3.84e-07, throughput=2367 tok/s | |
| 2025-12-01 21:24:36,111 - INFO - Epoch 1 Step 9250 (Global: 9250): loss=1.5539, ppl=4.73, grad_norm=1.50, lr=3.77e-07, throughput=2403 tok/s | |
| 2025-12-01 21:27:55,574 - INFO - Epoch 1 Step 9260 (Global: 9260): loss=1.5684, ppl=4.80, grad_norm=1.48, lr=3.71e-07, throughput=2406 tok/s | |
| 2025-12-01 21:31:16,254 - INFO - Epoch 1 Step 9270 (Global: 9270): loss=1.5511, ppl=4.72, grad_norm=1.32, lr=3.65e-07, throughput=2392 tok/s | |
| 2025-12-01 21:34:35,786 - INFO - Epoch 1 Step 9280 (Global: 9280): loss=1.5747, ppl=4.83, grad_norm=1.11, lr=3.58e-07, throughput=2406 tok/s | |
| 2025-12-01 21:37:55,568 - INFO - Epoch 1 Step 9290 (Global: 9290): loss=1.4973, ppl=4.47, grad_norm=2.52, lr=3.52e-07, throughput=2403 tok/s | |
| 2025-12-01 21:41:15,235 - INFO - Epoch 1 Step 9300 (Global: 9300): loss=1.4671, ppl=4.34, grad_norm=1.53, lr=3.46e-07, throughput=2404 tok/s | |
| 2025-12-01 21:44:35,247 - INFO - Epoch 1 Step 9310 (Global: 9310): loss=1.3856, ppl=4.00, grad_norm=1.77, lr=3.40e-07, throughput=2400 tok/s | |
| 2025-12-01 21:47:54,297 - INFO - Epoch 1 Step 9320 (Global: 9320): loss=1.5386, ppl=4.66, grad_norm=1.20, lr=3.34e-07, throughput=2411 tok/s | |
| 2025-12-01 21:51:14,159 - INFO - Epoch 1 Step 9330 (Global: 9330): loss=1.6950, ppl=5.45, grad_norm=1.35, lr=3.28e-07, throughput=2402 tok/s | |
| 2025-12-01 21:54:34,356 - INFO - Epoch 1 Step 9340 (Global: 9340): loss=1.5751, ppl=4.83, grad_norm=1.48, lr=3.22e-07, throughput=2398 tok/s | |
| 2025-12-01 21:57:54,306 - INFO - Epoch 1 Step 9350 (Global: 9350): loss=1.5364, ppl=4.65, grad_norm=1.57, lr=3.16e-07, throughput=2401 tok/s | |
| 2025-12-01 22:01:15,454 - INFO - Epoch 1 Step 9360 (Global: 9360): loss=1.3444, ppl=3.84, grad_norm=1.22, lr=3.10e-07, throughput=2386 tok/s | |
| 2025-12-01 22:04:33,558 - INFO - Epoch 1 Step 9370 (Global: 9370): loss=1.5099, ppl=4.53, grad_norm=1.37, lr=3.05e-07, throughput=2423 tok/s | |
| 2025-12-01 22:07:53,799 - INFO - Epoch 1 Step 9380 (Global: 9380): loss=1.7279, ppl=5.63, grad_norm=1.63, lr=2.99e-07, throughput=2397 tok/s | |
| 2025-12-01 22:11:13,893 - INFO - Epoch 1 Step 9390 (Global: 9390): loss=1.3389, ppl=3.81, grad_norm=1.31, lr=2.93e-07, throughput=2399 tok/s | |
| 2025-12-01 22:14:33,237 - INFO - Epoch 1 Step 9400 (Global: 9400): loss=1.5063, ppl=4.51, grad_norm=1.25, lr=2.88e-07, throughput=2408 tok/s | |
| 2025-12-01 22:17:52,847 - INFO - Epoch 1 Step 9410 (Global: 9410): loss=1.6514, ppl=5.21, grad_norm=1.31, lr=2.82e-07, throughput=2405 tok/s | |
| 2025-12-01 22:21:12,073 - INFO - Epoch 1 Step 9420 (Global: 9420): loss=1.4929, ppl=4.45, grad_norm=2.52, lr=2.76e-07, throughput=2409 tok/s | |
| 2025-12-01 22:24:30,660 - INFO - Epoch 1 Step 9430 (Global: 9430): loss=1.7168, ppl=5.57, grad_norm=1.37, lr=2.71e-07, throughput=2417 tok/s | |
| 2025-12-01 22:27:49,977 - INFO - Epoch 1 Step 9440 (Global: 9440): loss=1.5179, ppl=4.56, grad_norm=1.27, lr=2.66e-07, throughput=2408 tok/s | |
| 2025-12-01 22:31:08,573 - INFO - Epoch 1 Step 9450 (Global: 9450): loss=1.5960, ppl=4.93, grad_norm=1.38, lr=2.60e-07, throughput=2417 tok/s | |
| 2025-12-01 22:34:27,457 - INFO - Epoch 1 Step 9460 (Global: 9460): loss=1.8515, ppl=6.37, grad_norm=1.21, lr=2.55e-07, throughput=2413 tok/s | |
| 2025-12-01 22:37:47,467 - INFO - Epoch 1 Step 9470 (Global: 9470): loss=1.6755, ppl=5.34, grad_norm=1.11, lr=2.50e-07, throughput=2400 tok/s | |
| 2025-12-01 22:41:06,931 - INFO - Epoch 1 Step 9480 (Global: 9480): loss=1.6719, ppl=5.32, grad_norm=1.20, lr=2.44e-07, throughput=2406 tok/s | |
| 2025-12-01 22:44:27,210 - INFO - Epoch 1 Step 9490 (Global: 9490): loss=1.8204, ppl=6.17, grad_norm=1.48, lr=2.39e-07, throughput=2397 tok/s | |
| 2025-12-01 22:47:47,190 - INFO - Epoch 1 Step 9500 (Global: 9500): loss=1.5407, ppl=4.67, grad_norm=1.30, lr=2.34e-07, throughput=2400 tok/s | |
| 2025-12-01 22:51:06,856 - INFO - Epoch 1 Step 9510 (Global: 9510): loss=1.6991, ppl=5.47, grad_norm=4.19, lr=2.29e-07, throughput=2404 tok/s | |
| 2025-12-01 22:54:26,582 - INFO - Epoch 1 Step 9520 (Global: 9520): loss=1.8507, ppl=6.36, grad_norm=1.22, lr=2.24e-07, throughput=2403 tok/s | |
| 2025-12-01 22:57:47,290 - INFO - Epoch 1 Step 9530 (Global: 9530): loss=1.6330, ppl=5.12, grad_norm=1.44, lr=2.19e-07, throughput=2392 tok/s | |
| 2025-12-01 23:01:06,320 - INFO - Epoch 1 Step 9540 (Global: 9540): loss=1.6966, ppl=5.46, grad_norm=1.41, lr=2.14e-07, throughput=2412 tok/s | |
| 2025-12-01 23:04:25,837 - INFO - Epoch 1 Step 9550 (Global: 9550): loss=1.5937, ppl=4.92, grad_norm=1.84, lr=2.10e-07, throughput=2406 tok/s | |
| 2025-12-01 23:07:45,057 - INFO - Epoch 1 Step 9560 (Global: 9560): loss=1.6616, ppl=5.27, grad_norm=1.23, lr=2.05e-07, throughput=2409 tok/s | |
| 2025-12-01 23:11:05,443 - INFO - Epoch 1 Step 9570 (Global: 9570): loss=1.9132, ppl=6.77, grad_norm=2.17, lr=2.00e-07, throughput=2395 tok/s | |
| 2025-12-01 23:14:25,082 - INFO - Epoch 1 Step 9580 (Global: 9580): loss=1.5264, ppl=4.60, grad_norm=2.97, lr=1.95e-07, throughput=2404 tok/s | |
| 2025-12-01 23:17:44,601 - INFO - Epoch 1 Step 9590 (Global: 9590): loss=1.6374, ppl=5.14, grad_norm=1.13, lr=1.91e-07, throughput=2406 tok/s | |
| 2025-12-01 23:21:04,449 - INFO - Epoch 1 Step 9600 (Global: 9600): loss=1.5726, ppl=4.82, grad_norm=1.17, lr=1.86e-07, throughput=2402 tok/s | |
| 2025-12-01 23:24:23,543 - INFO - Epoch 1 Step 9610 (Global: 9610): loss=1.6539, ppl=5.23, grad_norm=1.55, lr=1.82e-07, throughput=2411 tok/s | |
| 2025-12-01 23:27:42,399 - INFO - Epoch 1 Step 9620 (Global: 9620): loss=1.8480, ppl=6.35, grad_norm=1.73, lr=1.77e-07, throughput=2414 tok/s | |
| 2025-12-01 23:31:01,819 - INFO - Epoch 1 Step 9630 (Global: 9630): loss=1.8712, ppl=6.50, grad_norm=1.67, lr=1.73e-07, throughput=2407 tok/s | |
| 2025-12-01 23:34:20,611 - INFO - Epoch 1 Step 9640 (Global: 9640): loss=1.7248, ppl=5.61, grad_norm=1.53, lr=1.68e-07, throughput=2415 tok/s | |
| 2025-12-01 23:37:39,803 - INFO - Epoch 1 Step 9650 (Global: 9650): loss=1.5941, ppl=4.92, grad_norm=2.08, lr=1.64e-07, throughput=2410 tok/s | |
| 2025-12-01 23:40:59,352 - INFO - Epoch 1 Step 9660 (Global: 9660): loss=1.7921, ppl=6.00, grad_norm=1.29, lr=1.60e-07, throughput=2405 tok/s | |
| 2025-12-01 23:44:19,047 - INFO - Epoch 1 Step 9670 (Global: 9670): loss=1.7397, ppl=5.70, grad_norm=1.86, lr=1.56e-07, throughput=2404 tok/s | |
| 2025-12-01 23:47:39,225 - INFO - Epoch 1 Step 9680 (Global: 9680): loss=1.4493, ppl=4.26, grad_norm=2.22, lr=1.52e-07, throughput=2398 tok/s | |
| 2025-12-01 23:50:57,998 - INFO - Epoch 1 Step 9690 (Global: 9690): loss=1.5719, ppl=4.82, grad_norm=1.59, lr=1.48e-07, throughput=2415 tok/s | |
| 2025-12-01 23:54:16,875 - INFO - Epoch 1 Step 9700 (Global: 9700): loss=1.7437, ppl=5.72, grad_norm=1.48, lr=1.44e-07, throughput=2414 tok/s | |
| 2025-12-01 23:57:36,769 - INFO - Epoch 1 Step 9710 (Global: 9710): loss=1.7706, ppl=5.87, grad_norm=2.22, lr=1.40e-07, throughput=2401 tok/s | |
| 2025-12-02 00:00:56,598 - INFO - Epoch 1 Step 9720 (Global: 9720): loss=1.5012, ppl=4.49, grad_norm=1.63, lr=1.36e-07, throughput=2402 tok/s | |
| 2025-12-02 00:04:15,632 - INFO - Epoch 1 Step 9730 (Global: 9730): loss=1.7994, ppl=6.05, grad_norm=1.24, lr=1.32e-07, throughput=2412 tok/s | |
| 2025-12-02 00:07:35,849 - INFO - Epoch 1 Step 9740 (Global: 9740): loss=1.5662, ppl=4.79, grad_norm=1.33, lr=1.28e-07, throughput=2397 tok/s | |
| 2025-12-02 00:10:55,869 - INFO - Epoch 1 Step 9750 (Global: 9750): loss=1.5248, ppl=4.59, grad_norm=1.42, lr=1.24e-07, throughput=2400 tok/s | |
| 2025-12-02 00:14:14,727 - INFO - Epoch 1 Step 9760 (Global: 9760): loss=1.6835, ppl=5.38, grad_norm=1.45, lr=1.21e-07, throughput=2414 tok/s | |
| 2025-12-02 00:17:34,865 - INFO - Epoch 1 Step 9770 (Global: 9770): loss=1.5989, ppl=4.95, grad_norm=1.29, lr=1.17e-07, throughput=2398 tok/s | |
| 2025-12-02 00:20:54,711 - INFO - Epoch 1 Step 9780 (Global: 9780): loss=1.7228, ppl=5.60, grad_norm=1.97, lr=1.13e-07, throughput=2402 tok/s | |
| 2025-12-02 00:24:13,993 - INFO - Epoch 1 Step 9790 (Global: 9790): loss=1.7456, ppl=5.73, grad_norm=1.67, lr=1.10e-07, throughput=2409 tok/s | |
| 2025-12-02 00:27:34,057 - INFO - Epoch 1 Step 9800 (Global: 9800): loss=1.7351, ppl=5.67, grad_norm=1.77, lr=1.06e-07, throughput=2399 tok/s | |
| 2025-12-02 00:30:53,092 - INFO - Epoch 1 Step 9810 (Global: 9810): loss=1.5327, ppl=4.63, grad_norm=1.53, lr=1.03e-07, throughput=2412 tok/s | |
| 2025-12-02 00:34:11,622 - INFO - Epoch 1 Step 9820 (Global: 9820): loss=1.7539, ppl=5.78, grad_norm=1.80, lr=9.97e-08, throughput=2418 tok/s | |
| 2025-12-02 00:37:32,006 - INFO - Epoch 1 Step 9830 (Global: 9830): loss=1.5025, ppl=4.49, grad_norm=2.12, lr=9.64e-08, throughput=2395 tok/s | |
| 2025-12-02 00:40:51,305 - INFO - Epoch 1 Step 9840 (Global: 9840): loss=1.6638, ppl=5.28, grad_norm=1.62, lr=9.32e-08, throughput=2408 tok/s | |
| 2025-12-02 00:44:10,787 - INFO - Epoch 1 Step 9850 (Global: 9850): loss=1.6013, ppl=4.96, grad_norm=1.23, lr=9.00e-08, throughput=2406 tok/s | |
| 2025-12-02 00:47:29,218 - INFO - Epoch 1 Step 9860 (Global: 9860): loss=1.6424, ppl=5.17, grad_norm=1.47, lr=8.68e-08, throughput=2419 tok/s | |
| 2025-12-02 00:50:48,596 - INFO - Epoch 1 Step 9870 (Global: 9870): loss=1.3467, ppl=3.84, grad_norm=1.91, lr=8.37e-08, throughput=2408 tok/s | |
| 2025-12-02 00:54:07,590 - INFO - Epoch 1 Step 9880 (Global: 9880): loss=1.5090, ppl=4.52, grad_norm=1.23, lr=8.07e-08, throughput=2412 tok/s | |
| 2025-12-02 00:57:26,295 - INFO - Epoch 1 Step 9890 (Global: 9890): loss=1.5690, ppl=4.80, grad_norm=1.58, lr=7.77e-08, throughput=2416 tok/s | |
| 2025-12-02 01:00:45,723 - INFO - Epoch 1 Step 9900 (Global: 9900): loss=1.3660, ppl=3.92, grad_norm=1.45, lr=7.48e-08, throughput=2407 tok/s | |
| 2025-12-02 01:04:05,052 - INFO - Epoch 1 Step 9910 (Global: 9910): loss=1.5030, ppl=4.50, grad_norm=1.48, lr=7.20e-08, throughput=2408 tok/s | |
| 2025-12-02 01:07:22,040 - INFO - Epoch 1 Step 9920 (Global: 9920): loss=1.4839, ppl=4.41, grad_norm=1.39, lr=6.92e-08, throughput=2437 tok/s | |
| 2025-12-02 01:10:43,454 - INFO - Epoch 1 Step 9930 (Global: 9930): loss=1.6976, ppl=5.46, grad_norm=1.44, lr=6.64e-08, throughput=2383 tok/s | |
| 2025-12-02 01:14:02,990 - INFO - Epoch 1 Step 9940 (Global: 9940): loss=1.4906, ppl=4.44, grad_norm=1.27, lr=6.37e-08, throughput=2406 tok/s | |
| 2025-12-02 01:17:22,399 - INFO - Epoch 1 Step 9950 (Global: 9950): loss=1.6766, ppl=5.35, grad_norm=1.38, lr=6.11e-08, throughput=2407 tok/s | |
| 2025-12-02 01:20:42,108 - INFO - Epoch 1 Step 9960 (Global: 9960): loss=1.7471, ppl=5.74, grad_norm=1.12, lr=5.85e-08, throughput=2404 tok/s | |
| 2025-12-02 01:24:01,540 - INFO - Epoch 1 Step 9970 (Global: 9970): loss=1.7476, ppl=5.74, grad_norm=1.30, lr=5.60e-08, throughput=2407 tok/s | |
| 2025-12-02 01:27:21,919 - INFO - Epoch 1 Step 9980 (Global: 9980): loss=1.6836, ppl=5.39, grad_norm=1.25, lr=5.35e-08, throughput=2395 tok/s | |
| 2025-12-02 01:30:41,852 - INFO - Epoch 1 Step 9990 (Global: 9990): loss=1.5744, ppl=4.83, grad_norm=1.45, lr=5.11e-08, throughput=2401 tok/s | |
| 2025-12-02 01:34:01,107 - INFO - Epoch 1 Step 10000 (Global: 10000): loss=1.7636, ppl=5.83, grad_norm=2.38, lr=4.87e-08, throughput=2409 tok/s | |
| 2025-12-02 01:34:01,107 - INFO - | |
| Running validation at step 10000... | |
| 2025-12-02 01:45:28,849 - INFO - Validation loss: 1.6221, perplexity: 5.06 | |
| 2025-12-02 01:45:28,849 - INFO - | |
| ====================================================================== | |
| 2025-12-02 01:45:28,850 - INFO - Qualitative Evaluation Samples: | |
| 2025-12-02 01:45:28,850 - INFO - ====================================================================== | |
| 2025-12-02 01:45:28,850 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-12-02 01:45:28,850 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-12-02 01:45:28,851 - INFO - Generated: ' to the band\'s previous work, stating that "the band\'s sound is more mature and more consistent than ever, and the album is a testament to that." In a review for The A.V. Club, Fitsmaurice gave the al...' | |
| 2025-12-02 01:45:28,851 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' | |
| 2025-12-02 01:45:28,851 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 01:45:28,851 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-12-02 01:45:28,851 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-12-02 01:45:28,852 - INFO - Generated: 'aternity and sorority life. The Order of the Arrow was founded in 1920 by a group of fraternity and sorority members who were concerned about the lack of Native American involvement in the organizatio...' | |
| 2025-12-02 01:45:28,852 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' | |
| 2025-12-02 01:45:28,852 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 01:45:28,852 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-12-02 01:45:28,853 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-12-02 01:45:28,853 - INFO - Generated: ' be defeated by Oga and Mikii. Oga is later seen in the manga series, where he is revealed to be the son of a priest who was killed by the Shingetsu Temple. He is later revealed to be the son of a pri...' | |
| 2025-12-02 01:45:28,853 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." | |
| 2025-12-02 01:45:28,853 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 01:45:28,854 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-12-02 01:45:28,854 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-12-02 01:45:28,854 - INFO - Generated: '-01 | 0x0000 0x0001 0x0002 0x0003 0x0004 0x0005 0x0006 0x0007 0x0008 0x0009 0x000A 0x000B 0x000C 0x000D 0x000E 0x000F 0x0010 0x0011 0x0012 0x0013 0x0014 0x0015 0x0016 0x0017 0x0018 0x0019 0x001A 0x001...' | |
| 2025-12-02 01:45:28,854 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' | |
| 2025-12-02 01:45:28,854 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 01:45:28,854 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-12-02 01:45:28,855 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-12-02 01:45:28,855 - INFO - Generated: '1 | Windows | EA Tiburon ...' | |
| 2025-12-02 01:45:28,856 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' | |
| 2025-12-02 01:45:28,856 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 01:45:28,858 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/qualitative_step_10000.jsonl | |
| 2025-12-02 01:46:59,682 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/best_checkpoint.pt | |
| 2025-12-02 01:46:59,692 - INFO - New best validation loss: 1.6221, perplexity: 5.06 | |
| 2025-12-02 01:50:18,049 - INFO - Epoch 1 Step 10010 (Global: 10010): loss=1.7448, ppl=5.72, grad_norm=1.45, lr=4.64e-08, throughput=2420 tok/s | |
| 2025-12-02 01:53:34,268 - INFO - Epoch 1 Step 10020 (Global: 10020): loss=1.6747, ppl=5.34, grad_norm=1.41, lr=4.42e-08, throughput=2446 tok/s | |
| 2025-12-02 01:56:52,856 - INFO - Epoch 1 Step 10030 (Global: 10030): loss=1.6581, ppl=5.25, grad_norm=4.25, lr=4.20e-08, throughput=2417 tok/s | |
| 2025-12-02 02:00:11,496 - INFO - Epoch 1 Step 10040 (Global: 10040): loss=1.8670, ppl=6.47, grad_norm=2.44, lr=3.98e-08, throughput=2416 tok/s | |
| 2025-12-02 02:03:29,871 - INFO - Epoch 1 Step 10050 (Global: 10050): loss=1.7136, ppl=5.55, grad_norm=1.26, lr=3.78e-08, throughput=2420 tok/s | |
| 2025-12-02 02:06:49,475 - INFO - Epoch 1 Step 10060 (Global: 10060): loss=1.4865, ppl=4.42, grad_norm=1.35, lr=3.57e-08, throughput=2405 tok/s | |
| 2025-12-02 02:10:08,411 - INFO - Epoch 1 Step 10070 (Global: 10070): loss=1.6521, ppl=5.22, grad_norm=2.23, lr=3.38e-08, throughput=2413 tok/s | |
| 2025-12-02 02:13:27,247 - INFO - Epoch 1 Step 10080 (Global: 10080): loss=1.5917, ppl=4.91, grad_norm=1.31, lr=3.18e-08, throughput=2414 tok/s | |
| 2025-12-02 02:16:46,228 - INFO - Epoch 1 Step 10090 (Global: 10090): loss=1.6375, ppl=5.14, grad_norm=1.57, lr=3.00e-08, throughput=2412 tok/s | |
| 2025-12-02 02:20:04,218 - INFO - Epoch 1 Step 10100 (Global: 10100): loss=1.5952, ppl=4.93, grad_norm=1.47, lr=2.82e-08, throughput=2424 tok/s | |
| 2025-12-02 02:23:23,854 - INFO - Epoch 1 Step 10110 (Global: 10110): loss=1.6344, ppl=5.13, grad_norm=1.34, lr=2.64e-08, throughput=2404 tok/s | |
| 2025-12-02 02:26:41,240 - INFO - Epoch 1 Step 10120 (Global: 10120): loss=1.5422, ppl=4.67, grad_norm=2.11, lr=2.47e-08, throughput=2432 tok/s | |
| 2025-12-02 02:30:00,603 - INFO - Epoch 1 Step 10130 (Global: 10130): loss=1.9132, ppl=6.77, grad_norm=1.30, lr=2.31e-08, throughput=2408 tok/s | |
| 2025-12-02 02:33:18,334 - INFO - Epoch 1 Step 10140 (Global: 10140): loss=1.5186, ppl=4.57, grad_norm=8.75, lr=2.15e-08, throughput=2428 tok/s | |
| 2025-12-02 02:36:37,271 - INFO - Epoch 1 Step 10150 (Global: 10150): loss=1.7339, ppl=5.66, grad_norm=1.41, lr=2.00e-08, throughput=2413 tok/s | |
| 2025-12-02 02:39:55,647 - INFO - Epoch 1 Step 10160 (Global: 10160): loss=1.7815, ppl=5.94, grad_norm=1.42, lr=1.85e-08, throughput=2420 tok/s | |
| 2025-12-02 02:43:14,222 - INFO - Epoch 1 Step 10170 (Global: 10170): loss=1.7469, ppl=5.74, grad_norm=1.34, lr=1.71e-08, throughput=2417 tok/s | |
| 2025-12-02 02:46:33,374 - INFO - Epoch 1 Step 10180 (Global: 10180): loss=1.5860, ppl=4.88, grad_norm=2.22, lr=1.58e-08, throughput=2410 tok/s | |
| 2025-12-02 02:49:52,054 - INFO - Epoch 1 Step 10190 (Global: 10190): loss=1.7610, ppl=5.82, grad_norm=1.27, lr=1.45e-08, throughput=2416 tok/s | |
| 2025-12-02 02:53:11,065 - INFO - Epoch 1 Step 10200 (Global: 10200): loss=1.7640, ppl=5.84, grad_norm=1.21, lr=1.32e-08, throughput=2412 tok/s | |
| 2025-12-02 02:56:29,148 - INFO - Epoch 1 Step 10210 (Global: 10210): loss=1.4805, ppl=4.40, grad_norm=1.55, lr=1.20e-08, throughput=2423 tok/s | |
| 2025-12-02 02:59:47,130 - INFO - Epoch 1 Step 10220 (Global: 10220): loss=1.5491, ppl=4.71, grad_norm=1.98, lr=1.09e-08, throughput=2424 tok/s | |
| 2025-12-02 03:03:06,538 - INFO - Epoch 1 Step 10230 (Global: 10230): loss=1.6250, ppl=5.08, grad_norm=2.36, lr=9.81e-09, throughput=2407 tok/s | |
| 2025-12-02 03:06:24,475 - INFO - Epoch 1 Step 10240 (Global: 10240): loss=1.6811, ppl=5.37, grad_norm=1.14, lr=8.79e-09, throughput=2425 tok/s | |
| 2025-12-02 03:09:43,044 - INFO - Epoch 1 Step 10250 (Global: 10250): loss=1.6413, ppl=5.16, grad_norm=1.64, lr=7.83e-09, throughput=2417 tok/s | |
| 2025-12-02 03:13:01,492 - INFO - Epoch 1 Step 10260 (Global: 10260): loss=1.8152, ppl=6.14, grad_norm=1.34, lr=6.92e-09, throughput=2419 tok/s | |
| 2025-12-02 03:16:19,451 - INFO - Epoch 1 Step 10270 (Global: 10270): loss=1.7363, ppl=5.68, grad_norm=1.46, lr=6.06e-09, throughput=2425 tok/s | |
| 2025-12-02 03:19:38,704 - INFO - Epoch 1 Step 10280 (Global: 10280): loss=1.7360, ppl=5.67, grad_norm=1.21, lr=5.27e-09, throughput=2409 tok/s | |
| 2025-12-02 03:22:56,677 - INFO - Epoch 1 Step 10290 (Global: 10290): loss=1.7065, ppl=5.51, grad_norm=1.31, lr=4.53e-09, throughput=2425 tok/s | |
| 2025-12-02 03:26:14,594 - INFO - Epoch 1 Step 10300 (Global: 10300): loss=1.7015, ppl=5.48, grad_norm=1.43, lr=3.84e-09, throughput=2425 tok/s | |
| 2025-12-02 03:29:33,664 - INFO - Epoch 1 Step 10310 (Global: 10310): loss=1.7053, ppl=5.50, grad_norm=2.22, lr=3.21e-09, throughput=2411 tok/s | |
| 2025-12-02 03:32:52,519 - INFO - Epoch 1 Step 10320 (Global: 10320): loss=1.6265, ppl=5.09, grad_norm=1.35, lr=2.64e-09, throughput=2414 tok/s | |
| 2025-12-02 03:36:11,298 - INFO - Epoch 1 Step 10330 (Global: 10330): loss=1.7688, ppl=5.86, grad_norm=1.65, lr=2.12e-09, throughput=2415 tok/s | |
| 2025-12-02 03:39:29,368 - INFO - Epoch 1 Step 10340 (Global: 10340): loss=1.6135, ppl=5.02, grad_norm=1.12, lr=1.66e-09, throughput=2423 tok/s | |
| 2025-12-02 03:42:47,216 - INFO - Epoch 1 Step 10350 (Global: 10350): loss=1.6621, ppl=5.27, grad_norm=1.26, lr=1.26e-09, throughput=2426 tok/s | |
| 2025-12-02 03:46:06,244 - INFO - Epoch 1 Step 10360 (Global: 10360): loss=1.6736, ppl=5.33, grad_norm=1.34, lr=9.12e-10, throughput=2412 tok/s | |
| 2025-12-02 03:49:24,567 - INFO - Epoch 1 Step 10370 (Global: 10370): loss=1.5200, ppl=4.57, grad_norm=1.51, lr=6.20e-10, throughput=2420 tok/s | |
| 2025-12-02 03:52:43,106 - INFO - Epoch 1 Step 10380 (Global: 10380): loss=1.5091, ppl=4.52, grad_norm=1.69, lr=3.84e-10, throughput=2418 tok/s | |
| 2025-12-02 03:55:59,682 - INFO - Epoch 1 Step 10390 (Global: 10390): loss=1.3142, ppl=3.72, grad_norm=1.20, lr=2.05e-10, throughput=2442 tok/s | |
| 2025-12-02 03:59:17,809 - INFO - Epoch 1 Step 10400 (Global: 10400): loss=1.5836, ppl=4.87, grad_norm=2.14, lr=8.11e-11, throughput=2423 tok/s | |
| 2025-12-02 04:02:35,344 - INFO - Epoch 1 Step 10410 (Global: 10410): loss=1.6481, ppl=5.20, grad_norm=1.15, lr=1.38e-11, throughput=2430 tok/s | |
| 2025-12-02 04:04:45,901 - INFO - Flushing 3 remainder batches from gradient accumulation | |
| 2025-12-02 04:04:45,903 - INFO - Rescaling gradients by 1.33x (compensating for 3/4 batches) | |
| 2025-12-02 04:04:46,366 - INFO - Remainder batch: loss=1.7005, ppl=5.48, grad_norm=2.16 | |
| 2025-12-02 04:04:46,392 - INFO - Epoch 1 training: loss=1.6889, ppl=5.41, grad_norm=1.70, throughput=2370 tok/s (210981.2s total) | |
| 2025-12-02 04:04:46,400 - INFO - | |
| Running final validation... | |
| 2025-12-02 04:16:10,516 - INFO - Validation loss: 1.6221, perplexity: 5.06 | |
| 2025-12-02 04:16:10,516 - INFO - | |
| ====================================================================== | |
| 2025-12-02 04:16:10,517 - INFO - Qualitative Evaluation Samples: | |
| 2025-12-02 04:16:10,517 - INFO - ====================================================================== | |
| 2025-12-02 04:16:10,517 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-12-02 04:16:10,517 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-12-02 04:16:10,517 - INFO - Generated: ' to the band\'s previous work, stating that "the band\'s sound is more mature and more consistent than ever, and the album is a testament to that." In a review for The A.V. Club, Fitsmaurice stated that...' | |
| 2025-12-02 04:16:10,518 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' | |
| 2025-12-02 04:16:10,518 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 04:16:10,518 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-12-02 04:16:10,518 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-12-02 04:16:10,518 - INFO - Generated: 'aternity and sorority life. The Order of the Arrow was founded in 1920 by a group of fraternity and sorority members who were concerned about the lack of Native American involvement in the organizatio...' | |
| 2025-12-02 04:16:10,518 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' | |
| 2025-12-02 04:16:10,518 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 04:16:10,519 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-12-02 04:16:10,519 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-12-02 04:16:10,519 - INFO - Generated: " be defeated by Oga. He is later seen in the Shingetsu Temple, where he is defeated by Teimou's shadow. He is later seen in the Shingetsu Temple, where he is defeated by Teimou's shadow. He is later s..." | |
| 2025-12-02 04:16:10,519 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." | |
| 2025-12-02 04:16:10,519 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 04:16:10,520 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-12-02 04:16:10,520 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-12-02 04:16:10,520 - INFO - Generated: '-01 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x0000 0x0001 | 0x...' | |
| 2025-12-02 04:16:10,520 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' | |
| 2025-12-02 04:16:10,520 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 04:16:10,520 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-12-02 04:16:10,521 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-12-02 04:16:10,521 - INFO - Generated: '1 | Xbox 360 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 | August 30, 2011 | P...' | |
| 2025-12-02 04:16:10,521 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' | |
| 2025-12-02 04:16:10,521 - INFO - ---------------------------------------------------------------------- | |
| 2025-12-02 04:16:10,522 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/qualitative_step_10417.jsonl | |
| 2025-12-02 04:17:47,213 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554/best_checkpoint.pt | |
| 2025-12-02 04:17:47,224 - INFO - New best validation loss: 1.6221, perplexity: 5.06 | |
| 2025-12-02 04:17:47,225 - INFO - | |
| Training complete! | |
| 2025-12-02 04:17:47,226 - INFO - Final checkpoint is best, created symlink to save space (~2GB saved) | |
| 2025-12-02 04:17:47,226 - INFO - Best validation loss: 1.6221, perplexity: 5.06 | |
| 2025-12-02 04:17:47,226 - INFO - Checkpoints saved to outputs/production_vision_base_reconstruction_20251120_220510_lm_20251129_171554 | |
| 2025-12-02 04:17:47,989 - INFO - W&B run finished | |