diff --git "a/meanpool_w4s4_h0_recon/train.log" "b/meanpool_w4s4_h0_recon/train.log" new file mode 100644--- /dev/null +++ "b/meanpool_w4s4_h0_recon/train.log" @@ -0,0 +1,2095 @@ +2025-11-15 01:13:59,141 - INFO - Starting training with args: Namespace(regime='meanpool', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_meanpool_w4_s4_reconstruction_20251115_011352', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='small', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt=None, train_encoder=False, encoder_lr=1e-05, compression_window_size=4, compression_stride=4, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251115_011352', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=500, initial_validation=True, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint=None, init_from_checkpoint=None, aux_loss_weight=0.5, num_workers=8, prefetch_factor=64, seed=None, eval_seed=42, device='cuda', compile=True) +2025-11-15 01:13:59,141 - INFO - Auto-generated W&B run name: production_meanpool_w4_s4_reconstruction_20251115_011352 +2025-11-15 01:14:00,358 - INFO - Initialized W&B run: vision-compression-2/production_meanpool_w4_s4_reconstruction_20251115_011352 (ID: gcdpauol) +2025-11-15 01:14:00,358 - INFO - Loading model and tokenizer... +2025-11-15 01:14:09,744 - INFO - Compiling model with torch.compile... +2025-11-15 01:14:09,745 - INFO - Note: First forward pass will compile (may take several minutes) +2025-11-15 01:14:10,674 - INFO - Created Mean Pool Compression trainer +2025-11-15 01:14:10,674 - INFO - Compression: 1000 → 251 tokens +2025-11-15 01:14:10,674 - INFO - Training objective: reconstruction +2025-11-15 01:14:10,705 - INFO - Logged parameter counts to W&B: total=3,336,107,520, trainable=2,934,737,920, encoder=0, decoder=2,934,737,920 +2025-11-15 01:14:10,706 - INFO - Loading training data from data/training/splits_510k/train.jsonl +2025-11-15 01:16:52,917 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl +2025-11-15 01:16:52,918 - INFO - Meanpool regime: using full 1000-token context +2025-11-15 01:16:52,919 - INFO - Loading validation data from data/training/splits_510k/val.jsonl +2025-11-15 01:16:56,278 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl +2025-11-15 01:16:56,279 - INFO - Validation meanpool regime: using full 1000-token context +2025-11-15 01:16:56,320 - INFO - Created AdamW optimizer with lr=0.0001, fused=True +2025-11-15 01:16:56,322 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-15 01:16:56,322 - INFO - Starting training loop... +2025-11-15 01:16:56,322 - INFO - +====================================================================== +2025-11-15 01:16:56,323 - INFO - Running initial validation (before any training)... +2025-11-15 01:16:56,323 - INFO - ====================================================================== +2025-11-15 01:24:54,483 - DEBUG - Building prefix dict from the default dictionary ... +2025-11-15 01:24:54,483 - DEBUG - Loading model from cache /tmp/jieba.cache +2025-11-15 01:24:55,148 - DEBUG - Loading model cost 0.665 seconds. +2025-11-15 01:24:55,149 - DEBUG - Prefix dict has been built successfully. +2025-11-15 01:24:58,118 - INFO - Validation loss: 2.3058, perplexity: 10.03 +2025-11-15 01:24:58,119 - INFO - Qualitative metrics (n=5): +2025-11-15 01:24:58,119 - INFO - BLEU: 0.0000 +2025-11-15 01:24:58,119 - INFO - METEOR: 0.0000 +2025-11-15 01:24:58,119 - INFO - Edit Distance: 0.9677 +2025-11-15 01:24:58,119 - INFO - F-measure: 0.0000 +2025-11-15 01:24:58,120 - INFO - +====================================================================== +2025-11-15 01:24:58,120 - INFO - Qualitative Evaluation Samples: +2025-11-15 01:24:58,120 - INFO - ====================================================================== +2025-11-15 01:24:58,120 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-15 01:24:58,121 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 01:24:58,121 - INFO - Generated: '和txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt:txt...' +2025-11-15 01:24:58,121 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-15 01:24:58,121 - INFO - ---------------------------------------------------------------------- +2025-11-15 01:24:58,121 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-15 01:24:58,121 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 01:24:58,121 - INFO - Generated: ' 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2' +2025-11-15 01:24:58,122 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 01:24:58,122 - INFO - ---------------------------------------------------------------------- +2025-11-15 01:24:58,122 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-15 01:24:58,122 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 01:24:58,122 - INFO - Generated: '((()))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))' +2025-11-15 01:24:58,122 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 01:24:58,122 - INFO - ---------------------------------------------------------------------- +2025-11-15 01:24:58,122 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-15 01:24:58,122 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 01:24:58,122 - INFO - Generated: ' 1.' +2025-11-15 01:24:58,122 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 01:24:58,123 - INFO - ---------------------------------------------------------------------- +2025-11-15 01:24:58,123 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-15 01:24:58,123 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 01:24:58,123 - INFO - Generated: '的,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,可,' +2025-11-15 01:24:58,123 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 01:24:58,123 - INFO - ---------------------------------------------------------------------- +2025-11-15 01:24:58,124 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_0.jsonl +2025-11-15 01:24:59,177 - INFO - Initial validation - Loss: 2.3058, Perplexity: 10.03 +2025-11-15 01:24:59,177 - INFO - ====================================================================== + +2025-11-15 01:24:59,178 - INFO - +====================================================================== +2025-11-15 01:24:59,178 - INFO - Epoch 1/1 +2025-11-15 01:24:59,178 - INFO - ====================================================================== +2025-11-15 01:25:25,612 - INFO - Effective context tokens (per-sample): 252 | Compression ratio: 3.97x +2025-11-15 01:25:25,612 - INFO - Target tokens per sample: 1000 +2025-11-15 01:27:52,476 - INFO - Epoch 1 Step 10 (Global: 10): loss=2.0210, ppl=7.55, grad_norm=1.70, lr=1.09e-05, throughput=2770 tok/s +2025-11-15 01:30:26,486 - INFO - Epoch 1 Step 20 (Global: 20): loss=1.9081, ppl=6.74, grad_norm=1.26, lr=1.17e-05, throughput=3117 tok/s +2025-11-15 01:33:00,829 - INFO - Epoch 1 Step 30 (Global: 30): loss=1.9272, ppl=6.87, grad_norm=1.23, lr=1.26e-05, throughput=3110 tok/s +2025-11-15 01:35:26,647 - INFO - Epoch 1 Step 40 (Global: 40): loss=2.0000, ppl=7.39, grad_norm=1.23, lr=1.35e-05, throughput=3292 tok/s +2025-11-15 01:38:01,266 - INFO - Epoch 1 Step 50 (Global: 50): loss=1.9403, ppl=6.96, grad_norm=1.17, lr=1.43e-05, throughput=3104 tok/s +2025-11-15 01:40:36,440 - INFO - Epoch 1 Step 60 (Global: 60): loss=1.8706, ppl=6.49, grad_norm=1.20, lr=1.52e-05, throughput=3093 tok/s +2025-11-15 01:43:10,909 - INFO - Epoch 1 Step 70 (Global: 70): loss=2.1164, ppl=8.30, grad_norm=1.21, lr=1.61e-05, throughput=3107 tok/s +2025-11-15 01:45:36,493 - INFO - Epoch 1 Step 80 (Global: 80): loss=1.8546, ppl=6.39, grad_norm=1.20, lr=1.69e-05, throughput=3297 tok/s +2025-11-15 01:48:11,128 - INFO - Epoch 1 Step 90 (Global: 90): loss=1.7642, ppl=5.84, grad_norm=1.17, lr=1.78e-05, throughput=3104 tok/s +2025-11-15 01:50:46,834 - INFO - Epoch 1 Step 100 (Global: 100): loss=1.7561, ppl=5.79, grad_norm=1.16, lr=1.86e-05, throughput=3083 tok/s +2025-11-15 01:53:12,505 - INFO - Epoch 1 Step 110 (Global: 110): loss=2.0663, ppl=7.90, grad_norm=1.16, lr=1.95e-05, throughput=3295 tok/s +2025-11-15 01:55:46,269 - INFO - Epoch 1 Step 120 (Global: 120): loss=1.9796, ppl=7.24, grad_norm=1.29, lr=2.04e-05, throughput=3122 tok/s +2025-11-15 01:58:20,872 - INFO - Epoch 1 Step 130 (Global: 130): loss=1.7681, ppl=5.86, grad_norm=1.17, lr=2.12e-05, throughput=3108 tok/s +2025-11-15 02:00:54,894 - INFO - Epoch 1 Step 140 (Global: 140): loss=1.8134, ppl=6.13, grad_norm=1.38, lr=2.21e-05, throughput=3117 tok/s +2025-11-15 02:03:21,802 - INFO - Epoch 1 Step 150 (Global: 150): loss=1.7852, ppl=5.96, grad_norm=1.31, lr=2.30e-05, throughput=3267 tok/s +2025-11-15 02:05:56,027 - INFO - Epoch 1 Step 160 (Global: 160): loss=1.8179, ppl=6.16, grad_norm=1.38, lr=2.38e-05, throughput=3112 tok/s +2025-11-15 02:08:30,948 - INFO - Epoch 1 Step 170 (Global: 170): loss=1.4751, ppl=4.37, grad_norm=11.00, lr=2.47e-05, throughput=3098 tok/s +2025-11-15 02:10:57,832 - INFO - Epoch 1 Step 180 (Global: 180): loss=0.7235, ppl=2.06, grad_norm=5.75, lr=2.56e-05, throughput=3268 tok/s +2025-11-15 02:13:33,702 - INFO - Epoch 1 Step 190 (Global: 190): loss=0.5043, ppl=1.66, grad_norm=3.38, lr=2.64e-05, throughput=3080 tok/s +2025-11-15 02:16:09,299 - INFO - Epoch 1 Step 200 (Global: 200): loss=0.4170, ppl=1.52, grad_norm=1.91, lr=2.73e-05, throughput=3085 tok/s +2025-11-15 02:18:45,719 - INFO - Epoch 1 Step 210 (Global: 210): loss=0.3129, ppl=1.37, grad_norm=2.39, lr=2.82e-05, throughput=3069 tok/s +2025-11-15 02:21:12,350 - INFO - Epoch 1 Step 220 (Global: 220): loss=0.2991, ppl=1.35, grad_norm=1.69, lr=2.90e-05, throughput=3274 tok/s +2025-11-15 02:23:47,278 - INFO - Epoch 1 Step 230 (Global: 230): loss=0.2964, ppl=1.34, grad_norm=2.25, lr=2.99e-05, throughput=3098 tok/s +2025-11-15 02:26:21,887 - INFO - Epoch 1 Step 240 (Global: 240): loss=0.2632, ppl=1.30, grad_norm=1.94, lr=3.07e-05, throughput=3105 tok/s +2025-11-15 02:28:56,224 - INFO - Epoch 1 Step 250 (Global: 250): loss=0.2310, ppl=1.26, grad_norm=2.08, lr=3.16e-05, throughput=3110 tok/s +2025-11-15 02:31:21,875 - INFO - Epoch 1 Step 260 (Global: 260): loss=0.2316, ppl=1.26, grad_norm=1.43, lr=3.25e-05, throughput=3296 tok/s +2025-11-15 02:33:57,398 - INFO - Epoch 1 Step 270 (Global: 270): loss=0.2173, ppl=1.24, grad_norm=1.46, lr=3.33e-05, throughput=3086 tok/s +2025-11-15 02:36:32,672 - INFO - Epoch 1 Step 280 (Global: 280): loss=0.2097, ppl=1.23, grad_norm=1.80, lr=3.42e-05, throughput=3091 tok/s +2025-11-15 02:39:01,388 - INFO - Epoch 1 Step 290 (Global: 290): loss=0.1954, ppl=1.22, grad_norm=2.16, lr=3.51e-05, throughput=3228 tok/s +2025-11-15 02:41:37,270 - INFO - Epoch 1 Step 300 (Global: 300): loss=0.1992, ppl=1.22, grad_norm=2.39, lr=3.59e-05, throughput=3079 tok/s +2025-11-15 02:44:11,678 - INFO - Epoch 1 Step 310 (Global: 310): loss=0.1623, ppl=1.18, grad_norm=1.80, lr=3.68e-05, throughput=3109 tok/s +2025-11-15 02:46:45,850 - INFO - Epoch 1 Step 320 (Global: 320): loss=0.1755, ppl=1.19, grad_norm=1.58, lr=3.77e-05, throughput=3113 tok/s +2025-11-15 02:49:11,077 - INFO - Epoch 1 Step 330 (Global: 330): loss=0.1762, ppl=1.19, grad_norm=1.88, lr=3.85e-05, throughput=3305 tok/s +2025-11-15 02:51:45,127 - INFO - Epoch 1 Step 340 (Global: 340): loss=0.1788, ppl=1.20, grad_norm=1.73, lr=3.94e-05, throughput=3116 tok/s +2025-11-15 02:54:18,855 - INFO - Epoch 1 Step 350 (Global: 350): loss=0.1561, ppl=1.17, grad_norm=1.23, lr=4.03e-05, throughput=3122 tok/s +2025-11-15 02:56:44,106 - INFO - Epoch 1 Step 360 (Global: 360): loss=0.1557, ppl=1.17, grad_norm=1.16, lr=4.11e-05, throughput=3305 tok/s +2025-11-15 02:59:18,730 - INFO - Epoch 1 Step 370 (Global: 370): loss=0.1505, ppl=1.16, grad_norm=1.48, lr=4.20e-05, throughput=3104 tok/s +2025-11-15 03:01:52,834 - INFO - Epoch 1 Step 380 (Global: 380): loss=0.1427, ppl=1.15, grad_norm=1.22, lr=4.29e-05, throughput=3115 tok/s +2025-11-15 03:04:27,090 - INFO - Epoch 1 Step 390 (Global: 390): loss=0.1521, ppl=1.16, grad_norm=1.36, lr=4.37e-05, throughput=3112 tok/s +2025-11-15 03:06:53,527 - INFO - Epoch 1 Step 400 (Global: 400): loss=0.1367, ppl=1.15, grad_norm=1.28, lr=4.46e-05, throughput=3278 tok/s +2025-11-15 03:09:27,635 - INFO - Epoch 1 Step 410 (Global: 410): loss=0.1311, ppl=1.14, grad_norm=1.27, lr=4.54e-05, throughput=3115 tok/s +2025-11-15 03:12:02,245 - INFO - Epoch 1 Step 420 (Global: 420): loss=0.1423, ppl=1.15, grad_norm=1.00, lr=4.63e-05, throughput=3105 tok/s +2025-11-15 03:14:28,430 - INFO - Epoch 1 Step 430 (Global: 430): loss=0.1262, ppl=1.13, grad_norm=1.12, lr=4.72e-05, throughput=3284 tok/s +2025-11-15 03:17:04,568 - INFO - Epoch 1 Step 440 (Global: 440): loss=0.1391, ppl=1.15, grad_norm=1.15, lr=4.80e-05, throughput=3074 tok/s +2025-11-15 03:19:39,953 - INFO - Epoch 1 Step 450 (Global: 450): loss=0.1211, ppl=1.13, grad_norm=1.65, lr=4.89e-05, throughput=3089 tok/s +2025-11-15 03:22:13,913 - INFO - Epoch 1 Step 460 (Global: 460): loss=0.1300, ppl=1.14, grad_norm=1.34, lr=4.98e-05, throughput=3118 tok/s +2025-11-15 03:24:38,938 - INFO - Epoch 1 Step 470 (Global: 470): loss=0.1275, ppl=1.14, grad_norm=0.89, lr=5.06e-05, throughput=3310 tok/s +2025-11-15 03:27:12,663 - INFO - Epoch 1 Step 480 (Global: 480): loss=0.1209, ppl=1.13, grad_norm=0.96, lr=5.15e-05, throughput=3123 tok/s +2025-11-15 03:29:45,674 - INFO - Epoch 1 Step 490 (Global: 490): loss=0.1210, ppl=1.13, grad_norm=1.28, lr=5.24e-05, throughput=3137 tok/s +2025-11-15 03:32:19,851 - INFO - Epoch 1 Step 500 (Global: 500): loss=0.1127, ppl=1.12, grad_norm=0.89, lr=5.32e-05, throughput=3113 tok/s +2025-11-15 03:32:19,854 - INFO - +Running validation at step 500... +2025-11-15 03:39:55,806 - INFO - Validation loss: 0.1209, perplexity: 1.13 +2025-11-15 03:39:55,807 - INFO - Qualitative metrics (n=5): +2025-11-15 03:39:55,807 - INFO - BLEU: 0.6448 +2025-11-15 03:39:55,807 - INFO - METEOR: 0.8752 +2025-11-15 03:39:55,807 - INFO - Edit Distance: 0.1727 +2025-11-15 03:39:55,807 - INFO - F-measure: 0.8515 +2025-11-15 03:39:55,807 - INFO - +====================================================================== +2025-11-15 03:39:55,807 - INFO - Qualitative Evaluation Samples: +2025-11-15 03:39:55,808 - INFO - ====================================================================== +2025-11-15 03:39:55,808 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-15 03:39:55,808 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 03:39:55,808 - INFO - Generated: ' gave it four Q\'s out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking you\'s-were it. But things\'s not...' +2025-11-15 03:39:55,808 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-15 03:39:55,808 - INFO - ---------------------------------------------------------------------- +2025-11-15 03:39:55,808 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-15 03:39:55,808 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 03:39:55,808 - INFO - Generated: 'ire, was Squeachou Aba, Lebanese a student-led American Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROTC; and the...' +2025-11-15 03:39:55,808 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 03:39:55,808 - INFO - ---------------------------------------------------------------------- +2025-11-15 03:39:55,808 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-15 03:39:55,809 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 03:39:55,809 - INFO - Generated: ' meeting at the Layheaded. Hismia weapon of choice is a giant, ax he has and the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and bo...' +2025-11-15 03:39:55,809 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 03:39:55,809 - INFO - ---------------------------------------------------------------------- +2025-11-15 03:39:55,809 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-15 03:39:55,809 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 03:39:55,809 - INFO - Generated: ' Oriya (unicode block)\nA Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01....' +2025-11-15 03:39:55,809 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 03:39:55,809 - INFO - ---------------------------------------------------------------------- +2025-11-15 03:39:55,809 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-15 03:39:55,809 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 03:39:55,810 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Max Windows | Redwood Shishores ...' +2025-11-15 03:39:55,810 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 03:39:55,810 - INFO - ---------------------------------------------------------------------- +2025-11-15 03:39:55,811 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_500.jsonl +2025-11-15 03:40:31,870 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-15 03:40:31,875 - INFO - New best validation loss: 0.1209, perplexity: 1.13 +2025-11-15 03:42:57,533 - INFO - Epoch 1 Step 510 (Global: 510): loss=0.1033, ppl=1.11, grad_norm=1.04, lr=5.41e-05, throughput=3296 tok/s +2025-11-15 03:45:31,943 - INFO - Epoch 1 Step 520 (Global: 520): loss=0.1213, ppl=1.13, grad_norm=0.94, lr=5.50e-05, throughput=3109 tok/s +2025-11-15 03:48:05,651 - INFO - Epoch 1 Step 530 (Global: 530): loss=0.1185, ppl=1.13, grad_norm=0.98, lr=5.58e-05, throughput=3123 tok/s +2025-11-15 03:50:40,174 - INFO - Epoch 1 Step 540 (Global: 540): loss=0.1174, ppl=1.12, grad_norm=0.88, lr=5.67e-05, throughput=3106 tok/s +2025-11-15 03:53:05,716 - INFO - Epoch 1 Step 550 (Global: 550): loss=0.1135, ppl=1.12, grad_norm=0.81, lr=5.76e-05, throughput=3298 tok/s +2025-11-15 03:55:39,540 - INFO - Epoch 1 Step 560 (Global: 560): loss=0.1094, ppl=1.12, grad_norm=0.82, lr=5.84e-05, throughput=3121 tok/s +2025-11-15 03:58:13,842 - INFO - Epoch 1 Step 570 (Global: 570): loss=0.1132, ppl=1.12, grad_norm=0.81, lr=5.93e-05, throughput=3111 tok/s +2025-11-15 04:00:38,850 - INFO - Epoch 1 Step 580 (Global: 580): loss=0.1092, ppl=1.12, grad_norm=1.17, lr=6.01e-05, throughput=3310 tok/s +2025-11-15 04:03:12,531 - INFO - Epoch 1 Step 590 (Global: 590): loss=0.1022, ppl=1.11, grad_norm=0.80, lr=6.10e-05, throughput=3123 tok/s +2025-11-15 04:05:47,105 - INFO - Epoch 1 Step 600 (Global: 600): loss=0.1006, ppl=1.11, grad_norm=0.97, lr=6.19e-05, throughput=3105 tok/s +2025-11-15 04:08:13,738 - INFO - Epoch 1 Step 610 (Global: 610): loss=0.0952, ppl=1.10, grad_norm=0.81, lr=6.27e-05, throughput=3274 tok/s +2025-11-15 04:10:48,475 - INFO - Epoch 1 Step 620 (Global: 620): loss=0.1015, ppl=1.11, grad_norm=0.85, lr=6.36e-05, throughput=3102 tok/s +2025-11-15 04:13:22,663 - INFO - Epoch 1 Step 630 (Global: 630): loss=0.1073, ppl=1.11, grad_norm=0.86, lr=6.45e-05, throughput=3113 tok/s +2025-11-15 04:15:58,633 - INFO - Epoch 1 Step 640 (Global: 640): loss=0.0976, ppl=1.10, grad_norm=0.70, lr=6.53e-05, throughput=3078 tok/s +2025-11-15 04:18:24,364 - INFO - Epoch 1 Step 650 (Global: 650): loss=0.0909, ppl=1.10, grad_norm=0.68, lr=6.62e-05, throughput=3294 tok/s +2025-11-15 04:20:59,124 - INFO - Epoch 1 Step 660 (Global: 660): loss=0.1129, ppl=1.12, grad_norm=0.86, lr=6.71e-05, throughput=3102 tok/s +2025-11-15 04:23:33,303 - INFO - Epoch 1 Step 670 (Global: 670): loss=0.1076, ppl=1.11, grad_norm=0.78, lr=6.79e-05, throughput=3113 tok/s +2025-11-15 04:26:08,665 - INFO - Epoch 1 Step 680 (Global: 680): loss=0.1030, ppl=1.11, grad_norm=0.73, lr=6.88e-05, throughput=3090 tok/s +2025-11-15 04:28:34,092 - INFO - Epoch 1 Step 690 (Global: 690): loss=0.1001, ppl=1.11, grad_norm=0.74, lr=6.97e-05, throughput=3301 tok/s +2025-11-15 04:31:08,153 - INFO - Epoch 1 Step 700 (Global: 700): loss=0.0963, ppl=1.10, grad_norm=0.79, lr=7.05e-05, throughput=3116 tok/s +2025-11-15 04:33:42,564 - INFO - Epoch 1 Step 710 (Global: 710): loss=0.0899, ppl=1.09, grad_norm=0.74, lr=7.14e-05, throughput=3109 tok/s +2025-11-15 04:36:08,416 - INFO - Epoch 1 Step 720 (Global: 720): loss=0.1006, ppl=1.11, grad_norm=0.77, lr=7.22e-05, throughput=3291 tok/s +2025-11-15 04:38:42,300 - INFO - Epoch 1 Step 730 (Global: 730): loss=0.0940, ppl=1.10, grad_norm=0.70, lr=7.31e-05, throughput=3119 tok/s +2025-11-15 04:41:17,026 - INFO - Epoch 1 Step 740 (Global: 740): loss=0.1051, ppl=1.11, grad_norm=0.78, lr=7.40e-05, throughput=3102 tok/s +2025-11-15 04:43:50,733 - INFO - Epoch 1 Step 750 (Global: 750): loss=0.0854, ppl=1.09, grad_norm=0.62, lr=7.48e-05, throughput=3123 tok/s +2025-11-15 04:46:15,928 - INFO - Epoch 1 Step 760 (Global: 760): loss=0.1041, ppl=1.11, grad_norm=0.77, lr=7.57e-05, throughput=3306 tok/s +2025-11-15 04:48:50,486 - INFO - Epoch 1 Step 770 (Global: 770): loss=0.1026, ppl=1.11, grad_norm=0.73, lr=7.66e-05, throughput=3106 tok/s +2025-11-15 04:51:25,055 - INFO - Epoch 1 Step 780 (Global: 780): loss=0.0850, ppl=1.09, grad_norm=0.61, lr=7.74e-05, throughput=3105 tok/s +2025-11-15 04:53:50,531 - INFO - Epoch 1 Step 790 (Global: 790): loss=0.0873, ppl=1.09, grad_norm=0.60, lr=7.83e-05, throughput=3300 tok/s +2025-11-15 04:56:24,945 - INFO - Epoch 1 Step 800 (Global: 800): loss=0.0927, ppl=1.10, grad_norm=0.62, lr=7.92e-05, throughput=3109 tok/s +2025-11-15 04:58:58,771 - INFO - Epoch 1 Step 810 (Global: 810): loss=0.0893, ppl=1.09, grad_norm=0.55, lr=8.00e-05, throughput=3120 tok/s +2025-11-15 05:01:32,294 - INFO - Epoch 1 Step 820 (Global: 820): loss=0.0897, ppl=1.09, grad_norm=0.66, lr=8.09e-05, throughput=3127 tok/s +2025-11-15 05:03:56,791 - INFO - Epoch 1 Step 830 (Global: 830): loss=0.0862, ppl=1.09, grad_norm=0.67, lr=8.18e-05, throughput=3322 tok/s +2025-11-15 05:06:30,718 - INFO - Epoch 1 Step 840 (Global: 840): loss=0.0945, ppl=1.10, grad_norm=0.63, lr=8.26e-05, throughput=3118 tok/s +2025-11-15 05:09:05,941 - INFO - Epoch 1 Step 850 (Global: 850): loss=0.0915, ppl=1.10, grad_norm=0.65, lr=8.35e-05, throughput=3092 tok/s +2025-11-15 05:11:32,711 - INFO - Epoch 1 Step 860 (Global: 860): loss=0.0767, ppl=1.08, grad_norm=0.55, lr=8.44e-05, throughput=3270 tok/s +2025-11-15 05:14:09,716 - INFO - Epoch 1 Step 870 (Global: 870): loss=0.0804, ppl=1.08, grad_norm=0.58, lr=8.52e-05, throughput=3057 tok/s +2025-11-15 05:16:48,518 - INFO - Epoch 1 Step 880 (Global: 880): loss=0.0847, ppl=1.09, grad_norm=0.71, lr=8.61e-05, throughput=3023 tok/s +2025-11-15 05:19:27,094 - INFO - Epoch 1 Step 890 (Global: 890): loss=0.0856, ppl=1.09, grad_norm=0.59, lr=8.69e-05, throughput=3027 tok/s +2025-11-15 05:21:52,991 - INFO - Epoch 1 Step 900 (Global: 900): loss=0.0871, ppl=1.09, grad_norm=0.65, lr=8.78e-05, throughput=3290 tok/s +2025-11-15 05:24:27,717 - INFO - Epoch 1 Step 910 (Global: 910): loss=0.0945, ppl=1.10, grad_norm=0.83, lr=8.87e-05, throughput=3102 tok/s +2025-11-15 05:27:03,123 - INFO - Epoch 1 Step 920 (Global: 920): loss=0.0929, ppl=1.10, grad_norm=0.64, lr=8.95e-05, throughput=3089 tok/s +2025-11-15 05:29:29,669 - INFO - Epoch 1 Step 930 (Global: 930): loss=0.0843, ppl=1.09, grad_norm=0.58, lr=9.04e-05, throughput=3276 tok/s +2025-11-15 05:32:04,356 - INFO - Epoch 1 Step 940 (Global: 940): loss=0.0882, ppl=1.09, grad_norm=0.75, lr=9.13e-05, throughput=3103 tok/s +2025-11-15 05:34:40,626 - INFO - Epoch 1 Step 950 (Global: 950): loss=0.0902, ppl=1.09, grad_norm=0.64, lr=9.21e-05, throughput=3072 tok/s +2025-11-15 05:37:17,672 - INFO - Epoch 1 Step 960 (Global: 960): loss=0.0915, ppl=1.10, grad_norm=0.66, lr=9.30e-05, throughput=3056 tok/s +2025-11-15 05:39:44,689 - INFO - Epoch 1 Step 970 (Global: 970): loss=0.0862, ppl=1.09, grad_norm=0.62, lr=9.39e-05, throughput=3265 tok/s +2025-11-15 05:42:19,659 - INFO - Epoch 1 Step 980 (Global: 980): loss=0.0861, ppl=1.09, grad_norm=0.59, lr=9.47e-05, throughput=3097 tok/s +2025-11-15 05:44:53,959 - INFO - Epoch 1 Step 990 (Global: 990): loss=0.0673, ppl=1.07, grad_norm=0.56, lr=9.56e-05, throughput=3111 tok/s +2025-11-15 05:47:19,470 - INFO - Epoch 1 Step 1000 (Global: 1000): loss=0.0916, ppl=1.10, grad_norm=0.67, lr=9.65e-05, throughput=3299 tok/s +2025-11-15 05:47:19,472 - INFO - +Running validation at step 1000... +2025-11-15 05:54:53,953 - INFO - Validation loss: 0.0817, perplexity: 1.09 +2025-11-15 05:54:53,953 - INFO - Qualitative metrics (n=5): +2025-11-15 05:54:53,954 - INFO - BLEU: 0.7085 +2025-11-15 05:54:53,954 - INFO - METEOR: 0.8905 +2025-11-15 05:54:53,954 - INFO - Edit Distance: 0.1507 +2025-11-15 05:54:53,954 - INFO - F-measure: 0.8793 +2025-11-15 05:54:53,954 - INFO - +====================================================================== +2025-11-15 05:54:53,955 - INFO - Qualitative Evaluation Samples: +2025-11-15 05:54:53,955 - INFO - ====================================================================== +2025-11-15 05:54:53,955 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-15 05:54:53,955 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 05:54:53,955 - INFO - Generated: ' gave it fourQ stars out of five and said that "the album [perhaps] seemingly ill-sounding songs of sequencing if they make sense to lure their wish into thinking it audience-was-you\'s. But itere\'s no...' +2025-11-15 05:54:53,955 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-15 05:54:53,955 - INFO - ---------------------------------------------------------------------- +2025-11-15 05:54:53,955 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-15 05:54:53,956 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 05:54:53,956 - INFO - Generated: 'ire, was S-Chou Abne, a Lebaneseakra-led American student who the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 05:54:53,956 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 05:54:53,956 - INFO - ---------------------------------------------------------------------- +2025-11-15 05:54:53,956 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-15 05:54:53,956 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 05:54:53,956 - INFO - Generated: ' lay at the Meeting. His headmia weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beelax stops the battle a...' +2025-11-15 05:54:53,956 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 05:54:53,956 - INFO - ---------------------------------------------------------------------- +2025-11-15 05:54:53,956 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-15 05:54:53,956 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 05:54:53,957 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 05:54:53,957 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 05:54:53,957 - INFO - ---------------------------------------------------------------------- +2025-11-15 05:54:53,957 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-15 05:54:53,957 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 05:54:53,957 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Red Shoothores ...' +2025-11-15 05:54:53,957 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 05:54:53,957 - INFO - ---------------------------------------------------------------------- +2025-11-15 05:54:53,958 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_1000.jsonl +2025-11-15 05:55:35,065 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-15 05:55:35,075 - INFO - New best validation loss: 0.0817, perplexity: 1.09 +2025-11-15 05:58:09,520 - INFO - Epoch 1 Step 1010 (Global: 1010): loss=0.0769, ppl=1.08, grad_norm=0.63, lr=9.73e-05, throughput=3108 tok/s +2025-11-15 06:00:44,116 - INFO - Epoch 1 Step 1020 (Global: 1020): loss=0.0649, ppl=1.07, grad_norm=0.57, lr=9.82e-05, throughput=3105 tok/s +2025-11-15 06:03:19,487 - INFO - Epoch 1 Step 1030 (Global: 1030): loss=0.0804, ppl=1.08, grad_norm=0.56, lr=9.90e-05, throughput=3089 tok/s +2025-11-15 06:05:45,688 - INFO - Epoch 1 Step 1040 (Global: 1040): loss=0.0954, ppl=1.10, grad_norm=0.64, lr=9.99e-05, throughput=3283 tok/s +2025-11-15 06:08:21,846 - INFO - Epoch 1 Step 1050 (Global: 1050): loss=0.0806, ppl=1.08, grad_norm=0.54, lr=1.00e-04, throughput=3074 tok/s +2025-11-15 06:10:59,824 - INFO - Epoch 1 Step 1060 (Global: 1060): loss=0.0954, ppl=1.10, grad_norm=0.60, lr=1.00e-04, throughput=3038 tok/s +2025-11-15 06:13:28,656 - INFO - Epoch 1 Step 1070 (Global: 1070): loss=0.0742, ppl=1.08, grad_norm=0.52, lr=1.00e-04, throughput=3225 tok/s +2025-11-15 06:16:06,505 - INFO - Epoch 1 Step 1080 (Global: 1080): loss=0.0798, ppl=1.08, grad_norm=0.48, lr=1.00e-04, throughput=3041 tok/s +2025-11-15 06:18:42,439 - INFO - Epoch 1 Step 1090 (Global: 1090): loss=0.0737, ppl=1.08, grad_norm=0.50, lr=1.00e-04, throughput=3078 tok/s +2025-11-15 06:21:18,241 - INFO - Epoch 1 Step 1100 (Global: 1100): loss=0.0823, ppl=1.09, grad_norm=0.55, lr=1.00e-04, throughput=3081 tok/s +2025-11-15 06:23:45,021 - INFO - Epoch 1 Step 1110 (Global: 1110): loss=0.0796, ppl=1.08, grad_norm=0.50, lr=1.00e-04, throughput=3270 tok/s +2025-11-15 06:26:18,796 - INFO - Epoch 1 Step 1120 (Global: 1120): loss=0.0808, ppl=1.08, grad_norm=0.50, lr=1.00e-04, throughput=3122 tok/s +2025-11-15 06:28:53,648 - INFO - Epoch 1 Step 1130 (Global: 1130): loss=0.0771, ppl=1.08, grad_norm=0.47, lr=1.00e-04, throughput=3100 tok/s +2025-11-15 06:31:19,800 - INFO - Epoch 1 Step 1140 (Global: 1140): loss=0.0818, ppl=1.09, grad_norm=0.51, lr=1.00e-04, throughput=3284 tok/s +2025-11-15 06:33:54,857 - INFO - Epoch 1 Step 1150 (Global: 1150): loss=0.0777, ppl=1.08, grad_norm=0.50, lr=1.00e-04, throughput=3096 tok/s +2025-11-15 06:36:28,996 - INFO - Epoch 1 Step 1160 (Global: 1160): loss=0.0843, ppl=1.09, grad_norm=0.48, lr=1.00e-04, throughput=3114 tok/s +2025-11-15 06:39:03,661 - INFO - Epoch 1 Step 1170 (Global: 1170): loss=0.0734, ppl=1.08, grad_norm=0.52, lr=1.00e-04, throughput=3104 tok/s +2025-11-15 06:41:29,903 - INFO - Epoch 1 Step 1180 (Global: 1180): loss=0.0749, ppl=1.08, grad_norm=0.45, lr=9.99e-05, throughput=3282 tok/s +2025-11-15 06:44:04,327 - INFO - Epoch 1 Step 1190 (Global: 1190): loss=0.0709, ppl=1.07, grad_norm=0.46, lr=9.99e-05, throughput=3108 tok/s +2025-11-15 06:46:38,278 - INFO - Epoch 1 Step 1200 (Global: 1200): loss=0.0771, ppl=1.08, grad_norm=0.46, lr=9.99e-05, throughput=3118 tok/s +2025-11-15 06:49:03,879 - INFO - Epoch 1 Step 1210 (Global: 1210): loss=0.0758, ppl=1.08, grad_norm=0.44, lr=9.99e-05, throughput=3297 tok/s +2025-11-15 06:51:38,509 - INFO - Epoch 1 Step 1220 (Global: 1220): loss=0.0659, ppl=1.07, grad_norm=0.43, lr=9.99e-05, throughput=3104 tok/s +2025-11-15 06:54:12,245 - INFO - Epoch 1 Step 1230 (Global: 1230): loss=0.0692, ppl=1.07, grad_norm=0.45, lr=9.99e-05, throughput=3122 tok/s +2025-11-15 06:56:47,451 - INFO - Epoch 1 Step 1240 (Global: 1240): loss=0.0758, ppl=1.08, grad_norm=0.47, lr=9.99e-05, throughput=3093 tok/s +2025-11-15 06:59:13,159 - INFO - Epoch 1 Step 1250 (Global: 1250): loss=0.0692, ppl=1.07, grad_norm=0.47, lr=9.99e-05, throughput=3294 tok/s +2025-11-15 07:01:48,352 - INFO - Epoch 1 Step 1260 (Global: 1260): loss=0.0643, ppl=1.07, grad_norm=0.41, lr=9.99e-05, throughput=3093 tok/s +2025-11-15 07:04:22,271 - INFO - Epoch 1 Step 1270 (Global: 1270): loss=0.0667, ppl=1.07, grad_norm=0.43, lr=9.99e-05, throughput=3119 tok/s +2025-11-15 07:06:57,404 - INFO - Epoch 1 Step 1280 (Global: 1280): loss=0.0769, ppl=1.08, grad_norm=0.46, lr=9.98e-05, throughput=3094 tok/s +2025-11-15 07:09:23,773 - INFO - Epoch 1 Step 1290 (Global: 1290): loss=0.0756, ppl=1.08, grad_norm=0.43, lr=9.98e-05, throughput=3280 tok/s +2025-11-15 07:11:59,235 - INFO - Epoch 1 Step 1300 (Global: 1300): loss=0.0704, ppl=1.07, grad_norm=0.44, lr=9.98e-05, throughput=3088 tok/s +2025-11-15 07:14:37,742 - INFO - Epoch 1 Step 1310 (Global: 1310): loss=0.0593, ppl=1.06, grad_norm=0.47, lr=9.98e-05, throughput=3028 tok/s +2025-11-15 07:17:06,205 - INFO - Epoch 1 Step 1320 (Global: 1320): loss=0.0661, ppl=1.07, grad_norm=0.45, lr=9.98e-05, throughput=3233 tok/s +2025-11-15 07:19:42,177 - INFO - Epoch 1 Step 1330 (Global: 1330): loss=0.0771, ppl=1.08, grad_norm=0.69, lr=9.98e-05, throughput=3078 tok/s +2025-11-15 07:22:16,796 - INFO - Epoch 1 Step 1340 (Global: 1340): loss=0.0597, ppl=1.06, grad_norm=0.49, lr=9.97e-05, throughput=3104 tok/s +2025-11-15 07:24:42,272 - INFO - Epoch 1 Step 1350 (Global: 1350): loss=0.0673, ppl=1.07, grad_norm=0.49, lr=9.97e-05, throughput=3300 tok/s +2025-11-15 07:27:21,144 - INFO - Epoch 1 Step 1360 (Global: 1360): loss=0.0643, ppl=1.07, grad_norm=0.39, lr=9.97e-05, throughput=3021 tok/s +2025-11-15 07:30:02,178 - INFO - Epoch 1 Step 1370 (Global: 1370): loss=0.0689, ppl=1.07, grad_norm=0.42, lr=9.97e-05, throughput=2981 tok/s +2025-11-15 07:32:39,040 - INFO - Epoch 1 Step 1380 (Global: 1380): loss=0.0650, ppl=1.07, grad_norm=0.42, lr=9.97e-05, throughput=3060 tok/s +2025-11-15 07:35:04,963 - INFO - Epoch 1 Step 1390 (Global: 1390): loss=0.0674, ppl=1.07, grad_norm=0.40, lr=9.97e-05, throughput=3289 tok/s +2025-11-15 07:37:39,794 - INFO - Epoch 1 Step 1400 (Global: 1400): loss=0.0635, ppl=1.07, grad_norm=0.40, lr=9.96e-05, throughput=3100 tok/s +2025-11-15 07:40:15,269 - INFO - Epoch 1 Step 1410 (Global: 1410): loss=0.0662, ppl=1.07, grad_norm=0.50, lr=9.96e-05, throughput=3087 tok/s +2025-11-15 07:42:42,044 - INFO - Epoch 1 Step 1420 (Global: 1420): loss=0.0813, ppl=1.08, grad_norm=0.44, lr=9.96e-05, throughput=3270 tok/s +2025-11-15 07:45:17,198 - INFO - Epoch 1 Step 1430 (Global: 1430): loss=0.0658, ppl=1.07, grad_norm=0.41, lr=9.96e-05, throughput=3094 tok/s +2025-11-15 07:47:52,039 - INFO - Epoch 1 Step 1440 (Global: 1440): loss=0.0634, ppl=1.07, grad_norm=0.38, lr=9.96e-05, throughput=3100 tok/s +2025-11-15 07:50:27,384 - INFO - Epoch 1 Step 1450 (Global: 1450): loss=0.0654, ppl=1.07, grad_norm=0.42, lr=9.95e-05, throughput=3090 tok/s +2025-11-15 07:52:53,694 - INFO - Epoch 1 Step 1460 (Global: 1460): loss=0.0651, ppl=1.07, grad_norm=0.37, lr=9.95e-05, throughput=3281 tok/s +2025-11-15 07:55:29,446 - INFO - Epoch 1 Step 1470 (Global: 1470): loss=0.0613, ppl=1.06, grad_norm=0.61, lr=9.95e-05, throughput=3082 tok/s +2025-11-15 07:58:04,081 - INFO - Epoch 1 Step 1480 (Global: 1480): loss=0.0713, ppl=1.07, grad_norm=0.49, lr=9.95e-05, throughput=3104 tok/s +2025-11-15 08:00:29,835 - INFO - Epoch 1 Step 1490 (Global: 1490): loss=0.0636, ppl=1.07, grad_norm=0.48, lr=9.94e-05, throughput=3293 tok/s +2025-11-15 08:03:04,915 - INFO - Epoch 1 Step 1500 (Global: 1500): loss=0.0640, ppl=1.07, grad_norm=0.45, lr=9.94e-05, throughput=3095 tok/s +2025-11-15 08:03:04,918 - INFO - +Running validation at step 1500... +2025-11-15 08:10:44,286 - INFO - Validation loss: 0.0648, perplexity: 1.07 +2025-11-15 08:10:44,287 - INFO - Qualitative metrics (n=5): +2025-11-15 08:10:44,287 - INFO - BLEU: 0.7865 +2025-11-15 08:10:44,287 - INFO - METEOR: 0.9201 +2025-11-15 08:10:44,287 - INFO - Edit Distance: 0.1154 +2025-11-15 08:10:44,287 - INFO - F-measure: 0.8996 +2025-11-15 08:10:44,287 - INFO - +====================================================================== +2025-11-15 08:10:44,287 - INFO - Qualitative Evaluation Samples: +2025-11-15 08:10:44,288 - INFO - ====================================================================== +2025-11-15 08:10:44,288 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-15 08:10:44,288 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 08:10:44,288 - INFO - Generated: ' gave it fourQ stars out of five and said that "the album [Perhaps\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s you-was. But itere: not...' +2025-11-15 08:10:44,288 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-15 08:10:44,288 - INFO - ---------------------------------------------------------------------- +2025-11-15 08:10:44,288 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-15 08:10:44,288 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 08:10:44,288 - INFO - Generated: 'ire, was S-Chou Abneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 08:10:44,288 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 08:10:44,289 - INFO - ---------------------------------------------------------------------- +2025-11-15 08:10:44,289 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-15 08:10:44,289 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 08:10:44,289 - INFO - Generated: ' lay at the Meeting. His headedmia weapon of choice is a giant, ax and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 08:10:44,289 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 08:10:44,289 - INFO - ---------------------------------------------------------------------- +2025-11-15 08:10:44,289 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-15 08:10:44,289 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 08:10:44,289 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 08:10:44,289 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 08:10:44,290 - INFO - ---------------------------------------------------------------------- +2025-11-15 08:10:44,290 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-15 08:10:44,290 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 08:10:44,290 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Max Shredis Wood | [o...' +2025-11-15 08:10:44,290 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 08:10:44,290 - INFO - ---------------------------------------------------------------------- +2025-11-15 08:10:44,291 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_1500.jsonl +2025-11-15 08:11:24,908 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-15 08:11:24,923 - INFO - New best validation loss: 0.0648, perplexity: 1.07 +2025-11-15 08:14:00,875 - INFO - Epoch 1 Step 1510 (Global: 1510): loss=0.0665, ppl=1.07, grad_norm=0.43, lr=9.94e-05, throughput=3078 tok/s +2025-11-15 08:16:36,944 - INFO - Epoch 1 Step 1520 (Global: 1520): loss=0.0755, ppl=1.08, grad_norm=0.46, lr=9.94e-05, throughput=3076 tok/s +2025-11-15 08:19:04,125 - INFO - Epoch 1 Step 1530 (Global: 1530): loss=0.0648, ppl=1.07, grad_norm=0.38, lr=9.93e-05, throughput=3261 tok/s +2025-11-15 08:21:38,926 - INFO - Epoch 1 Step 1540 (Global: 1540): loss=0.0690, ppl=1.07, grad_norm=0.42, lr=9.93e-05, throughput=3101 tok/s +2025-11-15 08:24:13,954 - INFO - Epoch 1 Step 1550 (Global: 1550): loss=0.0628, ppl=1.06, grad_norm=0.41, lr=9.93e-05, throughput=3096 tok/s +2025-11-15 08:26:48,015 - INFO - Epoch 1 Step 1560 (Global: 1560): loss=0.0572, ppl=1.06, grad_norm=0.41, lr=9.92e-05, throughput=3116 tok/s +2025-11-15 08:29:13,232 - INFO - Epoch 1 Step 1570 (Global: 1570): loss=0.0627, ppl=1.06, grad_norm=0.39, lr=9.92e-05, throughput=3305 tok/s +2025-11-15 08:31:48,350 - INFO - Epoch 1 Step 1580 (Global: 1580): loss=0.0624, ppl=1.06, grad_norm=0.56, lr=9.92e-05, throughput=3094 tok/s +2025-11-15 08:34:22,984 - INFO - Epoch 1 Step 1590 (Global: 1590): loss=0.0658, ppl=1.07, grad_norm=0.51, lr=9.92e-05, throughput=3104 tok/s +2025-11-15 08:36:48,680 - INFO - Epoch 1 Step 1600 (Global: 1600): loss=0.0609, ppl=1.06, grad_norm=0.38, lr=9.91e-05, throughput=3295 tok/s +2025-11-15 08:39:23,305 - INFO - Epoch 1 Step 1610 (Global: 1610): loss=0.0582, ppl=1.06, grad_norm=0.36, lr=9.91e-05, throughput=3104 tok/s +2025-11-15 08:41:57,806 - INFO - Epoch 1 Step 1620 (Global: 1620): loss=0.0605, ppl=1.06, grad_norm=0.38, lr=9.91e-05, throughput=3107 tok/s +2025-11-15 08:44:32,495 - INFO - Epoch 1 Step 1630 (Global: 1630): loss=0.0577, ppl=1.06, grad_norm=0.38, lr=9.90e-05, throughput=3103 tok/s +2025-11-15 08:46:57,677 - INFO - Epoch 1 Step 1640 (Global: 1640): loss=0.0658, ppl=1.07, grad_norm=0.63, lr=9.90e-05, throughput=3306 tok/s +2025-11-15 08:49:32,033 - INFO - Epoch 1 Step 1650 (Global: 1650): loss=0.0829, ppl=1.09, grad_norm=0.61, lr=9.90e-05, throughput=3110 tok/s +2025-11-15 08:52:06,910 - INFO - Epoch 1 Step 1660 (Global: 1660): loss=0.0759, ppl=1.08, grad_norm=0.44, lr=9.89e-05, throughput=3099 tok/s +2025-11-15 08:54:33,216 - INFO - Epoch 1 Step 1670 (Global: 1670): loss=0.0666, ppl=1.07, grad_norm=0.44, lr=9.89e-05, throughput=3281 tok/s +2025-11-15 08:57:06,928 - INFO - Epoch 1 Step 1680 (Global: 1680): loss=0.0615, ppl=1.06, grad_norm=0.40, lr=9.89e-05, throughput=3123 tok/s +2025-11-15 08:59:41,875 - INFO - Epoch 1 Step 1690 (Global: 1690): loss=0.0617, ppl=1.06, grad_norm=0.39, lr=9.88e-05, throughput=3098 tok/s +2025-11-15 09:02:16,989 - INFO - Epoch 1 Step 1700 (Global: 1700): loss=0.0705, ppl=1.07, grad_norm=0.71, lr=9.88e-05, throughput=3095 tok/s +2025-11-15 09:04:44,238 - INFO - Epoch 1 Step 1710 (Global: 1710): loss=0.0695, ppl=1.07, grad_norm=0.41, lr=9.87e-05, throughput=3260 tok/s +2025-11-15 09:07:20,487 - INFO - Epoch 1 Step 1720 (Global: 1720): loss=0.0643, ppl=1.07, grad_norm=0.42, lr=9.87e-05, throughput=3072 tok/s +2025-11-15 09:09:56,511 - INFO - Epoch 1 Step 1730 (Global: 1730): loss=0.0540, ppl=1.06, grad_norm=0.35, lr=9.87e-05, throughput=3077 tok/s +2025-11-15 09:12:23,460 - INFO - Epoch 1 Step 1740 (Global: 1740): loss=0.0660, ppl=1.07, grad_norm=0.38, lr=9.86e-05, throughput=3267 tok/s +2025-11-15 09:14:58,855 - INFO - Epoch 1 Step 1750 (Global: 1750): loss=0.0614, ppl=1.06, grad_norm=0.49, lr=9.86e-05, throughput=3089 tok/s +2025-11-15 09:17:36,706 - INFO - Epoch 1 Step 1760 (Global: 1760): loss=0.0648, ppl=1.07, grad_norm=0.42, lr=9.86e-05, throughput=3041 tok/s +2025-11-15 09:20:13,235 - INFO - Epoch 1 Step 1770 (Global: 1770): loss=0.0700, ppl=1.07, grad_norm=0.56, lr=9.85e-05, throughput=3067 tok/s +2025-11-15 09:22:40,054 - INFO - Epoch 1 Step 1780 (Global: 1780): loss=0.0681, ppl=1.07, grad_norm=0.43, lr=9.85e-05, throughput=3269 tok/s +2025-11-15 09:25:16,390 - INFO - Epoch 1 Step 1790 (Global: 1790): loss=0.0460, ppl=1.05, grad_norm=0.39, lr=9.84e-05, throughput=3070 tok/s +2025-11-15 09:27:50,959 - INFO - Epoch 1 Step 1800 (Global: 1800): loss=0.0602, ppl=1.06, grad_norm=0.34, lr=9.84e-05, throughput=3105 tok/s +2025-11-15 09:30:17,281 - INFO - Epoch 1 Step 1810 (Global: 1810): loss=0.0603, ppl=1.06, grad_norm=0.38, lr=9.83e-05, throughput=3281 tok/s +2025-11-15 09:32:52,276 - INFO - Epoch 1 Step 1820 (Global: 1820): loss=0.0603, ppl=1.06, grad_norm=0.45, lr=9.83e-05, throughput=3097 tok/s +2025-11-15 09:35:28,544 - INFO - Epoch 1 Step 1830 (Global: 1830): loss=0.0600, ppl=1.06, grad_norm=0.37, lr=9.83e-05, throughput=3075 tok/s +2025-11-15 09:38:03,586 - INFO - Epoch 1 Step 1840 (Global: 1840): loss=0.0551, ppl=1.06, grad_norm=0.35, lr=9.82e-05, throughput=3096 tok/s +2025-11-15 09:40:28,623 - INFO - Epoch 1 Step 1850 (Global: 1850): loss=0.0540, ppl=1.06, grad_norm=0.37, lr=9.82e-05, throughput=3310 tok/s +2025-11-15 09:43:03,115 - INFO - Epoch 1 Step 1860 (Global: 1860): loss=0.0587, ppl=1.06, grad_norm=0.37, lr=9.81e-05, throughput=3107 tok/s +2025-11-15 09:45:37,101 - INFO - Epoch 1 Step 1870 (Global: 1870): loss=0.0608, ppl=1.06, grad_norm=0.42, lr=9.81e-05, throughput=3117 tok/s +2025-11-15 09:48:02,164 - INFO - Epoch 1 Step 1880 (Global: 1880): loss=0.0487, ppl=1.05, grad_norm=0.35, lr=9.80e-05, throughput=3309 tok/s +2025-11-15 09:50:36,934 - INFO - Epoch 1 Step 1890 (Global: 1890): loss=0.0665, ppl=1.07, grad_norm=0.40, lr=9.80e-05, throughput=3101 tok/s +2025-11-15 09:53:10,451 - INFO - Epoch 1 Step 1900 (Global: 1900): loss=0.0598, ppl=1.06, grad_norm=0.36, lr=9.79e-05, throughput=3127 tok/s +2025-11-15 09:55:35,743 - INFO - Epoch 1 Step 1910 (Global: 1910): loss=0.0579, ppl=1.06, grad_norm=0.34, lr=9.79e-05, throughput=3304 tok/s +2025-11-15 09:58:10,417 - INFO - Epoch 1 Step 1920 (Global: 1920): loss=0.0576, ppl=1.06, grad_norm=0.41, lr=9.78e-05, throughput=3103 tok/s +2025-11-15 10:00:44,477 - INFO - Epoch 1 Step 1930 (Global: 1930): loss=0.0522, ppl=1.05, grad_norm=0.41, lr=9.78e-05, throughput=3116 tok/s +2025-11-15 10:03:19,369 - INFO - Epoch 1 Step 1940 (Global: 1940): loss=0.0534, ppl=1.05, grad_norm=0.34, lr=9.77e-05, throughput=3099 tok/s +2025-11-15 10:05:45,773 - INFO - Epoch 1 Step 1950 (Global: 1950): loss=0.0630, ppl=1.07, grad_norm=0.37, lr=9.77e-05, throughput=3279 tok/s +2025-11-15 10:08:20,986 - INFO - Epoch 1 Step 1960 (Global: 1960): loss=0.0601, ppl=1.06, grad_norm=0.36, lr=9.76e-05, throughput=3093 tok/s +2025-11-15 10:10:55,801 - INFO - Epoch 1 Step 1970 (Global: 1970): loss=0.0554, ppl=1.06, grad_norm=0.35, lr=9.76e-05, throughput=3101 tok/s +2025-11-15 10:13:21,854 - INFO - Epoch 1 Step 1980 (Global: 1980): loss=0.0564, ppl=1.06, grad_norm=0.37, lr=9.75e-05, throughput=3287 tok/s +2025-11-15 10:15:58,824 - INFO - Epoch 1 Step 1990 (Global: 1990): loss=0.0495, ppl=1.05, grad_norm=0.46, lr=9.75e-05, throughput=3058 tok/s +2025-11-15 10:18:34,267 - INFO - Epoch 1 Step 2000 (Global: 2000): loss=0.0627, ppl=1.06, grad_norm=0.46, lr=9.74e-05, throughput=3088 tok/s +2025-11-15 10:18:34,270 - INFO - +Running validation at step 2000... +2025-11-15 10:26:09,105 - INFO - Validation loss: 0.0573, perplexity: 1.06 +2025-11-15 10:26:09,106 - INFO - Qualitative metrics (n=5): +2025-11-15 10:26:09,106 - INFO - BLEU: 0.8152 +2025-11-15 10:26:09,106 - INFO - METEOR: 0.9350 +2025-11-15 10:26:09,107 - INFO - Edit Distance: 0.1131 +2025-11-15 10:26:09,107 - INFO - F-measure: 0.9192 +2025-11-15 10:26:09,107 - INFO - +====================================================================== +2025-11-15 10:26:09,107 - INFO - Qualitative Evaluation Samples: +2025-11-15 10:26:09,107 - INFO - ====================================================================== +2025-11-15 10:26:09,107 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-15 10:26:09,107 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 10:26:09,107 - INFO - Generated: ' gave it fourQ out of five stars and said that "the album [Perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s you-was-but it.ere: thi...' +2025-11-15 10:26:09,107 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-15 10:26:09,108 - INFO - ---------------------------------------------------------------------- +2025-11-15 10:26:09,108 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-15 10:26:09,108 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 10:26:09,108 - INFO - Generated: 'ire, was Sabeou-Ch Abakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army RO...' +2025-11-15 10:26:09,108 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 10:26:09,108 - INFO - ---------------------------------------------------------------------- +2025-11-15 10:26:09,108 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-15 10:26:09,109 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 10:26:09,109 - INFO - Generated: ' meeting at the Layheaded. His mia weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax defea...' +2025-11-15 10:26:09,109 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 10:26:09,109 - INFO - ---------------------------------------------------------------------- +2025-11-15 10:26:09,109 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-15 10:26:09,109 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 10:26:09,109 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 10:26:09,110 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 10:26:09,110 - INFO - ---------------------------------------------------------------------- +2025-11-15 10:26:09,110 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-15 10:26:09,110 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 10:26:09,110 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows Max | Redwood Shisores ...' +2025-11-15 10:26:09,110 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 10:26:09,110 - INFO - ---------------------------------------------------------------------- +2025-11-15 10:26:09,111 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_2000.jsonl +2025-11-15 10:26:50,512 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-15 10:26:50,524 - INFO - New best validation loss: 0.0573, perplexity: 1.06 +2025-11-15 10:29:25,271 - INFO - Epoch 1 Step 2010 (Global: 2010): loss=0.0512, ppl=1.05, grad_norm=0.32, lr=9.74e-05, throughput=3102 tok/s +2025-11-15 10:31:52,061 - INFO - Epoch 1 Step 2020 (Global: 2020): loss=0.0566, ppl=1.06, grad_norm=0.37, lr=9.73e-05, throughput=3270 tok/s +2025-11-15 10:34:27,358 - INFO - Epoch 1 Step 2030 (Global: 2030): loss=0.0572, ppl=1.06, grad_norm=0.39, lr=9.73e-05, throughput=3091 tok/s +2025-11-15 10:37:02,272 - INFO - Epoch 1 Step 2040 (Global: 2040): loss=0.0457, ppl=1.05, grad_norm=0.32, lr=9.72e-05, throughput=3099 tok/s +2025-11-15 10:39:28,378 - INFO - Epoch 1 Step 2050 (Global: 2050): loss=0.0570, ppl=1.06, grad_norm=0.36, lr=9.72e-05, throughput=3285 tok/s +2025-11-15 10:42:02,973 - INFO - Epoch 1 Step 2060 (Global: 2060): loss=0.0529, ppl=1.05, grad_norm=0.35, lr=9.71e-05, throughput=3105 tok/s +2025-11-15 10:44:37,063 - INFO - Epoch 1 Step 2070 (Global: 2070): loss=0.0567, ppl=1.06, grad_norm=0.34, lr=9.71e-05, throughput=3115 tok/s +2025-11-15 10:47:11,606 - INFO - Epoch 1 Step 2080 (Global: 2080): loss=0.0562, ppl=1.06, grad_norm=0.40, lr=9.70e-05, throughput=3106 tok/s +2025-11-15 10:49:37,538 - INFO - Epoch 1 Step 2090 (Global: 2090): loss=0.0564, ppl=1.06, grad_norm=0.39, lr=9.69e-05, throughput=3289 tok/s +2025-11-15 10:52:12,624 - INFO - Epoch 1 Step 2100 (Global: 2100): loss=0.0690, ppl=1.07, grad_norm=0.39, lr=9.69e-05, throughput=3095 tok/s +2025-11-15 10:54:48,206 - INFO - Epoch 1 Step 2110 (Global: 2110): loss=0.0621, ppl=1.06, grad_norm=0.38, lr=9.68e-05, throughput=3085 tok/s +2025-11-15 10:57:15,757 - INFO - Epoch 1 Step 2120 (Global: 2120): loss=0.0600, ppl=1.06, grad_norm=0.38, lr=9.68e-05, throughput=3253 tok/s +2025-11-15 10:59:51,280 - INFO - Epoch 1 Step 2130 (Global: 2130): loss=0.0568, ppl=1.06, grad_norm=0.35, lr=9.67e-05, throughput=3086 tok/s +2025-11-15 11:02:27,352 - INFO - Epoch 1 Step 2140 (Global: 2140): loss=0.0569, ppl=1.06, grad_norm=0.35, lr=9.66e-05, throughput=3076 tok/s +2025-11-15 11:05:04,177 - INFO - Epoch 1 Step 2150 (Global: 2150): loss=0.0522, ppl=1.05, grad_norm=0.32, lr=9.66e-05, throughput=3061 tok/s +2025-11-15 11:07:30,414 - INFO - Epoch 1 Step 2160 (Global: 2160): loss=0.0618, ppl=1.06, grad_norm=0.39, lr=9.65e-05, throughput=3282 tok/s +2025-11-15 11:10:05,969 - INFO - Epoch 1 Step 2170 (Global: 2170): loss=0.0502, ppl=1.05, grad_norm=0.38, lr=9.65e-05, throughput=3086 tok/s +2025-11-15 11:12:41,342 - INFO - Epoch 1 Step 2180 (Global: 2180): loss=0.0537, ppl=1.06, grad_norm=0.35, lr=9.64e-05, throughput=3089 tok/s +2025-11-15 11:15:08,712 - INFO - Epoch 1 Step 2190 (Global: 2190): loss=0.0591, ppl=1.06, grad_norm=0.34, lr=9.63e-05, throughput=3257 tok/s +2025-11-15 11:17:44,795 - INFO - Epoch 1 Step 2200 (Global: 2200): loss=0.0491, ppl=1.05, grad_norm=0.34, lr=9.63e-05, throughput=3075 tok/s +2025-11-15 11:20:20,747 - INFO - Epoch 1 Step 2210 (Global: 2210): loss=0.0641, ppl=1.07, grad_norm=0.56, lr=9.62e-05, throughput=3078 tok/s +2025-11-15 11:22:56,500 - INFO - Epoch 1 Step 2220 (Global: 2220): loss=0.0558, ppl=1.06, grad_norm=0.37, lr=9.61e-05, throughput=3082 tok/s +2025-11-15 11:25:23,003 - INFO - Epoch 1 Step 2230 (Global: 2230): loss=0.0529, ppl=1.05, grad_norm=0.33, lr=9.61e-05, throughput=3276 tok/s +2025-11-15 11:27:59,406 - INFO - Epoch 1 Step 2240 (Global: 2240): loss=0.0431, ppl=1.04, grad_norm=0.33, lr=9.60e-05, throughput=3069 tok/s +2025-11-15 11:30:34,484 - INFO - Epoch 1 Step 2250 (Global: 2250): loss=0.0591, ppl=1.06, grad_norm=0.41, lr=9.60e-05, throughput=3095 tok/s +2025-11-15 11:32:59,878 - INFO - Epoch 1 Step 2260 (Global: 2260): loss=0.0519, ppl=1.05, grad_norm=0.33, lr=9.59e-05, throughput=3301 tok/s +2025-11-15 11:35:34,468 - INFO - Epoch 1 Step 2270 (Global: 2270): loss=0.0559, ppl=1.06, grad_norm=0.34, lr=9.58e-05, throughput=3105 tok/s +2025-11-15 11:38:09,187 - INFO - Epoch 1 Step 2280 (Global: 2280): loss=0.0531, ppl=1.05, grad_norm=0.34, lr=9.58e-05, throughput=3102 tok/s +2025-11-15 11:40:43,927 - INFO - Epoch 1 Step 2290 (Global: 2290): loss=0.0459, ppl=1.05, grad_norm=0.36, lr=9.57e-05, throughput=3102 tok/s +2025-11-15 11:43:09,672 - INFO - Epoch 1 Step 2300 (Global: 2300): loss=0.0590, ppl=1.06, grad_norm=0.34, lr=9.56e-05, throughput=3293 tok/s +2025-11-15 11:45:44,420 - INFO - Epoch 1 Step 2310 (Global: 2310): loss=0.0527, ppl=1.05, grad_norm=0.34, lr=9.55e-05, throughput=3102 tok/s +2025-11-15 11:48:19,778 - INFO - Epoch 1 Step 2320 (Global: 2320): loss=0.0502, ppl=1.05, grad_norm=0.35, lr=9.55e-05, throughput=3090 tok/s +2025-11-15 11:50:43,963 - INFO - Epoch 1 Step 2330 (Global: 2330): loss=0.0438, ppl=1.04, grad_norm=0.33, lr=9.54e-05, throughput=3329 tok/s +2025-11-15 11:53:15,404 - INFO - Epoch 1 Step 2340 (Global: 2340): loss=0.0500, ppl=1.05, grad_norm=0.33, lr=9.53e-05, throughput=3170 tok/s +2025-11-15 11:55:47,486 - INFO - Epoch 1 Step 2350 (Global: 2350): loss=0.0476, ppl=1.05, grad_norm=0.31, lr=9.53e-05, throughput=3156 tok/s +2025-11-15 11:58:19,240 - INFO - Epoch 1 Step 2360 (Global: 2360): loss=0.0475, ppl=1.05, grad_norm=0.32, lr=9.52e-05, throughput=3163 tok/s +2025-11-15 12:00:42,359 - INFO - Epoch 1 Step 2370 (Global: 2370): loss=0.0464, ppl=1.05, grad_norm=0.31, lr=9.51e-05, throughput=3354 tok/s +2025-11-15 12:03:15,869 - INFO - Epoch 1 Step 2380 (Global: 2380): loss=0.0586, ppl=1.06, grad_norm=0.39, lr=9.51e-05, throughput=3127 tok/s +2025-11-15 12:05:47,738 - INFO - Epoch 1 Step 2390 (Global: 2390): loss=0.0542, ppl=1.06, grad_norm=0.35, lr=9.50e-05, throughput=3161 tok/s +2025-11-15 12:08:10,651 - INFO - Epoch 1 Step 2400 (Global: 2400): loss=0.0516, ppl=1.05, grad_norm=0.34, lr=9.49e-05, throughput=3359 tok/s +2025-11-15 12:10:42,743 - INFO - Epoch 1 Step 2410 (Global: 2410): loss=0.0541, ppl=1.06, grad_norm=0.36, lr=9.48e-05, throughput=3156 tok/s +2025-11-15 12:13:16,146 - INFO - Epoch 1 Step 2420 (Global: 2420): loss=0.0585, ppl=1.06, grad_norm=0.38, lr=9.48e-05, throughput=3129 tok/s +2025-11-15 12:15:50,046 - INFO - Epoch 1 Step 2430 (Global: 2430): loss=0.0517, ppl=1.05, grad_norm=0.38, lr=9.47e-05, throughput=3119 tok/s +2025-11-15 12:18:12,554 - INFO - Epoch 1 Step 2440 (Global: 2440): loss=0.0519, ppl=1.05, grad_norm=0.40, lr=9.46e-05, throughput=3368 tok/s +2025-11-15 12:20:44,985 - INFO - Epoch 1 Step 2450 (Global: 2450): loss=0.0490, ppl=1.05, grad_norm=0.32, lr=9.45e-05, throughput=3149 tok/s +2025-11-15 12:23:16,115 - INFO - Epoch 1 Step 2460 (Global: 2460): loss=0.0538, ppl=1.06, grad_norm=0.35, lr=9.45e-05, throughput=3176 tok/s +2025-11-15 12:25:38,847 - INFO - Epoch 1 Step 2470 (Global: 2470): loss=0.0530, ppl=1.05, grad_norm=0.35, lr=9.44e-05, throughput=3363 tok/s +2025-11-15 12:28:10,125 - INFO - Epoch 1 Step 2480 (Global: 2480): loss=0.0577, ppl=1.06, grad_norm=0.38, lr=9.43e-05, throughput=3173 tok/s +2025-11-15 12:30:42,002 - INFO - Epoch 1 Step 2490 (Global: 2490): loss=0.0491, ppl=1.05, grad_norm=0.33, lr=9.42e-05, throughput=3160 tok/s +2025-11-15 12:33:14,086 - INFO - Epoch 1 Step 2500 (Global: 2500): loss=0.0548, ppl=1.06, grad_norm=0.32, lr=9.41e-05, throughput=3156 tok/s +2025-11-15 12:33:14,087 - INFO - +Running validation at step 2500... +2025-11-15 12:40:37,772 - INFO - Validation loss: 0.0517, perplexity: 1.05 +2025-11-15 12:40:37,774 - INFO - Qualitative metrics (n=5): +2025-11-15 12:40:37,774 - INFO - BLEU: 0.8210 +2025-11-15 12:40:37,774 - INFO - METEOR: 0.9412 +2025-11-15 12:40:37,774 - INFO - Edit Distance: 0.0746 +2025-11-15 12:40:37,774 - INFO - F-measure: 0.9144 +2025-11-15 12:40:37,774 - INFO - +====================================================================== +2025-11-15 12:40:37,774 - INFO - Qualitative Evaluation Samples: +2025-11-15 12:40:37,774 - INFO - ====================================================================== +2025-11-15 12:40:37,775 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-15 12:40:37,775 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 12:40:37,775 - INFO - Generated: 'Q gave it four out of five stars and said that "the album [Perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it was you-w-as-butere. It\'s...' +2025-11-15 12:40:37,775 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-15 12:40:37,775 - INFO - ---------------------------------------------------------------------- +2025-11-15 12:40:37,775 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-15 12:40:37,775 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 12:40:37,775 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 12:40:37,776 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 12:40:37,776 - INFO - ---------------------------------------------------------------------- +2025-11-15 12:40:37,776 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-15 12:40:37,776 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 12:40:37,776 - INFO - Generated: ' at the meeting Layheaded. His mia weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-15 12:40:37,776 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 12:40:37,777 - INFO - ---------------------------------------------------------------------- +2025-11-15 12:40:37,777 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-15 12:40:37,777 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 12:40:37,777 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 12:40:37,777 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 12:40:37,777 - INFO - ---------------------------------------------------------------------- +2025-11-15 12:40:37,777 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-15 12:40:37,778 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 12:40:37,778 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 12:40:37,778 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 12:40:37,778 - INFO - ---------------------------------------------------------------------- +2025-11-15 12:40:37,779 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_2500.jsonl +2025-11-15 12:41:15,542 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-15 12:41:15,551 - INFO - New best validation loss: 0.0517, perplexity: 1.05 +2025-11-15 12:43:37,914 - INFO - Epoch 1 Step 2510 (Global: 2510): loss=0.0524, ppl=1.05, grad_norm=0.57, lr=9.41e-05, throughput=3372 tok/s +2025-11-15 12:46:09,870 - INFO - Epoch 1 Step 2520 (Global: 2520): loss=0.0508, ppl=1.05, grad_norm=0.36, lr=9.40e-05, throughput=3159 tok/s +2025-11-15 12:48:40,993 - INFO - Epoch 1 Step 2530 (Global: 2530): loss=0.0543, ppl=1.06, grad_norm=0.36, lr=9.39e-05, throughput=3176 tok/s +2025-11-15 12:51:12,540 - INFO - Epoch 1 Step 2540 (Global: 2540): loss=0.0511, ppl=1.05, grad_norm=0.32, lr=9.38e-05, throughput=3167 tok/s +2025-11-15 12:53:37,085 - INFO - Epoch 1 Step 2550 (Global: 2550): loss=0.0407, ppl=1.04, grad_norm=0.30, lr=9.37e-05, throughput=3321 tok/s +2025-11-15 12:56:10,069 - INFO - Epoch 1 Step 2560 (Global: 2560): loss=0.0546, ppl=1.06, grad_norm=0.35, lr=9.37e-05, throughput=3138 tok/s +2025-11-15 12:58:43,605 - INFO - Epoch 1 Step 2570 (Global: 2570): loss=0.0471, ppl=1.05, grad_norm=0.32, lr=9.36e-05, throughput=3126 tok/s +2025-11-15 13:01:09,837 - INFO - Epoch 1 Step 2580 (Global: 2580): loss=0.0444, ppl=1.05, grad_norm=0.29, lr=9.35e-05, throughput=3283 tok/s +2025-11-15 13:03:44,835 - INFO - Epoch 1 Step 2590 (Global: 2590): loss=0.0480, ppl=1.05, grad_norm=0.33, lr=9.34e-05, throughput=3097 tok/s +2025-11-15 13:06:19,041 - INFO - Epoch 1 Step 2600 (Global: 2600): loss=0.0434, ppl=1.04, grad_norm=0.30, lr=9.33e-05, throughput=3113 tok/s +2025-11-15 13:08:52,477 - INFO - Epoch 1 Step 2610 (Global: 2610): loss=0.0449, ppl=1.05, grad_norm=0.31, lr=9.32e-05, throughput=3128 tok/s +2025-11-15 13:11:17,654 - INFO - Epoch 1 Step 2620 (Global: 2620): loss=0.0508, ppl=1.05, grad_norm=0.31, lr=9.32e-05, throughput=3306 tok/s +2025-11-15 13:13:50,048 - INFO - Epoch 1 Step 2630 (Global: 2630): loss=0.0486, ppl=1.05, grad_norm=0.31, lr=9.31e-05, throughput=3150 tok/s +2025-11-15 13:16:23,679 - INFO - Epoch 1 Step 2640 (Global: 2640): loss=0.0474, ppl=1.05, grad_norm=0.32, lr=9.30e-05, throughput=3124 tok/s +2025-11-15 13:18:48,785 - INFO - Epoch 1 Step 2650 (Global: 2650): loss=0.0469, ppl=1.05, grad_norm=0.34, lr=9.29e-05, throughput=3308 tok/s +2025-11-15 13:21:25,087 - INFO - Epoch 1 Step 2660 (Global: 2660): loss=0.0546, ppl=1.06, grad_norm=0.35, lr=9.28e-05, throughput=3071 tok/s +2025-11-15 13:23:59,732 - INFO - Epoch 1 Step 2670 (Global: 2670): loss=0.0531, ppl=1.05, grad_norm=0.37, lr=9.27e-05, throughput=3104 tok/s +2025-11-15 13:26:35,255 - INFO - Epoch 1 Step 2680 (Global: 2680): loss=0.0522, ppl=1.05, grad_norm=0.32, lr=9.26e-05, throughput=3086 tok/s +2025-11-15 13:29:01,145 - INFO - Epoch 1 Step 2690 (Global: 2690): loss=0.0499, ppl=1.05, grad_norm=0.32, lr=9.26e-05, throughput=3290 tok/s +2025-11-15 13:31:34,138 - INFO - Epoch 1 Step 2700 (Global: 2700): loss=0.0538, ppl=1.06, grad_norm=0.34, lr=9.25e-05, throughput=3137 tok/s +2025-11-15 13:34:07,876 - INFO - Epoch 1 Step 2710 (Global: 2710): loss=0.0457, ppl=1.05, grad_norm=0.44, lr=9.24e-05, throughput=3122 tok/s +2025-11-15 13:36:32,920 - INFO - Epoch 1 Step 2720 (Global: 2720): loss=0.0425, ppl=1.04, grad_norm=0.29, lr=9.23e-05, throughput=3309 tok/s +2025-11-15 13:39:07,566 - INFO - Epoch 1 Step 2730 (Global: 2730): loss=0.0553, ppl=1.06, grad_norm=0.34, lr=9.22e-05, throughput=3104 tok/s +2025-11-15 13:41:41,721 - INFO - Epoch 1 Step 2740 (Global: 2740): loss=0.0420, ppl=1.04, grad_norm=0.29, lr=9.21e-05, throughput=3114 tok/s +2025-11-15 13:44:15,615 - INFO - Epoch 1 Step 2750 (Global: 2750): loss=0.0400, ppl=1.04, grad_norm=0.29, lr=9.20e-05, throughput=3119 tok/s +2025-11-15 13:46:40,963 - INFO - Epoch 1 Step 2760 (Global: 2760): loss=0.0529, ppl=1.05, grad_norm=0.32, lr=9.19e-05, throughput=3303 tok/s +2025-11-15 13:49:17,461 - INFO - Epoch 1 Step 2770 (Global: 2770): loss=0.0537, ppl=1.06, grad_norm=0.35, lr=9.18e-05, throughput=3067 tok/s +2025-11-15 13:51:52,113 - INFO - Epoch 1 Step 2780 (Global: 2780): loss=0.0444, ppl=1.05, grad_norm=0.34, lr=9.17e-05, throughput=3104 tok/s +2025-11-15 13:54:17,683 - INFO - Epoch 1 Step 2790 (Global: 2790): loss=0.0486, ppl=1.05, grad_norm=0.31, lr=9.17e-05, throughput=3297 tok/s +2025-11-15 13:56:52,749 - INFO - Epoch 1 Step 2800 (Global: 2800): loss=0.0653, ppl=1.07, grad_norm=0.42, lr=9.16e-05, throughput=3096 tok/s +2025-11-15 13:59:27,999 - INFO - Epoch 1 Step 2810 (Global: 2810): loss=0.0468, ppl=1.05, grad_norm=0.33, lr=9.15e-05, throughput=3092 tok/s +2025-11-15 14:02:02,441 - INFO - Epoch 1 Step 2820 (Global: 2820): loss=0.0501, ppl=1.05, grad_norm=0.31, lr=9.14e-05, throughput=3108 tok/s +2025-11-15 14:04:27,950 - INFO - Epoch 1 Step 2830 (Global: 2830): loss=0.0549, ppl=1.06, grad_norm=0.38, lr=9.13e-05, throughput=3299 tok/s +2025-11-15 14:07:02,113 - INFO - Epoch 1 Step 2840 (Global: 2840): loss=0.0512, ppl=1.05, grad_norm=0.33, lr=9.12e-05, throughput=3114 tok/s +2025-11-15 14:09:37,051 - INFO - Epoch 1 Step 2850 (Global: 2850): loss=0.0550, ppl=1.06, grad_norm=0.33, lr=9.11e-05, throughput=3098 tok/s +2025-11-15 14:12:03,162 - INFO - Epoch 1 Step 2860 (Global: 2860): loss=0.0532, ppl=1.05, grad_norm=0.33, lr=9.10e-05, throughput=3285 tok/s +2025-11-15 14:14:41,001 - INFO - Epoch 1 Step 2870 (Global: 2870): loss=0.0507, ppl=1.05, grad_norm=0.31, lr=9.09e-05, throughput=3041 tok/s +2025-11-15 14:17:18,656 - INFO - Epoch 1 Step 2880 (Global: 2880): loss=0.0436, ppl=1.04, grad_norm=0.32, lr=9.08e-05, throughput=3045 tok/s +2025-11-15 14:19:57,283 - INFO - Epoch 1 Step 2890 (Global: 2890): loss=0.0474, ppl=1.05, grad_norm=0.34, lr=9.07e-05, throughput=3026 tok/s +2025-11-15 14:22:21,498 - INFO - Epoch 1 Step 2900 (Global: 2900): loss=0.0540, ppl=1.06, grad_norm=0.35, lr=9.06e-05, throughput=3328 tok/s +2025-11-15 14:24:56,727 - INFO - Epoch 1 Step 2910 (Global: 2910): loss=0.0478, ppl=1.05, grad_norm=0.60, lr=9.05e-05, throughput=3092 tok/s +2025-11-15 14:27:33,078 - INFO - Epoch 1 Step 2920 (Global: 2920): loss=0.0467, ppl=1.05, grad_norm=0.30, lr=9.04e-05, throughput=3070 tok/s +2025-11-15 14:29:57,972 - INFO - Epoch 1 Step 2930 (Global: 2930): loss=0.0402, ppl=1.04, grad_norm=0.28, lr=9.03e-05, throughput=3313 tok/s +2025-11-15 14:32:33,719 - INFO - Epoch 1 Step 2940 (Global: 2940): loss=0.0511, ppl=1.05, grad_norm=0.33, lr=9.02e-05, throughput=3082 tok/s +2025-11-15 14:35:09,134 - INFO - Epoch 1 Step 2950 (Global: 2950): loss=0.0405, ppl=1.04, grad_norm=0.33, lr=9.01e-05, throughput=3089 tok/s +2025-11-15 14:37:46,174 - INFO - Epoch 1 Step 2960 (Global: 2960): loss=0.0415, ppl=1.04, grad_norm=0.30, lr=9.00e-05, throughput=3057 tok/s +2025-11-15 14:40:15,189 - INFO - Epoch 1 Step 2970 (Global: 2970): loss=0.0439, ppl=1.04, grad_norm=0.32, lr=8.99e-05, throughput=3221 tok/s +2025-11-15 14:42:51,241 - INFO - Epoch 1 Step 2980 (Global: 2980): loss=0.0513, ppl=1.05, grad_norm=0.32, lr=8.98e-05, throughput=3076 tok/s +2025-11-15 14:45:28,792 - INFO - Epoch 1 Step 2990 (Global: 2990): loss=0.0440, ppl=1.05, grad_norm=0.29, lr=8.97e-05, throughput=3047 tok/s +2025-11-15 14:47:53,787 - INFO - Epoch 1 Step 3000 (Global: 3000): loss=0.0470, ppl=1.05, grad_norm=0.32, lr=8.96e-05, throughput=3311 tok/s +2025-11-15 14:47:53,789 - INFO - +Running validation at step 3000... +2025-11-15 14:55:36,952 - INFO - Validation loss: 0.0479, perplexity: 1.05 +2025-11-15 14:55:36,953 - INFO - Qualitative metrics (n=5): +2025-11-15 14:55:36,953 - INFO - BLEU: 0.8054 +2025-11-15 14:55:36,953 - INFO - METEOR: 0.9248 +2025-11-15 14:55:36,953 - INFO - Edit Distance: 0.1029 +2025-11-15 14:55:36,953 - INFO - F-measure: 0.9052 +2025-11-15 14:55:36,953 - INFO - +====================================================================== +2025-11-15 14:55:36,953 - INFO - Qualitative Evaluation Samples: +2025-11-15 14:55:36,953 - INFO - ====================================================================== +2025-11-15 14:55:36,954 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-15 14:55:36,954 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 14:55:36,954 - INFO - Generated: 'Q gave it four stars out of five and said that "the album [Perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s you-was. But itere: thi...' +2025-11-15 14:55:36,954 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-15 14:55:36,954 - INFO - ---------------------------------------------------------------------- +2025-11-15 14:55:36,954 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-15 14:55:36,954 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 14:55:36,954 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 14:55:36,954 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 14:55:36,955 - INFO - ---------------------------------------------------------------------- +2025-11-15 14:55:36,955 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-15 14:55:36,955 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 14:55:36,955 - INFO - Generated: ' at the meeting Layheaded. Hismia is weapon of choice, a giant ax and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and de...' +2025-11-15 14:55:36,955 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 14:55:36,955 - INFO - ---------------------------------------------------------------------- +2025-11-15 14:55:36,955 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-15 14:55:36,955 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 14:55:36,955 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 14:55:36,955 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 14:55:36,956 - INFO - ---------------------------------------------------------------------- +2025-11-15 14:55:36,956 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-15 14:55:36,956 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 14:55:36,956 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 14:55:36,956 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 14:55:36,956 - INFO - ---------------------------------------------------------------------- +2025-11-15 14:55:36,957 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_3000.jsonl +2025-11-15 14:56:17,371 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-15 14:56:17,386 - INFO - New best validation loss: 0.0479, perplexity: 1.05 +2025-11-15 14:58:52,907 - INFO - Epoch 1 Step 3010 (Global: 3010): loss=0.0518, ppl=1.05, grad_norm=0.37, lr=8.95e-05, throughput=3087 tok/s +2025-11-15 15:01:30,020 - INFO - Epoch 1 Step 3020 (Global: 3020): loss=0.0456, ppl=1.05, grad_norm=0.40, lr=8.94e-05, throughput=3055 tok/s +2025-11-15 15:04:06,306 - INFO - Epoch 1 Step 3030 (Global: 3030): loss=0.0467, ppl=1.05, grad_norm=0.33, lr=8.93e-05, throughput=3071 tok/s +2025-11-15 15:06:32,959 - INFO - Epoch 1 Step 3040 (Global: 3040): loss=0.0426, ppl=1.04, grad_norm=0.36, lr=8.92e-05, throughput=3273 tok/s +2025-11-15 15:09:10,138 - INFO - Epoch 1 Step 3050 (Global: 3050): loss=0.0485, ppl=1.05, grad_norm=0.31, lr=8.91e-05, throughput=3054 tok/s +2025-11-15 15:11:51,754 - INFO - Epoch 1 Step 3060 (Global: 3060): loss=0.0405, ppl=1.04, grad_norm=0.29, lr=8.90e-05, throughput=2970 tok/s +2025-11-15 15:14:21,186 - INFO - Epoch 1 Step 3070 (Global: 3070): loss=0.0489, ppl=1.05, grad_norm=0.36, lr=8.89e-05, throughput=3212 tok/s +2025-11-15 15:17:00,266 - INFO - Epoch 1 Step 3080 (Global: 3080): loss=0.0516, ppl=1.05, grad_norm=0.33, lr=8.88e-05, throughput=3021 tok/s +2025-11-15 15:19:39,839 - INFO - Epoch 1 Step 3090 (Global: 3090): loss=0.0449, ppl=1.05, grad_norm=0.32, lr=8.87e-05, throughput=3008 tok/s +2025-11-15 15:22:21,772 - INFO - Epoch 1 Step 3100 (Global: 3100): loss=0.0486, ppl=1.05, grad_norm=0.41, lr=8.86e-05, throughput=2964 tok/s +2025-11-15 15:24:52,726 - INFO - Epoch 1 Step 3110 (Global: 3110): loss=0.0445, ppl=1.05, grad_norm=0.30, lr=8.85e-05, throughput=3180 tok/s +2025-11-15 15:27:30,267 - INFO - Epoch 1 Step 3120 (Global: 3120): loss=0.0499, ppl=1.05, grad_norm=0.31, lr=8.84e-05, throughput=3047 tok/s +2025-11-15 15:30:08,331 - INFO - Epoch 1 Step 3130 (Global: 3130): loss=0.0571, ppl=1.06, grad_norm=0.34, lr=8.82e-05, throughput=3037 tok/s +2025-11-15 15:32:39,641 - INFO - Epoch 1 Step 3140 (Global: 3140): loss=0.0483, ppl=1.05, grad_norm=0.32, lr=8.81e-05, throughput=3172 tok/s +2025-11-15 15:35:18,491 - INFO - Epoch 1 Step 3150 (Global: 3150): loss=0.0438, ppl=1.04, grad_norm=0.30, lr=8.80e-05, throughput=3022 tok/s +2025-11-15 15:37:57,043 - INFO - Epoch 1 Step 3160 (Global: 3160): loss=0.0529, ppl=1.05, grad_norm=0.40, lr=8.79e-05, throughput=3027 tok/s +2025-11-15 15:40:35,925 - INFO - Epoch 1 Step 3170 (Global: 3170): loss=0.0494, ppl=1.05, grad_norm=0.32, lr=8.78e-05, throughput=3021 tok/s +2025-11-15 15:43:09,169 - INFO - Epoch 1 Step 3180 (Global: 3180): loss=0.0433, ppl=1.04, grad_norm=0.34, lr=8.77e-05, throughput=3132 tok/s +2025-11-15 19:15:20,492 - INFO - Epoch 1 Step 3190 (Global: 3190): loss=0.0469, ppl=1.05, grad_norm=0.35, lr=8.76e-05, throughput=38 tok/s +2025-11-15 19:17:57,186 - INFO - Epoch 1 Step 3200 (Global: 3200): loss=0.0476, ppl=1.05, grad_norm=0.36, lr=8.75e-05, throughput=3063 tok/s +2025-11-15 19:20:23,170 - INFO - Epoch 1 Step 3210 (Global: 3210): loss=0.0437, ppl=1.04, grad_norm=0.29, lr=8.74e-05, throughput=3288 tok/s +2025-11-15 19:23:03,782 - INFO - Epoch 1 Step 3220 (Global: 3220): loss=0.0513, ppl=1.05, grad_norm=0.37, lr=8.73e-05, throughput=2989 tok/s +2025-11-15 19:25:37,674 - INFO - Epoch 1 Step 3230 (Global: 3230): loss=0.0471, ppl=1.05, grad_norm=0.30, lr=8.71e-05, throughput=3119 tok/s +2025-11-15 19:28:08,988 - INFO - Epoch 1 Step 3240 (Global: 3240): loss=0.0459, ppl=1.05, grad_norm=0.31, lr=8.70e-05, throughput=3172 tok/s +2025-11-15 19:30:30,737 - INFO - Epoch 1 Step 3250 (Global: 3250): loss=0.0533, ppl=1.05, grad_norm=0.41, lr=8.69e-05, throughput=3386 tok/s +2025-11-15 19:33:01,712 - INFO - Epoch 1 Step 3260 (Global: 3260): loss=0.0423, ppl=1.04, grad_norm=0.29, lr=8.68e-05, throughput=3179 tok/s +2025-11-15 19:35:33,435 - INFO - Epoch 1 Step 3270 (Global: 3270): loss=0.0503, ppl=1.05, grad_norm=0.34, lr=8.67e-05, throughput=3164 tok/s +2025-11-15 19:37:55,803 - INFO - Epoch 1 Step 3280 (Global: 3280): loss=0.0472, ppl=1.05, grad_norm=0.32, lr=8.66e-05, throughput=3372 tok/s +2025-11-15 19:40:27,352 - INFO - Epoch 1 Step 3290 (Global: 3290): loss=0.0421, ppl=1.04, grad_norm=0.35, lr=8.65e-05, throughput=3167 tok/s +2025-11-15 19:42:58,992 - INFO - Epoch 1 Step 3300 (Global: 3300): loss=0.0455, ppl=1.05, grad_norm=0.31, lr=8.63e-05, throughput=3165 tok/s +2025-11-15 19:45:30,289 - INFO - Epoch 1 Step 3310 (Global: 3310): loss=0.0493, ppl=1.05, grad_norm=0.37, lr=8.62e-05, throughput=3173 tok/s +2025-11-15 19:47:52,797 - INFO - Epoch 1 Step 3320 (Global: 3320): loss=0.0445, ppl=1.05, grad_norm=0.29, lr=8.61e-05, throughput=3368 tok/s +2025-11-15 19:50:24,568 - INFO - Epoch 1 Step 3330 (Global: 3330): loss=0.0471, ppl=1.05, grad_norm=0.31, lr=8.60e-05, throughput=3163 tok/s +2025-11-15 19:52:56,058 - INFO - Epoch 1 Step 3340 (Global: 3340): loss=0.0486, ppl=1.05, grad_norm=0.31, lr=8.59e-05, throughput=3169 tok/s +2025-11-15 19:55:19,065 - INFO - Epoch 1 Step 3350 (Global: 3350): loss=0.0527, ppl=1.05, grad_norm=0.32, lr=8.58e-05, throughput=3357 tok/s +2025-11-15 19:57:51,035 - INFO - Epoch 1 Step 3360 (Global: 3360): loss=0.0451, ppl=1.05, grad_norm=0.29, lr=8.57e-05, throughput=3159 tok/s +2025-11-15 20:00:23,403 - INFO - Epoch 1 Step 3370 (Global: 3370): loss=0.0493, ppl=1.05, grad_norm=0.53, lr=8.55e-05, throughput=3150 tok/s +2025-11-15 20:02:55,540 - INFO - Epoch 1 Step 3380 (Global: 3380): loss=0.0513, ppl=1.05, grad_norm=0.34, lr=8.54e-05, throughput=3155 tok/s +2025-11-15 20:05:17,605 - INFO - Epoch 1 Step 3390 (Global: 3390): loss=0.0445, ppl=1.05, grad_norm=0.31, lr=8.53e-05, throughput=3379 tok/s +2025-11-15 20:07:50,166 - INFO - Epoch 1 Step 3400 (Global: 3400): loss=0.0772, ppl=1.08, grad_norm=1.37, lr=8.52e-05, throughput=3146 tok/s +2025-11-15 20:10:22,512 - INFO - Epoch 1 Step 3410 (Global: 3410): loss=0.0490, ppl=1.05, grad_norm=0.34, lr=8.51e-05, throughput=3151 tok/s +2025-11-15 20:12:47,086 - INFO - Epoch 1 Step 3420 (Global: 3420): loss=0.0433, ppl=1.04, grad_norm=0.30, lr=8.49e-05, throughput=3320 tok/s +2025-11-15 20:15:21,335 - INFO - Epoch 1 Step 3430 (Global: 3430): loss=0.0430, ppl=1.04, grad_norm=0.29, lr=8.48e-05, throughput=3112 tok/s +2025-11-15 20:17:56,880 - INFO - Epoch 1 Step 3440 (Global: 3440): loss=0.0474, ppl=1.05, grad_norm=0.33, lr=8.47e-05, throughput=3086 tok/s +2025-11-15 20:20:31,035 - INFO - Epoch 1 Step 3450 (Global: 3450): loss=0.0446, ppl=1.05, grad_norm=0.30, lr=8.46e-05, throughput=3114 tok/s +2025-11-15 20:22:57,079 - INFO - Epoch 1 Step 3460 (Global: 3460): loss=0.0412, ppl=1.04, grad_norm=0.29, lr=8.45e-05, throughput=3287 tok/s +2025-11-15 20:25:31,403 - INFO - Epoch 1 Step 3470 (Global: 3470): loss=0.0383, ppl=1.04, grad_norm=0.27, lr=8.43e-05, throughput=3110 tok/s +2025-11-15 20:28:06,769 - INFO - Epoch 1 Step 3480 (Global: 3480): loss=0.0495, ppl=1.05, grad_norm=0.30, lr=8.42e-05, throughput=3090 tok/s +2025-11-15 20:30:34,617 - INFO - Epoch 1 Step 3490 (Global: 3490): loss=0.0496, ppl=1.05, grad_norm=0.33, lr=8.41e-05, throughput=3247 tok/s +2025-11-15 20:33:09,885 - INFO - Epoch 1 Step 3500 (Global: 3500): loss=0.0403, ppl=1.04, grad_norm=0.32, lr=8.40e-05, throughput=3091 tok/s +2025-11-15 20:33:09,887 - INFO - +Running validation at step 3500... +2025-11-15 20:40:51,072 - INFO - Validation loss: 0.0455, perplexity: 1.05 +2025-11-15 20:40:51,072 - INFO - Qualitative metrics (n=5): +2025-11-15 20:40:51,073 - INFO - BLEU: 0.8372 +2025-11-15 20:40:51,073 - INFO - METEOR: 0.9526 +2025-11-15 20:40:51,073 - INFO - Edit Distance: 0.0775 +2025-11-15 20:40:51,073 - INFO - F-measure: 0.9301 +2025-11-15 20:40:51,073 - INFO - +====================================================================== +2025-11-15 20:40:51,073 - INFO - Qualitative Evaluation Samples: +2025-11-15 20:40:51,073 - INFO - ====================================================================== +2025-11-15 20:40:51,073 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-15 20:40:51,073 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 20:40:51,074 - INFO - Generated: 'Q gave it four stars out of five and said that "the album [Perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s you-wasere. But it\'s no...' +2025-11-15 20:40:51,074 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-15 20:40:51,074 - INFO - ---------------------------------------------------------------------- +2025-11-15 20:40:51,074 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-15 20:40:51,074 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 20:40:51,074 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 20:40:51,074 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 20:40:51,074 - INFO - ---------------------------------------------------------------------- +2025-11-15 20:40:51,074 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-15 20:40:51,074 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 20:40:51,075 - INFO - Generated: ' at the meeting Laymia. His headed weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-15 20:40:51,075 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 20:40:51,075 - INFO - ---------------------------------------------------------------------- +2025-11-15 20:40:51,075 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-15 20:40:51,075 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 20:40:51,075 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 20:40:51,075 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 20:40:51,075 - INFO - ---------------------------------------------------------------------- +2025-11-15 20:40:51,075 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-15 20:40:51,075 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 20:40:51,076 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 20:40:51,076 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 20:40:51,076 - INFO - ---------------------------------------------------------------------- +2025-11-15 20:40:51,077 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_3500.jsonl +2025-11-15 20:41:28,147 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-15 20:41:28,158 - INFO - New best validation loss: 0.0455, perplexity: 1.05 +2025-11-15 20:44:03,618 - INFO - Epoch 1 Step 3510 (Global: 3510): loss=0.0450, ppl=1.05, grad_norm=0.32, lr=8.38e-05, throughput=3088 tok/s +2025-11-15 20:46:40,617 - INFO - Epoch 1 Step 3520 (Global: 3520): loss=0.0370, ppl=1.04, grad_norm=0.29, lr=8.37e-05, throughput=3057 tok/s +2025-11-15 20:49:07,610 - INFO - Epoch 1 Step 3530 (Global: 3530): loss=0.0447, ppl=1.05, grad_norm=0.30, lr=8.36e-05, throughput=3266 tok/s +2025-11-15 20:51:44,785 - INFO - Epoch 1 Step 3540 (Global: 3540): loss=0.0437, ppl=1.04, grad_norm=0.31, lr=8.35e-05, throughput=3054 tok/s +2025-11-15 20:54:21,255 - INFO - Epoch 1 Step 3550 (Global: 3550): loss=0.0473, ppl=1.05, grad_norm=0.32, lr=8.33e-05, throughput=3068 tok/s +2025-11-15 20:56:48,770 - INFO - Epoch 1 Step 3560 (Global: 3560): loss=0.0438, ppl=1.04, grad_norm=0.30, lr=8.32e-05, throughput=3254 tok/s +2025-11-15 20:59:23,403 - INFO - Epoch 1 Step 3570 (Global: 3570): loss=0.0468, ppl=1.05, grad_norm=0.33, lr=8.31e-05, throughput=3104 tok/s +2025-11-15 21:01:57,724 - INFO - Epoch 1 Step 3580 (Global: 3580): loss=0.0427, ppl=1.04, grad_norm=0.29, lr=8.30e-05, throughput=3110 tok/s +2025-11-15 21:04:31,321 - INFO - Epoch 1 Step 3590 (Global: 3590): loss=0.0518, ppl=1.05, grad_norm=0.37, lr=8.28e-05, throughput=3125 tok/s +2025-11-15 21:06:56,437 - INFO - Epoch 1 Step 3600 (Global: 3600): loss=0.0432, ppl=1.04, grad_norm=0.29, lr=8.27e-05, throughput=3308 tok/s +2025-11-15 21:09:28,535 - INFO - Epoch 1 Step 3610 (Global: 3610): loss=0.0416, ppl=1.04, grad_norm=0.28, lr=8.26e-05, throughput=3156 tok/s +2025-11-15 21:12:02,252 - INFO - Epoch 1 Step 3620 (Global: 3620): loss=0.0402, ppl=1.04, grad_norm=0.28, lr=8.25e-05, throughput=3123 tok/s +2025-11-15 21:14:27,713 - INFO - Epoch 1 Step 3630 (Global: 3630): loss=0.0423, ppl=1.04, grad_norm=0.31, lr=8.23e-05, throughput=3300 tok/s +2025-11-15 21:17:06,799 - INFO - Epoch 1 Step 3640 (Global: 3640): loss=0.0463, ppl=1.05, grad_norm=0.34, lr=8.22e-05, throughput=3017 tok/s +2025-11-15 21:19:49,647 - INFO - Epoch 1 Step 3650 (Global: 3650): loss=0.0452, ppl=1.05, grad_norm=0.34, lr=8.21e-05, throughput=2948 tok/s +2025-11-15 21:22:24,392 - INFO - Epoch 1 Step 3660 (Global: 3660): loss=0.0459, ppl=1.05, grad_norm=0.31, lr=8.20e-05, throughput=3102 tok/s +2025-11-15 21:24:50,523 - INFO - Epoch 1 Step 3670 (Global: 3670): loss=0.0419, ppl=1.04, grad_norm=0.29, lr=8.18e-05, throughput=3285 tok/s +2025-11-15 21:27:25,134 - INFO - Epoch 1 Step 3680 (Global: 3680): loss=0.0463, ppl=1.05, grad_norm=0.30, lr=8.17e-05, throughput=3105 tok/s +2025-11-15 21:30:00,963 - INFO - Epoch 1 Step 3690 (Global: 3690): loss=0.0468, ppl=1.05, grad_norm=0.39, lr=8.16e-05, throughput=3080 tok/s +2025-11-15 21:32:27,827 - INFO - Epoch 1 Step 3700 (Global: 3700): loss=0.0437, ppl=1.04, grad_norm=0.30, lr=8.14e-05, throughput=3268 tok/s +2025-11-15 21:35:02,493 - INFO - Epoch 1 Step 3710 (Global: 3710): loss=0.0416, ppl=1.04, grad_norm=0.28, lr=8.13e-05, throughput=3104 tok/s +2025-11-15 21:37:37,789 - INFO - Epoch 1 Step 3720 (Global: 3720): loss=0.0474, ppl=1.05, grad_norm=0.30, lr=8.12e-05, throughput=3091 tok/s +2025-11-15 21:40:13,576 - INFO - Epoch 1 Step 3730 (Global: 3730): loss=0.0408, ppl=1.04, grad_norm=0.30, lr=8.10e-05, throughput=3081 tok/s +2025-11-15 21:42:39,536 - INFO - Epoch 1 Step 3740 (Global: 3740): loss=0.0496, ppl=1.05, grad_norm=0.30, lr=8.09e-05, throughput=3289 tok/s +2025-11-15 21:45:21,732 - INFO - Epoch 1 Step 3750 (Global: 3750): loss=0.0470, ppl=1.05, grad_norm=0.34, lr=8.08e-05, throughput=2959 tok/s +2025-11-15 21:48:52,990 - INFO - Epoch 1 Step 3760 (Global: 3760): loss=0.0503, ppl=1.05, grad_norm=0.35, lr=8.06e-05, throughput=2272 tok/s +2025-11-15 21:52:45,243 - INFO - Epoch 1 Step 3770 (Global: 3770): loss=0.0421, ppl=1.04, grad_norm=0.29, lr=8.05e-05, throughput=2067 tok/s +2025-11-15 21:57:05,817 - INFO - Epoch 1 Step 3780 (Global: 3780): loss=0.0396, ppl=1.04, grad_norm=0.28, lr=8.04e-05, throughput=1842 tok/s +2025-11-15 22:00:59,363 - INFO - Epoch 1 Step 3790 (Global: 3790): loss=0.0444, ppl=1.05, grad_norm=0.30, lr=8.02e-05, throughput=2055 tok/s +2025-11-15 22:04:44,719 - INFO - Epoch 1 Step 3800 (Global: 3800): loss=0.0390, ppl=1.04, grad_norm=0.27, lr=8.01e-05, throughput=2130 tok/s +2025-11-15 22:08:04,229 - INFO - Epoch 1 Step 3810 (Global: 3810): loss=0.0442, ppl=1.05, grad_norm=0.28, lr=8.00e-05, throughput=2406 tok/s +2025-11-15 22:11:48,267 - INFO - Epoch 1 Step 3820 (Global: 3820): loss=0.0427, ppl=1.04, grad_norm=0.34, lr=7.98e-05, throughput=2143 tok/s +2025-11-15 22:15:28,111 - INFO - Epoch 1 Step 3830 (Global: 3830): loss=0.0407, ppl=1.04, grad_norm=0.28, lr=7.97e-05, throughput=2184 tok/s +2025-11-15 22:18:57,975 - INFO - Epoch 1 Step 3840 (Global: 3840): loss=0.0425, ppl=1.04, grad_norm=0.29, lr=7.96e-05, throughput=2287 tok/s +2025-11-15 22:23:04,926 - INFO - Epoch 1 Step 3850 (Global: 3850): loss=0.0445, ppl=1.05, grad_norm=0.29, lr=7.94e-05, throughput=1944 tok/s +2025-11-15 22:27:28,041 - INFO - Epoch 1 Step 3860 (Global: 3860): loss=0.0515, ppl=1.05, grad_norm=0.33, lr=7.93e-05, throughput=1824 tok/s +2025-11-15 22:31:14,169 - INFO - Epoch 1 Step 3870 (Global: 3870): loss=0.0397, ppl=1.04, grad_norm=0.27, lr=7.92e-05, throughput=2123 tok/s +2025-11-15 22:35:31,782 - INFO - Epoch 1 Step 3880 (Global: 3880): loss=0.0348, ppl=1.04, grad_norm=0.29, lr=7.90e-05, throughput=1863 tok/s +2025-11-15 22:39:36,270 - INFO - Epoch 1 Step 3890 (Global: 3890): loss=0.0379, ppl=1.04, grad_norm=0.29, lr=7.89e-05, throughput=1963 tok/s +2025-11-15 22:44:34,344 - INFO - Epoch 1 Step 3900 (Global: 3900): loss=0.0403, ppl=1.04, grad_norm=0.30, lr=7.88e-05, throughput=1610 tok/s +2025-11-15 22:50:22,281 - INFO - Epoch 1 Step 3910 (Global: 3910): loss=0.0420, ppl=1.04, grad_norm=0.30, lr=7.86e-05, throughput=1380 tok/s +2025-11-15 22:56:40,629 - INFO - Epoch 1 Step 3920 (Global: 3920): loss=0.0445, ppl=1.05, grad_norm=0.38, lr=7.85e-05, throughput=1269 tok/s +2025-11-15 23:02:33,778 - INFO - Epoch 1 Step 3930 (Global: 3930): loss=0.0508, ppl=1.05, grad_norm=0.34, lr=7.83e-05, throughput=1359 tok/s +2025-11-15 23:05:54,639 - INFO - Epoch 1 Step 3940 (Global: 3940): loss=0.0345, ppl=1.04, grad_norm=0.27, lr=7.82e-05, throughput=2390 tok/s +2025-11-15 23:08:32,355 - INFO - Epoch 1 Step 3950 (Global: 3950): loss=0.0381, ppl=1.04, grad_norm=0.33, lr=7.81e-05, throughput=3044 tok/s +2025-11-15 23:11:29,038 - INFO - Epoch 1 Step 3960 (Global: 3960): loss=0.0399, ppl=1.04, grad_norm=0.31, lr=7.79e-05, throughput=2717 tok/s +2025-11-15 23:14:18,996 - INFO - Epoch 1 Step 3970 (Global: 3970): loss=0.0490, ppl=1.05, grad_norm=0.31, lr=7.78e-05, throughput=2824 tok/s +2025-11-15 23:16:53,484 - INFO - Epoch 1 Step 3980 (Global: 3980): loss=0.0465, ppl=1.05, grad_norm=0.29, lr=7.77e-05, throughput=3107 tok/s +2025-11-15 23:19:21,837 - INFO - Epoch 1 Step 3990 (Global: 3990): loss=0.0476, ppl=1.05, grad_norm=0.34, lr=7.75e-05, throughput=3236 tok/s +2025-11-15 23:22:01,297 - INFO - Epoch 1 Step 4000 (Global: 4000): loss=0.0390, ppl=1.04, grad_norm=0.27, lr=7.74e-05, throughput=3010 tok/s +2025-11-15 23:22:01,299 - INFO - +Running validation at step 4000... +2025-11-15 23:29:44,288 - INFO - Validation loss: 0.0430, perplexity: 1.04 +2025-11-15 23:29:44,289 - INFO - Qualitative metrics (n=5): +2025-11-15 23:29:44,289 - INFO - BLEU: 0.8642 +2025-11-15 23:29:44,289 - INFO - METEOR: 0.9534 +2025-11-15 23:29:44,289 - INFO - Edit Distance: 0.0599 +2025-11-15 23:29:44,289 - INFO - F-measure: 0.9360 +2025-11-15 23:29:44,289 - INFO - +====================================================================== +2025-11-15 23:29:44,289 - INFO - Qualitative Evaluation Samples: +2025-11-15 23:29:44,290 - INFO - ====================================================================== +2025-11-15 23:29:44,290 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-15 23:29:44,290 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 23:29:44,290 - INFO - Generated: 'Q gave it four out of five stars and said that "the album [Perhaps\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s you-wasere. But it\'s no...' +2025-11-15 23:29:44,290 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-15 23:29:44,290 - INFO - ---------------------------------------------------------------------- +2025-11-15 23:29:44,290 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-15 23:29:44,290 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 23:29:44,290 - INFO - Generated: ', Sire was Aboune-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 23:29:44,290 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-15 23:29:44,290 - INFO - ---------------------------------------------------------------------- +2025-11-15 23:29:44,290 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-15 23:29:44,291 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 23:29:44,291 - INFO - Generated: ' at the meeting Laymia. His headed weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-15 23:29:44,291 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-15 23:29:44,291 - INFO - ---------------------------------------------------------------------- +2025-11-15 23:29:44,291 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-15 23:29:44,291 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 23:29:44,292 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 23:29:44,292 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-15 23:29:44,292 - INFO - ---------------------------------------------------------------------- +2025-11-15 23:29:44,292 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-15 23:29:44,292 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-15 23:29:44,292 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 23:29:44,292 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-15 23:29:44,292 - INFO - ---------------------------------------------------------------------- +2025-11-15 23:29:44,293 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_4000.jsonl +2025-11-15 23:30:25,219 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-15 23:30:25,231 - INFO - New best validation loss: 0.0430, perplexity: 1.04 +2025-11-15 23:32:58,962 - INFO - Epoch 1 Step 4010 (Global: 4010): loss=0.0360, ppl=1.04, grad_norm=0.34, lr=7.72e-05, throughput=3123 tok/s +2025-11-15 23:35:22,116 - INFO - Epoch 1 Step 4020 (Global: 4020): loss=0.0462, ppl=1.05, grad_norm=0.32, lr=7.71e-05, throughput=3353 tok/s +2025-11-15 23:37:54,834 - INFO - Epoch 1 Step 4030 (Global: 4030): loss=0.0417, ppl=1.04, grad_norm=0.29, lr=7.70e-05, throughput=3143 tok/s +2025-11-15 23:40:28,508 - INFO - Epoch 1 Step 4040 (Global: 4040): loss=0.0466, ppl=1.05, grad_norm=0.32, lr=7.68e-05, throughput=3124 tok/s +2025-11-15 23:43:00,694 - INFO - Epoch 1 Step 4050 (Global: 4050): loss=0.0416, ppl=1.04, grad_norm=0.28, lr=7.67e-05, throughput=3154 tok/s +2025-11-15 23:45:24,590 - INFO - Epoch 1 Step 4060 (Global: 4060): loss=0.0471, ppl=1.05, grad_norm=0.35, lr=7.65e-05, throughput=3336 tok/s +2025-11-15 23:47:57,862 - INFO - Epoch 1 Step 4070 (Global: 4070): loss=0.0441, ppl=1.05, grad_norm=0.40, lr=7.64e-05, throughput=3132 tok/s +2025-11-15 23:50:30,410 - INFO - Epoch 1 Step 4080 (Global: 4080): loss=0.0447, ppl=1.05, grad_norm=0.33, lr=7.62e-05, throughput=3147 tok/s +2025-11-15 23:52:54,780 - INFO - Epoch 1 Step 4090 (Global: 4090): loss=0.0413, ppl=1.04, grad_norm=0.30, lr=7.61e-05, throughput=3325 tok/s +2025-11-15 23:55:28,183 - INFO - Epoch 1 Step 4100 (Global: 4100): loss=0.0386, ppl=1.04, grad_norm=0.30, lr=7.60e-05, throughput=3129 tok/s +2025-11-15 23:58:00,290 - INFO - Epoch 1 Step 4110 (Global: 4110): loss=0.0397, ppl=1.04, grad_norm=0.29, lr=7.58e-05, throughput=3156 tok/s +2025-11-16 00:00:34,418 - INFO - Epoch 1 Step 4120 (Global: 4120): loss=0.0405, ppl=1.04, grad_norm=0.31, lr=7.57e-05, throughput=3114 tok/s +2025-11-16 00:03:00,648 - INFO - Epoch 1 Step 4130 (Global: 4130): loss=0.0480, ppl=1.05, grad_norm=0.31, lr=7.55e-05, throughput=3283 tok/s +2025-11-16 00:05:34,036 - INFO - Epoch 1 Step 4140 (Global: 4140): loss=0.0391, ppl=1.04, grad_norm=0.33, lr=7.54e-05, throughput=3129 tok/s +2025-11-16 00:08:08,861 - INFO - Epoch 1 Step 4150 (Global: 4150): loss=0.0383, ppl=1.04, grad_norm=0.30, lr=7.52e-05, throughput=3100 tok/s +2025-11-16 00:10:34,068 - INFO - Epoch 1 Step 4160 (Global: 4160): loss=0.0413, ppl=1.04, grad_norm=0.28, lr=7.51e-05, throughput=3306 tok/s +2025-11-16 00:13:10,371 - INFO - Epoch 1 Step 4170 (Global: 4170): loss=0.0479, ppl=1.05, grad_norm=0.30, lr=7.49e-05, throughput=3071 tok/s +2025-11-16 00:15:47,034 - INFO - Epoch 1 Step 4180 (Global: 4180): loss=0.0376, ppl=1.04, grad_norm=0.27, lr=7.48e-05, throughput=3064 tok/s +2025-11-16 00:18:22,020 - INFO - Epoch 1 Step 4190 (Global: 4190): loss=0.0440, ppl=1.05, grad_norm=0.29, lr=7.47e-05, throughput=3097 tok/s +2025-11-16 00:20:48,671 - INFO - Epoch 1 Step 4200 (Global: 4200): loss=0.0383, ppl=1.04, grad_norm=0.29, lr=7.45e-05, throughput=3273 tok/s +2025-11-16 00:23:24,239 - INFO - Epoch 1 Step 4210 (Global: 4210): loss=0.0527, ppl=1.05, grad_norm=0.32, lr=7.44e-05, throughput=3086 tok/s +2025-11-16 00:26:01,005 - INFO - Epoch 1 Step 4220 (Global: 4220): loss=0.0424, ppl=1.04, grad_norm=0.30, lr=7.42e-05, throughput=3062 tok/s +2025-11-16 00:28:28,020 - INFO - Epoch 1 Step 4230 (Global: 4230): loss=0.0412, ppl=1.04, grad_norm=0.31, lr=7.41e-05, throughput=3265 tok/s +2025-11-16 00:31:01,544 - INFO - Epoch 1 Step 4240 (Global: 4240): loss=0.0372, ppl=1.04, grad_norm=0.28, lr=7.39e-05, throughput=3127 tok/s +2025-11-16 00:33:36,494 - INFO - Epoch 1 Step 4250 (Global: 4250): loss=0.0396, ppl=1.04, grad_norm=0.31, lr=7.38e-05, throughput=3098 tok/s +2025-11-16 00:36:10,275 - INFO - Epoch 1 Step 4260 (Global: 4260): loss=0.0389, ppl=1.04, grad_norm=0.28, lr=7.36e-05, throughput=3121 tok/s +2025-11-16 00:38:35,526 - INFO - Epoch 1 Step 4270 (Global: 4270): loss=0.0459, ppl=1.05, grad_norm=0.31, lr=7.35e-05, throughput=3305 tok/s +2025-11-16 00:41:08,936 - INFO - Epoch 1 Step 4280 (Global: 4280): loss=0.0331, ppl=1.03, grad_norm=0.26, lr=7.33e-05, throughput=3129 tok/s +2025-11-16 00:43:43,197 - INFO - Epoch 1 Step 4290 (Global: 4290): loss=0.0426, ppl=1.04, grad_norm=0.30, lr=7.32e-05, throughput=3112 tok/s +2025-11-16 00:46:07,459 - INFO - Epoch 1 Step 4300 (Global: 4300): loss=0.0391, ppl=1.04, grad_norm=0.29, lr=7.30e-05, throughput=3327 tok/s +2025-11-16 00:48:40,898 - INFO - Epoch 1 Step 4310 (Global: 4310): loss=0.0340, ppl=1.03, grad_norm=0.27, lr=7.29e-05, throughput=3128 tok/s +2025-11-16 00:51:14,705 - INFO - Epoch 1 Step 4320 (Global: 4320): loss=0.0463, ppl=1.05, grad_norm=0.38, lr=7.27e-05, throughput=3121 tok/s +2025-11-16 00:53:47,974 - INFO - Epoch 1 Step 4330 (Global: 4330): loss=0.0453, ppl=1.05, grad_norm=0.30, lr=7.26e-05, throughput=3132 tok/s +2025-11-16 00:56:12,168 - INFO - Epoch 1 Step 4340 (Global: 4340): loss=0.0358, ppl=1.04, grad_norm=0.29, lr=7.24e-05, throughput=3329 tok/s +2025-11-16 00:58:45,294 - INFO - Epoch 1 Step 4350 (Global: 4350): loss=0.0435, ppl=1.04, grad_norm=0.28, lr=7.23e-05, throughput=3135 tok/s +2025-11-16 01:01:18,780 - INFO - Epoch 1 Step 4360 (Global: 4360): loss=0.0466, ppl=1.05, grad_norm=0.32, lr=7.21e-05, throughput=3127 tok/s +2025-11-16 01:03:44,627 - INFO - Epoch 1 Step 4370 (Global: 4370): loss=0.0412, ppl=1.04, grad_norm=0.32, lr=7.20e-05, throughput=3291 tok/s +2025-11-16 01:06:19,211 - INFO - Epoch 1 Step 4380 (Global: 4380): loss=0.0426, ppl=1.04, grad_norm=0.29, lr=7.18e-05, throughput=3105 tok/s +2025-11-16 01:08:53,207 - INFO - Epoch 1 Step 4390 (Global: 4390): loss=0.0403, ppl=1.04, grad_norm=0.30, lr=7.17e-05, throughput=3117 tok/s +2025-11-16 01:11:28,311 - INFO - Epoch 1 Step 4400 (Global: 4400): loss=0.0373, ppl=1.04, grad_norm=0.31, lr=7.15e-05, throughput=3095 tok/s +2025-11-16 01:13:55,702 - INFO - Epoch 1 Step 4410 (Global: 4410): loss=0.0373, ppl=1.04, grad_norm=0.27, lr=7.14e-05, throughput=3257 tok/s +2025-11-16 01:16:29,885 - INFO - Epoch 1 Step 4420 (Global: 4420): loss=0.0394, ppl=1.04, grad_norm=0.28, lr=7.12e-05, throughput=3113 tok/s +2025-11-16 01:19:03,726 - INFO - Epoch 1 Step 4430 (Global: 4430): loss=0.0461, ppl=1.05, grad_norm=0.31, lr=7.11e-05, throughput=3120 tok/s +2025-11-16 01:21:28,511 - INFO - Epoch 1 Step 4440 (Global: 4440): loss=0.0415, ppl=1.04, grad_norm=0.28, lr=7.09e-05, throughput=3315 tok/s +2025-11-16 01:24:02,493 - INFO - Epoch 1 Step 4450 (Global: 4450): loss=0.0453, ppl=1.05, grad_norm=0.31, lr=7.08e-05, throughput=3117 tok/s +2025-11-16 01:26:36,713 - INFO - Epoch 1 Step 4460 (Global: 4460): loss=0.0415, ppl=1.04, grad_norm=0.30, lr=7.06e-05, throughput=3112 tok/s +2025-11-16 01:29:11,634 - INFO - Epoch 1 Step 4470 (Global: 4470): loss=0.0421, ppl=1.04, grad_norm=0.29, lr=7.05e-05, throughput=3098 tok/s +2025-11-16 01:31:35,999 - INFO - Epoch 1 Step 4480 (Global: 4480): loss=0.0462, ppl=1.05, grad_norm=0.29, lr=7.03e-05, throughput=3325 tok/s +2025-11-16 01:34:10,249 - INFO - Epoch 1 Step 4490 (Global: 4490): loss=0.0416, ppl=1.04, grad_norm=0.29, lr=7.02e-05, throughput=3112 tok/s +2025-11-16 01:36:43,766 - INFO - Epoch 1 Step 4500 (Global: 4500): loss=0.0350, ppl=1.04, grad_norm=0.26, lr=7.00e-05, throughput=3127 tok/s +2025-11-16 01:36:43,767 - INFO - +Running validation at step 4500... +2025-11-16 01:44:24,850 - INFO - Validation loss: 0.0412, perplexity: 1.04 +2025-11-16 01:44:24,850 - INFO - Qualitative metrics (n=5): +2025-11-16 01:44:24,850 - INFO - BLEU: 0.8440 +2025-11-16 01:44:24,851 - INFO - METEOR: 0.9486 +2025-11-16 01:44:24,851 - INFO - Edit Distance: 0.0710 +2025-11-16 01:44:24,851 - INFO - F-measure: 0.9285 +2025-11-16 01:44:24,851 - INFO - +====================================================================== +2025-11-16 01:44:24,851 - INFO - Qualitative Evaluation Samples: +2025-11-16 01:44:24,851 - INFO - ====================================================================== +2025-11-16 01:44:24,851 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 01:44:24,851 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 01:44:24,851 - INFO - Generated: 'Q gave it four stars out of five and said that "the album [Perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s you-wasere. But it\'s no...' +2025-11-16 01:44:24,851 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 01:44:24,852 - INFO - ---------------------------------------------------------------------- +2025-11-16 01:44:24,852 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 01:44:24,852 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 01:44:24,852 - INFO - Generated: ' Sire, was Aboune-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 01:44:24,852 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 01:44:24,852 - INFO - ---------------------------------------------------------------------- +2025-11-16 01:44:24,852 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 01:44:24,852 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 01:44:24,852 - INFO - Generated: ' at the meeting Laymia. His headed weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 01:44:24,852 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 01:44:24,852 - INFO - ---------------------------------------------------------------------- +2025-11-16 01:44:24,853 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 01:44:24,853 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 01:44:24,853 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 01:44:24,853 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 01:44:24,853 - INFO - ---------------------------------------------------------------------- +2025-11-16 01:44:24,853 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 01:44:24,853 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 01:44:24,853 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 01:44:24,853 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 01:44:24,854 - INFO - ---------------------------------------------------------------------- +2025-11-16 01:44:24,855 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_4500.jsonl +2025-11-16 01:45:13,299 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 01:45:13,309 - INFO - New best validation loss: 0.0412, perplexity: 1.04 +2025-11-16 01:47:38,745 - INFO - Epoch 1 Step 4510 (Global: 4510): loss=0.0425, ppl=1.04, grad_norm=0.30, lr=6.99e-05, throughput=3301 tok/s +2025-11-16 01:50:12,975 - INFO - Epoch 1 Step 4520 (Global: 4520): loss=0.0412, ppl=1.04, grad_norm=0.29, lr=6.97e-05, throughput=3112 tok/s +2025-11-16 01:52:47,122 - INFO - Epoch 1 Step 4530 (Global: 4530): loss=0.0406, ppl=1.04, grad_norm=0.31, lr=6.96e-05, throughput=3114 tok/s +2025-11-16 01:55:20,759 - INFO - Epoch 1 Step 4540 (Global: 4540): loss=0.0420, ppl=1.04, grad_norm=0.30, lr=6.94e-05, throughput=3124 tok/s +2025-11-16 01:57:45,810 - INFO - Epoch 1 Step 4550 (Global: 4550): loss=0.0419, ppl=1.04, grad_norm=0.28, lr=6.92e-05, throughput=3309 tok/s +2025-11-16 02:00:20,563 - INFO - Epoch 1 Step 4560 (Global: 4560): loss=0.0409, ppl=1.04, grad_norm=0.29, lr=6.91e-05, throughput=3102 tok/s +2025-11-16 02:02:55,616 - INFO - Epoch 1 Step 4570 (Global: 4570): loss=0.0417, ppl=1.04, grad_norm=0.28, lr=6.89e-05, throughput=3096 tok/s +2025-11-16 02:05:23,520 - INFO - Epoch 1 Step 4580 (Global: 4580): loss=0.0411, ppl=1.04, grad_norm=0.28, lr=6.88e-05, throughput=3245 tok/s +2025-11-16 02:07:57,458 - INFO - Epoch 1 Step 4590 (Global: 4590): loss=0.0437, ppl=1.04, grad_norm=0.30, lr=6.86e-05, throughput=3118 tok/s +2025-11-16 02:10:32,281 - INFO - Epoch 1 Step 4600 (Global: 4600): loss=0.0402, ppl=1.04, grad_norm=0.29, lr=6.85e-05, throughput=3100 tok/s +2025-11-16 02:13:07,058 - INFO - Epoch 1 Step 4610 (Global: 4610): loss=0.0397, ppl=1.04, grad_norm=0.28, lr=6.83e-05, throughput=3101 tok/s +2025-11-16 02:15:32,995 - INFO - Epoch 1 Step 4620 (Global: 4620): loss=0.0377, ppl=1.04, grad_norm=0.27, lr=6.82e-05, throughput=3289 tok/s +2025-11-16 02:18:06,354 - INFO - Epoch 1 Step 4630 (Global: 4630): loss=0.0395, ppl=1.04, grad_norm=0.29, lr=6.80e-05, throughput=3130 tok/s +2025-11-16 02:20:39,762 - INFO - Epoch 1 Step 4640 (Global: 4640): loss=0.0453, ppl=1.05, grad_norm=0.30, lr=6.78e-05, throughput=3129 tok/s +2025-11-16 02:23:03,238 - INFO - Epoch 1 Step 4650 (Global: 4650): loss=0.0355, ppl=1.04, grad_norm=0.27, lr=6.77e-05, throughput=3346 tok/s +2025-11-16 02:25:35,679 - INFO - Epoch 1 Step 4660 (Global: 4660): loss=0.0428, ppl=1.04, grad_norm=0.28, lr=6.75e-05, throughput=3149 tok/s +2025-11-16 02:28:08,264 - INFO - Epoch 1 Step 4670 (Global: 4670): loss=0.0470, ppl=1.05, grad_norm=0.32, lr=6.74e-05, throughput=3146 tok/s +2025-11-16 02:30:41,494 - INFO - Epoch 1 Step 4680 (Global: 4680): loss=0.0414, ppl=1.04, grad_norm=0.28, lr=6.72e-05, throughput=3133 tok/s +2025-11-16 02:33:05,057 - INFO - Epoch 1 Step 4690 (Global: 4690): loss=0.0405, ppl=1.04, grad_norm=0.29, lr=6.71e-05, throughput=3344 tok/s +2025-11-16 02:35:38,202 - INFO - Epoch 1 Step 4700 (Global: 4700): loss=0.0408, ppl=1.04, grad_norm=0.31, lr=6.69e-05, throughput=3134 tok/s +2025-11-16 02:38:10,693 - INFO - Epoch 1 Step 4710 (Global: 4710): loss=0.0423, ppl=1.04, grad_norm=0.31, lr=6.67e-05, throughput=3148 tok/s +2025-11-16 02:40:34,059 - INFO - Epoch 1 Step 4720 (Global: 4720): loss=0.0400, ppl=1.04, grad_norm=0.29, lr=6.66e-05, throughput=3348 tok/s +2025-11-16 02:43:06,221 - INFO - Epoch 1 Step 4730 (Global: 4730): loss=0.0382, ppl=1.04, grad_norm=0.28, lr=6.64e-05, throughput=3155 tok/s +2025-11-16 02:45:38,590 - INFO - Epoch 1 Step 4740 (Global: 4740): loss=0.0484, ppl=1.05, grad_norm=0.31, lr=6.63e-05, throughput=3150 tok/s +2025-11-16 02:48:11,251 - INFO - Epoch 1 Step 4750 (Global: 4750): loss=0.0366, ppl=1.04, grad_norm=0.29, lr=6.61e-05, throughput=3144 tok/s +2025-11-16 02:50:34,745 - INFO - Epoch 1 Step 4760 (Global: 4760): loss=0.0357, ppl=1.04, grad_norm=0.28, lr=6.60e-05, throughput=3345 tok/s +2025-11-16 02:53:07,137 - INFO - Epoch 1 Step 4770 (Global: 4770): loss=0.0342, ppl=1.03, grad_norm=0.27, lr=6.58e-05, throughput=3150 tok/s +2025-11-16 02:55:39,683 - INFO - Epoch 1 Step 4780 (Global: 4780): loss=0.0481, ppl=1.05, grad_norm=0.32, lr=6.56e-05, throughput=3147 tok/s +2025-11-16 02:58:05,031 - INFO - Epoch 1 Step 4790 (Global: 4790): loss=0.0390, ppl=1.04, grad_norm=0.29, lr=6.55e-05, throughput=3303 tok/s +2025-11-16 03:00:40,283 - INFO - Epoch 1 Step 4800 (Global: 4800): loss=0.0348, ppl=1.04, grad_norm=0.27, lr=6.53e-05, throughput=3092 tok/s +2025-11-16 03:03:15,771 - INFO - Epoch 1 Step 4810 (Global: 4810): loss=0.0381, ppl=1.04, grad_norm=0.28, lr=6.52e-05, throughput=3087 tok/s +2025-11-16 03:05:50,363 - INFO - Epoch 1 Step 4820 (Global: 4820): loss=0.0429, ppl=1.04, grad_norm=0.31, lr=6.50e-05, throughput=3105 tok/s +2025-11-16 03:08:18,438 - INFO - Epoch 1 Step 4830 (Global: 4830): loss=0.0426, ppl=1.04, grad_norm=0.33, lr=6.48e-05, throughput=3242 tok/s +2025-11-16 03:10:53,676 - INFO - Epoch 1 Step 4840 (Global: 4840): loss=0.0365, ppl=1.04, grad_norm=0.29, lr=6.47e-05, throughput=3092 tok/s +2025-11-16 03:13:26,753 - INFO - Epoch 1 Step 4850 (Global: 4850): loss=0.0362, ppl=1.04, grad_norm=0.28, lr=6.45e-05, throughput=3136 tok/s +2025-11-16 03:15:51,488 - INFO - Epoch 1 Step 4860 (Global: 4860): loss=0.0367, ppl=1.04, grad_norm=0.27, lr=6.44e-05, throughput=3316 tok/s +2025-11-16 03:18:24,784 - INFO - Epoch 1 Step 4870 (Global: 4870): loss=0.0389, ppl=1.04, grad_norm=0.28, lr=6.42e-05, throughput=3131 tok/s +2025-11-16 03:20:57,710 - INFO - Epoch 1 Step 4880 (Global: 4880): loss=0.0334, ppl=1.03, grad_norm=0.26, lr=6.40e-05, throughput=3139 tok/s +2025-11-16 03:23:30,624 - INFO - Epoch 1 Step 4890 (Global: 4890): loss=0.0437, ppl=1.04, grad_norm=0.35, lr=6.39e-05, throughput=3139 tok/s +2025-11-16 03:25:54,393 - INFO - Epoch 1 Step 4900 (Global: 4900): loss=0.0409, ppl=1.04, grad_norm=0.28, lr=6.37e-05, throughput=3342 tok/s +2025-11-16 03:28:26,675 - INFO - Epoch 1 Step 4910 (Global: 4910): loss=0.0386, ppl=1.04, grad_norm=0.28, lr=6.35e-05, throughput=3152 tok/s +2025-11-16 03:30:59,286 - INFO - Epoch 1 Step 4920 (Global: 4920): loss=0.0345, ppl=1.04, grad_norm=0.26, lr=6.34e-05, throughput=3145 tok/s +2025-11-16 03:33:22,545 - INFO - Epoch 1 Step 4930 (Global: 4930): loss=0.0422, ppl=1.04, grad_norm=0.30, lr=6.32e-05, throughput=3351 tok/s +2025-11-16 03:35:55,191 - INFO - Epoch 1 Step 4940 (Global: 4940): loss=0.0306, ppl=1.03, grad_norm=0.25, lr=6.31e-05, throughput=3145 tok/s +2025-11-16 03:38:27,477 - INFO - Epoch 1 Step 4950 (Global: 4950): loss=0.0429, ppl=1.04, grad_norm=0.30, lr=6.29e-05, throughput=3152 tok/s +2025-11-16 03:40:59,538 - INFO - Epoch 1 Step 4960 (Global: 4960): loss=0.0367, ppl=1.04, grad_norm=0.30, lr=6.27e-05, throughput=3157 tok/s +2025-11-16 03:43:22,889 - INFO - Epoch 1 Step 4970 (Global: 4970): loss=0.0352, ppl=1.04, grad_norm=0.27, lr=6.26e-05, throughput=3348 tok/s +2025-11-16 03:45:55,032 - INFO - Epoch 1 Step 4980 (Global: 4980): loss=0.0362, ppl=1.04, grad_norm=0.27, lr=6.24e-05, throughput=3155 tok/s +2025-11-16 03:48:26,778 - INFO - Epoch 1 Step 4990 (Global: 4990): loss=0.0487, ppl=1.05, grad_norm=0.33, lr=6.23e-05, throughput=3163 tok/s +2025-11-16 03:50:49,794 - INFO - Epoch 1 Step 5000 (Global: 5000): loss=0.0399, ppl=1.04, grad_norm=0.27, lr=6.21e-05, throughput=3356 tok/s +2025-11-16 03:50:49,796 - INFO - +Running validation at step 5000... +2025-11-16 03:58:13,760 - INFO - Validation loss: 0.0395, perplexity: 1.04 +2025-11-16 03:58:13,761 - INFO - Qualitative metrics (n=5): +2025-11-16 03:58:13,761 - INFO - BLEU: 0.8584 +2025-11-16 03:58:13,761 - INFO - METEOR: 0.9456 +2025-11-16 03:58:13,761 - INFO - Edit Distance: 0.0646 +2025-11-16 03:58:13,761 - INFO - F-measure: 0.9261 +2025-11-16 03:58:13,762 - INFO - +====================================================================== +2025-11-16 03:58:13,762 - INFO - Qualitative Evaluation Samples: +2025-11-16 03:58:13,762 - INFO - ====================================================================== +2025-11-16 03:58:13,762 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 03:58:13,762 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 03:58:13,762 - INFO - Generated: 'Q gave it four out of five stars and said that "The album [perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s you-wasere. But it\'s no...' +2025-11-16 03:58:13,762 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 03:58:13,762 - INFO - ---------------------------------------------------------------------- +2025-11-16 03:58:13,762 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 03:58:13,762 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 03:58:13,763 - INFO - Generated: ', Sire was Aboune-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 03:58:13,763 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 03:58:13,763 - INFO - ---------------------------------------------------------------------- +2025-11-16 03:58:13,763 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 03:58:13,763 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 03:58:13,763 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 03:58:13,763 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 03:58:13,763 - INFO - ---------------------------------------------------------------------- +2025-11-16 03:58:13,763 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 03:58:13,763 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 03:58:13,764 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 03:58:13,764 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 03:58:13,764 - INFO - ---------------------------------------------------------------------- +2025-11-16 03:58:13,765 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 03:58:13,765 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 03:58:13,765 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 03:58:13,765 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 03:58:13,765 - INFO - ---------------------------------------------------------------------- +2025-11-16 03:58:13,767 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_5000.jsonl +2025-11-16 03:58:54,274 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 03:58:54,282 - INFO - New best validation loss: 0.0395, perplexity: 1.04 +2025-11-16 04:01:27,052 - INFO - Epoch 1 Step 5010 (Global: 5010): loss=0.0389, ppl=1.04, grad_norm=0.28, lr=6.19e-05, throughput=3142 tok/s +2025-11-16 04:04:00,781 - INFO - Epoch 1 Step 5020 (Global: 5020): loss=0.0342, ppl=1.03, grad_norm=0.27, lr=6.18e-05, throughput=3122 tok/s +2025-11-16 04:06:33,856 - INFO - Epoch 1 Step 5030 (Global: 5030): loss=0.0407, ppl=1.04, grad_norm=0.31, lr=6.16e-05, throughput=3136 tok/s +2025-11-16 04:08:57,680 - INFO - Epoch 1 Step 5040 (Global: 5040): loss=0.0307, ppl=1.03, grad_norm=0.25, lr=6.14e-05, throughput=3337 tok/s +2025-11-16 04:11:30,022 - INFO - Epoch 1 Step 5050 (Global: 5050): loss=0.0383, ppl=1.04, grad_norm=0.28, lr=6.13e-05, throughput=3151 tok/s +2025-11-16 04:14:02,940 - INFO - Epoch 1 Step 5060 (Global: 5060): loss=0.0383, ppl=1.04, grad_norm=0.28, lr=6.11e-05, throughput=3139 tok/s +2025-11-16 04:16:26,408 - INFO - Epoch 1 Step 5070 (Global: 5070): loss=0.0405, ppl=1.04, grad_norm=0.28, lr=6.10e-05, throughput=3346 tok/s +2025-11-16 04:18:58,790 - INFO - Epoch 1 Step 5080 (Global: 5080): loss=0.0398, ppl=1.04, grad_norm=0.29, lr=6.08e-05, throughput=3150 tok/s +2025-11-16 04:21:30,984 - INFO - Epoch 1 Step 5090 (Global: 5090): loss=0.0420, ppl=1.04, grad_norm=0.30, lr=6.06e-05, throughput=3154 tok/s +2025-11-16 04:24:03,251 - INFO - Epoch 1 Step 5100 (Global: 5100): loss=0.0353, ppl=1.04, grad_norm=0.28, lr=6.05e-05, throughput=3152 tok/s +2025-11-16 04:26:27,360 - INFO - Epoch 1 Step 5110 (Global: 5110): loss=0.0373, ppl=1.04, grad_norm=0.27, lr=6.03e-05, throughput=3331 tok/s +2025-11-16 04:28:59,636 - INFO - Epoch 1 Step 5120 (Global: 5120): loss=0.0397, ppl=1.04, grad_norm=0.27, lr=6.01e-05, throughput=3152 tok/s +2025-11-16 04:31:32,313 - INFO - Epoch 1 Step 5130 (Global: 5130): loss=0.0381, ppl=1.04, grad_norm=0.29, lr=6.00e-05, throughput=3144 tok/s +2025-11-16 04:33:55,546 - INFO - Epoch 1 Step 5140 (Global: 5140): loss=0.0399, ppl=1.04, grad_norm=0.29, lr=5.98e-05, throughput=3351 tok/s +2025-11-16 04:36:28,481 - INFO - Epoch 1 Step 5150 (Global: 5150): loss=0.0403, ppl=1.04, grad_norm=0.30, lr=5.96e-05, throughput=3139 tok/s +2025-11-16 04:39:00,675 - INFO - Epoch 1 Step 5160 (Global: 5160): loss=0.0383, ppl=1.04, grad_norm=0.28, lr=5.95e-05, throughput=3154 tok/s +2025-11-16 04:41:32,820 - INFO - Epoch 1 Step 5170 (Global: 5170): loss=0.0417, ppl=1.04, grad_norm=0.29, lr=5.93e-05, throughput=3155 tok/s +2025-11-16 04:43:55,676 - INFO - Epoch 1 Step 5180 (Global: 5180): loss=0.0358, ppl=1.04, grad_norm=0.27, lr=5.91e-05, throughput=3360 tok/s +2025-11-16 04:46:27,561 - INFO - Epoch 1 Step 5190 (Global: 5190): loss=0.0405, ppl=1.04, grad_norm=0.33, lr=5.90e-05, throughput=3160 tok/s +2025-11-16 04:48:59,346 - INFO - Epoch 1 Step 5200 (Global: 5200): loss=0.0347, ppl=1.04, grad_norm=0.29, lr=5.88e-05, throughput=3162 tok/s +2025-11-16 04:51:22,271 - INFO - Epoch 1 Step 5210 (Global: 5210): loss=0.0327, ppl=1.03, grad_norm=0.26, lr=5.87e-05, throughput=3358 tok/s +2025-11-16 04:53:54,285 - INFO - Epoch 1 Step 5220 (Global: 5220): loss=0.0392, ppl=1.04, grad_norm=0.32, lr=5.85e-05, throughput=3158 tok/s +2025-11-16 04:56:26,798 - INFO - Epoch 1 Step 5230 (Global: 5230): loss=0.0380, ppl=1.04, grad_norm=0.28, lr=5.83e-05, throughput=3147 tok/s +2025-11-16 04:58:59,018 - INFO - Epoch 1 Step 5240 (Global: 5240): loss=0.0471, ppl=1.05, grad_norm=0.31, lr=5.82e-05, throughput=3153 tok/s +2025-11-16 05:01:21,938 - INFO - Epoch 1 Step 5250 (Global: 5250): loss=0.0351, ppl=1.04, grad_norm=0.27, lr=5.80e-05, throughput=3359 tok/s +2025-11-16 05:03:54,030 - INFO - Epoch 1 Step 5260 (Global: 5260): loss=0.0423, ppl=1.04, grad_norm=0.30, lr=5.78e-05, throughput=3156 tok/s +2025-11-16 05:06:26,565 - INFO - Epoch 1 Step 5270 (Global: 5270): loss=0.0414, ppl=1.04, grad_norm=0.32, lr=5.77e-05, throughput=3147 tok/s +2025-11-16 05:08:50,070 - INFO - Epoch 1 Step 5280 (Global: 5280): loss=0.0418, ppl=1.04, grad_norm=0.29, lr=5.75e-05, throughput=3345 tok/s +2025-11-16 05:11:24,819 - INFO - Epoch 1 Step 5290 (Global: 5290): loss=0.0379, ppl=1.04, grad_norm=0.28, lr=5.73e-05, throughput=3102 tok/s +2025-11-16 05:13:58,124 - INFO - Epoch 1 Step 5300 (Global: 5300): loss=0.0380, ppl=1.04, grad_norm=0.29, lr=5.72e-05, throughput=3131 tok/s +2025-11-16 05:16:31,758 - INFO - Epoch 1 Step 5310 (Global: 5310): loss=0.0401, ppl=1.04, grad_norm=0.28, lr=5.70e-05, throughput=3124 tok/s +2025-11-16 05:18:54,738 - INFO - Epoch 1 Step 5320 (Global: 5320): loss=0.0425, ppl=1.04, grad_norm=0.30, lr=5.68e-05, throughput=3357 tok/s +2025-11-16 05:21:26,449 - INFO - Epoch 1 Step 5330 (Global: 5330): loss=0.0342, ppl=1.03, grad_norm=0.26, lr=5.67e-05, throughput=3164 tok/s +2025-11-16 05:23:59,080 - INFO - Epoch 1 Step 5340 (Global: 5340): loss=0.0400, ppl=1.04, grad_norm=0.29, lr=5.65e-05, throughput=3145 tok/s +2025-11-16 05:26:22,562 - INFO - Epoch 1 Step 5350 (Global: 5350): loss=0.0386, ppl=1.04, grad_norm=0.29, lr=5.63e-05, throughput=3345 tok/s +2025-11-16 05:28:54,594 - INFO - Epoch 1 Step 5360 (Global: 5360): loss=0.0345, ppl=1.04, grad_norm=0.29, lr=5.62e-05, throughput=3157 tok/s +2025-11-16 05:31:27,062 - INFO - Epoch 1 Step 5370 (Global: 5370): loss=0.0384, ppl=1.04, grad_norm=0.28, lr=5.60e-05, throughput=3148 tok/s +2025-11-16 05:33:59,391 - INFO - Epoch 1 Step 5380 (Global: 5380): loss=0.0417, ppl=1.04, grad_norm=0.28, lr=5.58e-05, throughput=3151 tok/s +2025-11-16 05:36:22,616 - INFO - Epoch 1 Step 5390 (Global: 5390): loss=0.0460, ppl=1.05, grad_norm=0.30, lr=5.57e-05, throughput=3351 tok/s +2025-11-16 05:38:55,243 - INFO - Epoch 1 Step 5400 (Global: 5400): loss=0.0357, ppl=1.04, grad_norm=0.27, lr=5.55e-05, throughput=3145 tok/s +2025-11-16 05:41:27,688 - INFO - Epoch 1 Step 5410 (Global: 5410): loss=0.0362, ppl=1.04, grad_norm=0.27, lr=5.53e-05, throughput=3149 tok/s +2025-11-16 05:43:50,846 - INFO - Epoch 1 Step 5420 (Global: 5420): loss=0.0374, ppl=1.04, grad_norm=0.28, lr=5.52e-05, throughput=3353 tok/s +2025-11-16 05:46:23,196 - INFO - Epoch 1 Step 5430 (Global: 5430): loss=0.0392, ppl=1.04, grad_norm=0.28, lr=5.50e-05, throughput=3151 tok/s +2025-11-16 05:48:55,426 - INFO - Epoch 1 Step 5440 (Global: 5440): loss=0.0419, ppl=1.04, grad_norm=0.29, lr=5.48e-05, throughput=3153 tok/s +2025-11-16 05:51:27,848 - INFO - Epoch 1 Step 5450 (Global: 5450): loss=0.0409, ppl=1.04, grad_norm=0.29, lr=5.47e-05, throughput=3149 tok/s +2025-11-16 05:53:51,328 - INFO - Epoch 1 Step 5460 (Global: 5460): loss=0.0391, ppl=1.04, grad_norm=0.26, lr=5.45e-05, throughput=3345 tok/s +2025-11-16 05:56:23,533 - INFO - Epoch 1 Step 5470 (Global: 5470): loss=0.0408, ppl=1.04, grad_norm=0.29, lr=5.43e-05, throughput=3154 tok/s +2025-11-16 05:58:55,692 - INFO - Epoch 1 Step 5480 (Global: 5480): loss=0.0367, ppl=1.04, grad_norm=0.26, lr=5.42e-05, throughput=3155 tok/s +2025-11-16 06:01:19,550 - INFO - Epoch 1 Step 5490 (Global: 5490): loss=0.0308, ppl=1.03, grad_norm=0.28, lr=5.40e-05, throughput=3337 tok/s +2025-11-16 06:03:51,854 - INFO - Epoch 1 Step 5500 (Global: 5500): loss=0.0395, ppl=1.04, grad_norm=0.29, lr=5.38e-05, throughput=3152 tok/s +2025-11-16 06:03:51,856 - INFO - +Running validation at step 5500... +2025-11-16 06:11:18,014 - INFO - Validation loss: 0.0383, perplexity: 1.04 +2025-11-16 06:11:18,015 - INFO - Qualitative metrics (n=5): +2025-11-16 06:11:18,015 - INFO - BLEU: 0.8797 +2025-11-16 06:11:18,015 - INFO - METEOR: 0.9585 +2025-11-16 06:11:18,015 - INFO - Edit Distance: 0.0596 +2025-11-16 06:11:18,015 - INFO - F-measure: 0.9420 +2025-11-16 06:11:18,015 - INFO - +====================================================================== +2025-11-16 06:11:18,015 - INFO - Qualitative Evaluation Samples: +2025-11-16 06:11:18,015 - INFO - ====================================================================== +2025-11-16 06:11:18,015 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 06:11:18,016 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 06:11:18,016 - INFO - Generated: 'Q gave it four stars out of five and said that "the album [Perhaps\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 06:11:18,016 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 06:11:18,016 - INFO - ---------------------------------------------------------------------- +2025-11-16 06:11:18,016 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 06:11:18,016 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 06:11:18,016 - INFO - Generated: 'ire, was S-Choune Abakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 06:11:18,016 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 06:11:18,016 - INFO - ---------------------------------------------------------------------- +2025-11-16 06:11:18,016 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 06:11:18,016 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 06:11:18,016 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 06:11:18,017 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 06:11:18,017 - INFO - ---------------------------------------------------------------------- +2025-11-16 06:11:18,017 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 06:11:18,017 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 06:11:18,017 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 06:11:18,017 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 06:11:18,017 - INFO - ---------------------------------------------------------------------- +2025-11-16 06:11:18,017 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 06:11:18,017 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 06:11:18,017 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 06:11:18,018 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 06:11:18,018 - INFO - ---------------------------------------------------------------------- +2025-11-16 06:11:18,019 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_5500.jsonl +2025-11-16 06:11:58,840 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 06:11:58,850 - INFO - New best validation loss: 0.0383, perplexity: 1.04 +2025-11-16 06:14:32,287 - INFO - Epoch 1 Step 5510 (Global: 5510): loss=0.0430, ppl=1.04, grad_norm=0.31, lr=5.37e-05, throughput=3129 tok/s +2025-11-16 06:17:04,321 - INFO - Epoch 1 Step 5520 (Global: 5520): loss=0.0359, ppl=1.04, grad_norm=0.27, lr=5.35e-05, throughput=3157 tok/s +2025-11-16 06:19:27,500 - INFO - Epoch 1 Step 5530 (Global: 5530): loss=0.0361, ppl=1.04, grad_norm=0.27, lr=5.33e-05, throughput=3353 tok/s +2025-11-16 06:22:00,286 - INFO - Epoch 1 Step 5540 (Global: 5540): loss=0.0401, ppl=1.04, grad_norm=0.28, lr=5.32e-05, throughput=3142 tok/s +2025-11-16 06:24:32,521 - INFO - Epoch 1 Step 5550 (Global: 5550): loss=0.0353, ppl=1.04, grad_norm=0.29, lr=5.30e-05, throughput=3153 tok/s +2025-11-16 06:27:04,481 - INFO - Epoch 1 Step 5560 (Global: 5560): loss=0.0363, ppl=1.04, grad_norm=0.29, lr=5.28e-05, throughput=3159 tok/s +2025-11-16 06:29:28,913 - INFO - Epoch 1 Step 5570 (Global: 5570): loss=0.0314, ppl=1.03, grad_norm=0.26, lr=5.27e-05, throughput=3323 tok/s +2025-11-16 06:32:01,464 - INFO - Epoch 1 Step 5580 (Global: 5580): loss=0.0349, ppl=1.04, grad_norm=0.29, lr=5.25e-05, throughput=3147 tok/s +2025-11-16 06:34:34,429 - INFO - Epoch 1 Step 5590 (Global: 5590): loss=0.0421, ppl=1.04, grad_norm=0.29, lr=5.23e-05, throughput=3138 tok/s +2025-11-16 06:36:58,702 - INFO - Epoch 1 Step 5600 (Global: 5600): loss=0.0424, ppl=1.04, grad_norm=0.49, lr=5.22e-05, throughput=3327 tok/s +2025-11-16 06:39:31,717 - INFO - Epoch 1 Step 5610 (Global: 5610): loss=0.0363, ppl=1.04, grad_norm=0.28, lr=5.20e-05, throughput=3137 tok/s +2025-11-16 06:42:04,541 - INFO - Epoch 1 Step 5620 (Global: 5620): loss=0.0371, ppl=1.04, grad_norm=0.29, lr=5.18e-05, throughput=3141 tok/s +2025-11-16 06:44:27,969 - INFO - Epoch 1 Step 5630 (Global: 5630): loss=0.0428, ppl=1.04, grad_norm=0.30, lr=5.17e-05, throughput=3347 tok/s +2025-11-16 06:47:00,700 - INFO - Epoch 1 Step 5640 (Global: 5640): loss=0.0368, ppl=1.04, grad_norm=0.27, lr=5.15e-05, throughput=3143 tok/s +2025-11-16 06:49:32,777 - INFO - Epoch 1 Step 5650 (Global: 5650): loss=0.0337, ppl=1.03, grad_norm=0.28, lr=5.13e-05, throughput=3156 tok/s +2025-11-16 06:52:04,960 - INFO - Epoch 1 Step 5660 (Global: 5660): loss=0.0434, ppl=1.04, grad_norm=0.30, lr=5.12e-05, throughput=3154 tok/s +2025-11-16 06:54:27,945 - INFO - Epoch 1 Step 5670 (Global: 5670): loss=0.0362, ppl=1.04, grad_norm=0.27, lr=5.10e-05, throughput=3357 tok/s +2025-11-16 06:57:00,820 - INFO - Epoch 1 Step 5680 (Global: 5680): loss=0.0469, ppl=1.05, grad_norm=0.34, lr=5.08e-05, throughput=3140 tok/s +2025-11-16 06:59:33,025 - INFO - Epoch 1 Step 5690 (Global: 5690): loss=0.0350, ppl=1.04, grad_norm=0.29, lr=5.07e-05, throughput=3154 tok/s +2025-11-16 07:02:05,538 - INFO - Epoch 1 Step 5700 (Global: 5700): loss=0.0335, ppl=1.03, grad_norm=0.28, lr=5.05e-05, throughput=3147 tok/s +2025-11-16 07:04:29,123 - INFO - Epoch 1 Step 5710 (Global: 5710): loss=0.0391, ppl=1.04, grad_norm=0.29, lr=5.03e-05, throughput=3343 tok/s +2025-11-16 07:07:01,798 - INFO - Epoch 1 Step 5720 (Global: 5720): loss=0.0364, ppl=1.04, grad_norm=0.29, lr=5.02e-05, throughput=3144 tok/s +2025-11-16 07:09:34,382 - INFO - Epoch 1 Step 5730 (Global: 5730): loss=0.0371, ppl=1.04, grad_norm=0.28, lr=5.00e-05, throughput=3146 tok/s +2025-11-16 07:11:57,749 - INFO - Epoch 1 Step 5740 (Global: 5740): loss=0.0340, ppl=1.03, grad_norm=0.26, lr=4.98e-05, throughput=3348 tok/s +2025-11-16 07:14:31,696 - INFO - Epoch 1 Step 5750 (Global: 5750): loss=0.0391, ppl=1.04, grad_norm=0.28, lr=4.96e-05, throughput=3118 tok/s +2025-11-16 07:17:03,782 - INFO - Epoch 1 Step 5760 (Global: 5760): loss=0.0397, ppl=1.04, grad_norm=0.28, lr=4.95e-05, throughput=3156 tok/s +2025-11-16 07:19:26,968 - INFO - Epoch 1 Step 5770 (Global: 5770): loss=0.0394, ppl=1.04, grad_norm=0.28, lr=4.93e-05, throughput=3352 tok/s +2025-11-16 07:21:59,148 - INFO - Epoch 1 Step 5780 (Global: 5780): loss=0.0427, ppl=1.04, grad_norm=0.30, lr=4.91e-05, throughput=3154 tok/s +2025-11-16 07:24:31,072 - INFO - Epoch 1 Step 5790 (Global: 5790): loss=0.0376, ppl=1.04, grad_norm=0.29, lr=4.90e-05, throughput=3160 tok/s +2025-11-16 07:27:02,664 - INFO - Epoch 1 Step 5800 (Global: 5800): loss=0.0345, ppl=1.04, grad_norm=0.27, lr=4.88e-05, throughput=3166 tok/s +2025-11-16 07:29:25,542 - INFO - Epoch 1 Step 5810 (Global: 5810): loss=0.0373, ppl=1.04, grad_norm=0.27, lr=4.86e-05, throughput=3360 tok/s +2025-11-16 07:31:57,244 - INFO - Epoch 1 Step 5820 (Global: 5820): loss=0.0345, ppl=1.04, grad_norm=0.27, lr=4.85e-05, throughput=3164 tok/s +2025-11-16 07:34:29,333 - INFO - Epoch 1 Step 5830 (Global: 5830): loss=0.0394, ppl=1.04, grad_norm=0.29, lr=4.83e-05, throughput=3156 tok/s +2025-11-16 07:37:01,396 - INFO - Epoch 1 Step 5840 (Global: 5840): loss=0.0352, ppl=1.04, grad_norm=0.27, lr=4.81e-05, throughput=3157 tok/s +2025-11-16 07:39:24,461 - INFO - Epoch 1 Step 5850 (Global: 5850): loss=0.0352, ppl=1.04, grad_norm=0.26, lr=4.80e-05, throughput=3355 tok/s +2025-11-16 07:41:56,701 - INFO - Epoch 1 Step 5860 (Global: 5860): loss=0.0398, ppl=1.04, grad_norm=0.30, lr=4.78e-05, throughput=3153 tok/s +2025-11-16 07:44:28,562 - INFO - Epoch 1 Step 5870 (Global: 5870): loss=0.0384, ppl=1.04, grad_norm=0.28, lr=4.76e-05, throughput=3161 tok/s +2025-11-16 07:46:51,211 - INFO - Epoch 1 Step 5880 (Global: 5880): loss=0.0394, ppl=1.04, grad_norm=0.30, lr=4.75e-05, throughput=3365 tok/s +2025-11-16 07:49:23,165 - INFO - Epoch 1 Step 5890 (Global: 5890): loss=0.0350, ppl=1.04, grad_norm=0.27, lr=4.73e-05, throughput=3159 tok/s +2025-11-16 07:51:55,276 - INFO - Epoch 1 Step 5900 (Global: 5900): loss=0.0333, ppl=1.03, grad_norm=0.28, lr=4.71e-05, throughput=3156 tok/s +2025-11-16 07:54:18,001 - INFO - Epoch 1 Step 5910 (Global: 5910): loss=0.0360, ppl=1.04, grad_norm=0.27, lr=4.70e-05, throughput=3363 tok/s +2025-11-16 07:56:50,551 - INFO - Epoch 1 Step 5920 (Global: 5920): loss=0.0426, ppl=1.04, grad_norm=0.31, lr=4.68e-05, throughput=3147 tok/s +2025-11-16 07:59:22,681 - INFO - Epoch 1 Step 5930 (Global: 5930): loss=0.0354, ppl=1.04, grad_norm=0.28, lr=4.66e-05, throughput=3155 tok/s +2025-11-16 08:01:54,590 - INFO - Epoch 1 Step 5940 (Global: 5940): loss=0.0344, ppl=1.03, grad_norm=0.27, lr=4.65e-05, throughput=3160 tok/s +2025-11-16 08:04:17,631 - INFO - Epoch 1 Step 5950 (Global: 5950): loss=0.0358, ppl=1.04, grad_norm=0.30, lr=4.63e-05, throughput=3356 tok/s +2025-11-16 08:06:49,950 - INFO - Epoch 1 Step 5960 (Global: 5960): loss=0.0366, ppl=1.04, grad_norm=0.29, lr=4.61e-05, throughput=3151 tok/s +2025-11-16 08:09:22,353 - INFO - Epoch 1 Step 5970 (Global: 5970): loss=0.0343, ppl=1.03, grad_norm=0.28, lr=4.60e-05, throughput=3150 tok/s +2025-11-16 08:11:55,315 - INFO - Epoch 1 Step 5980 (Global: 5980): loss=0.0448, ppl=1.05, grad_norm=0.30, lr=4.58e-05, throughput=3138 tok/s +2025-11-16 08:14:18,679 - INFO - Epoch 1 Step 5990 (Global: 5990): loss=0.0356, ppl=1.04, grad_norm=0.32, lr=4.56e-05, throughput=3348 tok/s +2025-11-16 08:16:50,569 - INFO - Epoch 1 Step 6000 (Global: 6000): loss=0.0433, ppl=1.04, grad_norm=0.37, lr=4.55e-05, throughput=3160 tok/s +2025-11-16 08:16:50,570 - INFO - +Running validation at step 6000... +2025-11-16 08:24:13,707 - INFO - Validation loss: 0.0372, perplexity: 1.04 +2025-11-16 08:24:13,707 - INFO - Qualitative metrics (n=5): +2025-11-16 08:24:13,707 - INFO - BLEU: 0.8772 +2025-11-16 08:24:13,707 - INFO - METEOR: 0.9568 +2025-11-16 08:24:13,707 - INFO - Edit Distance: 0.0555 +2025-11-16 08:24:13,708 - INFO - F-measure: 0.9382 +2025-11-16 08:24:13,708 - INFO - +====================================================================== +2025-11-16 08:24:13,708 - INFO - Qualitative Evaluation Samples: +2025-11-16 08:24:13,708 - INFO - ====================================================================== +2025-11-16 08:24:13,708 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 08:24:13,708 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 08:24:13,708 - INFO - Generated: 'Q gave it four stars out of five and said that "the album [Perhaps\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 08:24:13,708 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 08:24:13,708 - INFO - ---------------------------------------------------------------------- +2025-11-16 08:24:13,709 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 08:24:13,709 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 08:24:13,709 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 08:24:13,709 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 08:24:13,709 - INFO - ---------------------------------------------------------------------- +2025-11-16 08:24:13,709 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 08:24:13,709 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 08:24:13,709 - INFO - Generated: ' meeting at the Laymia. His headed weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 08:24:13,709 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 08:24:13,709 - INFO - ---------------------------------------------------------------------- +2025-11-16 08:24:13,709 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 08:24:13,710 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 08:24:13,710 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 08:24:13,710 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 08:24:13,710 - INFO - ---------------------------------------------------------------------- +2025-11-16 08:24:13,710 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 08:24:13,710 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 08:24:13,710 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 08:24:13,710 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 08:24:13,710 - INFO - ---------------------------------------------------------------------- +2025-11-16 08:24:13,711 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_6000.jsonl +2025-11-16 08:24:54,765 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 08:24:54,775 - INFO - New best validation loss: 0.0372, perplexity: 1.04 +2025-11-16 08:27:26,966 - INFO - Epoch 1 Step 6010 (Global: 6010): loss=0.0373, ppl=1.04, grad_norm=0.28, lr=4.53e-05, throughput=3154 tok/s +2025-11-16 08:29:49,685 - INFO - Epoch 1 Step 6020 (Global: 6020): loss=0.0357, ppl=1.04, grad_norm=0.28, lr=4.51e-05, throughput=3363 tok/s +2025-11-16 08:32:21,839 - INFO - Epoch 1 Step 6030 (Global: 6030): loss=0.0354, ppl=1.04, grad_norm=0.29, lr=4.50e-05, throughput=3155 tok/s +2025-11-16 08:34:54,142 - INFO - Epoch 1 Step 6040 (Global: 6040): loss=0.0350, ppl=1.04, grad_norm=0.27, lr=4.48e-05, throughput=3152 tok/s +2025-11-16 08:37:28,673 - INFO - Epoch 1 Step 6050 (Global: 6050): loss=0.0297, ppl=1.03, grad_norm=0.27, lr=4.46e-05, throughput=3106 tok/s +2025-11-16 08:39:52,686 - INFO - Epoch 1 Step 6060 (Global: 6060): loss=0.0347, ppl=1.04, grad_norm=0.28, lr=4.45e-05, throughput=3333 tok/s +2025-11-16 08:42:24,871 - INFO - Epoch 1 Step 6070 (Global: 6070): loss=0.0352, ppl=1.04, grad_norm=0.27, lr=4.43e-05, throughput=3154 tok/s +2025-11-16 08:44:57,785 - INFO - Epoch 1 Step 6080 (Global: 6080): loss=0.0310, ppl=1.03, grad_norm=0.26, lr=4.41e-05, throughput=3139 tok/s +2025-11-16 08:47:21,590 - INFO - Epoch 1 Step 6090 (Global: 6090): loss=0.0328, ppl=1.03, grad_norm=0.29, lr=4.40e-05, throughput=3338 tok/s +2025-11-16 08:49:54,320 - INFO - Epoch 1 Step 6100 (Global: 6100): loss=0.0396, ppl=1.04, grad_norm=0.29, lr=4.38e-05, throughput=3143 tok/s +2025-11-16 08:52:27,770 - INFO - Epoch 1 Step 6110 (Global: 6110): loss=0.0356, ppl=1.04, grad_norm=0.28, lr=4.36e-05, throughput=3128 tok/s +2025-11-16 08:55:01,154 - INFO - Epoch 1 Step 6120 (Global: 6120): loss=0.0462, ppl=1.05, grad_norm=0.30, lr=4.35e-05, throughput=3129 tok/s +2025-11-16 08:57:26,341 - INFO - Epoch 1 Step 6130 (Global: 6130): loss=0.0325, ppl=1.03, grad_norm=0.25, lr=4.33e-05, throughput=3306 tok/s +2025-11-16 08:59:58,956 - INFO - Epoch 1 Step 6140 (Global: 6140): loss=0.0427, ppl=1.04, grad_norm=0.29, lr=4.31e-05, throughput=3145 tok/s +2025-11-16 09:02:33,462 - INFO - Epoch 1 Step 6150 (Global: 6150): loss=0.0401, ppl=1.04, grad_norm=0.29, lr=4.30e-05, throughput=3107 tok/s +2025-11-16 09:04:58,671 - INFO - Epoch 1 Step 6160 (Global: 6160): loss=0.0338, ppl=1.03, grad_norm=0.26, lr=4.28e-05, throughput=3306 tok/s +2025-11-16 09:07:31,953 - INFO - Epoch 1 Step 6170 (Global: 6170): loss=0.0347, ppl=1.04, grad_norm=0.27, lr=4.26e-05, throughput=3132 tok/s +2025-11-16 09:10:05,669 - INFO - Epoch 1 Step 6180 (Global: 6180): loss=0.0356, ppl=1.04, grad_norm=0.28, lr=4.25e-05, throughput=3123 tok/s +2025-11-16 09:12:39,449 - INFO - Epoch 1 Step 6190 (Global: 6190): loss=0.0395, ppl=1.04, grad_norm=0.29, lr=4.23e-05, throughput=3121 tok/s +2025-11-16 09:15:05,223 - INFO - Epoch 1 Step 6200 (Global: 6200): loss=0.0417, ppl=1.04, grad_norm=0.29, lr=4.21e-05, throughput=3293 tok/s +2025-11-16 09:17:40,480 - INFO - Epoch 1 Step 6210 (Global: 6210): loss=0.0351, ppl=1.04, grad_norm=0.29, lr=4.20e-05, throughput=3092 tok/s +2025-11-16 09:20:14,802 - INFO - Epoch 1 Step 6220 (Global: 6220): loss=0.0341, ppl=1.03, grad_norm=0.27, lr=4.18e-05, throughput=3114 tok/s +2025-11-16 09:22:41,049 - INFO - Epoch 1 Step 6230 (Global: 6230): loss=0.0351, ppl=1.04, grad_norm=0.27, lr=4.16e-05, throughput=3282 tok/s +2025-11-16 09:25:15,585 - INFO - Epoch 1 Step 6240 (Global: 6240): loss=0.0382, ppl=1.04, grad_norm=0.27, lr=4.15e-05, throughput=3106 tok/s +2025-11-16 09:27:50,153 - INFO - Epoch 1 Step 6250 (Global: 6250): loss=0.0363, ppl=1.04, grad_norm=0.33, lr=4.13e-05, throughput=3105 tok/s +2025-11-16 09:30:23,614 - INFO - Epoch 1 Step 6260 (Global: 6260): loss=0.0306, ppl=1.03, grad_norm=0.25, lr=4.12e-05, throughput=3128 tok/s +2025-11-16 09:32:48,797 - INFO - Epoch 1 Step 6270 (Global: 6270): loss=0.0348, ppl=1.04, grad_norm=0.28, lr=4.10e-05, throughput=3306 tok/s +2025-11-16 09:35:22,358 - INFO - Epoch 1 Step 6280 (Global: 6280): loss=0.0392, ppl=1.04, grad_norm=0.29, lr=4.08e-05, throughput=3126 tok/s +2025-11-16 09:37:58,828 - INFO - Epoch 1 Step 6290 (Global: 6290): loss=0.0362, ppl=1.04, grad_norm=0.30, lr=4.07e-05, throughput=3068 tok/s +2025-11-16 09:40:26,759 - INFO - Epoch 1 Step 6300 (Global: 6300): loss=0.0301, ppl=1.03, grad_norm=0.24, lr=4.05e-05, throughput=3245 tok/s +2025-11-16 09:42:59,945 - INFO - Epoch 1 Step 6310 (Global: 6310): loss=0.0368, ppl=1.04, grad_norm=0.31, lr=4.03e-05, throughput=3134 tok/s +2025-11-16 09:45:32,951 - INFO - Epoch 1 Step 6320 (Global: 6320): loss=0.0362, ppl=1.04, grad_norm=0.28, lr=4.02e-05, throughput=3137 tok/s +2025-11-16 09:48:07,381 - INFO - Epoch 1 Step 6330 (Global: 6330): loss=0.0319, ppl=1.03, grad_norm=0.26, lr=4.00e-05, throughput=3108 tok/s +2025-11-16 09:50:32,568 - INFO - Epoch 1 Step 6340 (Global: 6340): loss=0.0360, ppl=1.04, grad_norm=0.28, lr=3.98e-05, throughput=3306 tok/s +2025-11-16 09:53:06,166 - INFO - Epoch 1 Step 6350 (Global: 6350): loss=0.0354, ppl=1.04, grad_norm=0.28, lr=3.97e-05, throughput=3125 tok/s +2025-11-16 09:55:39,642 - INFO - Epoch 1 Step 6360 (Global: 6360): loss=0.0297, ppl=1.03, grad_norm=0.26, lr=3.95e-05, throughput=3128 tok/s +2025-11-16 09:58:04,327 - INFO - Epoch 1 Step 6370 (Global: 6370): loss=0.0354, ppl=1.04, grad_norm=0.27, lr=3.93e-05, throughput=3318 tok/s +2025-11-16 10:00:37,782 - INFO - Epoch 1 Step 6380 (Global: 6380): loss=0.0348, ppl=1.04, grad_norm=0.26, lr=3.92e-05, throughput=3128 tok/s +2025-11-16 10:03:11,811 - INFO - Epoch 1 Step 6390 (Global: 6390): loss=0.0391, ppl=1.04, grad_norm=0.41, lr=3.90e-05, throughput=3116 tok/s +2025-11-16 10:05:47,906 - INFO - Epoch 1 Step 6400 (Global: 6400): loss=0.0322, ppl=1.03, grad_norm=0.26, lr=3.89e-05, throughput=3075 tok/s +2025-11-16 10:08:14,023 - INFO - Epoch 1 Step 6410 (Global: 6410): loss=0.0342, ppl=1.03, grad_norm=0.27, lr=3.87e-05, throughput=3285 tok/s +2025-11-16 10:10:49,114 - INFO - Epoch 1 Step 6420 (Global: 6420): loss=0.0392, ppl=1.04, grad_norm=0.29, lr=3.85e-05, throughput=3095 tok/s +2025-11-16 10:13:23,207 - INFO - Epoch 1 Step 6430 (Global: 6430): loss=0.0399, ppl=1.04, grad_norm=0.29, lr=3.84e-05, throughput=3115 tok/s +2025-11-16 10:15:47,411 - INFO - Epoch 1 Step 6440 (Global: 6440): loss=0.0418, ppl=1.04, grad_norm=0.30, lr=3.82e-05, throughput=3329 tok/s +2025-11-16 10:18:20,311 - INFO - Epoch 1 Step 6450 (Global: 6450): loss=0.0422, ppl=1.04, grad_norm=0.30, lr=3.80e-05, throughput=3139 tok/s +2025-11-16 10:20:54,162 - INFO - Epoch 1 Step 6460 (Global: 6460): loss=0.0328, ppl=1.03, grad_norm=0.26, lr=3.79e-05, throughput=3120 tok/s +2025-11-16 10:23:27,452 - INFO - Epoch 1 Step 6470 (Global: 6470): loss=0.0387, ppl=1.04, grad_norm=0.29, lr=3.77e-05, throughput=3131 tok/s +2025-11-16 10:25:50,987 - INFO - Epoch 1 Step 6480 (Global: 6480): loss=0.0358, ppl=1.04, grad_norm=0.28, lr=3.76e-05, throughput=3344 tok/s +2025-11-16 10:28:24,205 - INFO - Epoch 1 Step 6490 (Global: 6490): loss=0.0335, ppl=1.03, grad_norm=0.27, lr=3.74e-05, throughput=3133 tok/s +2025-11-16 10:30:56,766 - INFO - Epoch 1 Step 6500 (Global: 6500): loss=0.0383, ppl=1.04, grad_norm=0.29, lr=3.72e-05, throughput=3146 tok/s +2025-11-16 10:30:56,768 - INFO - +Running validation at step 6500... +2025-11-16 10:38:31,993 - INFO - Validation loss: 0.0365, perplexity: 1.04 +2025-11-16 10:38:31,993 - INFO - Qualitative metrics (n=5): +2025-11-16 10:38:31,993 - INFO - BLEU: 0.8866 +2025-11-16 10:38:31,994 - INFO - METEOR: 0.9630 +2025-11-16 10:38:31,994 - INFO - Edit Distance: 0.0485 +2025-11-16 10:38:31,994 - INFO - F-measure: 0.9459 +2025-11-16 10:38:31,994 - INFO - +====================================================================== +2025-11-16 10:38:31,995 - INFO - Qualitative Evaluation Samples: +2025-11-16 10:38:31,995 - INFO - ====================================================================== +2025-11-16 10:38:31,996 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 10:38:31,996 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 10:38:31,996 - INFO - Generated: 'Q gave it four stars out of five and said that "the album [Perhaps\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 10:38:31,997 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 10:38:31,997 - INFO - ---------------------------------------------------------------------- +2025-11-16 10:38:31,998 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 10:38:31,998 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 10:38:31,999 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 10:38:31,999 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 10:38:32,000 - INFO - ---------------------------------------------------------------------- +2025-11-16 10:38:32,000 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 10:38:32,000 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 10:38:32,000 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 10:38:32,001 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 10:38:32,001 - INFO - ---------------------------------------------------------------------- +2025-11-16 10:38:32,001 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 10:38:32,002 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 10:38:32,002 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 10:38:32,002 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 10:38:32,003 - INFO - ---------------------------------------------------------------------- +2025-11-16 10:38:32,003 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 10:38:32,003 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 10:38:32,004 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 10:38:32,004 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 10:38:32,004 - INFO - ---------------------------------------------------------------------- +2025-11-16 10:38:32,005 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_6500.jsonl +2025-11-16 10:39:08,852 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 10:39:08,861 - INFO - New best validation loss: 0.0365, perplexity: 1.04 +2025-11-16 10:41:32,489 - INFO - Epoch 1 Step 6510 (Global: 6510): loss=0.0388, ppl=1.04, grad_norm=0.28, lr=3.71e-05, throughput=3342 tok/s +2025-11-16 10:44:05,452 - INFO - Epoch 1 Step 6520 (Global: 6520): loss=0.0412, ppl=1.04, grad_norm=0.30, lr=3.69e-05, throughput=3138 tok/s +2025-11-16 10:46:42,093 - INFO - Epoch 1 Step 6530 (Global: 6530): loss=0.0319, ppl=1.03, grad_norm=0.25, lr=3.67e-05, throughput=3064 tok/s +2025-11-16 10:49:20,534 - INFO - Epoch 1 Step 6540 (Global: 6540): loss=0.0361, ppl=1.04, grad_norm=0.28, lr=3.66e-05, throughput=3030 tok/s +2025-11-16 10:51:47,287 - INFO - Epoch 1 Step 6550 (Global: 6550): loss=0.0423, ppl=1.04, grad_norm=0.31, lr=3.64e-05, throughput=3271 tok/s +2025-11-16 10:54:20,608 - INFO - Epoch 1 Step 6560 (Global: 6560): loss=0.0375, ppl=1.04, grad_norm=0.28, lr=3.63e-05, throughput=3131 tok/s +2025-11-16 10:56:53,263 - INFO - Epoch 1 Step 6570 (Global: 6570): loss=0.0355, ppl=1.04, grad_norm=0.26, lr=3.61e-05, throughput=3144 tok/s +2025-11-16 10:59:17,120 - INFO - Epoch 1 Step 6580 (Global: 6580): loss=0.0336, ppl=1.03, grad_norm=0.27, lr=3.59e-05, throughput=3337 tok/s +2025-11-16 11:01:50,763 - INFO - Epoch 1 Step 6590 (Global: 6590): loss=0.0302, ppl=1.03, grad_norm=0.26, lr=3.58e-05, throughput=3124 tok/s +2025-11-16 11:04:24,069 - INFO - Epoch 1 Step 6600 (Global: 6600): loss=0.0336, ppl=1.03, grad_norm=0.26, lr=3.56e-05, throughput=3131 tok/s +2025-11-16 11:06:58,507 - INFO - Epoch 1 Step 6610 (Global: 6610): loss=0.0287, ppl=1.03, grad_norm=0.25, lr=3.55e-05, throughput=3108 tok/s +2025-11-16 11:09:25,194 - INFO - Epoch 1 Step 6620 (Global: 6620): loss=0.0305, ppl=1.03, grad_norm=0.26, lr=3.53e-05, throughput=3272 tok/s +2025-11-16 11:12:04,840 - INFO - Epoch 1 Step 6630 (Global: 6630): loss=0.0351, ppl=1.04, grad_norm=0.27, lr=3.51e-05, throughput=3007 tok/s +2025-11-16 11:14:39,909 - INFO - Epoch 1 Step 6640 (Global: 6640): loss=0.0363, ppl=1.04, grad_norm=0.28, lr=3.50e-05, throughput=3095 tok/s +2025-11-16 11:17:04,606 - INFO - Epoch 1 Step 6650 (Global: 6650): loss=0.0451, ppl=1.05, grad_norm=0.30, lr=3.48e-05, throughput=3317 tok/s +2025-11-16 11:19:40,321 - INFO - Epoch 1 Step 6660 (Global: 6660): loss=0.0379, ppl=1.04, grad_norm=0.28, lr=3.47e-05, throughput=3083 tok/s +2025-11-16 11:22:13,764 - INFO - Epoch 1 Step 6670 (Global: 6670): loss=0.0356, ppl=1.04, grad_norm=0.29, lr=3.45e-05, throughput=3128 tok/s +2025-11-16 11:24:47,087 - INFO - Epoch 1 Step 6680 (Global: 6680): loss=0.0289, ppl=1.03, grad_norm=0.27, lr=3.43e-05, throughput=3131 tok/s +2025-11-16 11:27:11,035 - INFO - Epoch 1 Step 6690 (Global: 6690): loss=0.0393, ppl=1.04, grad_norm=0.29, lr=3.42e-05, throughput=3335 tok/s +2025-11-16 11:29:50,218 - INFO - Epoch 1 Step 6700 (Global: 6700): loss=0.0347, ppl=1.04, grad_norm=0.27, lr=3.40e-05, throughput=3015 tok/s +2025-11-16 11:32:26,561 - INFO - Epoch 1 Step 6710 (Global: 6710): loss=0.0373, ppl=1.04, grad_norm=0.29, lr=3.39e-05, throughput=3070 tok/s +2025-11-16 11:34:52,382 - INFO - Epoch 1 Step 6720 (Global: 6720): loss=0.0296, ppl=1.03, grad_norm=0.27, lr=3.37e-05, throughput=3292 tok/s +2025-11-16 11:37:26,261 - INFO - Epoch 1 Step 6730 (Global: 6730): loss=0.0362, ppl=1.04, grad_norm=0.27, lr=3.35e-05, throughput=3119 tok/s +2025-11-16 11:39:59,775 - INFO - Epoch 1 Step 6740 (Global: 6740): loss=0.0330, ppl=1.03, grad_norm=0.28, lr=3.34e-05, throughput=3127 tok/s +2025-11-16 11:42:33,308 - INFO - Epoch 1 Step 6750 (Global: 6750): loss=0.0330, ppl=1.03, grad_norm=0.26, lr=3.32e-05, throughput=3126 tok/s +2025-11-16 11:44:58,610 - INFO - Epoch 1 Step 6760 (Global: 6760): loss=0.0301, ppl=1.03, grad_norm=0.26, lr=3.31e-05, throughput=3304 tok/s +2025-11-16 11:47:33,011 - INFO - Epoch 1 Step 6770 (Global: 6770): loss=0.0397, ppl=1.04, grad_norm=0.29, lr=3.29e-05, throughput=3109 tok/s +2025-11-16 11:50:07,170 - INFO - Epoch 1 Step 6780 (Global: 6780): loss=0.0391, ppl=1.04, grad_norm=0.28, lr=3.28e-05, throughput=3114 tok/s +2025-11-16 11:52:33,044 - INFO - Epoch 1 Step 6790 (Global: 6790): loss=0.0373, ppl=1.04, grad_norm=0.28, lr=3.26e-05, throughput=3291 tok/s +2025-11-16 11:55:05,775 - INFO - Epoch 1 Step 6800 (Global: 6800): loss=0.0400, ppl=1.04, grad_norm=0.29, lr=3.24e-05, throughput=3143 tok/s +2025-11-16 11:57:38,959 - INFO - Epoch 1 Step 6810 (Global: 6810): loss=0.0362, ppl=1.04, grad_norm=0.30, lr=3.23e-05, throughput=3134 tok/s +2025-11-16 12:00:12,329 - INFO - Epoch 1 Step 6820 (Global: 6820): loss=0.0286, ppl=1.03, grad_norm=0.25, lr=3.21e-05, throughput=3130 tok/s +2025-11-16 12:02:37,366 - INFO - Epoch 1 Step 6830 (Global: 6830): loss=0.0465, ppl=1.05, grad_norm=0.31, lr=3.20e-05, throughput=3310 tok/s +2025-11-16 12:05:13,277 - INFO - Epoch 1 Step 6840 (Global: 6840): loss=0.0319, ppl=1.03, grad_norm=0.27, lr=3.18e-05, throughput=3079 tok/s +2025-11-16 12:07:49,330 - INFO - Epoch 1 Step 6850 (Global: 6850): loss=0.0339, ppl=1.03, grad_norm=0.27, lr=3.17e-05, throughput=3076 tok/s +2025-11-16 12:10:14,721 - INFO - Epoch 1 Step 6860 (Global: 6860): loss=0.0268, ppl=1.03, grad_norm=0.28, lr=3.15e-05, throughput=3302 tok/s +2025-11-16 12:12:50,332 - INFO - Epoch 1 Step 6870 (Global: 6870): loss=0.0336, ppl=1.03, grad_norm=0.29, lr=3.13e-05, throughput=3085 tok/s +2025-11-16 12:15:25,614 - INFO - Epoch 1 Step 6880 (Global: 6880): loss=0.0322, ppl=1.03, grad_norm=0.26, lr=3.12e-05, throughput=3091 tok/s +2025-11-16 12:18:01,672 - INFO - Epoch 1 Step 6890 (Global: 6890): loss=0.0349, ppl=1.04, grad_norm=0.28, lr=3.10e-05, throughput=3079 tok/s +2025-11-16 12:20:25,594 - INFO - Epoch 1 Step 6900 (Global: 6900): loss=0.0335, ppl=1.03, grad_norm=0.27, lr=3.09e-05, throughput=3335 tok/s +2025-11-16 12:22:59,358 - INFO - Epoch 1 Step 6910 (Global: 6910): loss=0.0419, ppl=1.04, grad_norm=0.30, lr=3.07e-05, throughput=3122 tok/s +2025-11-16 12:25:33,868 - INFO - Epoch 1 Step 6920 (Global: 6920): loss=0.0361, ppl=1.04, grad_norm=0.28, lr=3.06e-05, throughput=3107 tok/s +2025-11-16 12:28:07,072 - INFO - Epoch 1 Step 6930 (Global: 6930): loss=0.0328, ppl=1.03, grad_norm=0.28, lr=3.04e-05, throughput=3133 tok/s +2025-11-16 12:30:30,439 - INFO - Epoch 1 Step 6940 (Global: 6940): loss=0.0365, ppl=1.04, grad_norm=0.28, lr=3.03e-05, throughput=3348 tok/s +2025-11-16 12:33:03,271 - INFO - Epoch 1 Step 6950 (Global: 6950): loss=0.0353, ppl=1.04, grad_norm=0.27, lr=3.01e-05, throughput=3141 tok/s +2025-11-16 12:35:37,012 - INFO - Epoch 1 Step 6960 (Global: 6960): loss=0.0327, ppl=1.03, grad_norm=0.26, lr=3.00e-05, throughput=3122 tok/s +2025-11-16 12:38:01,529 - INFO - Epoch 1 Step 6970 (Global: 6970): loss=0.0384, ppl=1.04, grad_norm=0.27, lr=2.98e-05, throughput=3321 tok/s +2025-11-16 12:40:35,239 - INFO - Epoch 1 Step 6980 (Global: 6980): loss=0.0337, ppl=1.03, grad_norm=0.26, lr=2.96e-05, throughput=3123 tok/s +2025-11-16 12:43:08,377 - INFO - Epoch 1 Step 6990 (Global: 6990): loss=0.0333, ppl=1.03, grad_norm=0.27, lr=2.95e-05, throughput=3134 tok/s +2025-11-16 12:45:32,612 - INFO - Epoch 1 Step 7000 (Global: 7000): loss=0.0314, ppl=1.03, grad_norm=0.26, lr=2.93e-05, throughput=3328 tok/s +2025-11-16 12:45:32,614 - INFO - +Running validation at step 7000... +2025-11-16 12:53:18,746 - INFO - Validation loss: 0.0360, perplexity: 1.04 +2025-11-16 12:53:18,747 - INFO - Qualitative metrics (n=5): +2025-11-16 12:53:18,747 - INFO - BLEU: 0.8842 +2025-11-16 12:53:18,747 - INFO - METEOR: 0.9632 +2025-11-16 12:53:18,747 - INFO - Edit Distance: 0.0551 +2025-11-16 12:53:18,747 - INFO - F-measure: 0.9454 +2025-11-16 12:53:18,747 - INFO - +====================================================================== +2025-11-16 12:53:18,748 - INFO - Qualitative Evaluation Samples: +2025-11-16 12:53:18,748 - INFO - ====================================================================== +2025-11-16 12:53:18,748 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 12:53:18,748 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 12:53:18,748 - INFO - Generated: 'Q gave it four stars out of five and said that "The album [perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s you-wasere. But it\'s no...' +2025-11-16 12:53:18,748 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 12:53:18,748 - INFO - ---------------------------------------------------------------------- +2025-11-16 12:53:18,748 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 12:53:18,748 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 12:53:18,749 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 12:53:18,749 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 12:53:18,749 - INFO - ---------------------------------------------------------------------- +2025-11-16 12:53:18,749 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 12:53:18,749 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 12:53:18,749 - INFO - Generated: ' the meeting at Laymia. His headed weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 12:53:18,749 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 12:53:18,749 - INFO - ---------------------------------------------------------------------- +2025-11-16 12:53:18,749 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 12:53:18,749 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 12:53:18,749 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 12:53:18,750 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 12:53:18,750 - INFO - ---------------------------------------------------------------------- +2025-11-16 12:53:18,750 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 12:53:18,750 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 12:53:18,750 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 12:53:18,750 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 12:53:18,750 - INFO - ---------------------------------------------------------------------- +2025-11-16 12:53:18,751 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_7000.jsonl +2025-11-16 12:54:00,224 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 12:54:00,235 - INFO - New best validation loss: 0.0360, perplexity: 1.04 +2025-11-16 12:56:25,744 - INFO - Epoch 1 Step 7010 (Global: 7010): loss=0.0390, ppl=1.04, grad_norm=0.31, lr=2.92e-05, throughput=3299 tok/s +2025-11-16 12:59:02,519 - INFO - Epoch 1 Step 7020 (Global: 7020): loss=0.0400, ppl=1.04, grad_norm=0.30, lr=2.90e-05, throughput=3062 tok/s +2025-11-16 13:01:37,827 - INFO - Epoch 1 Step 7030 (Global: 7030): loss=0.0379, ppl=1.04, grad_norm=0.28, lr=2.89e-05, throughput=3091 tok/s +2025-11-16 13:04:04,940 - INFO - Epoch 1 Step 7040 (Global: 7040): loss=0.0363, ppl=1.04, grad_norm=0.28, lr=2.87e-05, throughput=3263 tok/s +2025-11-16 13:06:40,444 - INFO - Epoch 1 Step 7050 (Global: 7050): loss=0.0348, ppl=1.04, grad_norm=0.27, lr=2.86e-05, throughput=3087 tok/s +2025-11-16 13:09:16,357 - INFO - Epoch 1 Step 7060 (Global: 7060): loss=0.0414, ppl=1.04, grad_norm=0.29, lr=2.84e-05, throughput=3079 tok/s +2025-11-16 13:12:11,877 - INFO - Epoch 1 Step 7070 (Global: 7070): loss=0.0412, ppl=1.04, grad_norm=0.30, lr=2.83e-05, throughput=2735 tok/s +2025-11-16 13:14:48,518 - INFO - Epoch 1 Step 7080 (Global: 7080): loss=0.0331, ppl=1.03, grad_norm=0.28, lr=2.81e-05, throughput=3064 tok/s +2025-11-16 13:17:36,287 - INFO - Epoch 1 Step 7090 (Global: 7090): loss=0.0375, ppl=1.04, grad_norm=0.28, lr=2.80e-05, throughput=2861 tok/s +2025-11-16 13:20:18,958 - INFO - Epoch 1 Step 7100 (Global: 7100): loss=0.0335, ppl=1.03, grad_norm=0.27, lr=2.78e-05, throughput=2951 tok/s +2025-11-16 13:22:51,971 - INFO - Epoch 1 Step 7110 (Global: 7110): loss=0.0373, ppl=1.04, grad_norm=0.27, lr=2.77e-05, throughput=3137 tok/s +2025-11-16 13:25:33,795 - INFO - Epoch 1 Step 7120 (Global: 7120): loss=0.0307, ppl=1.03, grad_norm=0.26, lr=2.75e-05, throughput=2966 tok/s +2025-11-16 13:28:17,984 - INFO - Epoch 1 Step 7130 (Global: 7130): loss=0.0422, ppl=1.04, grad_norm=0.30, lr=2.74e-05, throughput=2924 tok/s +2025-11-16 13:30:56,997 - INFO - Epoch 1 Step 7140 (Global: 7140): loss=0.0381, ppl=1.04, grad_norm=0.29, lr=2.72e-05, throughput=3019 tok/s +2025-11-16 13:33:23,587 - INFO - Epoch 1 Step 7150 (Global: 7150): loss=0.0351, ppl=1.04, grad_norm=0.29, lr=2.71e-05, throughput=3275 tok/s +2025-11-16 13:36:00,362 - INFO - Epoch 1 Step 7160 (Global: 7160): loss=0.0359, ppl=1.04, grad_norm=0.27, lr=2.69e-05, throughput=3062 tok/s +2025-11-16 13:38:39,143 - INFO - Epoch 1 Step 7170 (Global: 7170): loss=0.0423, ppl=1.04, grad_norm=0.30, lr=2.68e-05, throughput=3023 tok/s +2025-11-16 13:41:14,647 - INFO - Epoch 1 Step 7180 (Global: 7180): loss=0.0356, ppl=1.04, grad_norm=0.31, lr=2.66e-05, throughput=3087 tok/s +2025-11-16 13:43:54,687 - INFO - Epoch 1 Step 7190 (Global: 7190): loss=0.0376, ppl=1.04, grad_norm=0.28, lr=2.65e-05, throughput=2999 tok/s +2025-11-16 13:46:27,781 - INFO - Epoch 1 Step 7200 (Global: 7200): loss=0.0364, ppl=1.04, grad_norm=0.28, lr=2.63e-05, throughput=3135 tok/s +2025-11-16 13:49:01,025 - INFO - Epoch 1 Step 7210 (Global: 7210): loss=0.0367, ppl=1.04, grad_norm=0.28, lr=2.62e-05, throughput=3132 tok/s +2025-11-16 13:51:28,666 - INFO - Epoch 1 Step 7220 (Global: 7220): loss=0.0365, ppl=1.04, grad_norm=0.30, lr=2.60e-05, throughput=3251 tok/s +2025-11-16 13:54:15,571 - INFO - Epoch 1 Step 7230 (Global: 7230): loss=0.0355, ppl=1.04, grad_norm=0.27, lr=2.59e-05, throughput=2876 tok/s +2025-11-16 13:57:02,831 - INFO - Epoch 1 Step 7240 (Global: 7240): loss=0.0309, ppl=1.03, grad_norm=0.26, lr=2.58e-05, throughput=2870 tok/s +2025-11-16 13:59:44,669 - INFO - Epoch 1 Step 7250 (Global: 7250): loss=0.0381, ppl=1.04, grad_norm=0.31, lr=2.56e-05, throughput=2966 tok/s +2025-11-16 14:02:39,821 - INFO - Epoch 1 Step 7260 (Global: 7260): loss=0.0391, ppl=1.04, grad_norm=0.29, lr=2.55e-05, throughput=2741 tok/s +2025-11-16 14:05:33,276 - INFO - Epoch 1 Step 7270 (Global: 7270): loss=0.0374, ppl=1.04, grad_norm=0.30, lr=2.53e-05, throughput=2767 tok/s +2025-11-16 14:08:25,958 - INFO - Epoch 1 Step 7280 (Global: 7280): loss=0.0312, ppl=1.03, grad_norm=0.27, lr=2.52e-05, throughput=2780 tok/s +2025-11-16 14:11:16,271 - INFO - Epoch 1 Step 7290 (Global: 7290): loss=0.0341, ppl=1.03, grad_norm=0.26, lr=2.50e-05, throughput=2818 tok/s +2025-11-16 14:14:31,018 - INFO - Epoch 1 Step 7300 (Global: 7300): loss=0.0280, ppl=1.03, grad_norm=0.25, lr=2.49e-05, throughput=2465 tok/s +2025-11-16 14:17:08,134 - INFO - Epoch 1 Step 7310 (Global: 7310): loss=0.0367, ppl=1.04, grad_norm=0.28, lr=2.47e-05, throughput=3055 tok/s +2025-11-16 14:19:34,781 - INFO - Epoch 1 Step 7320 (Global: 7320): loss=0.0336, ppl=1.03, grad_norm=0.31, lr=2.46e-05, throughput=3273 tok/s +2025-11-16 14:22:09,325 - INFO - Epoch 1 Step 7330 (Global: 7330): loss=0.0273, ppl=1.03, grad_norm=0.26, lr=2.44e-05, throughput=3106 tok/s +2025-11-16 14:24:46,040 - INFO - Epoch 1 Step 7340 (Global: 7340): loss=0.0378, ppl=1.04, grad_norm=0.28, lr=2.43e-05, throughput=3063 tok/s +2025-11-16 14:27:28,887 - INFO - Epoch 1 Step 7350 (Global: 7350): loss=0.0343, ppl=1.03, grad_norm=0.30, lr=2.42e-05, throughput=2948 tok/s +2025-11-16 14:30:05,525 - INFO - Epoch 1 Step 7360 (Global: 7360): loss=0.0393, ppl=1.04, grad_norm=0.29, lr=2.40e-05, throughput=3064 tok/s +2025-11-16 14:32:57,247 - INFO - Epoch 1 Step 7370 (Global: 7370): loss=0.0333, ppl=1.03, grad_norm=0.27, lr=2.39e-05, throughput=2795 tok/s +2025-11-16 14:35:54,073 - INFO - Epoch 1 Step 7380 (Global: 7380): loss=0.0384, ppl=1.04, grad_norm=0.29, lr=2.37e-05, throughput=2715 tok/s +2025-11-16 14:38:38,846 - INFO - Epoch 1 Step 7390 (Global: 7390): loss=0.0340, ppl=1.03, grad_norm=0.27, lr=2.36e-05, throughput=2913 tok/s +2025-11-16 14:41:33,206 - INFO - Epoch 1 Step 7400 (Global: 7400): loss=0.0365, ppl=1.04, grad_norm=0.28, lr=2.34e-05, throughput=2753 tok/s +2025-11-16 14:44:14,992 - INFO - Epoch 1 Step 7410 (Global: 7410): loss=0.0343, ppl=1.03, grad_norm=0.29, lr=2.33e-05, throughput=2967 tok/s +2025-11-16 14:47:04,510 - INFO - Epoch 1 Step 7420 (Global: 7420): loss=0.0343, ppl=1.03, grad_norm=0.28, lr=2.32e-05, throughput=2832 tok/s +2025-11-16 14:49:50,177 - INFO - Epoch 1 Step 7430 (Global: 7430): loss=0.0371, ppl=1.04, grad_norm=0.28, lr=2.30e-05, throughput=2897 tok/s +2025-11-16 14:52:41,650 - INFO - Epoch 1 Step 7440 (Global: 7440): loss=0.0338, ppl=1.03, grad_norm=0.28, lr=2.29e-05, throughput=2799 tok/s +2025-11-16 14:55:35,710 - INFO - Epoch 1 Step 7450 (Global: 7450): loss=0.0358, ppl=1.04, grad_norm=0.26, lr=2.27e-05, throughput=2758 tok/s +2025-11-16 14:58:16,519 - INFO - Epoch 1 Step 7460 (Global: 7460): loss=0.0334, ppl=1.03, grad_norm=0.26, lr=2.26e-05, throughput=2985 tok/s +2025-11-16 15:01:10,581 - INFO - Epoch 1 Step 7470 (Global: 7470): loss=0.0382, ppl=1.04, grad_norm=1.58, lr=2.25e-05, throughput=2758 tok/s +2025-11-16 15:03:55,981 - INFO - Epoch 1 Step 7480 (Global: 7480): loss=0.0340, ppl=1.03, grad_norm=0.27, lr=2.23e-05, throughput=2902 tok/s +2025-11-16 15:06:46,692 - INFO - Epoch 1 Step 7490 (Global: 7490): loss=0.0353, ppl=1.04, grad_norm=0.28, lr=2.22e-05, throughput=2812 tok/s +2025-11-16 15:09:31,988 - INFO - Epoch 1 Step 7500 (Global: 7500): loss=0.0336, ppl=1.03, grad_norm=0.28, lr=2.20e-05, throughput=2904 tok/s +2025-11-16 15:09:31,991 - INFO - +Running validation at step 7500... +2025-11-16 15:18:35,175 - INFO - Validation loss: 0.0357, perplexity: 1.04 +2025-11-16 15:18:35,176 - INFO - Qualitative metrics (n=5): +2025-11-16 15:18:35,176 - INFO - BLEU: 0.8770 +2025-11-16 15:18:35,176 - INFO - METEOR: 0.9607 +2025-11-16 15:18:35,176 - INFO - Edit Distance: 0.0551 +2025-11-16 15:18:35,176 - INFO - F-measure: 0.9426 +2025-11-16 15:18:35,176 - INFO - +====================================================================== +2025-11-16 15:18:35,177 - INFO - Qualitative Evaluation Samples: +2025-11-16 15:18:35,177 - INFO - ====================================================================== +2025-11-16 15:18:35,179 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 15:18:35,179 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 15:18:35,179 - INFO - Generated: 'Q gave it four out of five stars and said that "the album [Perhaps\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 15:18:35,179 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 15:18:35,180 - INFO - ---------------------------------------------------------------------- +2025-11-16 15:18:35,180 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 15:18:35,181 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 15:18:35,181 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 15:18:35,181 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 15:18:35,182 - INFO - ---------------------------------------------------------------------- +2025-11-16 15:18:35,182 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 15:18:35,182 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 15:18:35,183 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 15:18:35,183 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 15:18:35,183 - INFO - ---------------------------------------------------------------------- +2025-11-16 15:18:35,183 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 15:18:35,183 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 15:18:35,184 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 15:18:35,184 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 15:18:35,184 - INFO - ---------------------------------------------------------------------- +2025-11-16 15:18:35,184 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 15:18:35,185 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 15:18:35,185 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 15:18:35,185 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 15:18:35,186 - INFO - ---------------------------------------------------------------------- +2025-11-16 15:18:35,188 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_7500.jsonl +2025-11-16 15:20:19,339 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 15:20:19,352 - INFO - New best validation loss: 0.0357, perplexity: 1.04 +2025-11-16 15:23:10,775 - INFO - Epoch 1 Step 7510 (Global: 7510): loss=0.0381, ppl=1.04, grad_norm=0.29, lr=2.19e-05, throughput=2800 tok/s +2025-11-16 15:26:05,543 - INFO - Epoch 1 Step 7520 (Global: 7520): loss=0.0348, ppl=1.04, grad_norm=0.27, lr=2.18e-05, throughput=2747 tok/s +2025-11-16 15:28:48,489 - INFO - Epoch 1 Step 7530 (Global: 7530): loss=0.0341, ppl=1.03, grad_norm=0.27, lr=2.16e-05, throughput=2946 tok/s +2025-11-16 15:31:37,793 - INFO - Epoch 1 Step 7540 (Global: 7540): loss=0.0351, ppl=1.04, grad_norm=0.29, lr=2.15e-05, throughput=2835 tok/s +2025-11-16 15:34:29,906 - INFO - Epoch 1 Step 7550 (Global: 7550): loss=0.0305, ppl=1.03, grad_norm=0.26, lr=2.14e-05, throughput=2789 tok/s +2025-11-16 15:37:23,125 - INFO - Epoch 1 Step 7560 (Global: 7560): loss=0.0404, ppl=1.04, grad_norm=0.45, lr=2.12e-05, throughput=2771 tok/s +2025-11-16 15:40:01,243 - INFO - Epoch 1 Step 7570 (Global: 7570): loss=0.0364, ppl=1.04, grad_norm=0.28, lr=2.11e-05, throughput=3036 tok/s +2025-11-16 15:42:49,714 - INFO - Epoch 1 Step 7580 (Global: 7580): loss=0.0317, ppl=1.03, grad_norm=0.27, lr=2.09e-05, throughput=2849 tok/s +2025-11-16 15:45:41,273 - INFO - Epoch 1 Step 7590 (Global: 7590): loss=0.0403, ppl=1.04, grad_norm=0.32, lr=2.08e-05, throughput=2798 tok/s +2025-11-16 15:48:20,241 - INFO - Epoch 1 Step 7600 (Global: 7600): loss=0.0300, ppl=1.03, grad_norm=0.25, lr=2.07e-05, throughput=3020 tok/s +2025-11-16 15:51:12,978 - INFO - Epoch 1 Step 7610 (Global: 7610): loss=0.0391, ppl=1.04, grad_norm=0.29, lr=2.05e-05, throughput=2779 tok/s +2025-11-16 15:54:07,035 - INFO - Epoch 1 Step 7620 (Global: 7620): loss=0.0315, ppl=1.03, grad_norm=0.27, lr=2.04e-05, throughput=2758 tok/s +2025-11-16 15:56:57,947 - INFO - Epoch 1 Step 7630 (Global: 7630): loss=0.0337, ppl=1.03, grad_norm=0.27, lr=2.03e-05, throughput=2809 tok/s +2025-11-16 15:59:36,549 - INFO - Epoch 1 Step 7640 (Global: 7640): loss=0.0365, ppl=1.04, grad_norm=0.28, lr=2.01e-05, throughput=3027 tok/s +2025-11-16 16:02:31,961 - INFO - Epoch 1 Step 7650 (Global: 7650): loss=0.0389, ppl=1.04, grad_norm=0.28, lr=2.00e-05, throughput=2736 tok/s +2025-11-16 16:05:30,947 - INFO - Epoch 1 Step 7660 (Global: 7660): loss=0.0368, ppl=1.04, grad_norm=0.30, lr=1.99e-05, throughput=2682 tok/s +2025-11-16 16:08:13,409 - INFO - Epoch 1 Step 7670 (Global: 7670): loss=0.0360, ppl=1.04, grad_norm=0.28, lr=1.97e-05, throughput=2955 tok/s +2025-11-16 16:11:02,112 - INFO - Epoch 1 Step 7680 (Global: 7680): loss=0.0326, ppl=1.03, grad_norm=0.27, lr=1.96e-05, throughput=2845 tok/s +2025-11-16 16:13:53,039 - INFO - Epoch 1 Step 7690 (Global: 7690): loss=0.0397, ppl=1.04, grad_norm=0.30, lr=1.95e-05, throughput=2808 tok/s +2025-11-16 16:16:44,192 - INFO - Epoch 1 Step 7700 (Global: 7700): loss=0.0323, ppl=1.03, grad_norm=0.26, lr=1.93e-05, throughput=2805 tok/s +2025-11-16 16:19:24,752 - INFO - Epoch 1 Step 7710 (Global: 7710): loss=0.0315, ppl=1.03, grad_norm=0.29, lr=1.92e-05, throughput=2990 tok/s +2025-11-16 16:22:19,524 - INFO - Epoch 1 Step 7720 (Global: 7720): loss=0.0393, ppl=1.04, grad_norm=0.29, lr=1.91e-05, throughput=2746 tok/s +2025-11-16 16:25:13,586 - INFO - Epoch 1 Step 7730 (Global: 7730): loss=0.0309, ppl=1.03, grad_norm=0.26, lr=1.89e-05, throughput=2758 tok/s +2025-11-16 16:27:57,797 - INFO - Epoch 1 Step 7740 (Global: 7740): loss=0.0344, ppl=1.03, grad_norm=0.30, lr=1.88e-05, throughput=2923 tok/s +2025-11-16 16:30:51,893 - INFO - Epoch 1 Step 7750 (Global: 7750): loss=0.0370, ppl=1.04, grad_norm=0.28, lr=1.87e-05, throughput=2757 tok/s +2025-11-16 16:33:43,654 - INFO - Epoch 1 Step 7760 (Global: 7760): loss=0.0392, ppl=1.04, grad_norm=0.29, lr=1.85e-05, throughput=2795 tok/s +2025-11-16 16:36:36,074 - INFO - Epoch 1 Step 7770 (Global: 7770): loss=0.0356, ppl=1.04, grad_norm=0.29, lr=1.84e-05, throughput=2784 tok/s +2025-11-16 16:39:19,507 - INFO - Epoch 1 Step 7780 (Global: 7780): loss=0.0484, ppl=1.05, grad_norm=0.32, lr=1.83e-05, throughput=2937 tok/s +2025-11-16 16:42:13,573 - INFO - Epoch 1 Step 7790 (Global: 7790): loss=0.0407, ppl=1.04, grad_norm=0.30, lr=1.82e-05, throughput=2758 tok/s +2025-11-16 16:45:10,993 - INFO - Epoch 1 Step 7800 (Global: 7800): loss=0.0327, ppl=1.03, grad_norm=0.26, lr=1.80e-05, throughput=2706 tok/s +2025-11-16 16:47:48,693 - INFO - Epoch 1 Step 7810 (Global: 7810): loss=0.0300, ppl=1.03, grad_norm=0.27, lr=1.79e-05, throughput=3044 tok/s +2025-11-16 16:50:35,985 - INFO - Epoch 1 Step 7820 (Global: 7820): loss=0.0444, ppl=1.05, grad_norm=0.32, lr=1.78e-05, throughput=2869 tok/s +2025-11-16 16:53:28,483 - INFO - Epoch 1 Step 7830 (Global: 7830): loss=0.0355, ppl=1.04, grad_norm=0.28, lr=1.76e-05, throughput=2783 tok/s +2025-11-16 16:56:19,366 - INFO - Epoch 1 Step 7840 (Global: 7840): loss=0.0370, ppl=1.04, grad_norm=0.28, lr=1.75e-05, throughput=2809 tok/s +2025-11-16 16:58:56,383 - INFO - Epoch 1 Step 7850 (Global: 7850): loss=0.0331, ppl=1.03, grad_norm=0.28, lr=1.74e-05, throughput=3057 tok/s +2025-11-16 17:01:45,557 - INFO - Epoch 1 Step 7860 (Global: 7860): loss=0.0354, ppl=1.04, grad_norm=0.28, lr=1.73e-05, throughput=2837 tok/s +2025-11-16 17:04:29,904 - INFO - Epoch 1 Step 7870 (Global: 7870): loss=0.0409, ppl=1.04, grad_norm=0.29, lr=1.71e-05, throughput=2921 tok/s +2025-11-16 17:07:06,249 - INFO - Epoch 1 Step 7880 (Global: 7880): loss=0.0369, ppl=1.04, grad_norm=0.30, lr=1.70e-05, throughput=3070 tok/s +2025-11-16 17:09:51,805 - INFO - Epoch 1 Step 7890 (Global: 7890): loss=0.0382, ppl=1.04, grad_norm=0.29, lr=1.69e-05, throughput=2899 tok/s +2025-11-16 17:12:36,689 - INFO - Epoch 1 Step 7900 (Global: 7900): loss=0.0376, ppl=1.04, grad_norm=0.28, lr=1.68e-05, throughput=2911 tok/s +2025-11-16 17:15:19,207 - INFO - Epoch 1 Step 7910 (Global: 7910): loss=0.0326, ppl=1.03, grad_norm=0.27, lr=1.66e-05, throughput=2954 tok/s +2025-11-16 17:17:50,040 - INFO - Epoch 1 Step 7920 (Global: 7920): loss=0.0322, ppl=1.03, grad_norm=0.27, lr=1.65e-05, throughput=3182 tok/s +2025-11-16 17:20:28,749 - INFO - Epoch 1 Step 7930 (Global: 7930): loss=0.0301, ppl=1.03, grad_norm=0.25, lr=1.64e-05, throughput=3024 tok/s +2025-11-16 17:23:07,954 - INFO - Epoch 1 Step 7940 (Global: 7940): loss=0.0332, ppl=1.03, grad_norm=0.27, lr=1.63e-05, throughput=3015 tok/s +2025-11-16 17:25:35,476 - INFO - Epoch 1 Step 7950 (Global: 7950): loss=0.0339, ppl=1.03, grad_norm=0.27, lr=1.61e-05, throughput=3254 tok/s +2025-11-16 17:28:10,851 - INFO - Epoch 1 Step 7960 (Global: 7960): loss=0.0290, ppl=1.03, grad_norm=0.25, lr=1.60e-05, throughput=3089 tok/s +2025-11-16 17:30:45,483 - INFO - Epoch 1 Step 7970 (Global: 7970): loss=0.0392, ppl=1.04, grad_norm=0.29, lr=1.59e-05, throughput=3104 tok/s +2025-11-16 17:33:24,865 - INFO - Epoch 1 Step 7980 (Global: 7980): loss=0.0298, ppl=1.03, grad_norm=0.26, lr=1.58e-05, throughput=3012 tok/s +2025-11-16 17:35:49,173 - INFO - Epoch 1 Step 7990 (Global: 7990): loss=0.0335, ppl=1.03, grad_norm=0.27, lr=1.56e-05, throughput=3326 tok/s +2025-11-16 17:38:21,441 - INFO - Epoch 1 Step 8000 (Global: 8000): loss=0.0318, ppl=1.03, grad_norm=0.28, lr=1.55e-05, throughput=3152 tok/s +2025-11-16 17:38:21,443 - INFO - +Running validation at step 8000... +2025-11-16 17:45:43,081 - INFO - Validation loss: 0.0355, perplexity: 1.04 +2025-11-16 17:45:43,082 - INFO - Qualitative metrics (n=5): +2025-11-16 17:45:43,082 - INFO - BLEU: 0.8666 +2025-11-16 17:45:43,082 - INFO - METEOR: 0.9554 +2025-11-16 17:45:43,082 - INFO - Edit Distance: 0.0588 +2025-11-16 17:45:43,082 - INFO - F-measure: 0.9356 +2025-11-16 17:45:43,082 - INFO - +====================================================================== +2025-11-16 17:45:43,082 - INFO - Qualitative Evaluation Samples: +2025-11-16 17:45:43,083 - INFO - ====================================================================== +2025-11-16 17:45:43,083 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 17:45:43,083 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 17:45:43,083 - INFO - Generated: 'Q gave it four out of five stars and said that "The album [perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 17:45:43,083 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 17:45:43,083 - INFO - ---------------------------------------------------------------------- +2025-11-16 17:45:43,083 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 17:45:43,083 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 17:45:43,083 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 17:45:43,083 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 17:45:43,083 - INFO - ---------------------------------------------------------------------- +2025-11-16 17:45:43,083 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 17:45:43,084 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 17:45:43,084 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 17:45:43,084 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 17:45:43,084 - INFO - ---------------------------------------------------------------------- +2025-11-16 17:45:43,084 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 17:45:43,084 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 17:45:43,084 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 17:45:43,084 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 17:45:43,084 - INFO - ---------------------------------------------------------------------- +2025-11-16 17:45:43,084 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 17:45:43,085 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 17:45:43,085 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 17:45:43,085 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 17:45:43,085 - INFO - ---------------------------------------------------------------------- +2025-11-16 17:45:43,086 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_8000.jsonl +2025-11-16 17:46:21,267 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 17:46:21,276 - INFO - New best validation loss: 0.0355, perplexity: 1.04 +2025-11-16 17:48:54,389 - INFO - Epoch 1 Step 8010 (Global: 8010): loss=0.0318, ppl=1.03, grad_norm=0.28, lr=1.54e-05, throughput=3135 tok/s +2025-11-16 17:51:19,512 - INFO - Epoch 1 Step 8020 (Global: 8020): loss=0.0275, ppl=1.03, grad_norm=0.25, lr=1.53e-05, throughput=3308 tok/s +2025-11-16 17:53:53,403 - INFO - Epoch 1 Step 8030 (Global: 8030): loss=0.0340, ppl=1.03, grad_norm=0.28, lr=1.52e-05, throughput=3119 tok/s +2025-11-16 17:56:25,627 - INFO - Epoch 1 Step 8040 (Global: 8040): loss=0.0342, ppl=1.03, grad_norm=0.27, lr=1.50e-05, throughput=3153 tok/s +2025-11-16 17:58:57,622 - INFO - Epoch 1 Step 8050 (Global: 8050): loss=0.0341, ppl=1.03, grad_norm=0.33, lr=1.49e-05, throughput=3158 tok/s +2025-11-16 18:01:20,515 - INFO - Epoch 1 Step 8060 (Global: 8060): loss=0.0474, ppl=1.05, grad_norm=0.34, lr=1.48e-05, throughput=3359 tok/s +2025-11-16 18:03:53,754 - INFO - Epoch 1 Step 8070 (Global: 8070): loss=0.0363, ppl=1.04, grad_norm=0.28, lr=1.47e-05, throughput=3132 tok/s +2025-11-16 18:06:26,468 - INFO - Epoch 1 Step 8080 (Global: 8080): loss=0.0355, ppl=1.04, grad_norm=0.28, lr=1.46e-05, throughput=3143 tok/s +2025-11-16 18:08:49,299 - INFO - Epoch 1 Step 8090 (Global: 8090): loss=0.0334, ppl=1.03, grad_norm=0.31, lr=1.44e-05, throughput=3361 tok/s +2025-11-16 18:11:21,389 - INFO - Epoch 1 Step 8100 (Global: 8100): loss=0.0333, ppl=1.03, grad_norm=0.26, lr=1.43e-05, throughput=3156 tok/s +2025-11-16 18:13:53,587 - INFO - Epoch 1 Step 8110 (Global: 8110): loss=0.0311, ppl=1.03, grad_norm=0.26, lr=1.42e-05, throughput=3154 tok/s +2025-11-16 18:16:26,314 - INFO - Epoch 1 Step 8120 (Global: 8120): loss=0.0320, ppl=1.03, grad_norm=0.27, lr=1.41e-05, throughput=3143 tok/s +2025-11-16 18:18:50,101 - INFO - Epoch 1 Step 8130 (Global: 8130): loss=0.0332, ppl=1.03, grad_norm=0.29, lr=1.40e-05, throughput=3338 tok/s +2025-11-16 18:21:22,118 - INFO - Epoch 1 Step 8140 (Global: 8140): loss=0.0375, ppl=1.04, grad_norm=0.29, lr=1.39e-05, throughput=3158 tok/s +2025-11-16 18:23:53,981 - INFO - Epoch 1 Step 8150 (Global: 8150): loss=0.0341, ppl=1.03, grad_norm=0.29, lr=1.37e-05, throughput=3161 tok/s +2025-11-16 18:26:16,713 - INFO - Epoch 1 Step 8160 (Global: 8160): loss=0.0399, ppl=1.04, grad_norm=0.29, lr=1.36e-05, throughput=3363 tok/s +2025-11-16 18:28:48,520 - INFO - Epoch 1 Step 8170 (Global: 8170): loss=0.0305, ppl=1.03, grad_norm=0.26, lr=1.35e-05, throughput=3162 tok/s +2025-11-16 18:31:20,018 - INFO - Epoch 1 Step 8180 (Global: 8180): loss=0.0423, ppl=1.04, grad_norm=0.30, lr=1.34e-05, throughput=3168 tok/s +2025-11-16 18:33:51,693 - INFO - Epoch 1 Step 8190 (Global: 8190): loss=0.0328, ppl=1.03, grad_norm=0.36, lr=1.33e-05, throughput=3165 tok/s +2025-11-16 18:36:14,468 - INFO - Epoch 1 Step 8200 (Global: 8200): loss=0.0311, ppl=1.03, grad_norm=0.25, lr=1.32e-05, throughput=3362 tok/s +2025-11-16 18:38:45,975 - INFO - Epoch 1 Step 8210 (Global: 8210): loss=0.0352, ppl=1.04, grad_norm=0.28, lr=1.31e-05, throughput=3168 tok/s +2025-11-16 18:41:17,858 - INFO - Epoch 1 Step 8220 (Global: 8220): loss=0.0379, ppl=1.04, grad_norm=0.28, lr=1.29e-05, throughput=3160 tok/s +2025-11-16 18:43:40,400 - INFO - Epoch 1 Step 8230 (Global: 8230): loss=0.0328, ppl=1.03, grad_norm=0.26, lr=1.28e-05, throughput=3367 tok/s +2025-11-16 18:46:12,145 - INFO - Epoch 1 Step 8240 (Global: 8240): loss=0.0316, ppl=1.03, grad_norm=0.32, lr=1.27e-05, throughput=3163 tok/s +2025-11-16 18:48:44,472 - INFO - Epoch 1 Step 8250 (Global: 8250): loss=0.0379, ppl=1.04, grad_norm=0.36, lr=1.26e-05, throughput=3151 tok/s +2025-11-16 18:51:16,755 - INFO - Epoch 1 Step 8260 (Global: 8260): loss=0.0335, ppl=1.03, grad_norm=0.27, lr=1.25e-05, throughput=3152 tok/s +2025-11-16 18:53:39,894 - INFO - Epoch 1 Step 8270 (Global: 8270): loss=0.0382, ppl=1.04, grad_norm=0.28, lr=1.24e-05, throughput=3353 tok/s +2025-11-16 18:56:11,323 - INFO - Epoch 1 Step 8280 (Global: 8280): loss=0.0371, ppl=1.04, grad_norm=0.30, lr=1.23e-05, throughput=3170 tok/s +2025-11-16 18:58:43,572 - INFO - Epoch 1 Step 8290 (Global: 8290): loss=0.0342, ppl=1.03, grad_norm=0.26, lr=1.22e-05, throughput=3153 tok/s +2025-11-16 19:01:05,998 - INFO - Epoch 1 Step 8300 (Global: 8300): loss=0.0415, ppl=1.04, grad_norm=0.30, lr=1.21e-05, throughput=3370 tok/s +2025-11-16 19:03:37,378 - INFO - Epoch 1 Step 8310 (Global: 8310): loss=0.0328, ppl=1.03, grad_norm=0.25, lr=1.20e-05, throughput=3171 tok/s +2025-11-16 19:06:08,557 - INFO - Epoch 1 Step 8320 (Global: 8320): loss=0.0355, ppl=1.04, grad_norm=0.31, lr=1.18e-05, throughput=3175 tok/s +2025-11-16 19:08:39,767 - INFO - Epoch 1 Step 8330 (Global: 8330): loss=0.0360, ppl=1.04, grad_norm=0.27, lr=1.17e-05, throughput=3174 tok/s +2025-11-16 19:11:02,581 - INFO - Epoch 1 Step 8340 (Global: 8340): loss=0.0345, ppl=1.04, grad_norm=0.27, lr=1.16e-05, throughput=3361 tok/s +2025-11-16 19:13:34,982 - INFO - Epoch 1 Step 8350 (Global: 8350): loss=0.0336, ppl=1.03, grad_norm=0.26, lr=1.15e-05, throughput=3150 tok/s +2025-11-16 19:16:06,309 - INFO - Epoch 1 Step 8360 (Global: 8360): loss=0.0353, ppl=1.04, grad_norm=0.28, lr=1.14e-05, throughput=3172 tok/s +2025-11-16 19:18:28,614 - INFO - Epoch 1 Step 8370 (Global: 8370): loss=0.0329, ppl=1.03, grad_norm=0.26, lr=1.13e-05, throughput=3373 tok/s +2025-11-16 19:21:00,401 - INFO - Epoch 1 Step 8380 (Global: 8380): loss=0.0317, ppl=1.03, grad_norm=0.25, lr=1.12e-05, throughput=3162 tok/s +2025-11-16 19:23:31,922 - INFO - Epoch 1 Step 8390 (Global: 8390): loss=0.0415, ppl=1.04, grad_norm=0.30, lr=1.11e-05, throughput=3168 tok/s +2025-11-16 19:26:03,892 - INFO - Epoch 1 Step 8400 (Global: 8400): loss=0.0378, ppl=1.04, grad_norm=0.29, lr=1.10e-05, throughput=3159 tok/s +2025-11-16 19:28:27,258 - INFO - Epoch 1 Step 8410 (Global: 8410): loss=0.0349, ppl=1.04, grad_norm=0.28, lr=1.09e-05, throughput=3348 tok/s +2025-11-16 19:30:59,578 - INFO - Epoch 1 Step 8420 (Global: 8420): loss=0.0383, ppl=1.04, grad_norm=0.28, lr=1.08e-05, throughput=3151 tok/s +2025-11-16 19:33:31,900 - INFO - Epoch 1 Step 8430 (Global: 8430): loss=0.0347, ppl=1.04, grad_norm=0.26, lr=1.07e-05, throughput=3151 tok/s +2025-11-16 19:35:54,716 - INFO - Epoch 1 Step 8440 (Global: 8440): loss=0.0320, ppl=1.03, grad_norm=0.27, lr=1.06e-05, throughput=3361 tok/s +2025-11-16 19:38:27,084 - INFO - Epoch 1 Step 8450 (Global: 8450): loss=0.0347, ppl=1.04, grad_norm=0.28, lr=1.05e-05, throughput=3150 tok/s +2025-11-16 19:40:58,985 - INFO - Epoch 1 Step 8460 (Global: 8460): loss=0.0338, ppl=1.03, grad_norm=0.26, lr=1.04e-05, throughput=3160 tok/s +2025-11-16 19:43:30,933 - INFO - Epoch 1 Step 8470 (Global: 8470): loss=0.0393, ppl=1.04, grad_norm=0.29, lr=1.03e-05, throughput=3159 tok/s +2025-11-16 19:45:54,392 - INFO - Epoch 1 Step 8480 (Global: 8480): loss=0.0438, ppl=1.04, grad_norm=0.32, lr=1.02e-05, throughput=3346 tok/s +2025-11-16 19:48:27,468 - INFO - Epoch 1 Step 8490 (Global: 8490): loss=0.0390, ppl=1.04, grad_norm=0.29, lr=1.01e-05, throughput=3136 tok/s +2025-11-16 19:51:00,564 - INFO - Epoch 1 Step 8500 (Global: 8500): loss=0.0381, ppl=1.04, grad_norm=0.28, lr=9.96e-06, throughput=3135 tok/s +2025-11-16 19:51:00,565 - INFO - +Running validation at step 8500... +2025-11-16 19:58:35,835 - INFO - Validation loss: 0.0354, perplexity: 1.04 +2025-11-16 19:58:35,836 - INFO - Qualitative metrics (n=5): +2025-11-16 19:58:35,836 - INFO - BLEU: 0.8722 +2025-11-16 19:58:35,836 - INFO - METEOR: 0.9580 +2025-11-16 19:58:35,836 - INFO - Edit Distance: 0.0558 +2025-11-16 19:58:35,836 - INFO - F-measure: 0.9392 +2025-11-16 19:58:35,836 - INFO - +====================================================================== +2025-11-16 19:58:35,836 - INFO - Qualitative Evaluation Samples: +2025-11-16 19:58:35,836 - INFO - ====================================================================== +2025-11-16 19:58:35,836 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 19:58:35,837 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 19:58:35,837 - INFO - Generated: 'Q gave it four out of five stars and said that "The album [perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 19:58:35,837 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 19:58:35,837 - INFO - ---------------------------------------------------------------------- +2025-11-16 19:58:35,837 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 19:58:35,837 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 19:58:35,837 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 19:58:35,837 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 19:58:35,838 - INFO - ---------------------------------------------------------------------- +2025-11-16 19:58:35,838 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 19:58:35,838 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 19:58:35,838 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 19:58:35,838 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 19:58:35,838 - INFO - ---------------------------------------------------------------------- +2025-11-16 19:58:35,838 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 19:58:35,838 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 19:58:35,839 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 19:58:35,839 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 19:58:35,839 - INFO - ---------------------------------------------------------------------- +2025-11-16 19:58:35,840 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 19:58:35,840 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 19:58:35,841 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 19:58:35,841 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 19:58:35,842 - INFO - ---------------------------------------------------------------------- +2025-11-16 19:58:35,843 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_8500.jsonl +2025-11-16 19:59:21,705 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 19:59:21,716 - INFO - New best validation loss: 0.0354, perplexity: 1.04 +2025-11-16 20:01:55,600 - INFO - Epoch 1 Step 8510 (Global: 8510): loss=0.0312, ppl=1.03, grad_norm=0.27, lr=9.86e-06, throughput=3119 tok/s +2025-11-16 20:04:22,224 - INFO - Epoch 1 Step 8520 (Global: 8520): loss=0.0360, ppl=1.04, grad_norm=0.28, lr=9.76e-06, throughput=3274 tok/s +2025-11-16 20:06:56,835 - INFO - Epoch 1 Step 8530 (Global: 8530): loss=0.0283, ppl=1.03, grad_norm=0.29, lr=9.67e-06, throughput=3105 tok/s +2025-11-16 20:09:32,810 - INFO - Epoch 1 Step 8540 (Global: 8540): loss=0.0342, ppl=1.03, grad_norm=0.28, lr=9.57e-06, throughput=3077 tok/s +2025-11-16 20:11:59,737 - INFO - Epoch 1 Step 8550 (Global: 8550): loss=0.0360, ppl=1.04, grad_norm=0.28, lr=9.47e-06, throughput=3267 tok/s +2025-11-16 20:14:37,455 - INFO - Epoch 1 Step 8560 (Global: 8560): loss=0.0412, ppl=1.04, grad_norm=0.30, lr=9.37e-06, throughput=3043 tok/s +2025-11-16 20:17:12,051 - INFO - Epoch 1 Step 8570 (Global: 8570): loss=0.0346, ppl=1.04, grad_norm=0.27, lr=9.27e-06, throughput=3105 tok/s +2025-11-16 20:19:46,246 - INFO - Epoch 1 Step 8580 (Global: 8580): loss=0.0302, ppl=1.03, grad_norm=0.25, lr=9.18e-06, throughput=3113 tok/s +2025-11-16 20:22:11,465 - INFO - Epoch 1 Step 8590 (Global: 8590): loss=0.0334, ppl=1.03, grad_norm=0.27, lr=9.08e-06, throughput=3305 tok/s +2025-11-16 20:24:45,043 - INFO - Epoch 1 Step 8600 (Global: 8600): loss=0.0325, ppl=1.03, grad_norm=0.26, lr=8.98e-06, throughput=3126 tok/s +2025-11-16 20:27:18,529 - INFO - Epoch 1 Step 8610 (Global: 8610): loss=0.0359, ppl=1.04, grad_norm=0.29, lr=8.89e-06, throughput=3127 tok/s +2025-11-16 20:29:42,671 - INFO - Epoch 1 Step 8620 (Global: 8620): loss=0.0339, ppl=1.03, grad_norm=0.27, lr=8.79e-06, throughput=3330 tok/s +2025-11-16 20:32:15,565 - INFO - Epoch 1 Step 8630 (Global: 8630): loss=0.0349, ppl=1.04, grad_norm=0.28, lr=8.70e-06, throughput=3139 tok/s +2025-11-16 20:34:48,997 - INFO - Epoch 1 Step 8640 (Global: 8640): loss=0.0301, ppl=1.03, grad_norm=0.25, lr=8.60e-06, throughput=3128 tok/s +2025-11-16 20:37:12,911 - INFO - Epoch 1 Step 8650 (Global: 8650): loss=0.0372, ppl=1.04, grad_norm=0.29, lr=8.51e-06, throughput=3335 tok/s +2025-11-16 20:39:46,125 - INFO - Epoch 1 Step 8660 (Global: 8660): loss=0.0367, ppl=1.04, grad_norm=0.29, lr=8.42e-06, throughput=3133 tok/s +2025-11-16 20:42:18,614 - INFO - Epoch 1 Step 8670 (Global: 8670): loss=0.0362, ppl=1.04, grad_norm=0.28, lr=8.32e-06, throughput=3148 tok/s +2025-11-16 20:44:52,005 - INFO - Epoch 1 Step 8680 (Global: 8680): loss=0.0366, ppl=1.04, grad_norm=0.27, lr=8.23e-06, throughput=3129 tok/s +2025-11-16 20:47:16,343 - INFO - Epoch 1 Step 8690 (Global: 8690): loss=0.0379, ppl=1.04, grad_norm=0.28, lr=8.14e-06, throughput=3326 tok/s +2025-11-16 20:49:49,770 - INFO - Epoch 1 Step 8700 (Global: 8700): loss=0.0381, ppl=1.04, grad_norm=0.29, lr=8.05e-06, throughput=3129 tok/s +2025-11-16 20:52:23,335 - INFO - Epoch 1 Step 8710 (Global: 8710): loss=0.0405, ppl=1.04, grad_norm=0.29, lr=7.96e-06, throughput=3126 tok/s +2025-11-16 20:54:48,054 - INFO - Epoch 1 Step 8720 (Global: 8720): loss=0.0347, ppl=1.04, grad_norm=0.28, lr=7.87e-06, throughput=3317 tok/s +2025-11-16 20:57:20,394 - INFO - Epoch 1 Step 8730 (Global: 8730): loss=0.0430, ppl=1.04, grad_norm=0.29, lr=7.78e-06, throughput=3151 tok/s +2025-11-16 20:59:53,447 - INFO - Epoch 1 Step 8740 (Global: 8740): loss=0.0389, ppl=1.04, grad_norm=0.29, lr=7.69e-06, throughput=3136 tok/s +2025-11-16 21:02:27,806 - INFO - Epoch 1 Step 8750 (Global: 8750): loss=0.0312, ppl=1.03, grad_norm=0.26, lr=7.60e-06, throughput=3110 tok/s +2025-11-16 21:04:52,486 - INFO - Epoch 1 Step 8760 (Global: 8760): loss=0.0336, ppl=1.03, grad_norm=0.26, lr=7.51e-06, throughput=3318 tok/s +2025-11-16 21:07:27,411 - INFO - Epoch 1 Step 8770 (Global: 8770): loss=0.0318, ppl=1.03, grad_norm=0.26, lr=7.42e-06, throughput=3098 tok/s +2025-11-16 21:10:03,101 - INFO - Epoch 1 Step 8780 (Global: 8780): loss=0.0341, ppl=1.03, grad_norm=0.30, lr=7.33e-06, throughput=3083 tok/s +2025-11-16 21:12:29,596 - INFO - Epoch 1 Step 8790 (Global: 8790): loss=0.0432, ppl=1.04, grad_norm=0.29, lr=7.25e-06, throughput=3277 tok/s +2025-11-16 21:15:06,754 - INFO - Epoch 1 Step 8800 (Global: 8800): loss=0.0419, ppl=1.04, grad_norm=0.30, lr=7.16e-06, throughput=3054 tok/s +2025-11-16 21:17:41,451 - INFO - Epoch 1 Step 8810 (Global: 8810): loss=0.0315, ppl=1.03, grad_norm=0.26, lr=7.07e-06, throughput=3103 tok/s +2025-11-16 21:20:15,533 - INFO - Epoch 1 Step 8820 (Global: 8820): loss=0.0474, ppl=1.05, grad_norm=0.35, lr=6.99e-06, throughput=3115 tok/s +2025-11-16 21:22:39,066 - INFO - Epoch 1 Step 8830 (Global: 8830): loss=0.0342, ppl=1.03, grad_norm=0.29, lr=6.90e-06, throughput=3344 tok/s +2025-11-16 21:25:10,796 - INFO - Epoch 1 Step 8840 (Global: 8840): loss=0.0386, ppl=1.04, grad_norm=0.27, lr=6.82e-06, throughput=3164 tok/s +2025-11-16 21:27:42,980 - INFO - Epoch 1 Step 8850 (Global: 8850): loss=0.0358, ppl=1.04, grad_norm=0.28, lr=6.74e-06, throughput=3154 tok/s +2025-11-16 21:30:06,134 - INFO - Epoch 1 Step 8860 (Global: 8860): loss=0.0399, ppl=1.04, grad_norm=0.29, lr=6.65e-06, throughput=3353 tok/s +2025-11-16 21:32:38,720 - INFO - Epoch 1 Step 8870 (Global: 8870): loss=0.0374, ppl=1.04, grad_norm=0.28, lr=6.57e-06, throughput=3146 tok/s +2025-11-16 21:35:10,617 - INFO - Epoch 1 Step 8880 (Global: 8880): loss=0.0360, ppl=1.04, grad_norm=0.26, lr=6.49e-06, throughput=3160 tok/s +2025-11-16 21:37:43,105 - INFO - Epoch 1 Step 8890 (Global: 8890): loss=0.0357, ppl=1.04, grad_norm=0.26, lr=6.40e-06, throughput=3148 tok/s +2025-11-16 21:40:08,393 - INFO - Epoch 1 Step 8900 (Global: 8900): loss=0.0328, ppl=1.03, grad_norm=0.26, lr=6.32e-06, throughput=3304 tok/s +2025-11-16 21:42:46,377 - INFO - Epoch 1 Step 8910 (Global: 8910): loss=0.0354, ppl=1.04, grad_norm=0.29, lr=6.24e-06, throughput=3038 tok/s +2025-11-16 21:45:20,043 - INFO - Epoch 1 Step 8920 (Global: 8920): loss=0.0296, ppl=1.03, grad_norm=0.24, lr=6.16e-06, throughput=3124 tok/s +2025-11-16 21:47:51,863 - INFO - Epoch 1 Step 8930 (Global: 8930): loss=0.0326, ppl=1.03, grad_norm=0.25, lr=6.08e-06, throughput=3162 tok/s +2025-11-16 21:50:27,046 - INFO - Epoch 1 Step 8940 (Global: 8940): loss=0.0352, ppl=1.04, grad_norm=0.28, lr=6.00e-06, throughput=3096 tok/s +2025-11-16 21:53:03,319 - INFO - Epoch 1 Step 8950 (Global: 8950): loss=0.0389, ppl=1.04, grad_norm=0.28, lr=5.92e-06, throughput=3072 tok/s +2025-11-16 21:55:39,419 - INFO - Epoch 1 Step 8960 (Global: 8960): loss=0.0348, ppl=1.04, grad_norm=0.27, lr=5.84e-06, throughput=3075 tok/s +2025-11-16 21:58:06,303 - INFO - Epoch 1 Step 8970 (Global: 8970): loss=0.0394, ppl=1.04, grad_norm=0.29, lr=5.76e-06, throughput=3268 tok/s +2025-11-16 22:00:39,172 - INFO - Epoch 1 Step 8980 (Global: 8980): loss=0.0330, ppl=1.03, grad_norm=0.27, lr=5.68e-06, throughput=3140 tok/s +2025-11-16 22:03:11,294 - INFO - Epoch 1 Step 8990 (Global: 8990): loss=0.0329, ppl=1.03, grad_norm=0.27, lr=5.61e-06, throughput=3155 tok/s +2025-11-16 22:05:34,270 - INFO - Epoch 1 Step 9000 (Global: 9000): loss=0.0440, ppl=1.04, grad_norm=0.29, lr=5.53e-06, throughput=3357 tok/s +2025-11-16 22:05:34,271 - INFO - +Running validation at step 9000... +2025-11-16 22:12:58,113 - INFO - Validation loss: 0.0353, perplexity: 1.04 +2025-11-16 22:12:58,114 - INFO - Qualitative metrics (n=5): +2025-11-16 22:12:58,114 - INFO - BLEU: 0.8722 +2025-11-16 22:12:58,114 - INFO - METEOR: 0.9580 +2025-11-16 22:12:58,114 - INFO - Edit Distance: 0.0558 +2025-11-16 22:12:58,114 - INFO - F-measure: 0.9392 +2025-11-16 22:12:58,114 - INFO - +====================================================================== +2025-11-16 22:12:58,114 - INFO - Qualitative Evaluation Samples: +2025-11-16 22:12:58,114 - INFO - ====================================================================== +2025-11-16 22:12:58,114 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-16 22:12:58,114 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 22:12:58,115 - INFO - Generated: 'Q gave it four out of five stars and said that "The album [perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 22:12:58,115 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-16 22:12:58,115 - INFO - ---------------------------------------------------------------------- +2025-11-16 22:12:58,115 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-16 22:12:58,115 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 22:12:58,115 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 22:12:58,115 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-16 22:12:58,115 - INFO - ---------------------------------------------------------------------- +2025-11-16 22:12:58,115 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-16 22:12:58,115 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 22:12:58,116 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-16 22:12:58,116 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-16 22:12:58,116 - INFO - ---------------------------------------------------------------------- +2025-11-16 22:12:58,116 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-16 22:12:58,116 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 22:12:58,116 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 22:12:58,116 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-16 22:12:58,116 - INFO - ---------------------------------------------------------------------- +2025-11-16 22:12:58,116 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-16 22:12:58,116 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-16 22:12:58,116 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 22:12:58,117 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-16 22:12:58,117 - INFO - ---------------------------------------------------------------------- +2025-11-16 22:12:58,117 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_9000.jsonl +2025-11-16 22:13:46,984 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-16 22:13:46,992 - INFO - New best validation loss: 0.0353, perplexity: 1.04 +2025-11-16 22:16:10,310 - INFO - Epoch 1 Step 9010 (Global: 9010): loss=0.0364, ppl=1.04, grad_norm=0.30, lr=5.45e-06, throughput=3350 tok/s +2025-11-16 22:18:42,638 - INFO - Epoch 1 Step 9020 (Global: 9020): loss=0.0299, ppl=1.03, grad_norm=0.27, lr=5.38e-06, throughput=3151 tok/s +2025-11-16 22:21:14,673 - INFO - Epoch 1 Step 9030 (Global: 9030): loss=0.0378, ppl=1.04, grad_norm=0.32, lr=5.30e-06, throughput=3157 tok/s +2025-11-16 22:23:37,984 - INFO - Epoch 1 Step 9040 (Global: 9040): loss=0.0378, ppl=1.04, grad_norm=0.27, lr=5.23e-06, throughput=3349 tok/s +2025-11-16 22:26:10,272 - INFO - Epoch 1 Step 9050 (Global: 9050): loss=0.0367, ppl=1.04, grad_norm=0.30, lr=5.15e-06, throughput=3152 tok/s +2025-11-16 22:28:42,623 - INFO - Epoch 1 Step 9060 (Global: 9060): loss=0.0342, ppl=1.03, grad_norm=0.26, lr=5.08e-06, throughput=3151 tok/s +2025-11-16 22:31:15,502 - INFO - Epoch 1 Step 9070 (Global: 9070): loss=0.0293, ppl=1.03, grad_norm=0.27, lr=5.01e-06, throughput=3140 tok/s +2025-11-16 22:33:39,107 - INFO - Epoch 1 Step 9080 (Global: 9080): loss=0.0352, ppl=1.04, grad_norm=0.27, lr=4.93e-06, throughput=3343 tok/s +2025-11-16 22:36:11,576 - INFO - Epoch 1 Step 9090 (Global: 9090): loss=0.0380, ppl=1.04, grad_norm=0.28, lr=4.86e-06, throughput=3148 tok/s +2025-11-16 22:38:43,897 - INFO - Epoch 1 Step 9100 (Global: 9100): loss=0.0298, ppl=1.03, grad_norm=0.26, lr=4.79e-06, throughput=3151 tok/s +2025-11-16 22:41:06,687 - INFO - Epoch 1 Step 9110 (Global: 9110): loss=0.0323, ppl=1.03, grad_norm=0.27, lr=4.72e-06, throughput=3362 tok/s +2025-11-16 22:43:38,838 - INFO - Epoch 1 Step 9120 (Global: 9120): loss=0.0376, ppl=1.04, grad_norm=0.28, lr=4.65e-06, throughput=3155 tok/s +2025-11-16 22:46:11,704 - INFO - Epoch 1 Step 9130 (Global: 9130): loss=0.0311, ppl=1.03, grad_norm=0.26, lr=4.58e-06, throughput=3140 tok/s +2025-11-16 22:48:35,423 - INFO - Epoch 1 Step 9140 (Global: 9140): loss=0.0393, ppl=1.04, grad_norm=0.39, lr=4.51e-06, throughput=3340 tok/s +2025-11-16 22:51:08,151 - INFO - Epoch 1 Step 9150 (Global: 9150): loss=0.0347, ppl=1.04, grad_norm=0.27, lr=4.44e-06, throughput=3143 tok/s +2025-11-16 22:53:40,878 - INFO - Epoch 1 Step 9160 (Global: 9160): loss=0.0324, ppl=1.03, grad_norm=0.26, lr=4.37e-06, throughput=3143 tok/s +2025-11-16 22:56:13,389 - INFO - Epoch 1 Step 9170 (Global: 9170): loss=0.0331, ppl=1.03, grad_norm=0.27, lr=4.30e-06, throughput=3147 tok/s +2025-11-16 22:58:36,635 - INFO - Epoch 1 Step 9180 (Global: 9180): loss=0.0339, ppl=1.03, grad_norm=0.27, lr=4.23e-06, throughput=3351 tok/s +2025-11-16 23:01:08,952 - INFO - Epoch 1 Step 9190 (Global: 9190): loss=0.0377, ppl=1.04, grad_norm=0.29, lr=4.17e-06, throughput=3151 tok/s +2025-11-16 23:03:40,485 - INFO - Epoch 1 Step 9200 (Global: 9200): loss=0.0324, ppl=1.03, grad_norm=0.26, lr=4.10e-06, throughput=3168 tok/s +2025-11-16 23:06:03,139 - INFO - Epoch 1 Step 9210 (Global: 9210): loss=0.0362, ppl=1.04, grad_norm=0.29, lr=4.03e-06, throughput=3365 tok/s +2025-11-16 23:08:34,993 - INFO - Epoch 1 Step 9220 (Global: 9220): loss=0.0373, ppl=1.04, grad_norm=0.28, lr=3.97e-06, throughput=3161 tok/s +2025-11-16 23:11:07,968 - INFO - Epoch 1 Step 9230 (Global: 9230): loss=0.0304, ppl=1.03, grad_norm=0.27, lr=3.90e-06, throughput=3138 tok/s +2025-11-16 23:13:40,939 - INFO - Epoch 1 Step 9240 (Global: 9240): loss=0.0361, ppl=1.04, grad_norm=0.27, lr=3.84e-06, throughput=3138 tok/s +2025-11-16 23:16:04,757 - INFO - Epoch 1 Step 9250 (Global: 9250): loss=0.0386, ppl=1.04, grad_norm=0.30, lr=3.77e-06, throughput=3338 tok/s +2025-11-16 23:18:37,442 - INFO - Epoch 1 Step 9260 (Global: 9260): loss=0.0387, ppl=1.04, grad_norm=0.29, lr=3.71e-06, throughput=3144 tok/s +2025-11-16 23:21:09,894 - INFO - Epoch 1 Step 9270 (Global: 9270): loss=0.0371, ppl=1.04, grad_norm=0.29, lr=3.65e-06, throughput=3149 tok/s +2025-11-16 23:23:33,313 - INFO - Epoch 1 Step 9280 (Global: 9280): loss=0.0320, ppl=1.03, grad_norm=0.25, lr=3.58e-06, throughput=3347 tok/s +2025-11-16 23:26:06,610 - INFO - Epoch 1 Step 9290 (Global: 9290): loss=0.0379, ppl=1.04, grad_norm=0.29, lr=3.52e-06, throughput=3131 tok/s +2025-11-16 23:28:39,702 - INFO - Epoch 1 Step 9300 (Global: 9300): loss=0.0328, ppl=1.03, grad_norm=0.27, lr=3.46e-06, throughput=3135 tok/s +2025-11-16 23:31:12,211 - INFO - Epoch 1 Step 9310 (Global: 9310): loss=0.0289, ppl=1.03, grad_norm=0.26, lr=3.40e-06, throughput=3147 tok/s +2025-11-16 23:33:35,161 - INFO - Epoch 1 Step 9320 (Global: 9320): loss=0.0364, ppl=1.04, grad_norm=0.30, lr=3.34e-06, throughput=3358 tok/s +2025-11-16 23:36:07,708 - INFO - Epoch 1 Step 9330 (Global: 9330): loss=0.0356, ppl=1.04, grad_norm=0.30, lr=3.28e-06, throughput=3147 tok/s +2025-11-16 23:38:39,906 - INFO - Epoch 1 Step 9340 (Global: 9340): loss=0.0367, ppl=1.04, grad_norm=0.27, lr=3.22e-06, throughput=3154 tok/s +2025-11-16 23:41:03,486 - INFO - Epoch 1 Step 9350 (Global: 9350): loss=0.0405, ppl=1.04, grad_norm=0.34, lr=3.16e-06, throughput=3343 tok/s +2025-11-16 23:43:36,077 - INFO - Epoch 1 Step 9360 (Global: 9360): loss=0.0320, ppl=1.03, grad_norm=0.26, lr=3.10e-06, throughput=3146 tok/s +2025-11-16 23:46:08,488 - INFO - Epoch 1 Step 9370 (Global: 9370): loss=0.0341, ppl=1.03, grad_norm=0.27, lr=3.05e-06, throughput=3149 tok/s +2025-11-16 23:48:41,290 - INFO - Epoch 1 Step 9380 (Global: 9380): loss=0.0385, ppl=1.04, grad_norm=0.29, lr=2.99e-06, throughput=3141 tok/s +2025-11-16 23:51:04,724 - INFO - Epoch 1 Step 9390 (Global: 9390): loss=0.0386, ppl=1.04, grad_norm=0.28, lr=2.93e-06, throughput=3347 tok/s +2025-11-16 23:53:37,973 - INFO - Epoch 1 Step 9400 (Global: 9400): loss=0.0335, ppl=1.03, grad_norm=0.32, lr=2.88e-06, throughput=3132 tok/s +2025-11-16 23:56:09,897 - INFO - Epoch 1 Step 9410 (Global: 9410): loss=0.0406, ppl=1.04, grad_norm=0.29, lr=2.82e-06, throughput=3160 tok/s +2025-11-16 23:58:33,985 - INFO - Epoch 1 Step 9420 (Global: 9420): loss=0.0310, ppl=1.03, grad_norm=0.26, lr=2.76e-06, throughput=3331 tok/s +2025-11-17 00:01:07,755 - INFO - Epoch 1 Step 9430 (Global: 9430): loss=0.0317, ppl=1.03, grad_norm=0.26, lr=2.71e-06, throughput=3122 tok/s +2025-11-17 00:03:40,663 - INFO - Epoch 1 Step 9440 (Global: 9440): loss=0.0365, ppl=1.04, grad_norm=0.29, lr=2.66e-06, throughput=3139 tok/s +2025-11-17 00:06:12,569 - INFO - Epoch 1 Step 9450 (Global: 9450): loss=0.0359, ppl=1.04, grad_norm=0.28, lr=2.60e-06, throughput=3160 tok/s +2025-11-17 00:08:36,881 - INFO - Epoch 1 Step 9460 (Global: 9460): loss=0.0363, ppl=1.04, grad_norm=0.27, lr=2.55e-06, throughput=3326 tok/s +2025-11-17 00:11:09,517 - INFO - Epoch 1 Step 9470 (Global: 9470): loss=0.0298, ppl=1.03, grad_norm=0.25, lr=2.50e-06, throughput=3145 tok/s +2025-11-17 00:13:42,937 - INFO - Epoch 1 Step 9480 (Global: 9480): loss=0.0337, ppl=1.03, grad_norm=0.27, lr=2.44e-06, throughput=3129 tok/s +2025-11-17 00:16:09,178 - INFO - Epoch 1 Step 9490 (Global: 9490): loss=0.0405, ppl=1.04, grad_norm=0.29, lr=2.39e-06, throughput=3282 tok/s +2025-11-17 00:18:42,919 - INFO - Epoch 1 Step 9500 (Global: 9500): loss=0.0327, ppl=1.03, grad_norm=0.27, lr=2.34e-06, throughput=3122 tok/s +2025-11-17 00:18:42,920 - INFO - +Running validation at step 9500... +2025-11-17 00:26:14,057 - INFO - Validation loss: 0.0353, perplexity: 1.04 +2025-11-17 00:26:14,057 - INFO - Qualitative metrics (n=5): +2025-11-17 00:26:14,057 - INFO - BLEU: 0.8722 +2025-11-17 00:26:14,057 - INFO - METEOR: 0.9580 +2025-11-17 00:26:14,058 - INFO - Edit Distance: 0.0558 +2025-11-17 00:26:14,058 - INFO - F-measure: 0.9392 +2025-11-17 00:26:14,058 - INFO - +====================================================================== +2025-11-17 00:26:14,058 - INFO - Qualitative Evaluation Samples: +2025-11-17 00:26:14,058 - INFO - ====================================================================== +2025-11-17 00:26:14,058 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-17 00:26:14,058 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 00:26:14,058 - INFO - Generated: 'Q gave it four out of five stars and said that "The album [perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-17 00:26:14,059 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-17 00:26:14,059 - INFO - ---------------------------------------------------------------------- +2025-11-17 00:26:14,059 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-17 00:26:14,059 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 00:26:14,059 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-17 00:26:14,059 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-17 00:26:14,059 - INFO - ---------------------------------------------------------------------- +2025-11-17 00:26:14,059 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-17 00:26:14,059 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 00:26:14,059 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-17 00:26:14,059 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-17 00:26:14,060 - INFO - ---------------------------------------------------------------------- +2025-11-17 00:26:14,060 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-17 00:26:14,060 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 00:26:14,060 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-17 00:26:14,060 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-17 00:26:14,060 - INFO - ---------------------------------------------------------------------- +2025-11-17 00:26:14,060 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-17 00:26:14,060 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 00:26:14,060 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-17 00:26:14,060 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-17 00:26:14,061 - INFO - ---------------------------------------------------------------------- +2025-11-17 00:26:14,062 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_9500.jsonl +2025-11-17 00:27:02,615 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-17 00:27:02,624 - INFO - New best validation loss: 0.0353, perplexity: 1.04 +2025-11-17 00:29:37,760 - INFO - Epoch 1 Step 9510 (Global: 9510): loss=0.0334, ppl=1.03, grad_norm=0.27, lr=2.29e-06, throughput=3094 tok/s +2025-11-17 00:32:12,029 - INFO - Epoch 1 Step 9520 (Global: 9520): loss=0.0427, ppl=1.04, grad_norm=0.30, lr=2.24e-06, throughput=3112 tok/s +2025-11-17 00:34:36,124 - INFO - Epoch 1 Step 9530 (Global: 9530): loss=0.0373, ppl=1.04, grad_norm=0.29, lr=2.19e-06, throughput=3331 tok/s +2025-11-17 00:37:09,221 - INFO - Epoch 1 Step 9540 (Global: 9540): loss=0.0316, ppl=1.03, grad_norm=0.27, lr=2.14e-06, throughput=3135 tok/s +2025-11-17 00:39:42,939 - INFO - Epoch 1 Step 9550 (Global: 9550): loss=0.0320, ppl=1.03, grad_norm=0.27, lr=2.10e-06, throughput=3123 tok/s +2025-11-17 00:42:07,953 - INFO - Epoch 1 Step 9560 (Global: 9560): loss=0.0338, ppl=1.03, grad_norm=0.27, lr=2.05e-06, throughput=3310 tok/s +2025-11-17 00:44:41,908 - INFO - Epoch 1 Step 9570 (Global: 9570): loss=0.0381, ppl=1.04, grad_norm=0.30, lr=2.00e-06, throughput=3118 tok/s +2025-11-17 00:47:14,151 - INFO - Epoch 1 Step 9580 (Global: 9580): loss=0.0290, ppl=1.03, grad_norm=0.26, lr=1.95e-06, throughput=3153 tok/s +2025-11-17 00:49:46,936 - INFO - Epoch 1 Step 9590 (Global: 9590): loss=0.0325, ppl=1.03, grad_norm=0.27, lr=1.91e-06, throughput=3142 tok/s +2025-11-17 00:52:10,517 - INFO - Epoch 1 Step 9600 (Global: 9600): loss=0.0405, ppl=1.04, grad_norm=0.29, lr=1.86e-06, throughput=3343 tok/s +2025-11-17 00:54:42,778 - INFO - Epoch 1 Step 9610 (Global: 9610): loss=0.0282, ppl=1.03, grad_norm=0.24, lr=1.82e-06, throughput=3153 tok/s +2025-11-17 00:57:14,847 - INFO - Epoch 1 Step 9620 (Global: 9620): loss=0.0350, ppl=1.04, grad_norm=0.28, lr=1.77e-06, throughput=3157 tok/s +2025-11-17 00:59:38,267 - INFO - Epoch 1 Step 9630 (Global: 9630): loss=0.0307, ppl=1.03, grad_norm=0.25, lr=1.73e-06, throughput=3347 tok/s +2025-11-17 01:02:10,673 - INFO - Epoch 1 Step 9640 (Global: 9640): loss=0.0378, ppl=1.04, grad_norm=0.29, lr=1.68e-06, throughput=3150 tok/s +2025-11-17 01:04:43,780 - INFO - Epoch 1 Step 9650 (Global: 9650): loss=0.0421, ppl=1.04, grad_norm=0.29, lr=1.64e-06, throughput=3135 tok/s +2025-11-17 01:07:17,477 - INFO - Epoch 1 Step 9660 (Global: 9660): loss=0.0312, ppl=1.03, grad_norm=0.26, lr=1.60e-06, throughput=3123 tok/s +2025-11-17 01:09:41,055 - INFO - Epoch 1 Step 9670 (Global: 9670): loss=0.0364, ppl=1.04, grad_norm=0.36, lr=1.56e-06, throughput=3343 tok/s +2025-11-17 01:12:13,582 - INFO - Epoch 1 Step 9680 (Global: 9680): loss=0.0382, ppl=1.04, grad_norm=0.31, lr=1.52e-06, throughput=3147 tok/s +2025-11-17 01:14:47,350 - INFO - Epoch 1 Step 9690 (Global: 9690): loss=0.0342, ppl=1.03, grad_norm=0.27, lr=1.48e-06, throughput=3122 tok/s +2025-11-17 01:17:10,483 - INFO - Epoch 1 Step 9700 (Global: 9700): loss=0.0342, ppl=1.03, grad_norm=0.26, lr=1.44e-06, throughput=3354 tok/s +2025-11-17 01:19:42,566 - INFO - Epoch 1 Step 9710 (Global: 9710): loss=0.0343, ppl=1.03, grad_norm=0.27, lr=1.40e-06, throughput=3156 tok/s +2025-11-17 01:22:14,449 - INFO - Epoch 1 Step 9720 (Global: 9720): loss=0.0321, ppl=1.03, grad_norm=0.26, lr=1.36e-06, throughput=3160 tok/s +2025-11-17 01:24:46,077 - INFO - Epoch 1 Step 9730 (Global: 9730): loss=0.0372, ppl=1.04, grad_norm=0.28, lr=1.32e-06, throughput=3166 tok/s +2025-11-17 01:27:08,867 - INFO - Epoch 1 Step 9740 (Global: 9740): loss=0.0336, ppl=1.03, grad_norm=0.27, lr=1.28e-06, throughput=3362 tok/s +2025-11-17 01:29:41,214 - INFO - Epoch 1 Step 9750 (Global: 9750): loss=0.0350, ppl=1.04, grad_norm=0.27, lr=1.24e-06, throughput=3151 tok/s +2025-11-17 01:32:13,458 - INFO - Epoch 1 Step 9760 (Global: 9760): loss=0.0369, ppl=1.04, grad_norm=0.27, lr=1.21e-06, throughput=3153 tok/s +2025-11-17 01:34:36,278 - INFO - Epoch 1 Step 9770 (Global: 9770): loss=0.0316, ppl=1.03, grad_norm=0.26, lr=1.17e-06, throughput=3361 tok/s +2025-11-17 01:37:07,939 - INFO - Epoch 1 Step 9780 (Global: 9780): loss=0.0374, ppl=1.04, grad_norm=0.27, lr=1.13e-06, throughput=3165 tok/s +2025-11-17 01:39:40,201 - INFO - Epoch 1 Step 9790 (Global: 9790): loss=0.0345, ppl=1.04, grad_norm=0.27, lr=1.10e-06, throughput=3153 tok/s +2025-11-17 01:42:11,727 - INFO - Epoch 1 Step 9800 (Global: 9800): loss=0.0324, ppl=1.03, grad_norm=0.26, lr=1.06e-06, throughput=3168 tok/s +2025-11-17 01:44:34,917 - INFO - Epoch 1 Step 9810 (Global: 9810): loss=0.0366, ppl=1.04, grad_norm=0.29, lr=1.03e-06, throughput=3352 tok/s +2025-11-17 01:47:06,980 - INFO - Epoch 1 Step 9820 (Global: 9820): loss=0.0384, ppl=1.04, grad_norm=0.31, lr=9.97e-07, throughput=3157 tok/s +2025-11-17 01:49:38,858 - INFO - Epoch 1 Step 9830 (Global: 9830): loss=0.0357, ppl=1.04, grad_norm=0.28, lr=9.64e-07, throughput=3164 tok/s +2025-11-17 01:52:01,989 - INFO - Epoch 1 Step 9840 (Global: 9840): loss=0.0398, ppl=1.04, grad_norm=0.29, lr=9.32e-07, throughput=3354 tok/s +2025-11-17 01:54:33,793 - INFO - Epoch 1 Step 9850 (Global: 9850): loss=0.0368, ppl=1.04, grad_norm=0.29, lr=9.00e-07, throughput=3162 tok/s +2025-11-17 01:57:05,905 - INFO - Epoch 1 Step 9860 (Global: 9860): loss=0.0358, ppl=1.04, grad_norm=0.27, lr=8.68e-07, throughput=3156 tok/s +2025-11-17 01:59:38,001 - INFO - Epoch 1 Step 9870 (Global: 9870): loss=0.0342, ppl=1.03, grad_norm=0.26, lr=8.37e-07, throughput=3156 tok/s +2025-11-17 02:02:01,427 - INFO - Epoch 1 Step 9880 (Global: 9880): loss=0.0384, ppl=1.04, grad_norm=0.28, lr=8.07e-07, throughput=3347 tok/s +2025-11-17 02:04:33,898 - INFO - Epoch 1 Step 9890 (Global: 9890): loss=0.0339, ppl=1.03, grad_norm=0.27, lr=7.77e-07, throughput=3148 tok/s +2025-11-17 02:07:06,163 - INFO - Epoch 1 Step 9900 (Global: 9900): loss=0.0320, ppl=1.03, grad_norm=0.27, lr=7.48e-07, throughput=3152 tok/s +2025-11-17 02:09:29,348 - INFO - Epoch 1 Step 9910 (Global: 9910): loss=0.0341, ppl=1.03, grad_norm=0.27, lr=7.20e-07, throughput=3352 tok/s +2025-11-17 02:12:01,624 - INFO - Epoch 1 Step 9920 (Global: 9920): loss=0.0357, ppl=1.04, grad_norm=0.28, lr=6.92e-07, throughput=3152 tok/s +2025-11-17 02:14:34,610 - INFO - Epoch 1 Step 9930 (Global: 9930): loss=0.0368, ppl=1.04, grad_norm=0.28, lr=6.64e-07, throughput=3138 tok/s +2025-11-17 02:17:07,190 - INFO - Epoch 1 Step 9940 (Global: 9940): loss=0.0321, ppl=1.03, grad_norm=0.27, lr=6.37e-07, throughput=3146 tok/s +2025-11-17 02:19:29,928 - INFO - Epoch 1 Step 9950 (Global: 9950): loss=0.0323, ppl=1.03, grad_norm=0.27, lr=6.11e-07, throughput=3363 tok/s +2025-11-17 02:22:02,776 - INFO - Epoch 1 Step 9960 (Global: 9960): loss=0.0318, ppl=1.03, grad_norm=0.28, lr=5.85e-07, throughput=3140 tok/s +2025-11-17 02:24:35,463 - INFO - Epoch 1 Step 9970 (Global: 9970): loss=0.0349, ppl=1.04, grad_norm=0.29, lr=5.60e-07, throughput=3144 tok/s +2025-11-17 02:26:58,316 - INFO - Epoch 1 Step 9980 (Global: 9980): loss=0.0352, ppl=1.04, grad_norm=0.28, lr=5.35e-07, throughput=3360 tok/s +2025-11-17 02:29:29,984 - INFO - Epoch 1 Step 9990 (Global: 9990): loss=0.0452, ppl=1.05, grad_norm=0.30, lr=5.11e-07, throughput=3165 tok/s +2025-11-17 02:32:01,746 - INFO - Epoch 1 Step 10000 (Global: 10000): loss=0.0339, ppl=1.03, grad_norm=0.26, lr=4.87e-07, throughput=3163 tok/s +2025-11-17 02:32:01,748 - INFO - +Running validation at step 10000... +2025-11-17 02:39:26,566 - INFO - Validation loss: 0.0353, perplexity: 1.04 +2025-11-17 02:39:26,566 - INFO - Qualitative metrics (n=5): +2025-11-17 02:39:26,566 - INFO - BLEU: 0.8722 +2025-11-17 02:39:26,566 - INFO - METEOR: 0.9580 +2025-11-17 02:39:26,566 - INFO - Edit Distance: 0.0555 +2025-11-17 02:39:26,566 - INFO - F-measure: 0.9392 +2025-11-17 02:39:26,566 - INFO - +====================================================================== +2025-11-17 02:39:26,567 - INFO - Qualitative Evaluation Samples: +2025-11-17 02:39:26,567 - INFO - ====================================================================== +2025-11-17 02:39:26,567 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-17 02:39:26,567 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 02:39:26,567 - INFO - Generated: 'Q gave it four out of five stars and said that "the album [Perhaps\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-17 02:39:26,567 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-17 02:39:26,567 - INFO - ---------------------------------------------------------------------- +2025-11-17 02:39:26,567 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-17 02:39:26,567 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 02:39:26,567 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-17 02:39:26,567 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-17 02:39:26,568 - INFO - ---------------------------------------------------------------------- +2025-11-17 02:39:26,568 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-17 02:39:26,568 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 02:39:26,568 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-17 02:39:26,568 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-17 02:39:26,568 - INFO - ---------------------------------------------------------------------- +2025-11-17 02:39:26,568 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-17 02:39:26,568 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 02:39:26,569 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-17 02:39:26,569 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-17 02:39:26,569 - INFO - ---------------------------------------------------------------------- +2025-11-17 02:39:26,569 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-17 02:39:26,569 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 02:39:26,569 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-17 02:39:26,569 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-17 02:39:26,569 - INFO - ---------------------------------------------------------------------- +2025-11-17 02:39:26,570 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_10000.jsonl +2025-11-17 02:40:14,335 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-17 02:40:14,345 - INFO - New best validation loss: 0.0353, perplexity: 1.04 +2025-11-17 02:42:47,039 - INFO - Epoch 1 Step 10010 (Global: 10010): loss=0.0356, ppl=1.04, grad_norm=0.29, lr=4.64e-07, throughput=3144 tok/s +2025-11-17 02:45:10,118 - INFO - Epoch 1 Step 10020 (Global: 10020): loss=0.0318, ppl=1.03, grad_norm=0.26, lr=4.42e-07, throughput=3355 tok/s +2025-11-17 02:47:42,302 - INFO - Epoch 1 Step 10030 (Global: 10030): loss=0.0393, ppl=1.04, grad_norm=0.29, lr=4.20e-07, throughput=3154 tok/s +2025-11-17 02:50:14,539 - INFO - Epoch 1 Step 10040 (Global: 10040): loss=0.0290, ppl=1.03, grad_norm=0.25, lr=3.98e-07, throughput=3153 tok/s +2025-11-17 02:52:38,320 - INFO - Epoch 1 Step 10050 (Global: 10050): loss=0.0346, ppl=1.04, grad_norm=0.29, lr=3.78e-07, throughput=3338 tok/s +2025-11-17 02:55:11,146 - INFO - Epoch 1 Step 10060 (Global: 10060): loss=0.0316, ppl=1.03, grad_norm=0.26, lr=3.57e-07, throughput=3141 tok/s +2025-11-17 02:57:43,718 - INFO - Epoch 1 Step 10070 (Global: 10070): loss=0.0381, ppl=1.04, grad_norm=0.30, lr=3.38e-07, throughput=3146 tok/s +2025-11-17 03:00:16,554 - INFO - Epoch 1 Step 10080 (Global: 10080): loss=0.0327, ppl=1.03, grad_norm=0.25, lr=3.18e-07, throughput=3141 tok/s +2025-11-17 03:02:39,341 - INFO - Epoch 1 Step 10090 (Global: 10090): loss=0.0382, ppl=1.04, grad_norm=0.31, lr=3.00e-07, throughput=3362 tok/s +2025-11-17 03:05:11,139 - INFO - Epoch 1 Step 10100 (Global: 10100): loss=0.0320, ppl=1.03, grad_norm=0.26, lr=2.82e-07, throughput=3162 tok/s +2025-11-17 03:07:43,146 - INFO - Epoch 1 Step 10110 (Global: 10110): loss=0.0340, ppl=1.03, grad_norm=0.27, lr=2.64e-07, throughput=3158 tok/s +2025-11-17 03:10:06,573 - INFO - Epoch 1 Step 10120 (Global: 10120): loss=0.0330, ppl=1.03, grad_norm=0.32, lr=2.47e-07, throughput=3347 tok/s +2025-11-17 03:12:38,427 - INFO - Epoch 1 Step 10130 (Global: 10130): loss=0.0351, ppl=1.04, grad_norm=0.31, lr=2.31e-07, throughput=3161 tok/s +2025-11-17 03:15:10,365 - INFO - Epoch 1 Step 10140 (Global: 10140): loss=0.0294, ppl=1.03, grad_norm=0.24, lr=2.15e-07, throughput=3159 tok/s +2025-11-17 03:17:42,393 - INFO - Epoch 1 Step 10150 (Global: 10150): loss=0.0331, ppl=1.03, grad_norm=0.28, lr=2.00e-07, throughput=3157 tok/s +2025-11-17 03:20:05,627 - INFO - Epoch 1 Step 10160 (Global: 10160): loss=0.0380, ppl=1.04, grad_norm=0.29, lr=1.85e-07, throughput=3351 tok/s +2025-11-17 03:22:37,858 - INFO - Epoch 1 Step 10170 (Global: 10170): loss=0.0357, ppl=1.04, grad_norm=0.28, lr=1.71e-07, throughput=3153 tok/s +2025-11-17 03:25:10,092 - INFO - Epoch 1 Step 10180 (Global: 10180): loss=0.0372, ppl=1.04, grad_norm=0.28, lr=1.58e-07, throughput=3153 tok/s +2025-11-17 03:27:33,417 - INFO - Epoch 1 Step 10190 (Global: 10190): loss=0.0347, ppl=1.04, grad_norm=0.27, lr=1.45e-07, throughput=3349 tok/s +2025-11-17 03:30:05,681 - INFO - Epoch 1 Step 10200 (Global: 10200): loss=0.0363, ppl=1.04, grad_norm=0.27, lr=1.32e-07, throughput=3152 tok/s +2025-11-17 03:32:38,283 - INFO - Epoch 1 Step 10210 (Global: 10210): loss=0.0336, ppl=1.03, grad_norm=0.28, lr=1.20e-07, throughput=3145 tok/s +2025-11-17 03:35:11,182 - INFO - Epoch 1 Step 10220 (Global: 10220): loss=0.0317, ppl=1.03, grad_norm=0.27, lr=1.09e-07, throughput=3139 tok/s +2025-11-17 03:37:34,692 - INFO - Epoch 1 Step 10230 (Global: 10230): loss=0.0359, ppl=1.04, grad_norm=0.29, lr=9.81e-08, throughput=3345 tok/s +2025-11-17 03:40:06,480 - INFO - Epoch 1 Step 10240 (Global: 10240): loss=0.0361, ppl=1.04, grad_norm=0.27, lr=8.79e-08, throughput=3162 tok/s +2025-11-17 03:42:38,660 - INFO - Epoch 1 Step 10250 (Global: 10250): loss=0.0273, ppl=1.03, grad_norm=0.24, lr=7.83e-08, throughput=3154 tok/s +2025-11-17 03:45:02,311 - INFO - Epoch 1 Step 10260 (Global: 10260): loss=0.0380, ppl=1.04, grad_norm=0.29, lr=6.92e-08, throughput=3341 tok/s +2025-11-17 03:47:34,472 - INFO - Epoch 1 Step 10270 (Global: 10270): loss=0.0339, ppl=1.03, grad_norm=0.27, lr=6.06e-08, throughput=3155 tok/s +2025-11-17 03:50:06,858 - INFO - Epoch 1 Step 10280 (Global: 10280): loss=0.0352, ppl=1.04, grad_norm=0.28, lr=5.27e-08, throughput=3150 tok/s +2025-11-17 03:52:39,357 - INFO - Epoch 1 Step 10290 (Global: 10290): loss=0.0410, ppl=1.04, grad_norm=0.30, lr=4.53e-08, throughput=3148 tok/s +2025-11-17 03:55:02,656 - INFO - Epoch 1 Step 10300 (Global: 10300): loss=0.0345, ppl=1.04, grad_norm=0.28, lr=3.84e-08, throughput=3350 tok/s +2025-11-17 03:57:34,968 - INFO - Epoch 1 Step 10310 (Global: 10310): loss=0.0275, ppl=1.03, grad_norm=0.26, lr=3.21e-08, throughput=3151 tok/s +2025-11-17 04:00:07,397 - INFO - Epoch 1 Step 10320 (Global: 10320): loss=0.0302, ppl=1.03, grad_norm=0.25, lr=2.64e-08, throughput=3149 tok/s +2025-11-17 04:02:30,653 - INFO - Epoch 1 Step 10330 (Global: 10330): loss=0.0288, ppl=1.03, grad_norm=0.25, lr=2.12e-08, throughput=3351 tok/s +2025-11-17 04:05:03,807 - INFO - Epoch 1 Step 10340 (Global: 10340): loss=0.0382, ppl=1.04, grad_norm=0.29, lr=1.66e-08, throughput=3134 tok/s +2025-11-17 04:07:35,762 - INFO - Epoch 1 Step 10350 (Global: 10350): loss=0.0361, ppl=1.04, grad_norm=0.28, lr=1.26e-08, throughput=3159 tok/s +2025-11-17 04:10:08,245 - INFO - Epoch 1 Step 10360 (Global: 10360): loss=0.0355, ppl=1.04, grad_norm=0.29, lr=9.12e-09, throughput=3148 tok/s +2025-11-17 04:12:32,714 - INFO - Epoch 1 Step 10370 (Global: 10370): loss=0.0354, ppl=1.04, grad_norm=0.27, lr=6.20e-09, throughput=3323 tok/s +2025-11-17 04:15:05,550 - INFO - Epoch 1 Step 10380 (Global: 10380): loss=0.0398, ppl=1.04, grad_norm=0.29, lr=3.84e-09, throughput=3141 tok/s +2025-11-17 04:17:37,475 - INFO - Epoch 1 Step 10390 (Global: 10390): loss=0.0272, ppl=1.03, grad_norm=0.28, lr=2.05e-09, throughput=3159 tok/s +2025-11-17 04:20:00,549 - INFO - Epoch 1 Step 10400 (Global: 10400): loss=0.0303, ppl=1.03, grad_norm=0.27, lr=8.11e-10, throughput=3355 tok/s +2025-11-17 04:22:32,383 - INFO - Epoch 1 Step 10410 (Global: 10410): loss=0.0314, ppl=1.03, grad_norm=0.26, lr=1.38e-10, throughput=3161 tok/s +2025-11-17 04:24:17,507 - INFO - Flushing 8 remainder batches from gradient accumulation +2025-11-17 04:24:17,511 - INFO - Rescaling gradients by 1.50x (compensating for 8/12 batches) +2025-11-17 04:24:17,709 - INFO - Remainder batch: loss=0.0319, ppl=1.03, grad_norm=0.31 +2025-11-17 04:24:17,718 - INFO - Epoch 1 training: loss=0.0815, ppl=1.08, grad_norm=0.41, throughput=2724 tok/s (183558.5s total) +2025-11-17 04:24:17,721 - INFO - +Running final validation... +2025-11-17 04:31:38,302 - INFO - Validation loss: 0.0353, perplexity: 1.04 +2025-11-17 04:31:38,302 - INFO - Qualitative metrics (n=5): +2025-11-17 04:31:38,302 - INFO - BLEU: 0.8722 +2025-11-17 04:31:38,303 - INFO - METEOR: 0.9580 +2025-11-17 04:31:38,303 - INFO - Edit Distance: 0.0558 +2025-11-17 04:31:38,303 - INFO - F-measure: 0.9392 +2025-11-17 04:31:38,303 - INFO - +====================================================================== +2025-11-17 04:31:38,303 - INFO - Qualitative Evaluation Samples: +2025-11-17 04:31:38,303 - INFO - ====================================================================== +2025-11-17 04:31:38,303 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-17 04:31:38,303 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 04:31:38,303 - INFO - Generated: 'Q gave it four out of five stars and said that "The album [perhaps]\'s seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-17 04:31:38,303 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-17 04:31:38,303 - INFO - ---------------------------------------------------------------------- +2025-11-17 04:31:38,304 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-17 04:31:38,304 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 04:31:38,304 - INFO - Generated: ', Sire was Abou-Chneakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-17 04:31:38,304 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-17 04:31:38,304 - INFO - ---------------------------------------------------------------------- +2025-11-17 04:31:38,304 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-17 04:31:38,304 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 04:31:38,304 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and d...' +2025-11-17 04:31:38,304 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-17 04:31:38,304 - INFO - ---------------------------------------------------------------------- +2025-11-17 04:31:38,304 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-17 04:31:38,305 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 04:31:38,305 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-17 04:31:38,305 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-17 04:31:38,305 - INFO - ---------------------------------------------------------------------- +2025-11-17 04:31:38,305 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-17 04:31:38,305 - INFO - Context: [Mean pooled from 1000 tokens] +2025-11-17 04:31:38,305 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-17 04:31:38,305 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-17 04:31:38,306 - INFO - ---------------------------------------------------------------------- +2025-11-17 04:31:38,307 - INFO - +Qualitative samples saved to: outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/qualitative_step_10417.jsonl +2025-11-17 04:32:22,821 - INFO - Saved checkpoint to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352/best_checkpoint.pt +2025-11-17 04:32:22,837 - INFO - New best validation loss: 0.0353, perplexity: 1.04 +2025-11-17 04:32:22,839 - INFO - +Training complete! +2025-11-17 04:32:22,840 - INFO - Final checkpoint is best, created symlink to save space (~2GB saved) +2025-11-17 04:32:22,840 - INFO - Best validation loss: 0.0353, perplexity: 1.04 +2025-11-17 04:32:22,840 - INFO - Checkpoints saved to outputs/production_meanpool_w4_s4_reconstruction_20251115_011352 +2025-11-17 04:32:23,523 - INFO - W&B run finished