diff --git "a/conv1d_t250_h0_lm_recon-init/train.log" "b/conv1d_t250_h0_lm_recon-init/train.log" new file mode 100644--- /dev/null +++ "b/conv1d_t250_h0_lm_recon-init/train.log" @@ -0,0 +1,2181 @@ +2025-11-26 23:34:41,567 - INFO - Starting training with args: Namespace(regime='conv1d_residual', data_path='data/training/splits_510k/train_arrow', output_dir='outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434', objective='lm', val_data_path='data/training/splits_510k/val_arrow', max_samples=None, vision_mode='small', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt=None, train_encoder=False, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=250, conv_kernel=5, timestamp=None, batch_size=12, gradient_accumulation_steps=4, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=500, initial_validation=True, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint=None, resume=None, init_from_checkpoint='./outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt', allow_objective_switch=True, aux_loss_weight=0.5, num_workers=8, prefetch_factor=32, seed=None, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=False, compile_mode='default', use_optimized_model=True, use_encoder_checkpointing=True, use_decoder_checkpointing=True, use_8bit_optimizer=True) +2025-11-26 23:34:41,567 - INFO - Will initialize model from checkpoint: ./outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt +2025-11-26 23:34:41,567 - INFO - Auto-generated W&B run name: production_conv1d_residual_t250_k5_lm_20251126_233441 +2025-11-26 23:34:42,767 - INFO - Initialized W&B run: vision-compression-2/production_conv1d_residual_t250_k5_lm_20251126_233441 (ID: o01q8g0m) +2025-11-26 23:34:42,767 - INFO - Loading model and tokenizer... +2025-11-26 23:34:52,159 - INFO - Enabling decoder gradient checkpointing... +2025-11-26 23:34:52,165 - INFO - ✓ Decoder checkpointing enabled for 12 transformer layers +2025-11-26 23:34:52,165 - INFO - Expected: ~30-50% activation memory reduction, ~15-20% compute overhead +2025-11-26 23:34:52,250 - INFO - Created Conv1D Residual Pyramid Compression trainer +2025-11-26 23:34:52,250 - INFO - Architecture: Residual blocks with skip connections +2025-11-26 23:34:52,250 - INFO - Kernel size: 5 +2025-11-26 23:34:52,251 - INFO - Compression: 1000 → 251 tokens (4.00x) +2025-11-26 23:34:52,251 - INFO - Training objective: lm +2025-11-26 23:34:52,251 - INFO - +================================================================================ +2025-11-26 23:34:52,251 - INFO - TWO-STAGE TRAINING: Loading Stage 1 checkpoint +2025-11-26 23:34:52,251 - INFO - ================================================================================ +2025-11-26 23:34:52,251 - INFO - Peeking checkpoint metadata from outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt +2025-11-26 23:35:03,054 - WARNING - Checkpoint best_checkpoint.pt has no format_version field. Assuming compatibility with current version 2.0. This checkpoint was created before versioning was added. +2025-11-26 23:35:03,054 - INFO - Checkpoint metadata: epoch=0, batch_idx=107999, global_step=9000 +2025-11-26 23:35:03,054 - INFO - W&B run ID: 52qk5aob +2025-11-26 23:35:03,229 - INFO - ✓ Objective switch: reconstruction → lm (two-stage training) +2025-11-26 23:35:03,231 - INFO - Loading model weights for two-stage training from outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt +2025-11-26 23:35:12,335 - WARNING - Checkpoint best_checkpoint.pt has no format_version field. Assuming compatibility with current version 2.0. This checkpoint was created before versioning was added. +2025-11-26 23:35:12,351 - INFO - torch.compile mismatch: checkpoint=compiled, model=uncompiled. Normalizing keys by removing _orig_mod. prefix. +2025-11-26 23:35:12,439 - WARNING - Checkpoint architecture mismatch detected - loading with strict=False +2025-11-26 23:35:12,521 - INFO - Skipped 477 vision encoder parameters (model loaded without encoder) +2025-11-26 23:35:12,521 - INFO - ✓ Skipping optimizer/scheduler/RNG states (two-stage training) +2025-11-26 23:35:12,538 - INFO - +Stage 1 → Stage 2 Transition: +2025-11-26 23:35:12,539 - INFO - Stage 1 checkpoint: ./outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt +2025-11-26 23:35:12,539 - INFO - Stage 1 regime: conv1d_residual +2025-11-26 23:35:12,539 - INFO - Stage 1 objective: reconstruction +2025-11-26 23:35:12,540 - INFO - Stage 1 epoch: 0 +2025-11-26 23:35:12,540 - INFO - Stage 1 best_val_loss: 0.0005374805224200827 +2025-11-26 23:35:12,540 - INFO - Stage 1 W&B run: 52qk5aob +2025-11-26 23:35:12,540 - INFO - + Stage 2 regime: conv1d_residual ✓ MATCH +2025-11-26 23:35:12,540 - INFO - Stage 2 objective: lm (CHANGED from reconstruction) +2025-11-26 23:35:12,540 - INFO - +✓ Successfully loaded model weights from Stage 1 +2025-11-26 23:35:12,540 - INFO - ✓ Fresh optimizer will be created for Stage 2 +2025-11-26 23:35:12,540 - INFO - ✓ New W&B run will track Stage 2 +2025-11-26 23:35:12,540 - INFO - ================================================================================ + +2025-11-26 23:35:12,568 - INFO - Logged parameter counts to W&B: total=2,960,960,000, trainable=2,960,960,000, encoder=26,225,920, decoder=2,934,734,080 +2025-11-26 23:35:12,568 - INFO - Logged Stage 1 metadata to W&B config for tracking +2025-11-26 23:35:12,568 - INFO - Loading training data from data/training/splits_510k/train_arrow +2025-11-26 23:35:12,568 - INFO - Detected Arrow format: data/training/splits_510k/train_arrow +2025-11-26 23:35:12,569 - INFO - Loading Arrow dataset from data/training/splits_510k/train_arrow (memory-mapped) +2025-11-26 23:35:12,616 - INFO - Loaded 500,000 samples from data/training/splits_510k/train_arrow (memory-mapped) +2025-11-26 23:35:12,616 - INFO - Conv1d_residual regime: using full 1000-token context +2025-11-26 23:35:12,647 - INFO - Loading validation data from data/training/splits_510k/val_arrow +2025-11-26 23:35:12,647 - INFO - Detected Arrow format: data/training/splits_510k/val_arrow +2025-11-26 23:35:12,647 - INFO - Loading Arrow dataset from data/training/splits_510k/val_arrow (memory-mapped) +2025-11-26 23:35:12,653 - INFO - Loaded 10,000 samples from data/training/splits_510k/val_arrow (memory-mapped) +2025-11-26 23:35:12,654 - INFO - Validation conv1d_residual regime: using full 1000-token context +2025-11-26 23:35:14,858 - INFO - Created 8-bit AdamW optimizer (bitsandbytes): + Learning rate: 0.0001 + Memory savings: ~75% optimizer state (16.8GB for 2.8B params) + Expected overhead: ~2-5% +2025-11-26 23:35:14,858 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-26 23:35:14,865 - INFO - Logged optimizer config to W&B: type=adamw_8bit, memory=5.52GB +2025-11-26 23:35:14,865 - INFO - Starting training loop... +2025-11-26 23:35:14,866 - INFO - +====================================================================== +2025-11-26 23:35:14,866 - INFO - Running initial validation (before any training)... +2025-11-26 23:35:14,866 - INFO - ====================================================================== +2025-11-26 23:39:53,853 - INFO - Validation loss: 8.4611, perplexity: 4727.13 +2025-11-26 23:39:53,854 - INFO - Qualitative metrics (n=5): +2025-11-26 23:39:53,854 - INFO - BLEU: 0.0640 +2025-11-26 23:39:53,854 - INFO - METEOR: 0.2188 +2025-11-26 23:39:53,854 - INFO - Edit Distance: 0.6955 +2025-11-26 23:39:53,854 - INFO - F-measure: 0.2088 +2025-11-26 23:39:53,854 - INFO - +====================================================================== +2025-11-26 23:39:53,854 - INFO - Qualitative Evaluation Samples: +2025-11-26 23:39:53,854 - INFO - ====================================================================== +2025-11-26 23:39:53,854 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-26 23:39:53,854 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-26 23:39:53,855 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' +2025-11-26 23:39:53,855 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-26 23:39:53,855 - INFO - ---------------------------------------------------------------------- +2025-11-26 23:39:53,855 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-26 23:39:53,855 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-26 23:39:53,855 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' +2025-11-26 23:39:53,855 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-26 23:39:53,855 - INFO - ---------------------------------------------------------------------- +2025-11-26 23:39:53,855 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-26 23:39:53,855 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-26 23:39:53,855 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' +2025-11-26 23:39:53,855 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-26 23:39:53,856 - INFO - ---------------------------------------------------------------------- +2025-11-26 23:39:53,856 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-26 23:39:53,856 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-26 23:39:53,856 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' +2025-11-26 23:39:53,856 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-26 23:39:53,856 - INFO - ---------------------------------------------------------------------- +2025-11-26 23:39:53,856 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-26 23:39:53,856 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-26 23:39:53,856 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' +2025-11-26 23:39:53,856 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-26 23:39:53,856 - INFO - ---------------------------------------------------------------------- +2025-11-26 23:39:53,857 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_0.jsonl +2025-11-26 23:39:54,637 - INFO - Initial validation - Loss: 8.4611, Perplexity: 4727.13 +2025-11-26 23:39:54,637 - INFO - ====================================================================== + +2025-11-26 23:39:56,250 - INFO - Cleared GPU memory cache after initial validation +2025-11-26 23:39:56,251 - INFO - +====================================================================== +2025-11-26 23:39:56,252 - INFO - Epoch 1/1 +2025-11-26 23:39:56,252 - INFO - ====================================================================== +2025-11-26 23:39:58,832 - WARNING - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`transformers. +2025-11-26 23:39:59,786 - INFO - Effective context tokens (per-sample): 252 | Compression ratio: 3.97x +2025-11-26 23:39:59,787 - INFO - Target tokens per sample: 1000 +2025-11-26 23:41:26,164 - INFO - Epoch 1 Step 10 (Global: 10): loss=3.2581, ppl=26.00, grad_norm=13.25, lr=1.09e-05, throughput=5339 tok/s +2025-11-26 23:42:52,629 - INFO - Epoch 1 Step 20 (Global: 20): loss=2.2049, ppl=9.07, grad_norm=4.16, lr=1.17e-05, throughput=5551 tok/s +2025-11-26 23:44:18,512 - INFO - Epoch 1 Step 30 (Global: 30): loss=2.0476, ppl=7.75, grad_norm=2.14, lr=1.26e-05, throughput=5589 tok/s +2025-11-26 23:45:44,017 - INFO - Epoch 1 Step 40 (Global: 40): loss=2.0019, ppl=7.40, grad_norm=1.35, lr=1.35e-05, throughput=5614 tok/s +2025-11-26 23:47:09,895 - INFO - Epoch 1 Step 50 (Global: 50): loss=2.0969, ppl=8.14, grad_norm=1.31, lr=1.43e-05, throughput=5589 tok/s +2025-11-26 23:48:35,501 - INFO - Epoch 1 Step 60 (Global: 60): loss=1.7567, ppl=5.79, grad_norm=1.19, lr=1.52e-05, throughput=5607 tok/s +2025-11-26 23:50:01,011 - INFO - Epoch 1 Step 70 (Global: 70): loss=1.8842, ppl=6.58, grad_norm=1.28, lr=1.61e-05, throughput=5613 tok/s +2025-11-26 23:51:26,535 - INFO - Epoch 1 Step 80 (Global: 80): loss=1.7988, ppl=6.04, grad_norm=1.15, lr=1.69e-05, throughput=5613 tok/s +2025-11-26 23:52:52,442 - INFO - Epoch 1 Step 90 (Global: 90): loss=1.9452, ppl=7.00, grad_norm=1.13, lr=1.78e-05, throughput=5587 tok/s +2025-11-26 23:54:17,806 - INFO - Epoch 1 Step 100 (Global: 100): loss=1.9600, ppl=7.10, grad_norm=1.25, lr=1.86e-05, throughput=5623 tok/s +2025-11-26 23:55:43,343 - INFO - Epoch 1 Step 110 (Global: 110): loss=1.7015, ppl=5.48, grad_norm=1.21, lr=1.95e-05, throughput=5612 tok/s +2025-11-26 23:57:08,771 - INFO - Epoch 1 Step 120 (Global: 120): loss=1.8134, ppl=6.13, grad_norm=1.13, lr=2.04e-05, throughput=5619 tok/s +2025-11-26 23:58:33,966 - INFO - Epoch 1 Step 130 (Global: 130): loss=1.8466, ppl=6.34, grad_norm=1.21, lr=2.12e-05, throughput=5634 tok/s +2025-11-26 23:59:59,380 - INFO - Epoch 1 Step 140 (Global: 140): loss=1.7032, ppl=5.49, grad_norm=1.15, lr=2.21e-05, throughput=5620 tok/s +2025-11-27 00:01:25,269 - INFO - Epoch 1 Step 150 (Global: 150): loss=1.8833, ppl=6.58, grad_norm=1.16, lr=2.30e-05, throughput=5589 tok/s +2025-11-27 00:02:50,771 - INFO - Epoch 1 Step 160 (Global: 160): loss=1.6905, ppl=5.42, grad_norm=1.15, lr=2.38e-05, throughput=5614 tok/s +2025-11-27 00:04:16,426 - INFO - Epoch 1 Step 170 (Global: 170): loss=1.9039, ppl=6.71, grad_norm=1.19, lr=2.47e-05, throughput=5604 tok/s +2025-11-27 00:05:42,095 - INFO - Epoch 1 Step 180 (Global: 180): loss=1.7459, ppl=5.73, grad_norm=1.20, lr=2.56e-05, throughput=5603 tok/s +2025-11-27 00:07:07,502 - INFO - Epoch 1 Step 190 (Global: 190): loss=1.7477, ppl=5.74, grad_norm=1.20, lr=2.64e-05, throughput=5620 tok/s +2025-11-27 00:08:33,103 - INFO - Epoch 1 Step 200 (Global: 200): loss=1.9248, ppl=6.85, grad_norm=1.18, lr=2.73e-05, throughput=5607 tok/s +2025-11-27 00:09:58,350 - INFO - Epoch 1 Step 210 (Global: 210): loss=1.5613, ppl=4.77, grad_norm=1.14, lr=2.82e-05, throughput=5631 tok/s +2025-11-27 00:11:24,159 - INFO - Epoch 1 Step 220 (Global: 220): loss=1.6401, ppl=5.16, grad_norm=1.13, lr=2.90e-05, throughput=5594 tok/s +2025-11-27 00:12:49,873 - INFO - Epoch 1 Step 230 (Global: 230): loss=1.6986, ppl=5.47, grad_norm=1.29, lr=2.99e-05, throughput=5600 tok/s +2025-11-27 00:14:15,684 - INFO - Epoch 1 Step 240 (Global: 240): loss=1.6540, ppl=5.23, grad_norm=1.20, lr=3.07e-05, throughput=5594 tok/s +2025-11-27 00:15:41,496 - INFO - Epoch 1 Step 250 (Global: 250): loss=1.6978, ppl=5.46, grad_norm=1.26, lr=3.16e-05, throughput=5594 tok/s +2025-11-27 00:17:07,135 - INFO - Epoch 1 Step 260 (Global: 260): loss=1.8512, ppl=6.37, grad_norm=1.15, lr=3.25e-05, throughput=5605 tok/s +2025-11-27 00:18:33,216 - INFO - Epoch 1 Step 270 (Global: 270): loss=1.6762, ppl=5.35, grad_norm=1.16, lr=3.33e-05, throughput=5576 tok/s +2025-11-27 00:19:58,999 - INFO - Epoch 1 Step 280 (Global: 280): loss=1.7982, ppl=6.04, grad_norm=1.26, lr=3.42e-05, throughput=5596 tok/s +2025-11-27 00:21:24,680 - INFO - Epoch 1 Step 290 (Global: 290): loss=1.9261, ppl=6.86, grad_norm=1.27, lr=3.51e-05, throughput=5602 tok/s +2025-11-27 00:22:50,772 - INFO - Epoch 1 Step 300 (Global: 300): loss=1.5297, ppl=4.62, grad_norm=1.34, lr=3.59e-05, throughput=5576 tok/s +2025-11-27 00:24:16,844 - INFO - Epoch 1 Step 310 (Global: 310): loss=1.9226, ppl=6.84, grad_norm=1.21, lr=3.68e-05, throughput=5577 tok/s +2025-11-27 00:25:42,164 - INFO - Epoch 1 Step 320 (Global: 320): loss=1.5975, ppl=4.94, grad_norm=1.29, lr=3.77e-05, throughput=5626 tok/s +2025-11-27 00:27:08,103 - INFO - Epoch 1 Step 330 (Global: 330): loss=1.8526, ppl=6.38, grad_norm=1.21, lr=3.85e-05, throughput=5585 tok/s +2025-11-27 00:28:33,464 - INFO - Epoch 1 Step 340 (Global: 340): loss=1.8262, ppl=6.21, grad_norm=1.20, lr=3.94e-05, throughput=5623 tok/s +2025-11-27 00:29:59,139 - INFO - Epoch 1 Step 350 (Global: 350): loss=1.9168, ppl=6.80, grad_norm=1.23, lr=4.03e-05, throughput=5603 tok/s +2025-11-27 00:31:24,704 - INFO - Epoch 1 Step 360 (Global: 360): loss=1.6487, ppl=5.20, grad_norm=1.12, lr=4.11e-05, throughput=5610 tok/s +2025-11-27 00:32:50,345 - INFO - Epoch 1 Step 370 (Global: 370): loss=1.6149, ppl=5.03, grad_norm=1.48, lr=4.20e-05, throughput=5605 tok/s +2025-11-27 00:34:15,698 - INFO - Epoch 1 Step 380 (Global: 380): loss=1.7152, ppl=5.56, grad_norm=1.16, lr=4.29e-05, throughput=5624 tok/s +2025-11-27 00:35:41,379 - INFO - Epoch 1 Step 390 (Global: 390): loss=1.8090, ppl=6.10, grad_norm=1.27, lr=4.37e-05, throughput=5602 tok/s +2025-11-27 00:37:06,875 - INFO - Epoch 1 Step 400 (Global: 400): loss=1.7375, ppl=5.68, grad_norm=1.31, lr=4.46e-05, throughput=5614 tok/s +2025-11-27 00:38:32,714 - INFO - Epoch 1 Step 410 (Global: 410): loss=1.8413, ppl=6.30, grad_norm=1.12, lr=4.54e-05, throughput=5592 tok/s +2025-11-27 00:39:58,057 - INFO - Epoch 1 Step 420 (Global: 420): loss=1.6614, ppl=5.27, grad_norm=1.20, lr=4.63e-05, throughput=5624 tok/s +2025-11-27 00:41:23,482 - INFO - Epoch 1 Step 430 (Global: 430): loss=1.7508, ppl=5.76, grad_norm=1.27, lr=4.72e-05, throughput=5619 tok/s +2025-11-27 00:42:48,912 - INFO - Epoch 1 Step 440 (Global: 440): loss=1.6524, ppl=5.22, grad_norm=1.18, lr=4.80e-05, throughput=5619 tok/s +2025-11-27 00:44:14,297 - INFO - Epoch 1 Step 450 (Global: 450): loss=1.8130, ppl=6.13, grad_norm=1.16, lr=4.89e-05, throughput=5622 tok/s +2025-11-27 00:45:39,918 - INFO - Epoch 1 Step 460 (Global: 460): loss=1.8107, ppl=6.11, grad_norm=1.22, lr=4.98e-05, throughput=5606 tok/s +2025-11-27 00:47:05,448 - INFO - Epoch 1 Step 470 (Global: 470): loss=1.7022, ppl=5.49, grad_norm=1.40, lr=5.06e-05, throughput=5612 tok/s +2025-11-27 00:48:30,844 - INFO - Epoch 1 Step 480 (Global: 480): loss=1.9155, ppl=6.79, grad_norm=1.21, lr=5.15e-05, throughput=5621 tok/s +2025-11-27 00:49:56,001 - INFO - Epoch 1 Step 490 (Global: 490): loss=1.7865, ppl=5.97, grad_norm=1.08, lr=5.24e-05, throughput=5637 tok/s +2025-11-27 00:51:21,738 - INFO - Epoch 1 Step 500 (Global: 500): loss=1.9165, ppl=6.80, grad_norm=1.17, lr=5.32e-05, throughput=5599 tok/s +2025-11-27 00:51:21,738 - INFO - +Running validation at step 500... +2025-11-27 00:55:58,304 - INFO - Validation loss: 1.7498, perplexity: 5.75 +2025-11-27 00:55:58,304 - INFO - Qualitative metrics (n=5): +2025-11-27 00:55:58,304 - INFO - BLEU: 0.1239 +2025-11-27 00:55:58,304 - INFO - METEOR: 0.1638 +2025-11-27 00:55:58,305 - INFO - Edit Distance: 0.6338 +2025-11-27 00:55:58,305 - INFO - F-measure: 0.1826 +2025-11-27 00:55:58,305 - INFO - +====================================================================== +2025-11-27 00:55:58,305 - INFO - Qualitative Evaluation Samples: +2025-11-27 00:55:58,305 - INFO - ====================================================================== +2025-11-27 00:55:58,305 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 00:55:58,305 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 00:55:58,305 - INFO - Generated: '\'s "The End" as a "sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet, sweet,...' +2025-11-27 00:55:58,305 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 00:55:58,306 - INFO - ---------------------------------------------------------------------- +2025-11-27 00:55:58,306 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 00:55:58,306 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 00:55:58,306 - INFO - Generated: " of the Order's members. The Order's members are not required to be members of the Order, but they are encouraged to be active in the Order's activities. The Order's members are not required to be mem..." +2025-11-27 00:55:58,306 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 00:55:58,306 - INFO - ---------------------------------------------------------------------- +2025-11-27 00:55:58,306 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 00:55:58,306 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 00:55:58,306 - INFO - Generated: " the Red Tails. They are joined by the Red Tails' leader, the Red Tails' leader, the Red Tails' leader, the Red Tails' leader, the Red Tails' leader, the Red Tails' leader, the Red Tails' leader, the ..." +2025-11-27 00:55:58,306 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 00:55:58,306 - INFO - ---------------------------------------------------------------------- +2025-11-27 00:55:58,307 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 00:55:58,307 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 00:55:58,307 - INFO - Generated: ' | U+0B01..U+0B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | 0B01..B0F | ...' +2025-11-27 00:55:58,307 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 00:55:58,307 - INFO - ---------------------------------------------------------------------- +2025-11-27 00:55:58,307 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 00:55:58,307 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 00:55:58,308 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 151 ] |\n| Madden NFL 12 ...' +2025-11-27 00:55:58,308 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 00:55:58,308 - INFO - ---------------------------------------------------------------------- +2025-11-27 00:55:58,308 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_500.jsonl +2025-11-27 00:56:22,122 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-27 00:56:22,126 - INFO - New best validation loss: 1.7498, perplexity: 5.75 +2025-11-27 00:57:48,706 - INFO - Epoch 1 Step 510 (Global: 510): loss=1.7486, ppl=5.75, grad_norm=1.27, lr=5.41e-05, throughput=5545 tok/s +2025-11-27 00:59:14,634 - INFO - Epoch 1 Step 520 (Global: 520): loss=1.8224, ppl=6.19, grad_norm=1.22, lr=5.50e-05, throughput=5586 tok/s +2025-11-27 01:00:40,510 - INFO - Epoch 1 Step 530 (Global: 530): loss=1.8280, ppl=6.22, grad_norm=1.27, lr=5.58e-05, throughput=5590 tok/s +2025-11-27 01:02:06,419 - INFO - Epoch 1 Step 540 (Global: 540): loss=1.8345, ppl=6.26, grad_norm=1.21, lr=5.67e-05, throughput=5587 tok/s +2025-11-27 01:03:32,243 - INFO - Epoch 1 Step 550 (Global: 550): loss=1.6076, ppl=4.99, grad_norm=1.13, lr=5.76e-05, throughput=5593 tok/s +2025-11-27 01:04:57,862 - INFO - Epoch 1 Step 560 (Global: 560): loss=1.5095, ppl=4.52, grad_norm=1.12, lr=5.84e-05, throughput=5606 tok/s +2025-11-27 01:06:23,429 - INFO - Epoch 1 Step 570 (Global: 570): loss=1.6967, ppl=5.46, grad_norm=1.12, lr=5.93e-05, throughput=5610 tok/s +2025-11-27 01:07:49,012 - INFO - Epoch 1 Step 580 (Global: 580): loss=1.8456, ppl=6.33, grad_norm=1.27, lr=6.01e-05, throughput=5609 tok/s +2025-11-27 01:09:14,898 - INFO - Epoch 1 Step 590 (Global: 590): loss=1.5354, ppl=4.64, grad_norm=1.09, lr=6.10e-05, throughput=5589 tok/s +2025-11-27 01:10:40,417 - INFO - Epoch 1 Step 600 (Global: 600): loss=1.6133, ppl=5.02, grad_norm=1.27, lr=6.19e-05, throughput=5613 tok/s +2025-11-27 01:12:06,094 - INFO - Epoch 1 Step 610 (Global: 610): loss=1.6709, ppl=5.32, grad_norm=1.20, lr=6.27e-05, throughput=5603 tok/s +2025-11-27 01:13:32,110 - INFO - Epoch 1 Step 620 (Global: 620): loss=1.7486, ppl=5.75, grad_norm=1.09, lr=6.36e-05, throughput=5580 tok/s +2025-11-27 01:14:57,756 - INFO - Epoch 1 Step 630 (Global: 630): loss=1.7044, ppl=5.50, grad_norm=1.07, lr=6.45e-05, throughput=5605 tok/s +2025-11-27 01:16:23,819 - INFO - Epoch 1 Step 640 (Global: 640): loss=1.7848, ppl=5.96, grad_norm=1.09, lr=6.53e-05, throughput=5577 tok/s +2025-11-27 01:17:49,725 - INFO - Epoch 1 Step 650 (Global: 650): loss=1.7075, ppl=5.51, grad_norm=1.12, lr=6.62e-05, throughput=5588 tok/s +2025-11-27 01:19:15,356 - INFO - Epoch 1 Step 660 (Global: 660): loss=1.7194, ppl=5.58, grad_norm=1.20, lr=6.71e-05, throughput=5605 tok/s +2025-11-27 01:20:41,031 - INFO - Epoch 1 Step 670 (Global: 670): loss=1.5254, ppl=4.60, grad_norm=1.12, lr=6.79e-05, throughput=5603 tok/s +2025-11-27 01:22:06,695 - INFO - Epoch 1 Step 680 (Global: 680): loss=1.8540, ppl=6.39, grad_norm=1.13, lr=6.88e-05, throughput=5603 tok/s +2025-11-27 01:23:32,780 - INFO - Epoch 1 Step 690 (Global: 690): loss=1.6380, ppl=5.14, grad_norm=1.23, lr=6.97e-05, throughput=5576 tok/s +2025-11-27 01:24:58,423 - INFO - Epoch 1 Step 700 (Global: 700): loss=1.6554, ppl=5.24, grad_norm=1.25, lr=7.05e-05, throughput=5605 tok/s +2025-11-27 01:26:24,322 - INFO - Epoch 1 Step 710 (Global: 710): loss=1.4811, ppl=4.40, grad_norm=1.16, lr=7.14e-05, throughput=5588 tok/s +2025-11-27 01:27:49,889 - INFO - Epoch 1 Step 720 (Global: 720): loss=1.5896, ppl=4.90, grad_norm=1.20, lr=7.22e-05, throughput=5610 tok/s +2025-11-27 01:29:15,489 - INFO - Epoch 1 Step 730 (Global: 730): loss=1.7253, ppl=5.61, grad_norm=1.25, lr=7.31e-05, throughput=5607 tok/s +2025-11-27 01:30:40,778 - INFO - Epoch 1 Step 740 (Global: 740): loss=1.6692, ppl=5.31, grad_norm=1.09, lr=7.40e-05, throughput=5628 tok/s +2025-11-27 01:32:06,069 - INFO - Epoch 1 Step 750 (Global: 750): loss=1.8594, ppl=6.42, grad_norm=1.31, lr=7.48e-05, throughput=5628 tok/s +2025-11-27 01:33:31,535 - INFO - Epoch 1 Step 760 (Global: 760): loss=1.7220, ppl=5.60, grad_norm=1.12, lr=7.57e-05, throughput=5616 tok/s +2025-11-27 01:34:56,884 - INFO - Epoch 1 Step 770 (Global: 770): loss=1.6750, ppl=5.34, grad_norm=1.21, lr=7.66e-05, throughput=5624 tok/s +2025-11-27 01:36:22,176 - INFO - Epoch 1 Step 780 (Global: 780): loss=1.8803, ppl=6.56, grad_norm=1.12, lr=7.74e-05, throughput=5628 tok/s +2025-11-27 01:37:47,885 - INFO - Epoch 1 Step 790 (Global: 790): loss=1.6459, ppl=5.19, grad_norm=1.16, lr=7.83e-05, throughput=5600 tok/s +2025-11-27 01:39:13,292 - INFO - Epoch 1 Step 800 (Global: 800): loss=1.5550, ppl=4.73, grad_norm=1.07, lr=7.92e-05, throughput=5620 tok/s +2025-11-27 01:40:38,675 - INFO - Epoch 1 Step 810 (Global: 810): loss=1.7781, ppl=5.92, grad_norm=1.06, lr=8.00e-05, throughput=5622 tok/s +2025-11-27 01:42:04,014 - INFO - Epoch 1 Step 820 (Global: 820): loss=1.7807, ppl=5.93, grad_norm=1.06, lr=8.09e-05, throughput=5625 tok/s +2025-11-27 01:43:29,520 - INFO - Epoch 1 Step 830 (Global: 830): loss=1.8052, ppl=6.08, grad_norm=1.14, lr=8.18e-05, throughput=5614 tok/s +2025-11-27 01:44:55,030 - INFO - Epoch 1 Step 840 (Global: 840): loss=1.7030, ppl=5.49, grad_norm=1.18, lr=8.26e-05, throughput=5613 tok/s +2025-11-27 01:46:20,355 - INFO - Epoch 1 Step 850 (Global: 850): loss=1.6080, ppl=4.99, grad_norm=1.16, lr=8.35e-05, throughput=5626 tok/s +2025-11-27 01:47:45,606 - INFO - Epoch 1 Step 860 (Global: 860): loss=1.6494, ppl=5.20, grad_norm=1.05, lr=8.44e-05, throughput=5630 tok/s +2025-11-27 01:49:11,169 - INFO - Epoch 1 Step 870 (Global: 870): loss=1.7102, ppl=5.53, grad_norm=1.12, lr=8.52e-05, throughput=5610 tok/s +2025-11-27 01:50:36,839 - INFO - Epoch 1 Step 880 (Global: 880): loss=1.9194, ppl=6.82, grad_norm=1.21, lr=8.61e-05, throughput=5603 tok/s +2025-11-27 01:52:02,156 - INFO - Epoch 1 Step 890 (Global: 890): loss=1.8884, ppl=6.61, grad_norm=1.04, lr=8.69e-05, throughput=5626 tok/s +2025-11-27 01:53:27,683 - INFO - Epoch 1 Step 900 (Global: 900): loss=1.7717, ppl=5.88, grad_norm=1.15, lr=8.78e-05, throughput=5612 tok/s +2025-11-27 01:54:53,036 - INFO - Epoch 1 Step 910 (Global: 910): loss=1.8638, ppl=6.45, grad_norm=1.20, lr=8.87e-05, throughput=5624 tok/s +2025-11-27 01:56:18,449 - INFO - Epoch 1 Step 920 (Global: 920): loss=1.7037, ppl=5.49, grad_norm=1.24, lr=8.95e-05, throughput=5620 tok/s +2025-11-27 01:57:43,902 - INFO - Epoch 1 Step 930 (Global: 930): loss=1.9973, ppl=7.37, grad_norm=1.16, lr=9.04e-05, throughput=5617 tok/s +2025-11-27 01:59:09,283 - INFO - Epoch 1 Step 940 (Global: 940): loss=1.9470, ppl=7.01, grad_norm=1.16, lr=9.13e-05, throughput=5622 tok/s +2025-11-27 02:00:34,531 - INFO - Epoch 1 Step 950 (Global: 950): loss=1.7190, ppl=5.58, grad_norm=1.12, lr=9.21e-05, throughput=5631 tok/s +2025-11-27 02:01:59,976 - INFO - Epoch 1 Step 960 (Global: 960): loss=2.0378, ppl=7.67, grad_norm=1.09, lr=9.30e-05, throughput=5618 tok/s +2025-11-27 02:03:25,341 - INFO - Epoch 1 Step 970 (Global: 970): loss=1.6051, ppl=4.98, grad_norm=1.04, lr=9.39e-05, throughput=5623 tok/s +2025-11-27 02:04:50,846 - INFO - Epoch 1 Step 980 (Global: 980): loss=1.6885, ppl=5.41, grad_norm=1.12, lr=9.47e-05, throughput=5614 tok/s +2025-11-27 02:06:16,370 - INFO - Epoch 1 Step 990 (Global: 990): loss=1.7223, ppl=5.60, grad_norm=1.09, lr=9.56e-05, throughput=5612 tok/s +2025-11-27 02:07:42,211 - INFO - Epoch 1 Step 1000 (Global: 1000): loss=1.9137, ppl=6.78, grad_norm=1.12, lr=9.65e-05, throughput=5592 tok/s +2025-11-27 02:07:42,211 - INFO - +Running validation at step 1000... +2025-11-27 02:12:17,798 - INFO - Validation loss: 1.7953, perplexity: 6.02 +2025-11-27 02:12:17,798 - INFO - Qualitative metrics (n=5): +2025-11-27 02:12:17,798 - INFO - BLEU: 0.1441 +2025-11-27 02:12:17,798 - INFO - METEOR: 0.1890 +2025-11-27 02:12:17,798 - INFO - Edit Distance: 0.5849 +2025-11-27 02:12:17,798 - INFO - F-measure: 0.2413 +2025-11-27 02:12:17,799 - INFO - +====================================================================== +2025-11-27 02:12:17,799 - INFO - Qualitative Evaluation Samples: +2025-11-27 02:12:17,799 - INFO - ====================================================================== +2025-11-27 02:12:17,799 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 02:12:17,799 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 02:12:17,799 - INFO - Generated: '\'s "sadness, melancholy, and sadness" and "the quiet, gentle, and tender moments" of the album. She also praised the band\'s "soulful" and "emotional" performances, and said that "it\'s a real, real, re...' +2025-11-27 02:12:17,799 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 02:12:17,799 - INFO - ---------------------------------------------------------------------- +2025-11-27 02:12:17,800 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 02:12:17,800 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 02:12:17,800 - INFO - Generated: ' the 20th century. The Order of Angell was the first organization of its kind in the United States, and it was the first to be founded by a minority. The Order of Angell was the first to be founded by...' +2025-11-27 02:12:17,800 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 02:12:17,800 - INFO - ---------------------------------------------------------------------- +2025-11-27 02:12:17,800 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 02:12:17,800 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 02:12:17,800 - INFO - Generated: ' be the next Red Tailed Aesir. However, they are soon defeated by the Red Tailed Aesir, who are led by the god Odin. The Red Tailed Aesir are defeated by the gods and the gods are then defeated by the...' +2025-11-27 02:12:17,800 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 02:12:17,800 - INFO - ---------------------------------------------------------------------- +2025-11-27 02:12:17,800 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 02:12:17,801 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 02:12:17,801 - INFO - Generated: ' 1.0 (Unicode Consortium) |\n| 1.0.0 | U+0B2A0..0B2FF, U+0B2A0..0B2FF, U+0B2A0...' +2025-11-27 02:12:17,801 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 02:12:17,801 - INFO - ---------------------------------------------------------------------- +2025-11-27 02:12:17,801 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 02:12:17,801 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 02:12:17,801 - INFO - Generated: '1 | Xbox 360 | EA Tiburon | [ 151 ] |\n| Madden NFL 12 ...' +2025-11-27 02:12:17,801 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 02:12:17,801 - INFO - ---------------------------------------------------------------------- +2025-11-27 02:12:17,802 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_1000.jsonl +2025-11-27 02:13:44,200 - INFO - Epoch 1 Step 1010 (Global: 1010): loss=1.8608, ppl=6.43, grad_norm=1.12, lr=9.73e-05, throughput=5598 tok/s +2025-11-27 02:15:10,062 - INFO - Epoch 1 Step 1020 (Global: 1020): loss=1.6465, ppl=5.19, grad_norm=1.11, lr=9.82e-05, throughput=5590 tok/s +2025-11-27 02:16:35,892 - INFO - Epoch 1 Step 1030 (Global: 1030): loss=1.8002, ppl=6.05, grad_norm=1.08, lr=9.90e-05, throughput=5593 tok/s +2025-11-27 02:18:01,520 - INFO - Epoch 1 Step 1040 (Global: 1040): loss=2.0516, ppl=7.78, grad_norm=1.28, lr=9.99e-05, throughput=5606 tok/s +2025-11-27 02:19:27,295 - INFO - Epoch 1 Step 1050 (Global: 1050): loss=1.8300, ppl=6.23, grad_norm=1.08, lr=1.00e-04, throughput=5596 tok/s +2025-11-27 02:20:54,594 - INFO - Epoch 1 Step 1060 (Global: 1060): loss=1.7216, ppl=5.59, grad_norm=1.16, lr=1.00e-04, throughput=5498 tok/s +2025-11-27 02:22:20,333 - INFO - Epoch 1 Step 1070 (Global: 1070): loss=1.7234, ppl=5.60, grad_norm=1.08, lr=1.00e-04, throughput=5598 tok/s +2025-11-27 02:23:46,108 - INFO - Epoch 1 Step 1080 (Global: 1080): loss=1.7731, ppl=5.89, grad_norm=1.04, lr=1.00e-04, throughput=5596 tok/s +2025-11-27 02:25:11,613 - INFO - Epoch 1 Step 1090 (Global: 1090): loss=1.8047, ppl=6.08, grad_norm=1.15, lr=1.00e-04, throughput=5614 tok/s +2025-11-27 02:26:37,646 - INFO - Epoch 1 Step 1100 (Global: 1100): loss=1.8795, ppl=6.55, grad_norm=1.23, lr=1.00e-04, throughput=5579 tok/s +2025-11-27 02:28:03,458 - INFO - Epoch 1 Step 1110 (Global: 1110): loss=1.7477, ppl=5.74, grad_norm=1.02, lr=1.00e-04, throughput=5594 tok/s +2025-11-27 02:29:29,072 - INFO - Epoch 1 Step 1120 (Global: 1120): loss=1.8838, ppl=6.58, grad_norm=1.03, lr=1.00e-04, throughput=5607 tok/s +2025-11-27 02:30:54,657 - INFO - Epoch 1 Step 1130 (Global: 1130): loss=1.8636, ppl=6.45, grad_norm=1.00, lr=1.00e-04, throughput=5608 tok/s +2025-11-27 02:32:20,257 - INFO - Epoch 1 Step 1140 (Global: 1140): loss=1.7842, ppl=5.96, grad_norm=1.09, lr=1.00e-04, throughput=5608 tok/s +2025-11-27 02:33:45,723 - INFO - Epoch 1 Step 1150 (Global: 1150): loss=1.7212, ppl=5.59, grad_norm=0.96, lr=1.00e-04, throughput=5616 tok/s +2025-11-27 02:35:11,058 - INFO - Epoch 1 Step 1160 (Global: 1160): loss=1.9114, ppl=6.76, grad_norm=1.12, lr=1.00e-04, throughput=5625 tok/s +2025-11-27 02:36:36,485 - INFO - Epoch 1 Step 1170 (Global: 1170): loss=1.6535, ppl=5.23, grad_norm=1.08, lr=1.00e-04, throughput=5619 tok/s +2025-11-27 02:38:02,895 - INFO - Epoch 1 Step 1180 (Global: 1180): loss=1.8104, ppl=6.11, grad_norm=1.03, lr=9.99e-05, throughput=5555 tok/s +2025-11-27 02:39:28,697 - INFO - Epoch 1 Step 1190 (Global: 1190): loss=1.6208, ppl=5.06, grad_norm=0.95, lr=9.99e-05, throughput=5594 tok/s +2025-11-27 02:40:54,515 - INFO - Epoch 1 Step 1200 (Global: 1200): loss=1.5852, ppl=4.88, grad_norm=0.95, lr=9.99e-05, throughput=5593 tok/s +2025-11-27 02:42:20,088 - INFO - Epoch 1 Step 1210 (Global: 1210): loss=1.8648, ppl=6.45, grad_norm=0.97, lr=9.99e-05, throughput=5609 tok/s +2025-11-27 02:43:45,623 - INFO - Epoch 1 Step 1220 (Global: 1220): loss=1.8418, ppl=6.31, grad_norm=0.98, lr=9.99e-05, throughput=5612 tok/s +2025-11-27 02:45:10,729 - INFO - Epoch 1 Step 1230 (Global: 1230): loss=1.6930, ppl=5.44, grad_norm=0.96, lr=9.99e-05, throughput=5640 tok/s +2025-11-27 02:46:36,071 - INFO - Epoch 1 Step 1240 (Global: 1240): loss=1.6674, ppl=5.30, grad_norm=1.11, lr=9.99e-05, throughput=5624 tok/s +2025-11-27 02:48:01,310 - INFO - Epoch 1 Step 1250 (Global: 1250): loss=1.8428, ppl=6.31, grad_norm=0.95, lr=9.99e-05, throughput=5631 tok/s +2025-11-27 02:49:26,467 - INFO - Epoch 1 Step 1260 (Global: 1260): loss=1.7930, ppl=6.01, grad_norm=1.01, lr=9.99e-05, throughput=5637 tok/s +2025-11-27 02:50:52,104 - INFO - Epoch 1 Step 1270 (Global: 1270): loss=1.7855, ppl=5.96, grad_norm=1.00, lr=9.99e-05, throughput=5605 tok/s +2025-11-27 02:52:17,282 - INFO - Epoch 1 Step 1280 (Global: 1280): loss=1.7215, ppl=5.59, grad_norm=0.99, lr=9.98e-05, throughput=5635 tok/s +2025-11-27 02:53:42,442 - INFO - Epoch 1 Step 1290 (Global: 1290): loss=1.8285, ppl=6.22, grad_norm=1.13, lr=9.98e-05, throughput=5637 tok/s +2025-11-27 02:55:07,671 - INFO - Epoch 1 Step 1300 (Global: 1300): loss=1.9187, ppl=6.81, grad_norm=1.10, lr=9.98e-05, throughput=5632 tok/s +2025-11-27 02:56:32,897 - INFO - Epoch 1 Step 1310 (Global: 1310): loss=1.8591, ppl=6.42, grad_norm=1.08, lr=9.98e-05, throughput=5632 tok/s +2025-11-27 02:57:57,958 - INFO - Epoch 1 Step 1320 (Global: 1320): loss=1.7866, ppl=5.97, grad_norm=0.95, lr=9.98e-05, throughput=5643 tok/s +2025-11-27 02:59:23,258 - INFO - Epoch 1 Step 1330 (Global: 1330): loss=1.4939, ppl=4.45, grad_norm=0.97, lr=9.98e-05, throughput=5627 tok/s +2025-11-27 03:00:48,790 - INFO - Epoch 1 Step 1340 (Global: 1340): loss=1.5878, ppl=4.89, grad_norm=1.10, lr=9.97e-05, throughput=5612 tok/s +2025-11-27 03:02:13,820 - INFO - Epoch 1 Step 1350 (Global: 1350): loss=1.7998, ppl=6.05, grad_norm=1.04, lr=9.97e-05, throughput=5645 tok/s +2025-11-27 03:03:38,804 - INFO - Epoch 1 Step 1360 (Global: 1360): loss=1.7197, ppl=5.58, grad_norm=1.01, lr=9.97e-05, throughput=5648 tok/s +2025-11-27 03:05:03,839 - INFO - Epoch 1 Step 1370 (Global: 1370): loss=1.7033, ppl=5.49, grad_norm=1.23, lr=9.97e-05, throughput=5645 tok/s +2025-11-27 03:06:28,931 - INFO - Epoch 1 Step 1380 (Global: 1380): loss=1.5837, ppl=4.87, grad_norm=0.97, lr=9.97e-05, throughput=5641 tok/s +2025-11-27 03:07:54,098 - INFO - Epoch 1 Step 1390 (Global: 1390): loss=1.9457, ppl=7.00, grad_norm=1.06, lr=9.97e-05, throughput=5636 tok/s +2025-11-27 03:09:19,038 - INFO - Epoch 1 Step 1400 (Global: 1400): loss=1.8463, ppl=6.34, grad_norm=0.97, lr=9.96e-05, throughput=5651 tok/s +2025-11-27 03:10:44,590 - INFO - Epoch 1 Step 1410 (Global: 1410): loss=1.8177, ppl=6.16, grad_norm=1.00, lr=9.96e-05, throughput=5611 tok/s +2025-11-27 03:12:10,027 - INFO - Epoch 1 Step 1420 (Global: 1420): loss=1.6030, ppl=4.97, grad_norm=0.93, lr=9.96e-05, throughput=5618 tok/s +2025-11-27 03:13:35,754 - INFO - Epoch 1 Step 1430 (Global: 1430): loss=2.1111, ppl=8.26, grad_norm=1.03, lr=9.96e-05, throughput=5599 tok/s +2025-11-27 03:15:00,839 - INFO - Epoch 1 Step 1440 (Global: 1440): loss=1.5421, ppl=4.67, grad_norm=0.91, lr=9.96e-05, throughput=5641 tok/s +2025-11-27 03:16:25,862 - INFO - Epoch 1 Step 1450 (Global: 1450): loss=1.6692, ppl=5.31, grad_norm=1.01, lr=9.95e-05, throughput=5646 tok/s +2025-11-27 03:17:51,426 - INFO - Epoch 1 Step 1460 (Global: 1460): loss=1.8710, ppl=6.49, grad_norm=1.04, lr=9.95e-05, throughput=5610 tok/s +2025-11-27 03:19:16,698 - INFO - Epoch 1 Step 1470 (Global: 1470): loss=2.0232, ppl=7.56, grad_norm=1.00, lr=9.95e-05, throughput=5629 tok/s +2025-11-27 03:20:41,715 - INFO - Epoch 1 Step 1480 (Global: 1480): loss=1.8384, ppl=6.29, grad_norm=0.98, lr=9.95e-05, throughput=5646 tok/s +2025-11-27 03:22:07,100 - INFO - Epoch 1 Step 1490 (Global: 1490): loss=1.8091, ppl=6.11, grad_norm=0.98, lr=9.94e-05, throughput=5622 tok/s +2025-11-27 03:23:32,494 - INFO - Epoch 1 Step 1500 (Global: 1500): loss=1.8136, ppl=6.13, grad_norm=0.97, lr=9.94e-05, throughput=5621 tok/s +2025-11-27 03:23:32,494 - INFO - +Running validation at step 1500... +2025-11-27 03:28:06,378 - INFO - Validation loss: 1.7937, perplexity: 6.01 +2025-11-27 03:28:06,379 - INFO - Qualitative metrics (n=5): +2025-11-27 03:28:06,379 - INFO - BLEU: 0.1436 +2025-11-27 03:28:06,379 - INFO - METEOR: 0.1770 +2025-11-27 03:28:06,380 - INFO - Edit Distance: 0.6434 +2025-11-27 03:28:06,380 - INFO - F-measure: 0.2192 +2025-11-27 03:28:06,380 - INFO - +====================================================================== +2025-11-27 03:28:06,380 - INFO - Qualitative Evaluation Samples: +2025-11-27 03:28:06,380 - INFO - ====================================================================== +2025-11-27 03:28:06,381 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 03:28:06,381 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 03:28:06,381 - INFO - Generated: ' to Consequence, saying that "the album is a bit of a mess, but it\'s a lot of fun to listen to." She also praised the album\'s "sweet, sweet, sweet guitar work" and "the way the band\'s sound is so diff...' +2025-11-27 03:28:06,381 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 03:28:06,381 - INFO - ---------------------------------------------------------------------- +2025-11-27 03:28:06,382 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 03:28:06,382 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 03:28:06,382 - INFO - Generated: ' the other schools in the Big Ten Conference. The school\'s mascot, the Fighting Mighty Eagles, is a Native American mascot, and the school\'s colors are red, black, and white. The school\'s nickname, "T...' +2025-11-27 03:28:06,382 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 03:28:06,383 - INFO - ---------------------------------------------------------------------- +2025-11-27 03:28:06,383 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 03:28:06,383 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 03:28:06,383 - INFO - Generated: ' be killed. The other three are killed by the Red Tails, who are then killed by the Shingetsu Temple. The remaining two are killed by the Red Tails, who are then killed by the Shingetsu Temple.\nThe Re...' +2025-11-27 03:28:06,383 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 03:28:06,384 - INFO - ---------------------------------------------------------------------- +2025-11-27 03:28:06,384 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 03:28:06,384 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 03:28:06,384 - INFO - Generated: ' | 1.0.0 | U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0..0B3, U+0B0.....' +2025-11-27 03:28:06,384 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 03:28:06,385 - INFO - ---------------------------------------------------------------------- +2025-11-27 03:28:06,385 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 03:28:06,385 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 03:28:06,385 - INFO - Generated: '1 | Android + | [ 150 ] |\n| Madden NFL 12 | August 30, 2011 ...' +2025-11-27 03:28:06,385 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 03:28:06,386 - INFO - ---------------------------------------------------------------------- +2025-11-27 03:28:06,386 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_1500.jsonl +2025-11-27 03:29:32,611 - INFO - Epoch 1 Step 1510 (Global: 1510): loss=1.7648, ppl=5.84, grad_norm=1.01, lr=9.94e-05, throughput=5609 tok/s +2025-11-27 03:30:57,372 - INFO - Epoch 1 Step 1520 (Global: 1520): loss=1.7759, ppl=5.91, grad_norm=0.96, lr=9.94e-05, throughput=5663 tok/s +2025-11-27 03:32:22,318 - INFO - Epoch 1 Step 1530 (Global: 1530): loss=1.8285, ppl=6.22, grad_norm=0.89, lr=9.93e-05, throughput=5651 tok/s +2025-11-27 03:33:47,205 - INFO - Epoch 1 Step 1540 (Global: 1540): loss=1.8660, ppl=6.46, grad_norm=0.96, lr=9.93e-05, throughput=5655 tok/s +2025-11-27 03:35:12,063 - INFO - Epoch 1 Step 1550 (Global: 1550): loss=1.7674, ppl=5.86, grad_norm=0.99, lr=9.93e-05, throughput=5657 tok/s +2025-11-27 03:36:37,259 - INFO - Epoch 1 Step 1560 (Global: 1560): loss=1.7187, ppl=5.58, grad_norm=0.98, lr=9.92e-05, throughput=5634 tok/s +2025-11-27 03:38:02,642 - INFO - Epoch 1 Step 1570 (Global: 1570): loss=1.7255, ppl=5.62, grad_norm=0.91, lr=9.92e-05, throughput=5622 tok/s +2025-11-27 03:39:27,569 - INFO - Epoch 1 Step 1580 (Global: 1580): loss=1.6448, ppl=5.18, grad_norm=0.93, lr=9.92e-05, throughput=5652 tok/s +2025-11-27 03:40:52,567 - INFO - Epoch 1 Step 1590 (Global: 1590): loss=1.7610, ppl=5.82, grad_norm=0.97, lr=9.92e-05, throughput=5647 tok/s +2025-11-27 03:42:17,353 - INFO - Epoch 1 Step 1600 (Global: 1600): loss=1.7497, ppl=5.75, grad_norm=0.97, lr=9.91e-05, throughput=5661 tok/s +2025-11-27 03:43:42,316 - INFO - Epoch 1 Step 1610 (Global: 1610): loss=1.7360, ppl=5.67, grad_norm=0.89, lr=9.91e-05, throughput=5650 tok/s +2025-11-27 03:45:08,316 - INFO - Epoch 1 Step 1620 (Global: 1620): loss=1.8607, ppl=6.43, grad_norm=0.95, lr=9.91e-05, throughput=5581 tok/s +2025-11-27 03:46:33,296 - INFO - Epoch 1 Step 1630 (Global: 1630): loss=1.6987, ppl=5.47, grad_norm=1.07, lr=9.90e-05, throughput=5648 tok/s +2025-11-27 03:47:58,059 - INFO - Epoch 1 Step 1640 (Global: 1640): loss=1.8422, ppl=6.31, grad_norm=0.90, lr=9.90e-05, throughput=5663 tok/s +2025-11-27 03:49:22,967 - INFO - Epoch 1 Step 1650 (Global: 1650): loss=1.8512, ppl=6.37, grad_norm=1.03, lr=9.90e-05, throughput=5653 tok/s +2025-11-27 03:50:48,035 - INFO - Epoch 1 Step 1660 (Global: 1660): loss=1.6104, ppl=5.00, grad_norm=0.90, lr=9.89e-05, throughput=5643 tok/s +2025-11-27 03:52:13,034 - INFO - Epoch 1 Step 1670 (Global: 1670): loss=1.8563, ppl=6.40, grad_norm=0.96, lr=9.89e-05, throughput=5647 tok/s +2025-11-27 03:53:38,297 - INFO - Epoch 1 Step 1680 (Global: 1680): loss=1.7098, ppl=5.53, grad_norm=0.94, lr=9.89e-05, throughput=5630 tok/s +2025-11-27 03:55:03,382 - INFO - Epoch 1 Step 1690 (Global: 1690): loss=1.8319, ppl=6.25, grad_norm=1.05, lr=9.88e-05, throughput=5642 tok/s +2025-11-27 03:56:28,475 - INFO - Epoch 1 Step 1700 (Global: 1700): loss=1.9139, ppl=6.78, grad_norm=1.00, lr=9.88e-05, throughput=5641 tok/s +2025-11-27 03:57:53,697 - INFO - Epoch 1 Step 1710 (Global: 1710): loss=1.8407, ppl=6.30, grad_norm=0.95, lr=9.87e-05, throughput=5632 tok/s +2025-11-27 03:59:18,486 - INFO - Epoch 1 Step 1720 (Global: 1720): loss=1.7867, ppl=5.97, grad_norm=0.88, lr=9.87e-05, throughput=5661 tok/s +2025-11-27 04:00:43,525 - INFO - Epoch 1 Step 1730 (Global: 1730): loss=1.7267, ppl=5.62, grad_norm=0.89, lr=9.87e-05, throughput=5645 tok/s +2025-11-27 04:02:08,724 - INFO - Epoch 1 Step 1740 (Global: 1740): loss=1.7320, ppl=5.65, grad_norm=0.88, lr=9.86e-05, throughput=5634 tok/s +2025-11-27 04:03:33,835 - INFO - Epoch 1 Step 1750 (Global: 1750): loss=1.7386, ppl=5.69, grad_norm=0.90, lr=9.86e-05, throughput=5640 tok/s +2025-11-27 04:04:59,091 - INFO - Epoch 1 Step 1760 (Global: 1760): loss=2.0839, ppl=8.04, grad_norm=0.91, lr=9.86e-05, throughput=5630 tok/s +2025-11-27 04:06:24,351 - INFO - Epoch 1 Step 1770 (Global: 1770): loss=1.8038, ppl=6.07, grad_norm=0.91, lr=9.85e-05, throughput=5630 tok/s +2025-11-27 04:07:49,418 - INFO - Epoch 1 Step 1780 (Global: 1780): loss=1.8752, ppl=6.52, grad_norm=0.92, lr=9.85e-05, throughput=5643 tok/s +2025-11-27 04:09:14,475 - INFO - Epoch 1 Step 1790 (Global: 1790): loss=1.6062, ppl=4.98, grad_norm=0.87, lr=9.84e-05, throughput=5643 tok/s +2025-11-27 04:10:39,906 - INFO - Epoch 1 Step 1800 (Global: 1800): loss=1.7412, ppl=5.70, grad_norm=0.85, lr=9.84e-05, throughput=5619 tok/s +2025-11-27 04:12:05,095 - INFO - Epoch 1 Step 1810 (Global: 1810): loss=1.8034, ppl=6.07, grad_norm=0.92, lr=9.83e-05, throughput=5635 tok/s +2025-11-27 04:13:30,565 - INFO - Epoch 1 Step 1820 (Global: 1820): loss=1.7698, ppl=5.87, grad_norm=1.09, lr=9.83e-05, throughput=5616 tok/s +2025-11-27 04:14:55,667 - INFO - Epoch 1 Step 1830 (Global: 1830): loss=1.7667, ppl=5.85, grad_norm=0.91, lr=9.83e-05, throughput=5640 tok/s +2025-11-27 04:16:20,561 - INFO - Epoch 1 Step 1840 (Global: 1840): loss=1.7285, ppl=5.63, grad_norm=0.88, lr=9.82e-05, throughput=5654 tok/s +2025-11-27 04:17:45,470 - INFO - Epoch 1 Step 1850 (Global: 1850): loss=1.8020, ppl=6.06, grad_norm=1.16, lr=9.82e-05, throughput=5653 tok/s +2025-11-27 04:19:11,073 - INFO - Epoch 1 Step 1860 (Global: 1860): loss=1.5466, ppl=4.70, grad_norm=0.97, lr=9.81e-05, throughput=5607 tok/s +2025-11-27 04:20:36,210 - INFO - Epoch 1 Step 1870 (Global: 1870): loss=1.7657, ppl=5.85, grad_norm=0.99, lr=9.81e-05, throughput=5638 tok/s +2025-11-27 04:22:01,096 - INFO - Epoch 1 Step 1880 (Global: 1880): loss=1.7768, ppl=5.91, grad_norm=0.96, lr=9.80e-05, throughput=5655 tok/s +2025-11-27 04:23:26,123 - INFO - Epoch 1 Step 1890 (Global: 1890): loss=1.4695, ppl=4.35, grad_norm=0.93, lr=9.80e-05, throughput=5645 tok/s +2025-11-27 04:24:50,907 - INFO - Epoch 1 Step 1900 (Global: 1900): loss=1.7956, ppl=6.02, grad_norm=0.86, lr=9.79e-05, throughput=5661 tok/s +2025-11-27 04:26:16,021 - INFO - Epoch 1 Step 1910 (Global: 1910): loss=1.5956, ppl=4.93, grad_norm=0.83, lr=9.79e-05, throughput=5640 tok/s +2025-11-27 04:27:41,122 - INFO - Epoch 1 Step 1920 (Global: 1920): loss=1.7574, ppl=5.80, grad_norm=0.94, lr=9.78e-05, throughput=5640 tok/s +2025-11-27 04:29:06,269 - INFO - Epoch 1 Step 1930 (Global: 1930): loss=1.7379, ppl=5.69, grad_norm=0.95, lr=9.78e-05, throughput=5637 tok/s +2025-11-27 04:30:31,274 - INFO - Epoch 1 Step 1940 (Global: 1940): loss=1.8697, ppl=6.49, grad_norm=0.93, lr=9.77e-05, throughput=5647 tok/s +2025-11-27 04:31:56,672 - INFO - Epoch 1 Step 1950 (Global: 1950): loss=1.8281, ppl=6.22, grad_norm=0.93, lr=9.77e-05, throughput=5621 tok/s +2025-11-27 04:33:21,355 - INFO - Epoch 1 Step 1960 (Global: 1960): loss=1.7557, ppl=5.79, grad_norm=0.94, lr=9.76e-05, throughput=5668 tok/s +2025-11-27 04:34:45,758 - INFO - Epoch 1 Step 1970 (Global: 1970): loss=1.6411, ppl=5.16, grad_norm=0.95, lr=9.76e-05, throughput=5687 tok/s +2025-11-27 04:36:10,728 - INFO - Epoch 1 Step 1980 (Global: 1980): loss=1.7196, ppl=5.58, grad_norm=0.93, lr=9.75e-05, throughput=5649 tok/s +2025-11-27 04:37:35,836 - INFO - Epoch 1 Step 1990 (Global: 1990): loss=1.6182, ppl=5.04, grad_norm=0.92, lr=9.75e-05, throughput=5640 tok/s +2025-11-27 04:39:00,717 - INFO - Epoch 1 Step 2000 (Global: 2000): loss=2.0628, ppl=7.87, grad_norm=0.92, lr=9.74e-05, throughput=5655 tok/s +2025-11-27 04:39:00,717 - INFO - +Running validation at step 2000... +2025-11-27 04:43:33,783 - INFO - Validation loss: 1.7696, perplexity: 5.87 +2025-11-27 04:43:33,784 - INFO - Qualitative metrics (n=5): +2025-11-27 04:43:33,784 - INFO - BLEU: 0.1132 +2025-11-27 04:43:33,784 - INFO - METEOR: 0.1796 +2025-11-27 04:43:33,784 - INFO - Edit Distance: 0.6541 +2025-11-27 04:43:33,784 - INFO - F-measure: 0.2063 +2025-11-27 04:43:33,784 - INFO - +====================================================================== +2025-11-27 04:43:33,784 - INFO - Qualitative Evaluation Samples: +2025-11-27 04:43:33,785 - INFO - ====================================================================== +2025-11-27 04:43:33,785 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 04:43:33,785 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 04:43:33,785 - INFO - Generated: '\'s "The End" and "The End" with a "slightly more restrained" sound. She also gave the album a positive review, calling it "a great album, with a lot of great songs and a lot of great moments." She als...' +2025-11-27 04:43:33,785 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 04:43:33,785 - INFO - ---------------------------------------------------------------------- +2025-11-27 04:43:33,785 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 04:43:33,785 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 04:43:33,785 - INFO - Generated: 'arities in that it was not a religious institution. The Order of Angell was not a religious organization, but rather a social one. The Order of Angell was not a fraternity or sorority, but rather a so...' +2025-11-27 04:43:33,785 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 04:43:33,785 - INFO - ---------------------------------------------------------------------- +2025-11-27 04:43:33,786 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 04:43:33,786 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 04:43:33,786 - INFO - Generated: " be killed by the Red Tails. The group is led by the Red Tails' leader, Zeke, who is revealed to be a member of the Red Tails. The group is led by the Red Tails' leader, Zeke, who is revealed to be a ..." +2025-11-27 04:43:33,786 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 04:43:33,786 - INFO - ---------------------------------------------------------------------- +2025-11-27 04:43:33,786 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 04:43:33,786 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 04:43:33,786 - INFO - Generated: ' | 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0...' +2025-11-27 04:43:33,786 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 04:43:33,786 - INFO - ---------------------------------------------------------------------- +2025-11-27 04:43:33,787 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 04:43:33,787 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 04:43:33,787 - INFO - Generated: '1 | iOS | EA Tiburon | [ 151 ] |\n| Madden NFL 12 ...' +2025-11-27 04:43:33,787 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 04:43:33,787 - INFO - ---------------------------------------------------------------------- +2025-11-27 04:43:33,788 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_2000.jsonl +2025-11-27 04:44:59,378 - INFO - Epoch 1 Step 2010 (Global: 2010): loss=1.3592, ppl=3.89, grad_norm=0.78, lr=9.74e-05, throughput=5649 tok/s +2025-11-27 04:46:24,386 - INFO - Epoch 1 Step 2020 (Global: 2020): loss=1.9371, ppl=6.94, grad_norm=0.94, lr=9.73e-05, throughput=5647 tok/s +2025-11-27 04:47:49,433 - INFO - Epoch 1 Step 2030 (Global: 2030): loss=1.9148, ppl=6.79, grad_norm=0.89, lr=9.73e-05, throughput=5644 tok/s +2025-11-27 04:49:14,346 - INFO - Epoch 1 Step 2040 (Global: 2040): loss=1.6181, ppl=5.04, grad_norm=0.82, lr=9.72e-05, throughput=5653 tok/s +2025-11-27 04:50:39,312 - INFO - Epoch 1 Step 2050 (Global: 2050): loss=1.7998, ppl=6.05, grad_norm=0.89, lr=9.72e-05, throughput=5649 tok/s +2025-11-27 04:52:04,527 - INFO - Epoch 1 Step 2060 (Global: 2060): loss=1.6154, ppl=5.03, grad_norm=0.86, lr=9.71e-05, throughput=5633 tok/s +2025-11-27 04:53:29,751 - INFO - Epoch 1 Step 2070 (Global: 2070): loss=1.8775, ppl=6.54, grad_norm=0.91, lr=9.71e-05, throughput=5632 tok/s +2025-11-27 04:54:55,055 - INFO - Epoch 1 Step 2080 (Global: 2080): loss=1.6468, ppl=5.19, grad_norm=0.84, lr=9.70e-05, throughput=5627 tok/s +2025-11-27 04:56:19,850 - INFO - Epoch 1 Step 2090 (Global: 2090): loss=1.7388, ppl=5.69, grad_norm=0.88, lr=9.69e-05, throughput=5661 tok/s +2025-11-27 04:57:44,721 - INFO - Epoch 1 Step 2100 (Global: 2100): loss=1.9394, ppl=6.95, grad_norm=1.02, lr=9.69e-05, throughput=5656 tok/s +2025-11-27 04:59:09,674 - INFO - Epoch 1 Step 2110 (Global: 2110): loss=1.8386, ppl=6.29, grad_norm=0.90, lr=9.68e-05, throughput=5650 tok/s +2025-11-27 05:00:34,733 - INFO - Epoch 1 Step 2120 (Global: 2120): loss=1.7634, ppl=5.83, grad_norm=0.90, lr=9.68e-05, throughput=5643 tok/s +2025-11-27 05:01:59,849 - INFO - Epoch 1 Step 2130 (Global: 2130): loss=1.8470, ppl=6.34, grad_norm=0.96, lr=9.67e-05, throughput=5639 tok/s +2025-11-27 05:03:25,051 - INFO - Epoch 1 Step 2140 (Global: 2140): loss=1.6334, ppl=5.12, grad_norm=0.86, lr=9.66e-05, throughput=5634 tok/s +2025-11-27 05:04:50,222 - INFO - Epoch 1 Step 2150 (Global: 2150): loss=1.9296, ppl=6.89, grad_norm=0.94, lr=9.66e-05, throughput=5636 tok/s +2025-11-27 05:06:15,053 - INFO - Epoch 1 Step 2160 (Global: 2160): loss=1.6562, ppl=5.24, grad_norm=0.88, lr=9.65e-05, throughput=5658 tok/s +2025-11-27 05:07:40,484 - INFO - Epoch 1 Step 2170 (Global: 2170): loss=1.7720, ppl=5.88, grad_norm=0.88, lr=9.65e-05, throughput=5619 tok/s +2025-11-27 05:09:05,189 - INFO - Epoch 1 Step 2180 (Global: 2180): loss=1.8423, ppl=6.31, grad_norm=0.94, lr=9.64e-05, throughput=5667 tok/s +2025-11-27 05:10:30,865 - INFO - Epoch 1 Step 2190 (Global: 2190): loss=1.7566, ppl=5.79, grad_norm=0.89, lr=9.63e-05, throughput=5603 tok/s +2025-11-27 05:11:55,391 - INFO - Epoch 1 Step 2200 (Global: 2200): loss=1.6033, ppl=4.97, grad_norm=0.86, lr=9.63e-05, throughput=5679 tok/s +2025-11-27 05:13:20,985 - INFO - Epoch 1 Step 2210 (Global: 2210): loss=1.4135, ppl=4.11, grad_norm=0.81, lr=9.62e-05, throughput=5608 tok/s +2025-11-27 05:14:46,189 - INFO - Epoch 1 Step 2220 (Global: 2220): loss=1.7312, ppl=5.65, grad_norm=0.95, lr=9.61e-05, throughput=5634 tok/s +2025-11-27 05:16:11,471 - INFO - Epoch 1 Step 2230 (Global: 2230): loss=1.7292, ppl=5.64, grad_norm=0.93, lr=9.61e-05, throughput=5628 tok/s +2025-11-27 05:17:36,755 - INFO - Epoch 1 Step 2240 (Global: 2240): loss=1.8738, ppl=6.51, grad_norm=0.86, lr=9.60e-05, throughput=5628 tok/s +2025-11-27 05:19:01,634 - INFO - Epoch 1 Step 2250 (Global: 2250): loss=1.8376, ppl=6.28, grad_norm=0.90, lr=9.60e-05, throughput=5655 tok/s +2025-11-27 05:20:26,460 - INFO - Epoch 1 Step 2260 (Global: 2260): loss=1.6970, ppl=5.46, grad_norm=0.91, lr=9.59e-05, throughput=5659 tok/s +2025-11-27 05:21:51,248 - INFO - Epoch 1 Step 2270 (Global: 2270): loss=1.7177, ppl=5.57, grad_norm=0.86, lr=9.58e-05, throughput=5661 tok/s +2025-11-27 05:23:16,473 - INFO - Epoch 1 Step 2280 (Global: 2280): loss=1.8572, ppl=6.41, grad_norm=0.97, lr=9.58e-05, throughput=5632 tok/s +2025-11-27 05:24:41,584 - INFO - Epoch 1 Step 2290 (Global: 2290): loss=1.5219, ppl=4.58, grad_norm=0.80, lr=9.57e-05, throughput=5640 tok/s +2025-11-27 05:26:07,146 - INFO - Epoch 1 Step 2300 (Global: 2300): loss=1.8717, ppl=6.50, grad_norm=0.92, lr=9.56e-05, throughput=5610 tok/s +2025-11-27 05:27:32,487 - INFO - Epoch 1 Step 2310 (Global: 2310): loss=1.6788, ppl=5.36, grad_norm=0.83, lr=9.55e-05, throughput=5625 tok/s +2025-11-27 05:28:57,121 - INFO - Epoch 1 Step 2320 (Global: 2320): loss=1.7029, ppl=5.49, grad_norm=0.96, lr=9.55e-05, throughput=5671 tok/s +2025-11-27 05:30:22,158 - INFO - Epoch 1 Step 2330 (Global: 2330): loss=1.6871, ppl=5.40, grad_norm=0.82, lr=9.54e-05, throughput=5645 tok/s +2025-11-27 05:31:47,441 - INFO - Epoch 1 Step 2340 (Global: 2340): loss=1.9555, ppl=7.07, grad_norm=0.95, lr=9.53e-05, throughput=5628 tok/s +2025-11-27 05:33:12,380 - INFO - Epoch 1 Step 2350 (Global: 2350): loss=1.7133, ppl=5.55, grad_norm=0.85, lr=9.53e-05, throughput=5651 tok/s +2025-11-27 05:34:37,224 - INFO - Epoch 1 Step 2360 (Global: 2360): loss=1.7317, ppl=5.65, grad_norm=0.87, lr=9.52e-05, throughput=5658 tok/s +2025-11-27 05:36:02,131 - INFO - Epoch 1 Step 2370 (Global: 2370): loss=1.7783, ppl=5.92, grad_norm=0.88, lr=9.51e-05, throughput=5653 tok/s +2025-11-27 05:37:27,035 - INFO - Epoch 1 Step 2380 (Global: 2380): loss=1.9261, ppl=6.86, grad_norm=0.84, lr=9.51e-05, throughput=5653 tok/s +2025-11-27 05:38:51,845 - INFO - Epoch 1 Step 2390 (Global: 2390): loss=1.6263, ppl=5.09, grad_norm=0.79, lr=9.50e-05, throughput=5660 tok/s +2025-11-27 05:40:16,747 - INFO - Epoch 1 Step 2400 (Global: 2400): loss=1.7586, ppl=5.80, grad_norm=0.89, lr=9.49e-05, throughput=5654 tok/s +2025-11-27 05:41:42,383 - INFO - Epoch 1 Step 2410 (Global: 2410): loss=1.7116, ppl=5.54, grad_norm=0.83, lr=9.48e-05, throughput=5605 tok/s +2025-11-27 05:43:07,504 - INFO - Epoch 1 Step 2420 (Global: 2420): loss=1.7020, ppl=5.48, grad_norm=0.85, lr=9.48e-05, throughput=5639 tok/s +2025-11-27 05:44:32,475 - INFO - Epoch 1 Step 2430 (Global: 2430): loss=1.7495, ppl=5.75, grad_norm=0.92, lr=9.47e-05, throughput=5649 tok/s +2025-11-27 05:45:57,372 - INFO - Epoch 1 Step 2440 (Global: 2440): loss=1.7939, ppl=6.01, grad_norm=0.87, lr=9.46e-05, throughput=5654 tok/s +2025-11-27 05:47:22,360 - INFO - Epoch 1 Step 2450 (Global: 2450): loss=1.4086, ppl=4.09, grad_norm=0.88, lr=9.45e-05, throughput=5648 tok/s +2025-11-27 05:48:47,418 - INFO - Epoch 1 Step 2460 (Global: 2460): loss=1.9233, ppl=6.84, grad_norm=0.89, lr=9.45e-05, throughput=5643 tok/s +2025-11-27 05:50:12,345 - INFO - Epoch 1 Step 2470 (Global: 2470): loss=1.9444, ppl=6.99, grad_norm=0.90, lr=9.44e-05, throughput=5652 tok/s +2025-11-27 05:51:37,278 - INFO - Epoch 1 Step 2480 (Global: 2480): loss=1.7070, ppl=5.51, grad_norm=0.82, lr=9.43e-05, throughput=5652 tok/s +2025-11-27 05:53:02,415 - INFO - Epoch 1 Step 2490 (Global: 2490): loss=1.9535, ppl=7.05, grad_norm=0.89, lr=9.42e-05, throughput=5638 tok/s +2025-11-27 05:54:27,376 - INFO - Epoch 1 Step 2500 (Global: 2500): loss=1.5932, ppl=4.92, grad_norm=0.81, lr=9.41e-05, throughput=5650 tok/s +2025-11-27 05:54:27,376 - INFO - +Running validation at step 2500... +2025-11-27 05:59:00,228 - INFO - Validation loss: 1.7532, perplexity: 5.77 +2025-11-27 05:59:00,228 - INFO - Qualitative metrics (n=5): +2025-11-27 05:59:00,228 - INFO - BLEU: 0.1441 +2025-11-27 05:59:00,228 - INFO - METEOR: 0.2055 +2025-11-27 05:59:00,228 - INFO - Edit Distance: 0.6240 +2025-11-27 05:59:00,228 - INFO - F-measure: 0.2488 +2025-11-27 05:59:00,228 - INFO - +====================================================================== +2025-11-27 05:59:00,229 - INFO - Qualitative Evaluation Samples: +2025-11-27 05:59:00,229 - INFO - ====================================================================== +2025-11-27 05:59:00,229 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 05:59:00,229 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 05:59:00,229 - INFO - Generated: ' to the band\'s previous work, saying that "the band\'s new album is a triumph of the unexpected, and the unexpected is a triumph of the unexpected." He also said that "the band\'s new album is a triumph...' +2025-11-27 05:59:00,229 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 05:59:00,229 - INFO - ---------------------------------------------------------------------- +2025-11-27 05:59:00,229 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 05:59:00,229 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 05:59:00,230 - INFO - Generated: 'athletes, as the majority of them are of Native American descent. The Order of the Angell was founded in 2009 by a group of 12 students, and the first class was composed of 12 students. The Order of t...' +2025-11-27 05:59:00,230 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 05:59:00,230 - INFO - ---------------------------------------------------------------------- +2025-11-27 05:59:00,230 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 05:59:00,230 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 05:59:00,230 - INFO - Generated: ' kill Teimou. The other three are the ones who are killed by the Red Tails, and the last one is the one who is killed by the Black Jacks. The last one is the one who is killed by the Black Jacks.\nKiri...' +2025-11-27 05:59:00,230 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 05:59:00,230 - INFO - ---------------------------------------------------------------------- +2025-11-27 05:59:00,230 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 05:59:00,230 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 05:59:00,230 - INFO - Generated: '-01-01 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...' +2025-11-27 05:59:00,231 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 05:59:00,231 - INFO - ---------------------------------------------------------------------- +2025-11-27 05:59:00,231 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 05:59:00,231 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 05:59:00,231 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 05:59:00,231 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 05:59:00,231 - INFO - ---------------------------------------------------------------------- +2025-11-27 05:59:00,232 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_2500.jsonl +2025-11-27 06:00:26,015 - INFO - Epoch 1 Step 2510 (Global: 2510): loss=1.8641, ppl=6.45, grad_norm=0.84, lr=9.41e-05, throughput=5639 tok/s +2025-11-27 06:01:50,878 - INFO - Epoch 1 Step 2520 (Global: 2520): loss=1.6016, ppl=4.96, grad_norm=0.86, lr=9.40e-05, throughput=5656 tok/s +2025-11-27 06:03:15,781 - INFO - Epoch 1 Step 2530 (Global: 2530): loss=1.6091, ppl=5.00, grad_norm=0.85, lr=9.39e-05, throughput=5654 tok/s +2025-11-27 06:04:40,673 - INFO - Epoch 1 Step 2540 (Global: 2540): loss=1.8871, ppl=6.60, grad_norm=0.86, lr=9.38e-05, throughput=5654 tok/s +2025-11-27 06:06:05,529 - INFO - Epoch 1 Step 2550 (Global: 2550): loss=1.5200, ppl=4.57, grad_norm=0.81, lr=9.37e-05, throughput=5657 tok/s +2025-11-27 06:07:30,459 - INFO - Epoch 1 Step 2560 (Global: 2560): loss=1.8280, ppl=6.22, grad_norm=0.90, lr=9.37e-05, throughput=5652 tok/s +2025-11-27 06:08:55,271 - INFO - Epoch 1 Step 2570 (Global: 2570): loss=1.7949, ppl=6.02, grad_norm=0.89, lr=9.36e-05, throughput=5660 tok/s +2025-11-27 06:10:20,193 - INFO - Epoch 1 Step 2580 (Global: 2580): loss=1.7517, ppl=5.76, grad_norm=0.84, lr=9.35e-05, throughput=5652 tok/s +2025-11-27 06:11:45,238 - INFO - Epoch 1 Step 2590 (Global: 2590): loss=1.5632, ppl=4.77, grad_norm=0.83, lr=9.34e-05, throughput=5644 tok/s +2025-11-27 06:13:10,322 - INFO - Epoch 1 Step 2600 (Global: 2600): loss=1.8015, ppl=6.06, grad_norm=0.91, lr=9.33e-05, throughput=5642 tok/s +2025-11-27 06:14:35,700 - INFO - Epoch 1 Step 2610 (Global: 2610): loss=1.6334, ppl=5.12, grad_norm=0.84, lr=9.32e-05, throughput=5622 tok/s +2025-11-27 06:16:00,854 - INFO - Epoch 1 Step 2620 (Global: 2620): loss=1.7462, ppl=5.73, grad_norm=0.88, lr=9.32e-05, throughput=5637 tok/s +2025-11-27 06:17:25,942 - INFO - Epoch 1 Step 2630 (Global: 2630): loss=1.7869, ppl=5.97, grad_norm=0.84, lr=9.31e-05, throughput=5641 tok/s +2025-11-27 06:18:51,028 - INFO - Epoch 1 Step 2640 (Global: 2640): loss=1.8176, ppl=6.16, grad_norm=0.84, lr=9.30e-05, throughput=5641 tok/s +2025-11-27 06:20:16,193 - INFO - Epoch 1 Step 2650 (Global: 2650): loss=1.8459, ppl=6.33, grad_norm=0.82, lr=9.29e-05, throughput=5636 tok/s +2025-11-27 06:21:41,280 - INFO - Epoch 1 Step 2660 (Global: 2660): loss=1.8371, ppl=6.28, grad_norm=0.85, lr=9.28e-05, throughput=5641 tok/s +2025-11-27 06:23:06,411 - INFO - Epoch 1 Step 2670 (Global: 2670): loss=1.5263, ppl=4.60, grad_norm=0.84, lr=9.27e-05, throughput=5638 tok/s +2025-11-27 06:24:31,443 - INFO - Epoch 1 Step 2680 (Global: 2680): loss=1.6084, ppl=4.99, grad_norm=0.82, lr=9.26e-05, throughput=5645 tok/s +2025-11-27 06:25:56,876 - INFO - Epoch 1 Step 2690 (Global: 2690): loss=1.6086, ppl=5.00, grad_norm=0.93, lr=9.26e-05, throughput=5618 tok/s +2025-11-27 06:27:21,778 - INFO - Epoch 1 Step 2700 (Global: 2700): loss=1.6471, ppl=5.19, grad_norm=0.84, lr=9.25e-05, throughput=5654 tok/s +2025-11-27 06:28:46,539 - INFO - Epoch 1 Step 2710 (Global: 2710): loss=1.4742, ppl=4.37, grad_norm=0.82, lr=9.24e-05, throughput=5663 tok/s +2025-11-27 06:30:11,290 - INFO - Epoch 1 Step 2720 (Global: 2720): loss=1.8677, ppl=6.47, grad_norm=0.90, lr=9.23e-05, throughput=5664 tok/s +2025-11-27 06:31:36,419 - INFO - Epoch 1 Step 2730 (Global: 2730): loss=1.6772, ppl=5.35, grad_norm=0.83, lr=9.22e-05, throughput=5639 tok/s +2025-11-27 06:33:01,263 - INFO - Epoch 1 Step 2740 (Global: 2740): loss=1.6321, ppl=5.11, grad_norm=0.82, lr=9.21e-05, throughput=5658 tok/s +2025-11-27 06:34:26,102 - INFO - Epoch 1 Step 2750 (Global: 2750): loss=1.7205, ppl=5.59, grad_norm=0.89, lr=9.20e-05, throughput=5658 tok/s +2025-11-27 06:35:50,979 - INFO - Epoch 1 Step 2760 (Global: 2760): loss=1.9120, ppl=6.77, grad_norm=0.85, lr=9.19e-05, throughput=5655 tok/s +2025-11-27 06:37:15,775 - INFO - Epoch 1 Step 2770 (Global: 2770): loss=1.8299, ppl=6.23, grad_norm=0.84, lr=9.18e-05, throughput=5661 tok/s +2025-11-27 06:38:40,873 - INFO - Epoch 1 Step 2780 (Global: 2780): loss=1.8571, ppl=6.41, grad_norm=0.93, lr=9.17e-05, throughput=5641 tok/s +2025-11-27 06:40:05,763 - INFO - Epoch 1 Step 2790 (Global: 2790): loss=1.7779, ppl=5.92, grad_norm=0.88, lr=9.17e-05, throughput=5654 tok/s +2025-11-27 06:41:30,671 - INFO - Epoch 1 Step 2800 (Global: 2800): loss=1.5374, ppl=4.65, grad_norm=0.79, lr=9.16e-05, throughput=5653 tok/s +2025-11-27 06:42:55,404 - INFO - Epoch 1 Step 2810 (Global: 2810): loss=1.7023, ppl=5.49, grad_norm=0.87, lr=9.15e-05, throughput=5665 tok/s +2025-11-27 06:44:20,182 - INFO - Epoch 1 Step 2820 (Global: 2820): loss=1.9409, ppl=6.96, grad_norm=0.86, lr=9.14e-05, throughput=5662 tok/s +2025-11-27 06:45:44,956 - INFO - Epoch 1 Step 2830 (Global: 2830): loss=1.6332, ppl=5.12, grad_norm=0.84, lr=9.13e-05, throughput=5662 tok/s +2025-11-27 06:47:09,791 - INFO - Epoch 1 Step 2840 (Global: 2840): loss=1.7170, ppl=5.57, grad_norm=0.84, lr=9.12e-05, throughput=5658 tok/s +2025-11-27 06:48:34,983 - INFO - Epoch 1 Step 2850 (Global: 2850): loss=1.6712, ppl=5.32, grad_norm=0.80, lr=9.11e-05, throughput=5634 tok/s +2025-11-27 06:49:59,706 - INFO - Epoch 1 Step 2860 (Global: 2860): loss=1.7926, ppl=6.00, grad_norm=0.82, lr=9.10e-05, throughput=5666 tok/s +2025-11-27 06:51:24,953 - INFO - Epoch 1 Step 2870 (Global: 2870): loss=1.7081, ppl=5.52, grad_norm=0.82, lr=9.09e-05, throughput=5631 tok/s +2025-11-27 06:52:49,998 - INFO - Epoch 1 Step 2880 (Global: 2880): loss=2.1013, ppl=8.18, grad_norm=0.89, lr=9.08e-05, throughput=5644 tok/s +2025-11-27 06:54:15,149 - INFO - Epoch 1 Step 2890 (Global: 2890): loss=1.6653, ppl=5.29, grad_norm=0.83, lr=9.07e-05, throughput=5637 tok/s +2025-11-27 06:55:40,038 - INFO - Epoch 1 Step 2900 (Global: 2900): loss=1.7495, ppl=5.75, grad_norm=0.86, lr=9.06e-05, throughput=5655 tok/s +2025-11-27 06:57:04,790 - INFO - Epoch 1 Step 2910 (Global: 2910): loss=1.5687, ppl=4.80, grad_norm=0.80, lr=9.05e-05, throughput=5664 tok/s +2025-11-27 06:58:29,627 - INFO - Epoch 1 Step 2920 (Global: 2920): loss=1.6798, ppl=5.36, grad_norm=0.81, lr=9.04e-05, throughput=5658 tok/s +2025-11-27 06:59:54,635 - INFO - Epoch 1 Step 2930 (Global: 2930): loss=1.5883, ppl=4.90, grad_norm=0.82, lr=9.03e-05, throughput=5647 tok/s +2025-11-27 07:01:19,713 - INFO - Epoch 1 Step 2940 (Global: 2940): loss=1.7535, ppl=5.77, grad_norm=0.84, lr=9.02e-05, throughput=5642 tok/s +2025-11-27 07:02:44,859 - INFO - Epoch 1 Step 2950 (Global: 2950): loss=1.5798, ppl=4.85, grad_norm=0.82, lr=9.01e-05, throughput=5637 tok/s +2025-11-27 07:04:10,216 - INFO - Epoch 1 Step 2960 (Global: 2960): loss=1.6917, ppl=5.43, grad_norm=0.87, lr=9.00e-05, throughput=5624 tok/s +2025-11-27 07:05:35,465 - INFO - Epoch 1 Step 2970 (Global: 2970): loss=1.6831, ppl=5.38, grad_norm=0.83, lr=8.99e-05, throughput=5631 tok/s +2025-11-27 07:07:00,710 - INFO - Epoch 1 Step 2980 (Global: 2980): loss=1.5351, ppl=4.64, grad_norm=0.82, lr=8.98e-05, throughput=5631 tok/s +2025-11-27 07:08:25,756 - INFO - Epoch 1 Step 2990 (Global: 2990): loss=1.6279, ppl=5.09, grad_norm=0.86, lr=8.97e-05, throughput=5644 tok/s +2025-11-27 07:09:51,044 - INFO - Epoch 1 Step 3000 (Global: 3000): loss=1.8919, ppl=6.63, grad_norm=0.83, lr=8.96e-05, throughput=5628 tok/s +2025-11-27 07:09:51,044 - INFO - +Running validation at step 3000... +2025-11-27 07:14:25,768 - INFO - Validation loss: 1.7356, perplexity: 5.67 +2025-11-27 07:14:25,769 - INFO - Qualitative metrics (n=5): +2025-11-27 07:14:25,769 - INFO - BLEU: 0.1264 +2025-11-27 07:14:25,769 - INFO - METEOR: 0.2128 +2025-11-27 07:14:25,769 - INFO - Edit Distance: 0.6372 +2025-11-27 07:14:25,769 - INFO - F-measure: 0.2332 +2025-11-27 07:14:25,769 - INFO - +====================================================================== +2025-11-27 07:14:25,769 - INFO - Qualitative Evaluation Samples: +2025-11-27 07:14:25,769 - INFO - ====================================================================== +2025-11-27 07:14:25,769 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 07:14:25,769 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 07:14:25,770 - INFO - Generated: ' to the band\'s previous album, and said that the album "is a triumph, a triumph that will make you fall in love with Death Cab for Cutie." In a positive review, he said that the album "is a triumph, a...' +2025-11-27 07:14:25,770 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 07:14:25,770 - INFO - ---------------------------------------------------------------------- +2025-11-27 07:14:25,770 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 07:14:25,770 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 07:14:25,770 - INFO - Generated: 'ans in the United States. The Order of Angell was founded in 1906, and its members were mostly of African American descent. The Order of Angell was founded in 1906, and its members were mostly of Afri...' +2025-11-27 07:14:25,770 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 07:14:25,770 - INFO - ---------------------------------------------------------------------- +2025-11-27 07:14:25,770 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 07:14:25,770 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 07:14:25,770 - INFO - Generated: ' be killed by the Red Tails. The other three are killed by the Red Tails, and the last one is killed by the Red Tails.\nThe Red Tails (the Red Tails (the Red Tails (the Red Tails (the Red Tails (the Re...' +2025-11-27 07:14:25,771 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 07:14:25,771 - INFO - ---------------------------------------------------------------------- +2025-11-27 07:14:25,771 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 07:14:25,771 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 07:14:25,771 - INFO - Generated: ' | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 1988-01-01 | 19...' +2025-11-27 07:14:25,771 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 07:14:25,771 - INFO - ---------------------------------------------------------------------- +2025-11-27 07:14:25,771 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 07:14:25,771 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 07:14:25,771 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 151 ] |\n| Madden NFL 12 ...' +2025-11-27 07:14:25,771 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 07:14:25,771 - INFO - ---------------------------------------------------------------------- +2025-11-27 07:14:25,772 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_3000.jsonl +2025-11-27 07:14:53,437 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-27 07:14:53,443 - INFO - New best validation loss: 1.7356, perplexity: 5.67 +2025-11-27 07:16:18,932 - INFO - Epoch 1 Step 3010 (Global: 3010): loss=1.7128, ppl=5.54, grad_norm=0.79, lr=8.95e-05, throughput=5615 tok/s +2025-11-27 07:17:44,178 - INFO - Epoch 1 Step 3020 (Global: 3020): loss=1.7315, ppl=5.65, grad_norm=0.80, lr=8.94e-05, throughput=5631 tok/s +2025-11-27 07:19:09,460 - INFO - Epoch 1 Step 3030 (Global: 3030): loss=1.7072, ppl=5.51, grad_norm=0.85, lr=8.93e-05, throughput=5628 tok/s +2025-11-27 07:20:34,913 - INFO - Epoch 1 Step 3040 (Global: 3040): loss=1.7466, ppl=5.73, grad_norm=0.85, lr=8.92e-05, throughput=5617 tok/s +2025-11-27 07:21:59,990 - INFO - Epoch 1 Step 3050 (Global: 3050): loss=1.6557, ppl=5.24, grad_norm=0.77, lr=8.91e-05, throughput=5642 tok/s +2025-11-27 07:23:25,167 - INFO - Epoch 1 Step 3060 (Global: 3060): loss=1.7242, ppl=5.61, grad_norm=0.89, lr=8.90e-05, throughput=5635 tok/s +2025-11-27 07:24:50,157 - INFO - Epoch 1 Step 3070 (Global: 3070): loss=1.5747, ppl=4.83, grad_norm=0.81, lr=8.89e-05, throughput=5648 tok/s +2025-11-27 07:26:15,347 - INFO - Epoch 1 Step 3080 (Global: 3080): loss=1.4967, ppl=4.47, grad_norm=0.78, lr=8.88e-05, throughput=5635 tok/s +2025-11-27 07:27:40,465 - INFO - Epoch 1 Step 3090 (Global: 3090): loss=1.8103, ppl=6.11, grad_norm=0.79, lr=8.87e-05, throughput=5639 tok/s +2025-11-27 07:29:05,412 - INFO - Epoch 1 Step 3100 (Global: 3100): loss=1.8405, ppl=6.30, grad_norm=0.82, lr=8.86e-05, throughput=5651 tok/s +2025-11-27 07:30:30,311 - INFO - Epoch 1 Step 3110 (Global: 3110): loss=1.8821, ppl=6.57, grad_norm=0.86, lr=8.85e-05, throughput=5654 tok/s +2025-11-27 07:31:55,129 - INFO - Epoch 1 Step 3120 (Global: 3120): loss=1.7719, ppl=5.88, grad_norm=0.82, lr=8.84e-05, throughput=5659 tok/s +2025-11-27 07:33:19,997 - INFO - Epoch 1 Step 3130 (Global: 3130): loss=1.5521, ppl=4.72, grad_norm=0.82, lr=8.82e-05, throughput=5656 tok/s +2025-11-27 07:34:44,766 - INFO - Epoch 1 Step 3140 (Global: 3140): loss=1.6875, ppl=5.41, grad_norm=0.86, lr=8.81e-05, throughput=5663 tok/s +2025-11-27 07:36:09,695 - INFO - Epoch 1 Step 3150 (Global: 3150): loss=1.6506, ppl=5.21, grad_norm=0.79, lr=8.80e-05, throughput=5652 tok/s +2025-11-27 07:37:34,958 - INFO - Epoch 1 Step 3160 (Global: 3160): loss=1.6189, ppl=5.05, grad_norm=0.89, lr=8.79e-05, throughput=5630 tok/s +2025-11-27 07:38:59,866 - INFO - Epoch 1 Step 3170 (Global: 3170): loss=1.7415, ppl=5.71, grad_norm=0.84, lr=8.78e-05, throughput=5653 tok/s +2025-11-27 07:40:24,625 - INFO - Epoch 1 Step 3180 (Global: 3180): loss=1.7479, ppl=5.74, grad_norm=0.88, lr=8.77e-05, throughput=5663 tok/s +2025-11-27 07:41:49,731 - INFO - Epoch 1 Step 3190 (Global: 3190): loss=1.8141, ppl=6.14, grad_norm=0.81, lr=8.76e-05, throughput=5640 tok/s +2025-11-27 07:43:14,490 - INFO - Epoch 1 Step 3200 (Global: 3200): loss=1.6691, ppl=5.31, grad_norm=0.84, lr=8.75e-05, throughput=5663 tok/s +2025-11-27 07:44:39,296 - INFO - Epoch 1 Step 3210 (Global: 3210): loss=1.6761, ppl=5.34, grad_norm=0.77, lr=8.74e-05, throughput=5660 tok/s +2025-11-27 07:46:04,199 - INFO - Epoch 1 Step 3220 (Global: 3220): loss=1.7066, ppl=5.51, grad_norm=0.88, lr=8.73e-05, throughput=5654 tok/s +2025-11-27 07:47:29,072 - INFO - Epoch 1 Step 3230 (Global: 3230): loss=1.7021, ppl=5.49, grad_norm=0.78, lr=8.71e-05, throughput=5656 tok/s +2025-11-27 07:48:53,934 - INFO - Epoch 1 Step 3240 (Global: 3240): loss=1.9165, ppl=6.80, grad_norm=0.93, lr=8.70e-05, throughput=5656 tok/s +2025-11-27 07:50:18,909 - INFO - Epoch 1 Step 3250 (Global: 3250): loss=1.6807, ppl=5.37, grad_norm=0.82, lr=8.69e-05, throughput=5649 tok/s +2025-11-27 07:51:43,884 - INFO - Epoch 1 Step 3260 (Global: 3260): loss=1.5935, ppl=4.92, grad_norm=0.81, lr=8.68e-05, throughput=5649 tok/s +2025-11-27 07:53:08,853 - INFO - Epoch 1 Step 3270 (Global: 3270): loss=1.8483, ppl=6.35, grad_norm=0.89, lr=8.67e-05, throughput=5649 tok/s +2025-11-27 07:54:33,800 - INFO - Epoch 1 Step 3280 (Global: 3280): loss=1.7359, ppl=5.67, grad_norm=0.83, lr=8.66e-05, throughput=5651 tok/s +2025-11-27 07:55:59,040 - INFO - Epoch 1 Step 3290 (Global: 3290): loss=1.7124, ppl=5.54, grad_norm=0.83, lr=8.65e-05, throughput=5631 tok/s +2025-11-27 07:57:23,917 - INFO - Epoch 1 Step 3300 (Global: 3300): loss=1.7574, ppl=5.80, grad_norm=0.85, lr=8.63e-05, throughput=5655 tok/s +2025-11-27 07:58:48,807 - INFO - Epoch 1 Step 3310 (Global: 3310): loss=1.7083, ppl=5.52, grad_norm=0.86, lr=8.62e-05, throughput=5654 tok/s +2025-11-27 08:00:13,732 - INFO - Epoch 1 Step 3320 (Global: 3320): loss=1.5933, ppl=4.92, grad_norm=0.78, lr=8.61e-05, throughput=5652 tok/s +2025-11-27 08:01:38,535 - INFO - Epoch 1 Step 3330 (Global: 3330): loss=1.7795, ppl=5.93, grad_norm=0.81, lr=8.60e-05, throughput=5660 tok/s +2025-11-27 08:03:03,373 - INFO - Epoch 1 Step 3340 (Global: 3340): loss=1.5777, ppl=4.84, grad_norm=0.79, lr=8.59e-05, throughput=5658 tok/s +2025-11-27 08:04:28,362 - INFO - Epoch 1 Step 3350 (Global: 3350): loss=1.7569, ppl=5.79, grad_norm=0.81, lr=8.58e-05, throughput=5648 tok/s +2025-11-27 08:05:53,176 - INFO - Epoch 1 Step 3360 (Global: 3360): loss=1.5477, ppl=4.70, grad_norm=0.76, lr=8.57e-05, throughput=5660 tok/s +2025-11-27 08:07:18,268 - INFO - Epoch 1 Step 3370 (Global: 3370): loss=1.5622, ppl=4.77, grad_norm=0.87, lr=8.55e-05, throughput=5641 tok/s +2025-11-27 08:08:43,126 - INFO - Epoch 1 Step 3380 (Global: 3380): loss=1.6316, ppl=5.11, grad_norm=0.78, lr=8.54e-05, throughput=5657 tok/s +2025-11-27 08:10:08,194 - INFO - Epoch 1 Step 3390 (Global: 3390): loss=1.8853, ppl=6.59, grad_norm=0.85, lr=8.53e-05, throughput=5643 tok/s +2025-11-27 08:11:32,936 - INFO - Epoch 1 Step 3400 (Global: 3400): loss=1.6770, ppl=5.35, grad_norm=0.80, lr=8.52e-05, throughput=5664 tok/s +2025-11-27 08:12:57,627 - INFO - Epoch 1 Step 3410 (Global: 3410): loss=1.6487, ppl=5.20, grad_norm=0.84, lr=8.51e-05, throughput=5668 tok/s +2025-11-27 08:14:22,651 - INFO - Epoch 1 Step 3420 (Global: 3420): loss=1.7759, ppl=5.91, grad_norm=0.81, lr=8.49e-05, throughput=5646 tok/s +2025-11-27 08:15:47,727 - INFO - Epoch 1 Step 3430 (Global: 3430): loss=1.6266, ppl=5.09, grad_norm=0.82, lr=8.48e-05, throughput=5642 tok/s +2025-11-27 08:17:12,838 - INFO - Epoch 1 Step 3440 (Global: 3440): loss=1.9523, ppl=7.04, grad_norm=0.89, lr=8.47e-05, throughput=5640 tok/s +2025-11-27 08:18:37,959 - INFO - Epoch 1 Step 3450 (Global: 3450): loss=1.8866, ppl=6.60, grad_norm=0.82, lr=8.46e-05, throughput=5639 tok/s +2025-11-27 08:20:03,078 - INFO - Epoch 1 Step 3460 (Global: 3460): loss=1.6412, ppl=5.16, grad_norm=0.83, lr=8.45e-05, throughput=5639 tok/s +2025-11-27 08:21:28,503 - INFO - Epoch 1 Step 3470 (Global: 3470): loss=1.5101, ppl=4.53, grad_norm=0.84, lr=8.43e-05, throughput=5619 tok/s +2025-11-27 08:22:53,687 - INFO - Epoch 1 Step 3480 (Global: 3480): loss=1.6781, ppl=5.36, grad_norm=0.82, lr=8.42e-05, throughput=5635 tok/s +2025-11-27 08:24:19,135 - INFO - Epoch 1 Step 3490 (Global: 3490): loss=1.8415, ppl=6.31, grad_norm=0.88, lr=8.41e-05, throughput=5617 tok/s +2025-11-27 08:25:44,222 - INFO - Epoch 1 Step 3500 (Global: 3500): loss=1.5792, ppl=4.85, grad_norm=0.82, lr=8.40e-05, throughput=5641 tok/s +2025-11-27 08:25:44,222 - INFO - +Running validation at step 3500... +2025-11-27 08:30:17,066 - INFO - Validation loss: 1.7211, perplexity: 5.59 +2025-11-27 08:30:17,067 - INFO - Qualitative metrics (n=5): +2025-11-27 08:30:17,067 - INFO - BLEU: 0.1436 +2025-11-27 08:30:17,067 - INFO - METEOR: 0.1916 +2025-11-27 08:30:17,067 - INFO - Edit Distance: 0.6243 +2025-11-27 08:30:17,067 - INFO - F-measure: 0.2308 +2025-11-27 08:30:17,067 - INFO - +====================================================================== +2025-11-27 08:30:17,067 - INFO - Qualitative Evaluation Samples: +2025-11-27 08:30:17,068 - INFO - ====================================================================== +2025-11-27 08:30:17,068 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 08:30:17,068 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 08:30:17,068 - INFO - Generated: ' to the band\'s previous album, The Last Day of the World, saying that it "isn\'t a bad album, but it\'s not a great one either." In a 2016 interview with The Boston Globe, Gibbard said that he was "very...' +2025-11-27 08:30:17,068 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 08:30:17,068 - INFO - ---------------------------------------------------------------------- +2025-11-27 08:30:17,068 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 08:30:17,068 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 08:30:17,068 - INFO - Generated: 'aternity life. The first fraternity house was built in 1891, and the first fraternity house was built in 1892. The first fraternity house was built in 1892, and the first fraternity house was built in...' +2025-11-27 08:30:17,068 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 08:30:17,068 - INFO - ---------------------------------------------------------------------- +2025-11-27 08:30:17,069 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 08:30:17,069 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 08:30:17,069 - INFO - Generated: " be killed by the Red Tails. Teimou is killed by the Red Tails, but his body is later found by the Red Tails' leader, the Red Death, who is revealed to be the one who killed Teimou. The Red Death is r..." +2025-11-27 08:30:17,069 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 08:30:17,069 - INFO - ---------------------------------------------------------------------- +2025-11-27 08:30:17,069 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 08:30:17,069 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 08:30:17,069 - INFO - Generated: '-01-01 | 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0B0, 0B0B0..B0...' +2025-11-27 08:30:17,069 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 08:30:17,069 - INFO - ---------------------------------------------------------------------- +2025-11-27 08:30:17,069 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 08:30:17,069 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 08:30:17,070 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 08:30:17,070 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 08:30:17,070 - INFO - ---------------------------------------------------------------------- +2025-11-27 08:30:17,071 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_3500.jsonl +2025-11-27 08:30:46,069 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-27 08:30:46,076 - INFO - New best validation loss: 1.7211, perplexity: 5.59 +2025-11-27 08:32:11,130 - INFO - Epoch 1 Step 3510 (Global: 3510): loss=1.4760, ppl=4.38, grad_norm=0.84, lr=8.38e-05, throughput=5644 tok/s +2025-11-27 08:33:36,087 - INFO - Epoch 1 Step 3520 (Global: 3520): loss=1.5697, ppl=4.81, grad_norm=0.79, lr=8.37e-05, throughput=5650 tok/s +2025-11-27 08:35:01,086 - INFO - Epoch 1 Step 3530 (Global: 3530): loss=1.8770, ppl=6.53, grad_norm=0.81, lr=8.36e-05, throughput=5647 tok/s +2025-11-27 08:36:25,917 - INFO - Epoch 1 Step 3540 (Global: 3540): loss=1.6864, ppl=5.40, grad_norm=0.88, lr=8.35e-05, throughput=5658 tok/s +2025-11-27 08:37:50,832 - INFO - Epoch 1 Step 3550 (Global: 3550): loss=1.6771, ppl=5.35, grad_norm=0.84, lr=8.33e-05, throughput=5653 tok/s +2025-11-27 08:39:15,999 - INFO - Epoch 1 Step 3560 (Global: 3560): loss=1.6093, ppl=5.00, grad_norm=0.82, lr=8.32e-05, throughput=5636 tok/s +2025-11-27 08:40:40,848 - INFO - Epoch 1 Step 3570 (Global: 3570): loss=1.9380, ppl=6.94, grad_norm=0.92, lr=8.31e-05, throughput=5657 tok/s +2025-11-27 08:42:05,774 - INFO - Epoch 1 Step 3580 (Global: 3580): loss=1.6315, ppl=5.11, grad_norm=0.79, lr=8.30e-05, throughput=5652 tok/s +2025-11-27 08:43:31,154 - INFO - Epoch 1 Step 3590 (Global: 3590): loss=1.8025, ppl=6.06, grad_norm=0.85, lr=8.28e-05, throughput=5622 tok/s +2025-11-27 08:44:55,995 - INFO - Epoch 1 Step 3600 (Global: 3600): loss=1.6613, ppl=5.27, grad_norm=0.79, lr=8.27e-05, throughput=5658 tok/s +2025-11-27 08:46:21,131 - INFO - Epoch 1 Step 3610 (Global: 3610): loss=1.5753, ppl=4.83, grad_norm=0.79, lr=8.26e-05, throughput=5638 tok/s +2025-11-27 08:47:46,144 - INFO - Epoch 1 Step 3620 (Global: 3620): loss=1.8807, ppl=6.56, grad_norm=0.85, lr=8.25e-05, throughput=5646 tok/s +2025-11-27 08:49:11,115 - INFO - Epoch 1 Step 3630 (Global: 3630): loss=1.5980, ppl=4.94, grad_norm=0.80, lr=8.23e-05, throughput=5649 tok/s +2025-11-27 08:50:36,015 - INFO - Epoch 1 Step 3640 (Global: 3640): loss=1.6296, ppl=5.10, grad_norm=0.86, lr=8.22e-05, throughput=5654 tok/s +2025-11-27 08:52:01,544 - INFO - Epoch 1 Step 3650 (Global: 3650): loss=1.7970, ppl=6.03, grad_norm=0.88, lr=8.21e-05, throughput=5612 tok/s +2025-11-27 08:53:26,828 - INFO - Epoch 1 Step 3660 (Global: 3660): loss=1.6159, ppl=5.03, grad_norm=0.77, lr=8.20e-05, throughput=5628 tok/s +2025-11-27 08:54:51,699 - INFO - Epoch 1 Step 3670 (Global: 3670): loss=1.7257, ppl=5.62, grad_norm=0.79, lr=8.18e-05, throughput=5656 tok/s +2025-11-27 08:56:16,610 - INFO - Epoch 1 Step 3680 (Global: 3680): loss=1.5703, ppl=4.81, grad_norm=0.78, lr=8.17e-05, throughput=5653 tok/s +2025-11-27 08:57:41,453 - INFO - Epoch 1 Step 3690 (Global: 3690): loss=1.8822, ppl=6.57, grad_norm=0.83, lr=8.16e-05, throughput=5658 tok/s +2025-11-27 08:59:06,336 - INFO - Epoch 1 Step 3700 (Global: 3700): loss=1.6138, ppl=5.02, grad_norm=0.82, lr=8.14e-05, throughput=5655 tok/s +2025-11-27 09:00:31,968 - INFO - Epoch 1 Step 3710 (Global: 3710): loss=1.8686, ppl=6.48, grad_norm=0.85, lr=8.13e-05, throughput=5605 tok/s +2025-11-27 09:01:56,807 - INFO - Epoch 1 Step 3720 (Global: 3720): loss=1.7154, ppl=5.56, grad_norm=0.88, lr=8.12e-05, throughput=5658 tok/s +2025-11-27 09:03:21,651 - INFO - Epoch 1 Step 3730 (Global: 3730): loss=1.7180, ppl=5.57, grad_norm=0.86, lr=8.10e-05, throughput=5657 tok/s +2025-11-27 09:04:46,959 - INFO - Epoch 1 Step 3740 (Global: 3740): loss=1.6646, ppl=5.28, grad_norm=0.82, lr=8.09e-05, throughput=5627 tok/s +2025-11-27 09:06:12,076 - INFO - Epoch 1 Step 3750 (Global: 3750): loss=1.6070, ppl=4.99, grad_norm=0.91, lr=8.08e-05, throughput=5639 tok/s +2025-11-27 09:07:37,219 - INFO - Epoch 1 Step 3760 (Global: 3760): loss=1.4842, ppl=4.41, grad_norm=0.76, lr=8.06e-05, throughput=5638 tok/s +2025-11-27 09:09:02,157 - INFO - Epoch 1 Step 3770 (Global: 3770): loss=2.0332, ppl=7.64, grad_norm=0.86, lr=8.05e-05, throughput=5651 tok/s +2025-11-27 09:10:27,526 - INFO - Epoch 1 Step 3780 (Global: 3780): loss=1.5973, ppl=4.94, grad_norm=0.75, lr=8.04e-05, throughput=5623 tok/s +2025-11-27 09:11:52,669 - INFO - Epoch 1 Step 3790 (Global: 3790): loss=1.6191, ppl=5.05, grad_norm=0.76, lr=8.02e-05, throughput=5638 tok/s +2025-11-27 09:13:17,809 - INFO - Epoch 1 Step 3800 (Global: 3800): loss=1.7164, ppl=5.56, grad_norm=0.78, lr=8.01e-05, throughput=5638 tok/s +2025-11-27 09:14:43,108 - INFO - Epoch 1 Step 3810 (Global: 3810): loss=1.6773, ppl=5.35, grad_norm=0.79, lr=8.00e-05, throughput=5627 tok/s +2025-11-27 09:16:08,737 - INFO - Epoch 1 Step 3820 (Global: 3820): loss=1.6293, ppl=5.10, grad_norm=0.87, lr=7.98e-05, throughput=5606 tok/s +2025-11-27 09:17:33,968 - INFO - Epoch 1 Step 3830 (Global: 3830): loss=1.7128, ppl=5.54, grad_norm=0.90, lr=7.97e-05, throughput=5632 tok/s +2025-11-27 09:18:59,499 - INFO - Epoch 1 Step 3840 (Global: 3840): loss=1.6750, ppl=5.34, grad_norm=0.81, lr=7.96e-05, throughput=5612 tok/s +2025-11-27 09:20:24,899 - INFO - Epoch 1 Step 3850 (Global: 3850): loss=1.8335, ppl=6.26, grad_norm=0.80, lr=7.94e-05, throughput=5621 tok/s +2025-11-27 09:21:49,998 - INFO - Epoch 1 Step 3860 (Global: 3860): loss=1.7068, ppl=5.51, grad_norm=0.88, lr=7.93e-05, throughput=5641 tok/s +2025-11-27 09:23:15,217 - INFO - Epoch 1 Step 3870 (Global: 3870): loss=1.6741, ppl=5.33, grad_norm=0.77, lr=7.92e-05, throughput=5633 tok/s +2025-11-27 09:24:40,294 - INFO - Epoch 1 Step 3880 (Global: 3880): loss=1.6882, ppl=5.41, grad_norm=0.82, lr=7.90e-05, throughput=5642 tok/s +2025-11-27 09:26:05,196 - INFO - Epoch 1 Step 3890 (Global: 3890): loss=1.6339, ppl=5.12, grad_norm=0.77, lr=7.89e-05, throughput=5654 tok/s +2025-11-27 09:27:30,359 - INFO - Epoch 1 Step 3900 (Global: 3900): loss=1.8215, ppl=6.18, grad_norm=0.88, lr=7.88e-05, throughput=5636 tok/s +2025-11-27 09:28:55,309 - INFO - Epoch 1 Step 3910 (Global: 3910): loss=1.6270, ppl=5.09, grad_norm=0.79, lr=7.86e-05, throughput=5650 tok/s +2025-11-27 09:30:20,300 - INFO - Epoch 1 Step 3920 (Global: 3920): loss=1.6861, ppl=5.40, grad_norm=0.78, lr=7.85e-05, throughput=5648 tok/s +2025-11-27 09:31:45,584 - INFO - Epoch 1 Step 3930 (Global: 3930): loss=1.5606, ppl=4.76, grad_norm=0.82, lr=7.83e-05, throughput=5628 tok/s +2025-11-27 09:33:10,719 - INFO - Epoch 1 Step 3940 (Global: 3940): loss=1.5101, ppl=4.53, grad_norm=0.79, lr=7.82e-05, throughput=5638 tok/s +2025-11-27 09:34:35,495 - INFO - Epoch 1 Step 3950 (Global: 3950): loss=1.6309, ppl=5.11, grad_norm=0.81, lr=7.81e-05, throughput=5662 tok/s +2025-11-27 09:36:00,534 - INFO - Epoch 1 Step 3960 (Global: 3960): loss=1.7346, ppl=5.67, grad_norm=0.82, lr=7.79e-05, throughput=5645 tok/s +2025-11-27 09:37:25,467 - INFO - Epoch 1 Step 3970 (Global: 3970): loss=1.5938, ppl=4.92, grad_norm=0.82, lr=7.78e-05, throughput=5652 tok/s +2025-11-27 09:38:50,335 - INFO - Epoch 1 Step 3980 (Global: 3980): loss=1.6834, ppl=5.38, grad_norm=0.86, lr=7.77e-05, throughput=5656 tok/s +2025-11-27 09:40:15,270 - INFO - Epoch 1 Step 3990 (Global: 3990): loss=1.8189, ppl=6.16, grad_norm=0.87, lr=7.75e-05, throughput=5651 tok/s +2025-11-27 09:41:40,317 - INFO - Epoch 1 Step 4000 (Global: 4000): loss=1.6709, ppl=5.32, grad_norm=0.76, lr=7.74e-05, throughput=5644 tok/s +2025-11-27 09:41:40,317 - INFO - +Running validation at step 4000... +2025-11-27 09:46:13,562 - INFO - Validation loss: 1.7024, perplexity: 5.49 +2025-11-27 09:46:13,562 - INFO - Qualitative metrics (n=5): +2025-11-27 09:46:13,562 - INFO - BLEU: 0.1762 +2025-11-27 09:46:13,562 - INFO - METEOR: 0.2400 +2025-11-27 09:46:13,562 - INFO - Edit Distance: 0.6356 +2025-11-27 09:46:13,563 - INFO - F-measure: 0.2839 +2025-11-27 09:46:13,563 - INFO - +====================================================================== +2025-11-27 09:46:13,563 - INFO - Qualitative Evaluation Samples: +2025-11-27 09:46:13,563 - INFO - ====================================================================== +2025-11-27 09:46:13,563 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 09:46:13,563 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 09:46:13,563 - INFO - Generated: ' to the band\'s previous work, saying that "it\'s a little more experimental, a little more experimental, a little more experimental." He also praised the album\'s production, saying that "it\'s a little ...' +2025-11-27 09:46:13,563 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 09:46:13,563 - INFO - ---------------------------------------------------------------------- +2025-11-27 09:46:13,564 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 09:46:13,564 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 09:46:13,564 - INFO - Generated: 'aternities. The Order of Angell was founded by a group of African-American students at the University of Michigan, and the fraternity was founded by a group of African-American students at the Univers...' +2025-11-27 09:46:13,564 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 09:46:13,564 - INFO - ---------------------------------------------------------------------- +2025-11-27 09:46:13,564 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 09:46:13,564 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 09:46:13,564 - INFO - Generated: ' be killed by the Red Tails. However, they are saved by the Red Tails, who use their powers to defeat the shadow group and save Teimou. Afterwards, they are reunited with their parents, who are reveal...' +2025-11-27 09:46:13,564 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 09:46:13,564 - INFO - ---------------------------------------------------------------------- +2025-11-27 09:46:13,564 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 09:46:13,564 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 09:46:13,564 - INFO - Generated: '-1 | 0.0.0.0 | U+0B01..0B03, 0B05..0B08, 0B13..0B28, 0B32..0B33, 0B36..0B39, 0B3C..0B43, 0B47..0B48, 0B4B..0B4D, 0B57, 0B5C..0B5D, 0B61, 0B6C..0B6D, 0B7E..0B7F, 0B81..0B82, 0B83..0B84, 0B85..0B86, 0B8...' +2025-11-27 09:46:13,565 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 09:46:13,565 - INFO - ---------------------------------------------------------------------- +2025-11-27 09:46:13,565 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 09:46:13,565 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 09:46:13,565 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 09:46:13,565 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 09:46:13,565 - INFO - ---------------------------------------------------------------------- +2025-11-27 09:46:13,566 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_4000.jsonl +2025-11-27 09:46:46,560 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-27 09:46:46,567 - INFO - New best validation loss: 1.7024, perplexity: 5.49 +2025-11-27 09:48:11,567 - INFO - Epoch 1 Step 4010 (Global: 4010): loss=1.5410, ppl=4.67, grad_norm=0.79, lr=7.72e-05, throughput=5648 tok/s +2025-11-27 09:49:36,487 - INFO - Epoch 1 Step 4020 (Global: 4020): loss=1.8588, ppl=6.42, grad_norm=0.85, lr=7.71e-05, throughput=5652 tok/s +2025-11-27 09:51:01,373 - INFO - Epoch 1 Step 4030 (Global: 4030): loss=1.8626, ppl=6.44, grad_norm=0.92, lr=7.70e-05, throughput=5655 tok/s +2025-11-27 09:52:26,281 - INFO - Epoch 1 Step 4040 (Global: 4040): loss=1.8493, ppl=6.36, grad_norm=0.80, lr=7.68e-05, throughput=5653 tok/s +2025-11-27 09:53:51,459 - INFO - Epoch 1 Step 4050 (Global: 4050): loss=1.7331, ppl=5.66, grad_norm=0.88, lr=7.67e-05, throughput=5635 tok/s +2025-11-27 09:55:16,461 - INFO - Epoch 1 Step 4060 (Global: 4060): loss=1.6579, ppl=5.25, grad_norm=0.84, lr=7.65e-05, throughput=5647 tok/s +2025-11-27 09:56:41,356 - INFO - Epoch 1 Step 4070 (Global: 4070): loss=1.6591, ppl=5.25, grad_norm=0.79, lr=7.64e-05, throughput=5654 tok/s +2025-11-27 09:58:06,438 - INFO - Epoch 1 Step 4080 (Global: 4080): loss=1.7917, ppl=6.00, grad_norm=0.84, lr=7.62e-05, throughput=5642 tok/s +2025-11-27 09:59:31,095 - INFO - Epoch 1 Step 4090 (Global: 4090): loss=1.8858, ppl=6.59, grad_norm=0.79, lr=7.61e-05, throughput=5670 tok/s +2025-11-27 10:00:56,201 - INFO - Epoch 1 Step 4100 (Global: 4100): loss=1.6623, ppl=5.27, grad_norm=0.84, lr=7.60e-05, throughput=5640 tok/s +2025-11-27 10:02:21,343 - INFO - Epoch 1 Step 4110 (Global: 4110): loss=1.6113, ppl=5.01, grad_norm=0.84, lr=7.58e-05, throughput=5638 tok/s +2025-11-27 10:03:46,723 - INFO - Epoch 1 Step 4120 (Global: 4120): loss=1.7571, ppl=5.80, grad_norm=0.86, lr=7.57e-05, throughput=5622 tok/s +2025-11-27 10:05:11,754 - INFO - Epoch 1 Step 4130 (Global: 4130): loss=1.8426, ppl=6.31, grad_norm=0.83, lr=7.55e-05, throughput=5645 tok/s +2025-11-27 10:06:36,597 - INFO - Epoch 1 Step 4140 (Global: 4140): loss=1.7074, ppl=5.51, grad_norm=0.86, lr=7.54e-05, throughput=5658 tok/s +2025-11-27 10:08:01,731 - INFO - Epoch 1 Step 4150 (Global: 4150): loss=1.9561, ppl=7.07, grad_norm=0.86, lr=7.52e-05, throughput=5638 tok/s +2025-11-27 10:09:27,244 - INFO - Epoch 1 Step 4160 (Global: 4160): loss=1.5626, ppl=4.77, grad_norm=0.83, lr=7.51e-05, throughput=5613 tok/s +2025-11-27 10:10:52,862 - INFO - Epoch 1 Step 4170 (Global: 4170): loss=1.8059, ppl=6.09, grad_norm=0.83, lr=7.49e-05, throughput=5606 tok/s +2025-11-27 10:12:18,546 - INFO - Epoch 1 Step 4180 (Global: 4180): loss=1.6394, ppl=5.15, grad_norm=0.76, lr=7.48e-05, throughput=5602 tok/s +2025-11-27 10:13:44,041 - INFO - Epoch 1 Step 4190 (Global: 4190): loss=1.5582, ppl=4.75, grad_norm=0.77, lr=7.47e-05, throughput=5614 tok/s +2025-11-27 10:15:09,740 - INFO - Epoch 1 Step 4200 (Global: 4200): loss=1.7008, ppl=5.48, grad_norm=0.87, lr=7.45e-05, throughput=5601 tok/s +2025-11-27 10:16:35,500 - INFO - Epoch 1 Step 4210 (Global: 4210): loss=1.6250, ppl=5.08, grad_norm=0.80, lr=7.44e-05, throughput=5597 tok/s +2025-11-27 10:18:01,027 - INFO - Epoch 1 Step 4220 (Global: 4220): loss=1.7011, ppl=5.48, grad_norm=0.82, lr=7.42e-05, throughput=5612 tok/s +2025-11-27 10:19:26,419 - INFO - Epoch 1 Step 4230 (Global: 4230): loss=1.7150, ppl=5.56, grad_norm=0.76, lr=7.41e-05, throughput=5621 tok/s +2025-11-27 10:20:52,234 - INFO - Epoch 1 Step 4240 (Global: 4240): loss=1.7175, ppl=5.57, grad_norm=0.87, lr=7.39e-05, throughput=5593 tok/s +2025-11-27 10:22:17,793 - INFO - Epoch 1 Step 4250 (Global: 4250): loss=1.4965, ppl=4.47, grad_norm=0.79, lr=7.38e-05, throughput=5610 tok/s +2025-11-27 10:23:43,525 - INFO - Epoch 1 Step 4260 (Global: 4260): loss=1.6031, ppl=4.97, grad_norm=0.89, lr=7.36e-05, throughput=5599 tok/s +2025-11-27 10:25:08,969 - INFO - Epoch 1 Step 4270 (Global: 4270): loss=1.4976, ppl=4.47, grad_norm=0.80, lr=7.35e-05, throughput=5618 tok/s +2025-11-27 10:26:34,446 - INFO - Epoch 1 Step 4280 (Global: 4280): loss=1.6918, ppl=5.43, grad_norm=0.80, lr=7.33e-05, throughput=5616 tok/s +2025-11-27 10:27:59,741 - INFO - Epoch 1 Step 4290 (Global: 4290): loss=1.8753, ppl=6.52, grad_norm=0.86, lr=7.32e-05, throughput=5628 tok/s +2025-11-27 10:29:25,184 - INFO - Epoch 1 Step 4300 (Global: 4300): loss=1.5425, ppl=4.68, grad_norm=0.78, lr=7.30e-05, throughput=5618 tok/s +2025-11-27 10:30:50,630 - INFO - Epoch 1 Step 4310 (Global: 4310): loss=1.8290, ppl=6.23, grad_norm=0.79, lr=7.29e-05, throughput=5618 tok/s +2025-11-27 10:32:16,113 - INFO - Epoch 1 Step 4320 (Global: 4320): loss=1.8017, ppl=6.06, grad_norm=0.80, lr=7.27e-05, throughput=5615 tok/s +2025-11-27 10:33:41,506 - INFO - Epoch 1 Step 4330 (Global: 4330): loss=1.5814, ppl=4.86, grad_norm=0.80, lr=7.26e-05, throughput=5621 tok/s +2025-11-27 10:35:06,924 - INFO - Epoch 1 Step 4340 (Global: 4340): loss=1.8023, ppl=6.06, grad_norm=0.82, lr=7.24e-05, throughput=5619 tok/s +2025-11-27 10:36:32,144 - INFO - Epoch 1 Step 4350 (Global: 4350): loss=1.6004, ppl=4.96, grad_norm=0.79, lr=7.23e-05, throughput=5633 tok/s +2025-11-27 10:37:58,061 - INFO - Epoch 1 Step 4360 (Global: 4360): loss=1.5419, ppl=4.67, grad_norm=0.86, lr=7.21e-05, throughput=5587 tok/s +2025-11-27 10:39:23,925 - INFO - Epoch 1 Step 4370 (Global: 4370): loss=1.6693, ppl=5.31, grad_norm=0.83, lr=7.20e-05, throughput=5590 tok/s +2025-11-27 10:40:49,292 - INFO - Epoch 1 Step 4380 (Global: 4380): loss=1.5466, ppl=4.70, grad_norm=0.80, lr=7.18e-05, throughput=5623 tok/s +2025-11-27 10:42:14,847 - INFO - Epoch 1 Step 4390 (Global: 4390): loss=1.6891, ppl=5.41, grad_norm=0.81, lr=7.17e-05, throughput=5610 tok/s +2025-11-27 10:43:40,121 - INFO - Epoch 1 Step 4400 (Global: 4400): loss=1.6083, ppl=4.99, grad_norm=0.84, lr=7.15e-05, throughput=5629 tok/s +2025-11-27 10:45:05,433 - INFO - Epoch 1 Step 4410 (Global: 4410): loss=1.6622, ppl=5.27, grad_norm=0.92, lr=7.14e-05, throughput=5626 tok/s +2025-11-27 10:46:30,699 - INFO - Epoch 1 Step 4420 (Global: 4420): loss=1.7171, ppl=5.57, grad_norm=0.86, lr=7.12e-05, throughput=5629 tok/s +2025-11-27 10:47:56,008 - INFO - Epoch 1 Step 4430 (Global: 4430): loss=1.4794, ppl=4.39, grad_norm=0.75, lr=7.11e-05, throughput=5627 tok/s +2025-11-27 10:49:21,546 - INFO - Epoch 1 Step 4440 (Global: 4440): loss=1.6340, ppl=5.12, grad_norm=0.81, lr=7.09e-05, throughput=5612 tok/s +2025-11-27 10:50:46,370 - INFO - Epoch 1 Step 4450 (Global: 4450): loss=1.8179, ppl=6.16, grad_norm=0.82, lr=7.08e-05, throughput=5659 tok/s +2025-11-27 10:52:11,601 - INFO - Epoch 1 Step 4460 (Global: 4460): loss=1.7701, ppl=5.87, grad_norm=0.83, lr=7.06e-05, throughput=5632 tok/s +2025-11-27 10:53:36,204 - INFO - Epoch 1 Step 4470 (Global: 4470): loss=1.6144, ppl=5.02, grad_norm=0.78, lr=7.05e-05, throughput=5674 tok/s +2025-11-27 10:55:01,155 - INFO - Epoch 1 Step 4480 (Global: 4480): loss=1.8918, ppl=6.63, grad_norm=0.83, lr=7.03e-05, throughput=5650 tok/s +2025-11-27 10:56:26,561 - INFO - Epoch 1 Step 4490 (Global: 4490): loss=1.4378, ppl=4.21, grad_norm=0.82, lr=7.02e-05, throughput=5620 tok/s +2025-11-27 10:57:51,985 - INFO - Epoch 1 Step 4500 (Global: 4500): loss=1.8419, ppl=6.31, grad_norm=0.83, lr=7.00e-05, throughput=5619 tok/s +2025-11-27 10:57:51,985 - INFO - +Running validation at step 4500... +2025-11-27 11:02:27,081 - INFO - Validation loss: 1.6849, perplexity: 5.39 +2025-11-27 11:02:27,082 - INFO - Qualitative metrics (n=5): +2025-11-27 11:02:27,082 - INFO - BLEU: 0.1436 +2025-11-27 11:02:27,082 - INFO - METEOR: 0.1948 +2025-11-27 11:02:27,082 - INFO - Edit Distance: 0.6115 +2025-11-27 11:02:27,082 - INFO - F-measure: 0.2269 +2025-11-27 11:02:27,082 - INFO - +====================================================================== +2025-11-27 11:02:27,083 - INFO - Qualitative Evaluation Samples: +2025-11-27 11:02:27,083 - INFO - ====================================================================== +2025-11-27 11:02:27,083 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 11:02:27,083 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 11:02:27,083 - INFO - Generated: ' to the band\'s previous work, saying that it "isn\'t a bad record, but it\'s not a great one, either. It\'s not a bad record, but it\'s not a great one, either. It\'s not a bad record, but it\'s not a great...' +2025-11-27 11:02:27,083 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 11:02:27,083 - INFO - ---------------------------------------------------------------------- +2025-11-27 11:02:27,083 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 11:02:27,083 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 11:02:27,083 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was founded in 1921, and the Order of Michigamua was founded in 1922. The Order of Michigamua was founded in 1923, and the Order of Ange...' +2025-11-27 11:02:27,083 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 11:02:27,083 - INFO - ---------------------------------------------------------------------- +2025-11-27 11:02:27,084 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 11:02:27,084 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 11:02:27,084 - INFO - Generated: ' be killed by Oga. They are later killed by the Red Tails, who are later killed by the Six Knights.\nMiki\nVoiced by: Yūki Kaji\nMiki is the second leader of the Teimou Academy. She is a young girl with ...' +2025-11-27 11:02:27,084 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 11:02:27,084 - INFO - ---------------------------------------------------------------------- +2025-11-27 11:02:27,084 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 11:02:27,084 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 11:02:27,084 - INFO - Generated: ' 1.0 | U+0B01..0B0F, 0B10..0B20, 0B30..0B40, 0B50..0B59, 0B60..0B69, 0B70..0B79, 0B80..0B89, 0B90..0B9F, 0BA0..0BA9, 0BB0..0BB9, 0BC0..0BCD, 0BD0..0BD9, 0BE0..0BE9, 0BF0..0BF9, 0C00..0C0FF, 0C10..0C1F...' +2025-11-27 11:02:27,084 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 11:02:27,084 - INFO - ---------------------------------------------------------------------- +2025-11-27 11:02:27,084 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 11:02:27,085 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 11:02:27,085 - INFO - Generated: '1 | iOS | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 11:02:27,085 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 11:02:27,085 - INFO - ---------------------------------------------------------------------- +2025-11-27 11:02:27,086 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_4500.jsonl +2025-11-27 11:02:56,715 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-27 11:02:56,722 - INFO - New best validation loss: 1.6849, perplexity: 5.39 +2025-11-27 11:04:22,047 - INFO - Epoch 1 Step 4510 (Global: 4510): loss=1.5375, ppl=4.65, grad_norm=0.80, lr=6.99e-05, throughput=5626 tok/s +2025-11-27 11:05:47,420 - INFO - Epoch 1 Step 4520 (Global: 4520): loss=1.6717, ppl=5.32, grad_norm=0.77, lr=6.97e-05, throughput=5622 tok/s +2025-11-27 11:07:12,946 - INFO - Epoch 1 Step 4530 (Global: 4530): loss=1.6209, ppl=5.06, grad_norm=0.86, lr=6.96e-05, throughput=5612 tok/s +2025-11-27 11:08:38,431 - INFO - Epoch 1 Step 4540 (Global: 4540): loss=1.6528, ppl=5.22, grad_norm=0.81, lr=6.94e-05, throughput=5615 tok/s +2025-11-27 11:10:03,902 - INFO - Epoch 1 Step 4550 (Global: 4550): loss=1.5125, ppl=4.54, grad_norm=0.72, lr=6.92e-05, throughput=5616 tok/s +2025-11-27 11:11:29,280 - INFO - Epoch 1 Step 4560 (Global: 4560): loss=1.5844, ppl=4.88, grad_norm=0.78, lr=6.91e-05, throughput=5622 tok/s +2025-11-27 11:12:54,505 - INFO - Epoch 1 Step 4570 (Global: 4570): loss=1.3709, ppl=3.94, grad_norm=0.77, lr=6.89e-05, throughput=5632 tok/s +2025-11-27 11:14:19,716 - INFO - Epoch 1 Step 4580 (Global: 4580): loss=1.5766, ppl=4.84, grad_norm=0.78, lr=6.88e-05, throughput=5633 tok/s +2025-11-27 11:15:45,762 - INFO - Epoch 1 Step 4590 (Global: 4590): loss=1.6419, ppl=5.17, grad_norm=0.79, lr=6.86e-05, throughput=5578 tok/s +2025-11-27 11:17:11,043 - INFO - Epoch 1 Step 4600 (Global: 4600): loss=1.7262, ppl=5.62, grad_norm=0.80, lr=6.85e-05, throughput=5629 tok/s +2025-11-27 11:18:36,617 - INFO - Epoch 1 Step 4610 (Global: 4610): loss=1.6232, ppl=5.07, grad_norm=0.83, lr=6.83e-05, throughput=5609 tok/s +2025-11-27 11:20:02,110 - INFO - Epoch 1 Step 4620 (Global: 4620): loss=1.4778, ppl=4.38, grad_norm=0.79, lr=6.82e-05, throughput=5615 tok/s +2025-11-27 11:21:27,398 - INFO - Epoch 1 Step 4630 (Global: 4630): loss=1.8449, ppl=6.33, grad_norm=1.28, lr=6.80e-05, throughput=5628 tok/s +2025-11-27 11:22:52,539 - INFO - Epoch 1 Step 4640 (Global: 4640): loss=1.5561, ppl=4.74, grad_norm=0.80, lr=6.78e-05, throughput=5638 tok/s +2025-11-27 11:24:17,212 - INFO - Epoch 1 Step 4650 (Global: 4650): loss=2.0052, ppl=7.43, grad_norm=0.84, lr=6.77e-05, throughput=5669 tok/s +2025-11-27 11:25:43,388 - INFO - Epoch 1 Step 4660 (Global: 4660): loss=1.5378, ppl=4.65, grad_norm=0.79, lr=6.75e-05, throughput=5570 tok/s +2025-11-27 11:27:08,603 - INFO - Epoch 1 Step 4670 (Global: 4670): loss=1.7270, ppl=5.62, grad_norm=0.84, lr=6.74e-05, throughput=5633 tok/s +2025-11-27 11:28:33,945 - INFO - Epoch 1 Step 4680 (Global: 4680): loss=1.5689, ppl=4.80, grad_norm=0.80, lr=6.72e-05, throughput=5624 tok/s +2025-11-27 11:29:59,170 - INFO - Epoch 1 Step 4690 (Global: 4690): loss=1.4991, ppl=4.48, grad_norm=0.82, lr=6.71e-05, throughput=5632 tok/s +2025-11-27 11:31:24,838 - INFO - Epoch 1 Step 4700 (Global: 4700): loss=1.6703, ppl=5.31, grad_norm=0.80, lr=6.69e-05, throughput=5603 tok/s +2025-11-27 11:32:50,475 - INFO - Epoch 1 Step 4710 (Global: 4710): loss=1.5117, ppl=4.53, grad_norm=0.77, lr=6.67e-05, throughput=5605 tok/s +2025-11-27 11:34:15,853 - INFO - Epoch 1 Step 4720 (Global: 4720): loss=1.4894, ppl=4.43, grad_norm=0.76, lr=6.66e-05, throughput=5622 tok/s +2025-11-27 11:35:41,754 - INFO - Epoch 1 Step 4730 (Global: 4730): loss=1.8549, ppl=6.39, grad_norm=0.79, lr=6.64e-05, throughput=5588 tok/s +2025-11-27 11:37:06,940 - INFO - Epoch 1 Step 4740 (Global: 4740): loss=1.7866, ppl=5.97, grad_norm=0.83, lr=6.63e-05, throughput=5635 tok/s +2025-11-27 11:38:32,089 - INFO - Epoch 1 Step 4750 (Global: 4750): loss=1.5290, ppl=4.61, grad_norm=0.75, lr=6.61e-05, throughput=5637 tok/s +2025-11-27 11:39:57,927 - INFO - Epoch 1 Step 4760 (Global: 4760): loss=1.6684, ppl=5.30, grad_norm=0.85, lr=6.60e-05, throughput=5592 tok/s +2025-11-27 11:41:23,846 - INFO - Epoch 1 Step 4770 (Global: 4770): loss=1.7047, ppl=5.50, grad_norm=0.79, lr=6.58e-05, throughput=5587 tok/s +2025-11-27 11:42:49,234 - INFO - Epoch 1 Step 4780 (Global: 4780): loss=1.8394, ppl=6.29, grad_norm=0.78, lr=6.56e-05, throughput=5621 tok/s +2025-11-27 11:44:14,478 - INFO - Epoch 1 Step 4790 (Global: 4790): loss=1.5939, ppl=4.92, grad_norm=1.08, lr=6.55e-05, throughput=5631 tok/s +2025-11-27 11:45:39,987 - INFO - Epoch 1 Step 4800 (Global: 4800): loss=1.6436, ppl=5.17, grad_norm=0.81, lr=6.53e-05, throughput=5614 tok/s +2025-11-27 11:47:05,166 - INFO - Epoch 1 Step 4810 (Global: 4810): loss=1.6661, ppl=5.29, grad_norm=0.80, lr=6.52e-05, throughput=5635 tok/s +2025-11-27 11:48:30,395 - INFO - Epoch 1 Step 4820 (Global: 4820): loss=1.3430, ppl=3.83, grad_norm=0.74, lr=6.50e-05, throughput=5632 tok/s +2025-11-27 11:49:55,660 - INFO - Epoch 1 Step 4830 (Global: 4830): loss=1.5553, ppl=4.74, grad_norm=0.76, lr=6.48e-05, throughput=5630 tok/s +2025-11-27 11:51:20,612 - INFO - Epoch 1 Step 4840 (Global: 4840): loss=1.7271, ppl=5.62, grad_norm=0.86, lr=6.47e-05, throughput=5650 tok/s +2025-11-27 11:52:45,919 - INFO - Epoch 1 Step 4850 (Global: 4850): loss=1.5672, ppl=4.79, grad_norm=0.76, lr=6.45e-05, throughput=5627 tok/s +2025-11-27 11:54:11,546 - INFO - Epoch 1 Step 4860 (Global: 4860): loss=1.5556, ppl=4.74, grad_norm=0.76, lr=6.44e-05, throughput=5606 tok/s +2025-11-27 11:55:36,599 - INFO - Epoch 1 Step 4870 (Global: 4870): loss=1.5565, ppl=4.74, grad_norm=0.82, lr=6.42e-05, throughput=5644 tok/s +2025-11-27 11:57:01,495 - INFO - Epoch 1 Step 4880 (Global: 4880): loss=1.4492, ppl=4.26, grad_norm=0.78, lr=6.40e-05, throughput=5654 tok/s +2025-11-27 11:58:26,459 - INFO - Epoch 1 Step 4890 (Global: 4890): loss=1.5993, ppl=4.95, grad_norm=0.76, lr=6.39e-05, throughput=5649 tok/s +2025-11-27 11:59:51,518 - INFO - Epoch 1 Step 4900 (Global: 4900): loss=1.6937, ppl=5.44, grad_norm=0.80, lr=6.37e-05, throughput=5643 tok/s +2025-11-27 12:01:17,125 - INFO - Epoch 1 Step 4910 (Global: 4910): loss=1.5328, ppl=4.63, grad_norm=0.81, lr=6.35e-05, throughput=5607 tok/s +2025-11-27 12:02:42,213 - INFO - Epoch 1 Step 4920 (Global: 4920): loss=1.5344, ppl=4.64, grad_norm=0.74, lr=6.34e-05, throughput=5641 tok/s +2025-11-27 12:04:07,532 - INFO - Epoch 1 Step 4930 (Global: 4930): loss=1.7482, ppl=5.74, grad_norm=0.80, lr=6.32e-05, throughput=5626 tok/s +2025-11-27 12:05:32,842 - INFO - Epoch 1 Step 4940 (Global: 4940): loss=1.7000, ppl=5.47, grad_norm=0.79, lr=6.31e-05, throughput=5627 tok/s +2025-11-27 12:06:58,124 - INFO - Epoch 1 Step 4950 (Global: 4950): loss=1.6824, ppl=5.38, grad_norm=1.41, lr=6.29e-05, throughput=5628 tok/s +2025-11-27 12:08:23,700 - INFO - Epoch 1 Step 4960 (Global: 4960): loss=1.6489, ppl=5.20, grad_norm=0.77, lr=6.27e-05, throughput=5609 tok/s +2025-11-27 12:09:49,543 - INFO - Epoch 1 Step 4970 (Global: 4970): loss=1.6733, ppl=5.33, grad_norm=0.80, lr=6.26e-05, throughput=5592 tok/s +2025-11-27 12:11:15,398 - INFO - Epoch 1 Step 4980 (Global: 4980): loss=1.4012, ppl=4.06, grad_norm=0.78, lr=6.24e-05, throughput=5591 tok/s +2025-11-27 12:12:40,872 - INFO - Epoch 1 Step 4990 (Global: 4990): loss=2.0313, ppl=7.62, grad_norm=0.82, lr=6.23e-05, throughput=5616 tok/s +2025-11-27 12:14:06,192 - INFO - Epoch 1 Step 5000 (Global: 5000): loss=1.7543, ppl=5.78, grad_norm=0.83, lr=6.21e-05, throughput=5626 tok/s +2025-11-27 12:14:06,193 - INFO - +Running validation at step 5000... +2025-11-27 12:18:43,131 - INFO - Validation loss: 1.6682, perplexity: 5.30 +2025-11-27 12:18:43,132 - INFO - Qualitative metrics (n=5): +2025-11-27 12:18:43,132 - INFO - BLEU: 0.1441 +2025-11-27 12:18:43,132 - INFO - METEOR: 0.2015 +2025-11-27 12:18:43,132 - INFO - Edit Distance: 0.6401 +2025-11-27 12:18:43,132 - INFO - F-measure: 0.2506 +2025-11-27 12:18:43,132 - INFO - +====================================================================== +2025-11-27 12:18:43,132 - INFO - Qualitative Evaluation Samples: +2025-11-27 12:18:43,132 - INFO - ====================================================================== +2025-11-27 12:18:43,132 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 12:18:43,133 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 12:18:43,133 - INFO - Generated: ' to the band\'s previous work, saying that "the band\'s new album is a more mature, more confident, and more confident album than their previous work, and it\'s a more mature, more confident album than t...' +2025-11-27 12:18:43,133 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 12:18:43,133 - INFO - ---------------------------------------------------------------------- +2025-11-27 12:18:43,133 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 12:18:43,133 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 12:18:43,133 - INFO - Generated: 'aternal organizations in the United States. The Order was not a fraternal organization, and its members were not required to be members of a fraternal organization. The Order was not a member of the N...' +2025-11-27 12:18:43,133 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 12:18:43,133 - INFO - ---------------------------------------------------------------------- +2025-11-27 12:18:43,133 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 12:18:43,133 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 12:18:43,133 - INFO - Generated: ' be killed by Oga. Teimou is killed by the Red Tails, who are led by the mysterious "Black Dragon" (see below) who is revealed to be Oga\'s father. The Black Dragon is revealed to be the same as the "B...' +2025-11-27 12:18:43,134 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 12:18:43,134 - INFO - ---------------------------------------------------------------------- +2025-11-27 12:18:43,134 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 12:18:43,134 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 12:18:43,134 - INFO - Generated: '-31 | 0x00B0..0x00B8 | U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B8, U+0B0..0x00B...' +2025-11-27 12:18:43,134 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 12:18:43,134 - INFO - ---------------------------------------------------------------------- +2025-11-27 12:18:43,135 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 12:18:43,135 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 12:18:43,135 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 12:18:43,135 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 12:18:43,135 - INFO - ---------------------------------------------------------------------- +2025-11-27 12:18:43,136 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_5000.jsonl +2025-11-27 12:19:11,566 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-27 12:19:11,572 - INFO - New best validation loss: 1.6682, perplexity: 5.30 +2025-11-27 12:20:37,598 - INFO - Epoch 1 Step 5010 (Global: 5010): loss=1.7282, ppl=5.63, grad_norm=0.89, lr=6.19e-05, throughput=5580 tok/s +2025-11-27 12:22:03,872 - INFO - Epoch 1 Step 5020 (Global: 5020): loss=1.5059, ppl=4.51, grad_norm=0.75, lr=6.18e-05, throughput=5564 tok/s +2025-11-27 12:23:29,626 - INFO - Epoch 1 Step 5030 (Global: 5030): loss=1.5667, ppl=4.79, grad_norm=0.79, lr=6.16e-05, throughput=5598 tok/s +2025-11-27 12:24:54,747 - INFO - Epoch 1 Step 5040 (Global: 5040): loss=1.5323, ppl=4.63, grad_norm=0.73, lr=6.14e-05, throughput=5639 tok/s +2025-11-27 12:26:20,337 - INFO - Epoch 1 Step 5050 (Global: 5050): loss=1.7903, ppl=5.99, grad_norm=0.75, lr=6.13e-05, throughput=5608 tok/s +2025-11-27 12:27:45,928 - INFO - Epoch 1 Step 5060 (Global: 5060): loss=1.4847, ppl=4.41, grad_norm=0.73, lr=6.11e-05, throughput=5608 tok/s +2025-11-27 12:29:11,050 - INFO - Epoch 1 Step 5070 (Global: 5070): loss=1.6293, ppl=5.10, grad_norm=0.77, lr=6.10e-05, throughput=5639 tok/s +2025-11-27 12:30:36,384 - INFO - Epoch 1 Step 5080 (Global: 5080): loss=1.6346, ppl=5.13, grad_norm=0.80, lr=6.08e-05, throughput=5625 tok/s +2025-11-27 12:32:01,628 - INFO - Epoch 1 Step 5090 (Global: 5090): loss=1.7412, ppl=5.70, grad_norm=0.84, lr=6.06e-05, throughput=5631 tok/s +2025-11-27 12:33:26,978 - INFO - Epoch 1 Step 5100 (Global: 5100): loss=1.7834, ppl=5.95, grad_norm=0.79, lr=6.05e-05, throughput=5624 tok/s +2025-11-27 12:34:52,201 - INFO - Epoch 1 Step 5110 (Global: 5110): loss=1.7634, ppl=5.83, grad_norm=0.81, lr=6.03e-05, throughput=5632 tok/s +2025-11-27 12:36:17,344 - INFO - Epoch 1 Step 5120 (Global: 5120): loss=1.7700, ppl=5.87, grad_norm=0.84, lr=6.01e-05, throughput=5638 tok/s +2025-11-27 12:37:42,418 - INFO - Epoch 1 Step 5130 (Global: 5130): loss=1.8622, ppl=6.44, grad_norm=0.95, lr=6.00e-05, throughput=5642 tok/s +2025-11-27 12:39:07,694 - INFO - Epoch 1 Step 5140 (Global: 5140): loss=1.7537, ppl=5.78, grad_norm=0.84, lr=5.98e-05, throughput=5629 tok/s +2025-11-27 12:40:33,011 - INFO - Epoch 1 Step 5150 (Global: 5150): loss=1.5684, ppl=4.80, grad_norm=0.78, lr=5.96e-05, throughput=5626 tok/s +2025-11-27 12:41:58,061 - INFO - Epoch 1 Step 5160 (Global: 5160): loss=1.5824, ppl=4.87, grad_norm=0.75, lr=5.95e-05, throughput=5644 tok/s +2025-11-27 12:43:23,577 - INFO - Epoch 1 Step 5170 (Global: 5170): loss=1.6585, ppl=5.25, grad_norm=0.81, lr=5.93e-05, throughput=5613 tok/s +2025-11-27 12:44:49,108 - INFO - Epoch 1 Step 5180 (Global: 5180): loss=1.7695, ppl=5.87, grad_norm=0.83, lr=5.91e-05, throughput=5612 tok/s +2025-11-27 12:46:14,632 - INFO - Epoch 1 Step 5190 (Global: 5190): loss=1.3669, ppl=3.92, grad_norm=0.74, lr=5.90e-05, throughput=5613 tok/s +2025-11-27 12:47:39,643 - INFO - Epoch 1 Step 5200 (Global: 5200): loss=1.7681, ppl=5.86, grad_norm=0.82, lr=5.88e-05, throughput=5646 tok/s +2025-11-27 12:49:04,720 - INFO - Epoch 1 Step 5210 (Global: 5210): loss=1.5759, ppl=4.84, grad_norm=0.76, lr=5.87e-05, throughput=5642 tok/s +2025-11-27 12:50:29,871 - INFO - Epoch 1 Step 5220 (Global: 5220): loss=1.4823, ppl=4.40, grad_norm=0.74, lr=5.85e-05, throughput=5637 tok/s +2025-11-27 12:51:55,165 - INFO - Epoch 1 Step 5230 (Global: 5230): loss=1.6221, ppl=5.06, grad_norm=0.78, lr=5.83e-05, throughput=5628 tok/s +2025-11-27 12:53:20,488 - INFO - Epoch 1 Step 5240 (Global: 5240): loss=1.6249, ppl=5.08, grad_norm=0.75, lr=5.82e-05, throughput=5626 tok/s +2025-11-27 12:54:45,844 - INFO - Epoch 1 Step 5250 (Global: 5250): loss=1.3902, ppl=4.02, grad_norm=0.71, lr=5.80e-05, throughput=5624 tok/s +2025-11-27 12:56:11,148 - INFO - Epoch 1 Step 5260 (Global: 5260): loss=1.5380, ppl=4.66, grad_norm=0.75, lr=5.78e-05, throughput=5627 tok/s +2025-11-27 12:57:36,325 - INFO - Epoch 1 Step 5270 (Global: 5270): loss=1.4937, ppl=4.45, grad_norm=0.78, lr=5.77e-05, throughput=5635 tok/s +2025-11-27 12:59:01,515 - INFO - Epoch 1 Step 5280 (Global: 5280): loss=1.7029, ppl=5.49, grad_norm=0.79, lr=5.75e-05, throughput=5635 tok/s +2025-11-27 13:00:26,345 - INFO - Epoch 1 Step 5290 (Global: 5290): loss=1.7279, ppl=5.63, grad_norm=0.82, lr=5.73e-05, throughput=5658 tok/s +2025-11-27 13:01:51,937 - INFO - Epoch 1 Step 5300 (Global: 5300): loss=1.5873, ppl=4.89, grad_norm=0.76, lr=5.72e-05, throughput=5608 tok/s +2025-11-27 13:03:17,231 - INFO - Epoch 1 Step 5310 (Global: 5310): loss=1.7268, ppl=5.62, grad_norm=0.80, lr=5.70e-05, throughput=5628 tok/s +2025-11-27 13:04:42,519 - INFO - Epoch 1 Step 5320 (Global: 5320): loss=1.9441, ppl=6.99, grad_norm=0.80, lr=5.68e-05, throughput=5628 tok/s +2025-11-27 13:06:07,648 - INFO - Epoch 1 Step 5330 (Global: 5330): loss=1.5600, ppl=4.76, grad_norm=0.76, lr=5.67e-05, throughput=5639 tok/s +2025-11-27 13:07:33,002 - INFO - Epoch 1 Step 5340 (Global: 5340): loss=1.4850, ppl=4.42, grad_norm=0.75, lr=5.65e-05, throughput=5624 tok/s +2025-11-27 13:08:58,437 - INFO - Epoch 1 Step 5350 (Global: 5350): loss=1.5443, ppl=4.68, grad_norm=0.78, lr=5.63e-05, throughput=5618 tok/s +2025-11-27 13:10:23,726 - INFO - Epoch 1 Step 5360 (Global: 5360): loss=1.5161, ppl=4.55, grad_norm=0.77, lr=5.62e-05, throughput=5628 tok/s +2025-11-27 13:11:48,948 - INFO - Epoch 1 Step 5370 (Global: 5370): loss=1.6699, ppl=5.31, grad_norm=0.78, lr=5.60e-05, throughput=5632 tok/s +2025-11-27 13:13:14,293 - INFO - Epoch 1 Step 5380 (Global: 5380): loss=1.8139, ppl=6.13, grad_norm=0.80, lr=5.58e-05, throughput=5624 tok/s +2025-11-27 13:14:39,729 - INFO - Epoch 1 Step 5390 (Global: 5390): loss=1.4587, ppl=4.30, grad_norm=0.75, lr=5.57e-05, throughput=5618 tok/s +2025-11-27 13:16:04,992 - INFO - Epoch 1 Step 5400 (Global: 5400): loss=1.4960, ppl=4.46, grad_norm=0.80, lr=5.55e-05, throughput=5630 tok/s +2025-11-27 13:17:30,718 - INFO - Epoch 1 Step 5410 (Global: 5410): loss=1.5037, ppl=4.50, grad_norm=0.76, lr=5.53e-05, throughput=5599 tok/s +2025-11-27 13:18:56,143 - INFO - Epoch 1 Step 5420 (Global: 5420): loss=1.6249, ppl=5.08, grad_norm=0.79, lr=5.52e-05, throughput=5619 tok/s +2025-11-27 13:20:21,579 - INFO - Epoch 1 Step 5430 (Global: 5430): loss=1.6627, ppl=5.27, grad_norm=0.78, lr=5.50e-05, throughput=5618 tok/s +2025-11-27 13:21:47,386 - INFO - Epoch 1 Step 5440 (Global: 5440): loss=1.7231, ppl=5.60, grad_norm=0.77, lr=5.48e-05, throughput=5594 tok/s +2025-11-27 13:23:12,871 - INFO - Epoch 1 Step 5450 (Global: 5450): loss=1.6913, ppl=5.43, grad_norm=0.84, lr=5.47e-05, throughput=5615 tok/s +2025-11-27 13:24:38,461 - INFO - Epoch 1 Step 5460 (Global: 5460): loss=1.8435, ppl=6.32, grad_norm=0.88, lr=5.45e-05, throughput=5608 tok/s +2025-11-27 13:26:03,976 - INFO - Epoch 1 Step 5470 (Global: 5470): loss=1.6753, ppl=5.34, grad_norm=0.78, lr=5.43e-05, throughput=5613 tok/s +2025-11-27 13:27:29,203 - INFO - Epoch 1 Step 5480 (Global: 5480): loss=1.6687, ppl=5.31, grad_norm=0.80, lr=5.42e-05, throughput=5632 tok/s +2025-11-27 13:28:54,091 - INFO - Epoch 1 Step 5490 (Global: 5490): loss=1.6027, ppl=4.97, grad_norm=0.78, lr=5.40e-05, throughput=5655 tok/s +2025-11-27 13:30:19,432 - INFO - Epoch 1 Step 5500 (Global: 5500): loss=1.5602, ppl=4.76, grad_norm=0.73, lr=5.38e-05, throughput=5625 tok/s +2025-11-27 13:30:19,432 - INFO - +Running validation at step 5500... +2025-11-27 13:34:53,799 - INFO - Validation loss: 1.6515, perplexity: 5.21 +2025-11-27 13:34:53,799 - INFO - Qualitative metrics (n=5): +2025-11-27 13:34:53,799 - INFO - BLEU: 0.1526 +2025-11-27 13:34:53,799 - INFO - METEOR: 0.2375 +2025-11-27 13:34:53,799 - INFO - Edit Distance: 0.5524 +2025-11-27 13:34:53,799 - INFO - F-measure: 0.2554 +2025-11-27 13:34:53,800 - INFO - +====================================================================== +2025-11-27 13:34:53,800 - INFO - Qualitative Evaluation Samples: +2025-11-27 13:34:53,800 - INFO - ====================================================================== +2025-11-27 13:34:53,800 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-27 13:34:53,800 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 13:34:53,800 - INFO - Generated: ' to the work of the Beatles, the Rolling Stones, and the Doors, and said that the band "has a knack for making a song sound like it\'s about to end, and then it does." He also said that the album "isn\'...' +2025-11-27 13:34:53,800 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-27 13:34:53,800 - INFO - ---------------------------------------------------------------------- +2025-11-27 13:34:53,800 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-27 13:34:53,800 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 13:34:53,800 - INFO - Generated: 'aternal organizations in the United States. The Order of the Arrow, a Native American fraternal organization, was founded in 1920 and is the oldest Native American fraternal organization in the United...' +2025-11-27 13:34:53,800 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-27 13:34:53,800 - INFO - ---------------------------------------------------------------------- +2025-11-27 13:34:53,801 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-27 13:34:53,801 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 13:34:53,801 - INFO - Generated: " be killed by Oga. The other three are defeated by Oga and Miki, and are then killed by the Red Tails.\nMiki\nVoiced by: Yūko Hikasa\nA young woman who is the daughter of the Red Tails' leader, Miki. She..." +2025-11-27 13:34:53,801 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-27 13:34:53,802 - INFO - ---------------------------------------------------------------------- +2025-11-27 13:34:53,802 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-27 13:34:53,802 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 13:34:53,802 - INFO - Generated: '-31 | L2/00-035 | WG2/00-035 | ISO/IEC 10646-1:1999 ...' +2025-11-27 13:34:53,802 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-27 13:34:53,802 - INFO - ---------------------------------------------------------------------- +2025-11-27 13:34:53,802 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-27 13:34:53,802 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-27 13:34:53,802 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 13:34:53,802 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-27 13:34:53,802 - INFO - ---------------------------------------------------------------------- +2025-11-27 13:34:53,803 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_5500.jsonl +2025-11-27 13:35:22,653 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-27 13:35:22,658 - INFO - New best validation loss: 1.6515, perplexity: 5.21 +2025-11-27 13:36:47,868 - INFO - Epoch 1 Step 5510 (Global: 5510): loss=1.6821, ppl=5.38, grad_norm=0.84, lr=5.37e-05, throughput=5634 tok/s +2025-11-27 13:38:12,986 - INFO - Epoch 1 Step 5520 (Global: 5520): loss=1.8210, ppl=6.18, grad_norm=0.81, lr=5.35e-05, throughput=5639 tok/s +2025-11-28 01:02:28,152 - INFO - Starting training with args: Namespace(regime='conv1d_residual', data_path='data/training/splits_510k/train_arrow', output_dir='outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434', objective='lm', val_data_path='data/training/splits_510k/val_arrow', max_samples=None, vision_mode='small', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt=None, train_encoder=False, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=250, conv_kernel=5, timestamp='20251126_233434', batch_size=12, gradient_accumulation_steps=4, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=500, initial_validation=False, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name='production_conv1d_residual_t250_k5_lm_20251126_233441', resume_from_checkpoint='outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt', resume='outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=8, prefetch_factor=32, seed=None, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=False, compile_mode='default', use_optimized_model=True, use_encoder_checkpointing=True, use_decoder_checkpointing=True, use_8bit_optimizer=True) +2025-11-28 01:02:28,152 - INFO - Resuming training from checkpoint: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 01:02:28,152 - INFO - Continuing outputs in directory: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434 +2025-11-28 01:02:28,153 - INFO - Peeking checkpoint metadata from outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 01:02:32,482 - INFO - Checkpoint metadata: epoch=0, batch_idx=21999, global_step=5500 +2025-11-28 01:02:32,482 - INFO - W&B run ID: o01q8g0m +2025-11-28 01:02:32,527 - INFO - Checkpoint has WandB run ID: o01q8g0m +2025-11-28 01:02:32,528 - INFO - Creating fresh WandB run (not resuming to avoid stale data) +2025-11-28 01:02:33,858 - INFO - Initialized W&B run: vision-compression-2/production_conv1d_residual_t250_k5_lm_20251126_233441 (ID: tj169vxh) +2025-11-28 01:02:33,859 - INFO - Loading model and tokenizer... +2025-11-28 01:02:45,646 - INFO - Enabling decoder gradient checkpointing... +2025-11-28 01:02:45,653 - INFO - ✓ Decoder checkpointing enabled for 12 transformer layers +2025-11-28 01:02:45,653 - INFO - Expected: ~30-50% activation memory reduction, ~15-20% compute overhead +2025-11-28 01:02:45,695 - INFO - Created Conv1D Residual Pyramid Compression trainer +2025-11-28 01:02:45,695 - INFO - Architecture: Residual blocks with skip connections +2025-11-28 01:02:45,696 - INFO - Kernel size: 5 +2025-11-28 01:02:45,696 - INFO - Compression: 1000 → 251 tokens (4.00x) +2025-11-28 01:02:45,696 - INFO - Training objective: lm +2025-11-28 01:02:45,722 - INFO - Logged parameter counts to W&B: total=2,960,960,000, trainable=2,960,960,000, encoder=26,225,920, decoder=2,934,734,080 +2025-11-28 01:02:45,722 - INFO - Loading training data from data/training/splits_510k/train_arrow +2025-11-28 01:02:45,722 - INFO - Detected Arrow format: data/training/splits_510k/train_arrow +2025-11-28 01:02:45,722 - INFO - Loading Arrow dataset from data/training/splits_510k/train_arrow (memory-mapped) +2025-11-28 01:02:45,771 - INFO - Loaded 500,000 samples from data/training/splits_510k/train_arrow (memory-mapped) +2025-11-28 01:02:45,771 - INFO - Conv1d_residual regime: using full 1000-token context +2025-11-28 01:02:45,771 - INFO - Mid-epoch resume: skipping first 264000 samples at sampler level (batch 22000) +2025-11-28 01:02:45,867 - INFO - Loading validation data from data/training/splits_510k/val_arrow +2025-11-28 01:02:45,867 - INFO - Detected Arrow format: data/training/splits_510k/val_arrow +2025-11-28 01:02:45,867 - INFO - Loading Arrow dataset from data/training/splits_510k/val_arrow (memory-mapped) +2025-11-28 01:02:45,874 - INFO - Loaded 10,000 samples from data/training/splits_510k/val_arrow (memory-mapped) +2025-11-28 01:02:45,875 - INFO - Validation conv1d_residual regime: using full 1000-token context +2025-11-28 01:02:47,903 - INFO - Created 8-bit AdamW optimizer (bitsandbytes): + Learning rate: 0.0001 + Memory savings: ~75% optimizer state (16.8GB for 2.8B params) + Expected overhead: ~2-5% +2025-11-28 01:02:47,903 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 +2025-11-28 01:02:47,910 - INFO - Logged optimizer config to W&B: type=adamw_8bit, memory=5.52GB +2025-11-28 01:02:47,910 - INFO - Loading checkpoint state (model/optimizer/scheduler) from outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 01:02:54,802 - INFO - ✓ Successfully loaded optimizer state from checkpoint +2025-11-28 01:02:54,803 - INFO - ✓ Successfully loaded scheduler state from checkpoint +2025-11-28 01:02:54,804 - WARNING - Failed to restore RNG states: RNG state must be a torch.ByteTensor. Continuing with current RNG state. +2025-11-28 01:02:54,831 - INFO - Restored training state: epoch=0, batch_idx=21999, global_step=5500, best_val_loss=1.6515 +2025-11-28 01:02:54,832 - INFO - Resuming mid-epoch: will skip first 22000 batches of epoch 0 +2025-11-28 01:02:54,833 - INFO - Starting training loop... +2025-11-28 01:02:54,833 - INFO - +====================================================================== +2025-11-28 01:02:54,833 - INFO - Epoch 1/1 +2025-11-28 01:02:54,833 - INFO - ====================================================================== +2025-11-28 01:02:57,257 - WARNING - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`transformers. +2025-11-28 01:02:58,540 - INFO - Effective context tokens (per-sample): 252 | Compression ratio: 3.97x +2025-11-28 01:02:58,540 - INFO - Target tokens per sample: 1000 +2025-11-28 01:04:23,111 - INFO - Epoch 1 Step 10 (Global: 5510): loss=1.6819, ppl=5.38, grad_norm=0.84, lr=5.37e-05, throughput=5438 tok/s +2025-11-28 01:05:48,360 - INFO - Epoch 1 Step 20 (Global: 5520): loss=1.8208, ppl=6.18, grad_norm=0.81, lr=5.35e-05, throughput=5631 tok/s +2025-11-28 01:07:13,382 - INFO - Epoch 1 Step 30 (Global: 5530): loss=1.6166, ppl=5.04, grad_norm=0.81, lr=5.33e-05, throughput=5646 tok/s +2025-11-28 01:08:38,413 - INFO - Epoch 1 Step 40 (Global: 5540): loss=1.6930, ppl=5.44, grad_norm=0.77, lr=5.32e-05, throughput=5645 tok/s +2025-11-28 01:10:03,650 - INFO - Epoch 1 Step 50 (Global: 5550): loss=1.6239, ppl=5.07, grad_norm=0.81, lr=5.30e-05, throughput=5631 tok/s +2025-11-28 01:11:28,689 - INFO - Epoch 1 Step 60 (Global: 5560): loss=1.4698, ppl=4.35, grad_norm=0.82, lr=5.28e-05, throughput=5645 tok/s +2025-11-28 01:12:54,830 - INFO - Epoch 1 Step 70 (Global: 5570): loss=1.6767, ppl=5.35, grad_norm=0.80, lr=5.27e-05, throughput=5572 tok/s +2025-11-28 01:14:19,891 - INFO - Epoch 1 Step 80 (Global: 5580): loss=1.7531, ppl=5.77, grad_norm=0.82, lr=5.25e-05, throughput=5643 tok/s +2025-11-28 01:15:44,726 - INFO - Epoch 1 Step 90 (Global: 5590): loss=1.8729, ppl=6.51, grad_norm=0.79, lr=5.23e-05, throughput=5658 tok/s +2025-11-28 01:17:09,703 - INFO - Epoch 1 Step 100 (Global: 5600): loss=1.7258, ppl=5.62, grad_norm=0.82, lr=5.22e-05, throughput=5649 tok/s +2025-11-28 01:18:34,834 - INFO - Epoch 1 Step 110 (Global: 5610): loss=1.5598, ppl=4.76, grad_norm=0.76, lr=5.20e-05, throughput=5638 tok/s +2025-11-28 01:20:00,101 - INFO - Epoch 1 Step 120 (Global: 5620): loss=1.7469, ppl=5.74, grad_norm=0.80, lr=5.18e-05, throughput=5629 tok/s +2025-11-28 01:21:24,952 - INFO - Epoch 1 Step 130 (Global: 5630): loss=1.7851, ppl=5.96, grad_norm=0.85, lr=5.17e-05, throughput=5657 tok/s +2025-11-28 01:22:50,066 - INFO - Epoch 1 Step 140 (Global: 5640): loss=1.6530, ppl=5.22, grad_norm=0.80, lr=5.15e-05, throughput=5640 tok/s +2025-11-28 01:24:15,175 - INFO - Epoch 1 Step 150 (Global: 5650): loss=1.8055, ppl=6.08, grad_norm=0.79, lr=5.13e-05, throughput=5640 tok/s +2025-11-28 01:25:40,967 - INFO - Epoch 1 Step 160 (Global: 5660): loss=1.6525, ppl=5.22, grad_norm=0.75, lr=5.12e-05, throughput=5595 tok/s +2025-11-28 01:27:06,859 - INFO - Epoch 1 Step 170 (Global: 5670): loss=1.6167, ppl=5.04, grad_norm=0.78, lr=5.10e-05, throughput=5588 tok/s +2025-11-28 01:28:32,253 - INFO - Epoch 1 Step 180 (Global: 5680): loss=1.5240, ppl=4.59, grad_norm=0.75, lr=5.08e-05, throughput=5621 tok/s +2025-11-28 01:29:57,166 - INFO - Epoch 1 Step 190 (Global: 5690): loss=1.7962, ppl=6.03, grad_norm=0.92, lr=5.07e-05, throughput=5653 tok/s +2025-11-28 01:31:22,406 - INFO - Epoch 1 Step 200 (Global: 5700): loss=1.7226, ppl=5.60, grad_norm=0.78, lr=5.05e-05, throughput=5631 tok/s +2025-11-28 01:32:47,272 - INFO - Epoch 1 Step 210 (Global: 5710): loss=1.7015, ppl=5.48, grad_norm=0.84, lr=5.03e-05, throughput=5656 tok/s +2025-11-28 01:34:11,862 - INFO - Epoch 1 Step 220 (Global: 5720): loss=1.6128, ppl=5.02, grad_norm=0.76, lr=5.02e-05, throughput=5674 tok/s +2025-11-28 01:35:36,683 - INFO - Epoch 1 Step 230 (Global: 5730): loss=1.5893, ppl=4.90, grad_norm=0.79, lr=5.00e-05, throughput=5659 tok/s +2025-11-28 01:37:01,497 - INFO - Epoch 1 Step 240 (Global: 5740): loss=1.6976, ppl=5.46, grad_norm=0.79, lr=4.98e-05, throughput=5659 tok/s +2025-11-28 01:38:26,653 - INFO - Epoch 1 Step 250 (Global: 5750): loss=1.8091, ppl=6.10, grad_norm=0.81, lr=4.96e-05, throughput=5637 tok/s +2025-11-28 01:39:51,924 - INFO - Epoch 1 Step 260 (Global: 5760): loss=1.4536, ppl=4.28, grad_norm=0.82, lr=4.95e-05, throughput=5629 tok/s +2025-11-28 01:41:17,000 - INFO - Epoch 1 Step 270 (Global: 5770): loss=1.5417, ppl=4.67, grad_norm=0.77, lr=4.93e-05, throughput=5642 tok/s +2025-11-28 01:42:41,944 - INFO - Epoch 1 Step 280 (Global: 5780): loss=1.6960, ppl=5.45, grad_norm=0.79, lr=4.91e-05, throughput=5651 tok/s +2025-11-28 01:44:06,910 - INFO - Epoch 1 Step 290 (Global: 5790): loss=1.7846, ppl=5.96, grad_norm=0.80, lr=4.90e-05, throughput=5649 tok/s +2025-11-28 01:45:31,926 - INFO - Epoch 1 Step 300 (Global: 5800): loss=1.6149, ppl=5.03, grad_norm=0.78, lr=4.88e-05, throughput=5646 tok/s +2025-11-28 01:46:57,062 - INFO - Epoch 1 Step 310 (Global: 5810): loss=1.6272, ppl=5.09, grad_norm=0.80, lr=4.86e-05, throughput=5638 tok/s +2025-11-28 01:48:22,402 - INFO - Epoch 1 Step 320 (Global: 5820): loss=1.8867, ppl=6.60, grad_norm=0.80, lr=4.85e-05, throughput=5625 tok/s +2025-11-28 01:49:47,874 - INFO - Epoch 1 Step 330 (Global: 5830): loss=1.7923, ppl=6.00, grad_norm=0.85, lr=4.83e-05, throughput=5616 tok/s +2025-11-28 01:51:12,663 - INFO - Epoch 1 Step 340 (Global: 5840): loss=1.5136, ppl=4.54, grad_norm=0.75, lr=4.81e-05, throughput=5661 tok/s +2025-11-28 01:52:37,740 - INFO - Epoch 1 Step 350 (Global: 5850): loss=1.6447, ppl=5.18, grad_norm=0.78, lr=4.80e-05, throughput=5642 tok/s +2025-11-28 01:54:02,808 - INFO - Epoch 1 Step 360 (Global: 5860): loss=1.7315, ppl=5.65, grad_norm=0.81, lr=4.78e-05, throughput=5643 tok/s +2025-11-28 01:55:27,819 - INFO - Epoch 1 Step 370 (Global: 5870): loss=1.4273, ppl=4.17, grad_norm=0.76, lr=4.76e-05, throughput=5646 tok/s +2025-11-28 01:56:52,770 - INFO - Epoch 1 Step 380 (Global: 5880): loss=1.6348, ppl=5.13, grad_norm=0.96, lr=4.75e-05, throughput=5650 tok/s +2025-11-28 01:58:17,812 - INFO - Epoch 1 Step 390 (Global: 5890): loss=1.5046, ppl=4.50, grad_norm=0.78, lr=4.73e-05, throughput=5644 tok/s +2025-11-28 01:59:43,013 - INFO - Epoch 1 Step 400 (Global: 5900): loss=1.4846, ppl=4.41, grad_norm=0.74, lr=4.71e-05, throughput=5634 tok/s +2025-11-28 02:01:07,820 - INFO - Epoch 1 Step 410 (Global: 5910): loss=1.4305, ppl=4.18, grad_norm=0.73, lr=4.70e-05, throughput=5660 tok/s +2025-11-28 02:02:32,900 - INFO - Epoch 1 Step 420 (Global: 5920): loss=1.6934, ppl=5.44, grad_norm=0.79, lr=4.68e-05, throughput=5642 tok/s +2025-11-28 02:03:57,887 - INFO - Epoch 1 Step 430 (Global: 5930): loss=1.5446, ppl=4.69, grad_norm=0.79, lr=4.66e-05, throughput=5648 tok/s +2025-11-28 02:05:22,946 - INFO - Epoch 1 Step 440 (Global: 5940): loss=1.7531, ppl=5.77, grad_norm=0.82, lr=4.65e-05, throughput=5643 tok/s +2025-11-28 02:06:48,040 - INFO - Epoch 1 Step 450 (Global: 5950): loss=1.7291, ppl=5.64, grad_norm=0.79, lr=4.63e-05, throughput=5641 tok/s +2025-11-28 02:08:13,082 - INFO - Epoch 1 Step 460 (Global: 5960): loss=1.6015, ppl=4.96, grad_norm=0.79, lr=4.61e-05, throughput=5644 tok/s +2025-11-28 02:09:38,075 - INFO - Epoch 1 Step 470 (Global: 5970): loss=1.4239, ppl=4.15, grad_norm=0.76, lr=4.60e-05, throughput=5648 tok/s +2025-11-28 02:11:03,267 - INFO - Epoch 1 Step 480 (Global: 5980): loss=1.6028, ppl=4.97, grad_norm=0.79, lr=4.58e-05, throughput=5634 tok/s +2025-11-28 02:12:28,633 - INFO - Epoch 1 Step 490 (Global: 5990): loss=1.6783, ppl=5.36, grad_norm=0.79, lr=4.56e-05, throughput=5623 tok/s +2025-11-28 02:13:53,526 - INFO - Epoch 1 Step 500 (Global: 6000): loss=1.6062, ppl=4.98, grad_norm=0.80, lr=4.55e-05, throughput=5654 tok/s +2025-11-28 02:13:53,527 - INFO - +Running validation at step 6000... +2025-11-28 02:18:28,178 - INFO - Validation loss: 1.6374, perplexity: 5.14 +2025-11-28 02:18:28,179 - INFO - Qualitative metrics (n=5): +2025-11-28 02:18:28,179 - INFO - BLEU: 0.1436 +2025-11-28 02:18:28,179 - INFO - METEOR: 0.2027 +2025-11-28 02:18:28,179 - INFO - Edit Distance: 0.6614 +2025-11-28 02:18:28,179 - INFO - F-measure: 0.2374 +2025-11-28 02:18:28,179 - INFO - +====================================================================== +2025-11-28 02:18:28,179 - INFO - Qualitative Evaluation Samples: +2025-11-28 02:18:28,179 - INFO - ====================================================================== +2025-11-28 02:18:28,179 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 02:18:28,179 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 02:18:28,179 - INFO - Generated: ' to the band\'s previous work, saying that "the album is a triumph, a triumph of the band\'s own making, a triumph of their own making, a triumph of their own making." He also said that the album "is a ...' +2025-11-28 02:18:28,179 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 02:18:28,180 - INFO - ---------------------------------------------------------------------- +2025-11-28 02:18:28,180 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 02:18:28,180 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 02:18:28,180 - INFO - Generated: 'aternal organizations in the United States. The Order of the Arrow, a Native American fraternal organization, was founded in 1922. The Order of the Arrow was founded by a group of Native American stud...' +2025-11-28 02:18:28,180 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 02:18:28,180 - INFO - ---------------------------------------------------------------------- +2025-11-28 02:18:28,180 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 02:18:28,180 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 02:18:28,180 - INFO - Generated: ' be killed by Oga. They are later defeated by the Red Tails and the Six Knights.\nMiki\nVoiced by: Yūko Hikasa\nMiki is the second-in-command of the Red Tails. She is a young girl with a large head and a...' +2025-11-28 02:18:28,181 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 02:18:28,181 - INFO - ---------------------------------------------------------------------- +2025-11-28 02:18:28,181 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 02:18:28,181 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 02:18:28,181 - INFO - Generated: '-1 | 0B01..0B0F..0B0A | - ISO/IEC 10646-1:1991: 1.1.1.1 - ISO/IEC 10646-2:1991: 1.1.1.2 - ISO/IEC 10646-3:1991: 1.1.1.3 - ISO/IEC 10646-4:1991: 1.1.1.4 - ISO/IEC 10646-5:1991: 1.1.1.5 - ISO/IEC 10646-...' +2025-11-28 02:18:28,181 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 02:18:28,181 - INFO - ---------------------------------------------------------------------- +2025-11-28 02:18:28,181 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 02:18:28,181 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 02:18:28,181 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 02:18:28,181 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 02:18:28,181 - INFO - ---------------------------------------------------------------------- +2025-11-28 02:18:28,182 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_6000.jsonl +2025-11-28 02:18:56,659 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 02:18:56,663 - INFO - New best validation loss: 1.6374, perplexity: 5.14 +2025-11-28 02:20:21,550 - INFO - Epoch 1 Step 510 (Global: 6010): loss=1.6004, ppl=4.96, grad_norm=0.80, lr=4.53e-05, throughput=5655 tok/s +2025-11-28 02:21:46,161 - INFO - Epoch 1 Step 520 (Global: 6020): loss=1.7686, ppl=5.86, grad_norm=0.79, lr=4.51e-05, throughput=5673 tok/s +2025-11-28 02:23:11,322 - INFO - Epoch 1 Step 530 (Global: 6030): loss=1.6654, ppl=5.29, grad_norm=0.79, lr=4.50e-05, throughput=5636 tok/s +2025-11-28 02:24:36,411 - INFO - Epoch 1 Step 540 (Global: 6040): loss=1.5818, ppl=4.86, grad_norm=0.73, lr=4.48e-05, throughput=5641 tok/s +2025-11-28 02:26:01,174 - INFO - Epoch 1 Step 550 (Global: 6050): loss=1.6534, ppl=5.22, grad_norm=0.75, lr=4.46e-05, throughput=5663 tok/s +2025-11-28 02:27:26,009 - INFO - Epoch 1 Step 560 (Global: 6060): loss=1.5633, ppl=4.77, grad_norm=0.75, lr=4.45e-05, throughput=5658 tok/s +2025-11-28 02:28:50,551 - INFO - Epoch 1 Step 570 (Global: 6070): loss=1.5646, ppl=4.78, grad_norm=0.75, lr=4.43e-05, throughput=5678 tok/s +2025-11-28 02:30:15,219 - INFO - Epoch 1 Step 580 (Global: 6080): loss=1.7416, ppl=5.71, grad_norm=0.79, lr=4.41e-05, throughput=5669 tok/s +2025-11-28 02:31:39,772 - INFO - Epoch 1 Step 590 (Global: 6090): loss=1.5515, ppl=4.72, grad_norm=0.76, lr=4.40e-05, throughput=5677 tok/s +2025-11-28 02:33:04,432 - INFO - Epoch 1 Step 600 (Global: 6100): loss=1.8500, ppl=6.36, grad_norm=0.80, lr=4.38e-05, throughput=5670 tok/s +2025-11-28 02:34:29,063 - INFO - Epoch 1 Step 610 (Global: 6110): loss=1.5535, ppl=4.73, grad_norm=0.81, lr=4.36e-05, throughput=5672 tok/s +2025-11-28 02:35:54,067 - INFO - Epoch 1 Step 620 (Global: 6120): loss=1.5267, ppl=4.60, grad_norm=0.72, lr=4.35e-05, throughput=5647 tok/s +2025-11-28 02:37:18,662 - INFO - Epoch 1 Step 630 (Global: 6130): loss=1.5684, ppl=4.80, grad_norm=0.79, lr=4.33e-05, throughput=5674 tok/s +2025-11-28 02:38:43,034 - INFO - Epoch 1 Step 640 (Global: 6140): loss=1.5505, ppl=4.71, grad_norm=0.76, lr=4.31e-05, throughput=5689 tok/s +2025-11-28 02:40:07,525 - INFO - Epoch 1 Step 650 (Global: 6150): loss=1.7531, ppl=5.77, grad_norm=0.82, lr=4.30e-05, throughput=5681 tok/s +2025-11-28 02:41:32,441 - INFO - Epoch 1 Step 660 (Global: 6160): loss=1.4145, ppl=4.11, grad_norm=0.72, lr=4.28e-05, throughput=5653 tok/s +2025-11-28 02:42:57,071 - INFO - Epoch 1 Step 670 (Global: 6170): loss=1.5032, ppl=4.50, grad_norm=0.75, lr=4.26e-05, throughput=5672 tok/s +2025-11-28 02:44:21,612 - INFO - Epoch 1 Step 680 (Global: 6180): loss=1.9136, ppl=6.78, grad_norm=0.79, lr=4.25e-05, throughput=5678 tok/s +2025-11-28 02:45:46,181 - INFO - Epoch 1 Step 690 (Global: 6190): loss=1.6502, ppl=5.21, grad_norm=0.85, lr=4.23e-05, throughput=5676 tok/s +2025-11-28 02:47:10,787 - INFO - Epoch 1 Step 700 (Global: 6200): loss=1.6498, ppl=5.21, grad_norm=0.78, lr=4.21e-05, throughput=5673 tok/s +2025-11-28 02:48:35,858 - INFO - Epoch 1 Step 710 (Global: 6210): loss=1.8072, ppl=6.09, grad_norm=0.82, lr=4.20e-05, throughput=5642 tok/s +2025-11-28 02:50:00,494 - INFO - Epoch 1 Step 720 (Global: 6220): loss=1.5305, ppl=4.62, grad_norm=0.80, lr=4.18e-05, throughput=5671 tok/s +2025-11-28 02:51:25,218 - INFO - Epoch 1 Step 730 (Global: 6230): loss=1.4936, ppl=4.45, grad_norm=0.86, lr=4.16e-05, throughput=5666 tok/s +2025-11-28 02:52:49,926 - INFO - Epoch 1 Step 740 (Global: 6240): loss=1.6043, ppl=4.97, grad_norm=0.77, lr=4.15e-05, throughput=5667 tok/s +2025-11-28 02:54:14,560 - INFO - Epoch 1 Step 750 (Global: 6250): loss=1.5286, ppl=4.61, grad_norm=0.75, lr=4.13e-05, throughput=5672 tok/s +2025-11-28 02:55:39,393 - INFO - Epoch 1 Step 760 (Global: 6260): loss=1.6638, ppl=5.28, grad_norm=0.83, lr=4.12e-05, throughput=5658 tok/s +2025-11-28 02:57:04,242 - INFO - Epoch 1 Step 770 (Global: 6270): loss=1.5009, ppl=4.49, grad_norm=0.77, lr=4.10e-05, throughput=5657 tok/s +2025-11-28 02:58:28,853 - INFO - Epoch 1 Step 780 (Global: 6280): loss=1.9170, ppl=6.80, grad_norm=0.86, lr=4.08e-05, throughput=5673 tok/s +2025-11-28 02:59:53,558 - INFO - Epoch 1 Step 790 (Global: 6290): loss=1.7453, ppl=5.73, grad_norm=0.73, lr=4.07e-05, throughput=5667 tok/s +2025-11-28 03:01:18,214 - INFO - Epoch 1 Step 800 (Global: 6300): loss=1.4510, ppl=4.27, grad_norm=0.73, lr=4.05e-05, throughput=5670 tok/s +2025-11-28 03:02:42,833 - INFO - Epoch 1 Step 810 (Global: 6310): loss=1.5965, ppl=4.94, grad_norm=0.82, lr=4.03e-05, throughput=5673 tok/s +2025-11-28 03:04:07,594 - INFO - Epoch 1 Step 820 (Global: 6320): loss=1.4818, ppl=4.40, grad_norm=0.80, lr=4.02e-05, throughput=5663 tok/s +2025-11-28 03:05:32,337 - INFO - Epoch 1 Step 830 (Global: 6330): loss=1.9047, ppl=6.72, grad_norm=0.83, lr=4.00e-05, throughput=5664 tok/s +2025-11-28 03:06:57,416 - INFO - Epoch 1 Step 840 (Global: 6340): loss=1.6001, ppl=4.95, grad_norm=0.78, lr=3.98e-05, throughput=5642 tok/s +2025-11-28 03:08:22,064 - INFO - Epoch 1 Step 850 (Global: 6350): loss=1.6997, ppl=5.47, grad_norm=0.75, lr=3.97e-05, throughput=5671 tok/s +2025-11-28 03:09:46,785 - INFO - Epoch 1 Step 860 (Global: 6360): loss=1.4890, ppl=4.43, grad_norm=0.77, lr=3.95e-05, throughput=5666 tok/s +2025-11-28 03:11:11,792 - INFO - Epoch 1 Step 870 (Global: 6370): loss=1.4527, ppl=4.27, grad_norm=0.74, lr=3.93e-05, throughput=5647 tok/s +2025-11-28 03:12:36,970 - INFO - Epoch 1 Step 880 (Global: 6380): loss=1.6475, ppl=5.19, grad_norm=0.77, lr=3.92e-05, throughput=5635 tok/s +2025-11-28 03:14:01,651 - INFO - Epoch 1 Step 890 (Global: 6390): loss=1.7522, ppl=5.77, grad_norm=0.77, lr=3.90e-05, throughput=5668 tok/s +2025-11-28 03:15:26,347 - INFO - Epoch 1 Step 900 (Global: 6400): loss=1.5358, ppl=4.65, grad_norm=0.79, lr=3.89e-05, throughput=5667 tok/s +2025-11-28 03:16:51,173 - INFO - Epoch 1 Step 910 (Global: 6410): loss=1.9251, ppl=6.86, grad_norm=0.80, lr=3.87e-05, throughput=5659 tok/s +2025-11-28 03:18:16,142 - INFO - Epoch 1 Step 920 (Global: 6420): loss=1.5716, ppl=4.81, grad_norm=0.77, lr=3.85e-05, throughput=5649 tok/s +2025-11-28 03:19:41,015 - INFO - Epoch 1 Step 930 (Global: 6430): loss=1.5350, ppl=4.64, grad_norm=0.77, lr=3.84e-05, throughput=5656 tok/s +2025-11-28 03:21:06,214 - INFO - Epoch 1 Step 940 (Global: 6440): loss=1.5835, ppl=4.87, grad_norm=0.76, lr=3.82e-05, throughput=5634 tok/s +2025-11-28 03:22:31,075 - INFO - Epoch 1 Step 950 (Global: 6450): loss=1.5941, ppl=4.92, grad_norm=0.82, lr=3.80e-05, throughput=5656 tok/s +2025-11-28 03:23:55,953 - INFO - Epoch 1 Step 960 (Global: 6460): loss=1.4473, ppl=4.25, grad_norm=0.77, lr=3.79e-05, throughput=5655 tok/s +2025-11-28 03:25:20,825 - INFO - Epoch 1 Step 970 (Global: 6470): loss=1.4841, ppl=4.41, grad_norm=0.72, lr=3.77e-05, throughput=5656 tok/s +2025-11-28 03:26:45,604 - INFO - Epoch 1 Step 980 (Global: 6480): loss=1.7911, ppl=6.00, grad_norm=0.81, lr=3.76e-05, throughput=5662 tok/s +2025-11-28 03:28:10,434 - INFO - Epoch 1 Step 990 (Global: 6490): loss=1.6116, ppl=5.01, grad_norm=0.78, lr=3.74e-05, throughput=5658 tok/s +2025-11-28 03:29:35,533 - INFO - Epoch 1 Step 1000 (Global: 6500): loss=1.6097, ppl=5.00, grad_norm=0.76, lr=3.72e-05, throughput=5641 tok/s +2025-11-28 03:29:35,533 - INFO - +Running validation at step 6500... +2025-11-28 03:34:07,628 - INFO - Validation loss: 1.6251, perplexity: 5.08 +2025-11-28 03:34:07,628 - INFO - Qualitative metrics (n=5): +2025-11-28 03:34:07,628 - INFO - BLEU: 0.1569 +2025-11-28 03:34:07,629 - INFO - METEOR: 0.2121 +2025-11-28 03:34:07,629 - INFO - Edit Distance: 0.6195 +2025-11-28 03:34:07,629 - INFO - F-measure: 0.2595 +2025-11-28 03:34:07,629 - INFO - +====================================================================== +2025-11-28 03:34:07,629 - INFO - Qualitative Evaluation Samples: +2025-11-28 03:34:07,629 - INFO - ====================================================================== +2025-11-28 03:34:07,629 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 03:34:07,629 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 03:34:07,629 - INFO - Generated: ' to the work of the band\'s former bassist, and said that "the band\'s new album is a triumph, a triumph of the human spirit, a triumph of the human soul." Fitzmaurice also gave the album a positive rev...' +2025-11-28 03:34:07,629 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 03:34:07,629 - INFO - ---------------------------------------------------------------------- +2025-11-28 03:34:07,630 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 03:34:07,630 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 03:34:07,630 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was the first fraternal organization in the United States to be organized by a Native American. The Order of Angell was founded in 1924 ...' +2025-11-28 03:34:07,630 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 03:34:07,630 - INFO - ---------------------------------------------------------------------- +2025-11-28 03:34:07,630 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 03:34:07,630 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 03:34:07,630 - INFO - Generated: " be killed by Oga. They are defeated by Oga and Miki, and are then killed by the Red Tails.\nKiriya\nA young man who is the second in command of the Red Tails. He is a member of the Red Tails' inner cir..." +2025-11-28 03:34:07,630 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 03:34:07,631 - INFO - ---------------------------------------------------------------------- +2025-11-28 03:34:07,631 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 03:34:07,631 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 03:34:07,631 - INFO - Generated: '/1992 | 0B01..0B0B0 | 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B0 0B01..0B0B...' +2025-11-28 03:34:07,631 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 03:34:07,631 - INFO - ---------------------------------------------------------------------- +2025-11-28 03:34:07,631 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 03:34:07,631 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 03:34:07,631 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 03:34:07,631 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 03:34:07,631 - INFO - ---------------------------------------------------------------------- +2025-11-28 03:34:07,632 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_6500.jsonl +2025-11-28 03:34:35,623 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 03:34:35,629 - INFO - New best validation loss: 1.6251, perplexity: 5.08 +2025-11-28 03:36:00,536 - INFO - Epoch 1 Step 1010 (Global: 6510): loss=1.8284, ppl=6.22, grad_norm=0.77, lr=3.71e-05, throughput=5654 tok/s +2025-11-28 03:37:25,223 - INFO - Epoch 1 Step 1020 (Global: 6520): loss=1.6899, ppl=5.42, grad_norm=0.77, lr=3.69e-05, throughput=5668 tok/s +2025-11-28 03:38:49,861 - INFO - Epoch 1 Step 1030 (Global: 6530): loss=1.5735, ppl=4.82, grad_norm=0.74, lr=3.67e-05, throughput=5671 tok/s +2025-11-28 03:40:14,515 - INFO - Epoch 1 Step 1040 (Global: 6540): loss=1.8676, ppl=6.47, grad_norm=0.84, lr=3.66e-05, throughput=5670 tok/s +2025-11-28 03:41:39,090 - INFO - Epoch 1 Step 1050 (Global: 6550): loss=1.7101, ppl=5.53, grad_norm=0.79, lr=3.64e-05, throughput=5675 tok/s +2025-11-28 03:43:03,898 - INFO - Epoch 1 Step 1060 (Global: 6560): loss=1.4696, ppl=4.35, grad_norm=0.78, lr=3.63e-05, throughput=5660 tok/s +2025-11-28 03:44:28,223 - INFO - Epoch 1 Step 1070 (Global: 6570): loss=1.6498, ppl=5.21, grad_norm=0.73, lr=3.61e-05, throughput=5692 tok/s +2025-11-28 03:45:52,663 - INFO - Epoch 1 Step 1080 (Global: 6580): loss=1.8141, ppl=6.14, grad_norm=0.81, lr=3.59e-05, throughput=5685 tok/s +2025-11-28 03:47:17,297 - INFO - Epoch 1 Step 1090 (Global: 6590): loss=1.5116, ppl=4.53, grad_norm=0.74, lr=3.58e-05, throughput=5672 tok/s +2025-11-28 03:48:41,772 - INFO - Epoch 1 Step 1100 (Global: 6600): loss=1.6768, ppl=5.35, grad_norm=0.77, lr=3.56e-05, throughput=5682 tok/s +2025-11-28 03:50:06,635 - INFO - Epoch 1 Step 1110 (Global: 6610): loss=1.7479, ppl=5.74, grad_norm=0.79, lr=3.55e-05, throughput=5656 tok/s +2025-11-28 03:51:31,264 - INFO - Epoch 1 Step 1120 (Global: 6620): loss=1.4734, ppl=4.36, grad_norm=0.77, lr=3.53e-05, throughput=5672 tok/s +2025-11-28 03:52:55,881 - INFO - Epoch 1 Step 1130 (Global: 6630): loss=1.4509, ppl=4.27, grad_norm=0.75, lr=3.51e-05, throughput=5673 tok/s +2025-11-28 03:54:20,395 - INFO - Epoch 1 Step 1140 (Global: 6640): loss=1.5179, ppl=4.56, grad_norm=0.76, lr=3.50e-05, throughput=5680 tok/s +2025-11-28 03:55:45,023 - INFO - Epoch 1 Step 1150 (Global: 6650): loss=1.3759, ppl=3.96, grad_norm=0.73, lr=3.48e-05, throughput=5672 tok/s +2025-11-28 03:57:09,713 - INFO - Epoch 1 Step 1160 (Global: 6660): loss=1.7613, ppl=5.82, grad_norm=0.79, lr=3.47e-05, throughput=5668 tok/s +2025-11-28 03:58:34,346 - INFO - Epoch 1 Step 1170 (Global: 6670): loss=1.6363, ppl=5.14, grad_norm=0.78, lr=3.45e-05, throughput=5672 tok/s +2025-11-28 03:59:59,021 - INFO - Epoch 1 Step 1180 (Global: 6680): loss=1.4351, ppl=4.20, grad_norm=0.75, lr=3.43e-05, throughput=5669 tok/s +2025-11-28 04:01:23,984 - INFO - Epoch 1 Step 1190 (Global: 6690): loss=1.6070, ppl=4.99, grad_norm=0.75, lr=3.42e-05, throughput=5650 tok/s +2025-11-28 04:02:48,582 - INFO - Epoch 1 Step 1200 (Global: 6700): loss=1.5150, ppl=4.55, grad_norm=0.86, lr=3.40e-05, throughput=5674 tok/s +2025-11-28 04:04:13,221 - INFO - Epoch 1 Step 1210 (Global: 6710): loss=1.4783, ppl=4.39, grad_norm=0.74, lr=3.39e-05, throughput=5671 tok/s +2025-11-28 04:05:37,746 - INFO - Epoch 1 Step 1220 (Global: 6720): loss=1.5743, ppl=4.83, grad_norm=0.91, lr=3.37e-05, throughput=5679 tok/s +2025-11-28 04:07:02,370 - INFO - Epoch 1 Step 1230 (Global: 6730): loss=1.5183, ppl=4.56, grad_norm=0.76, lr=3.35e-05, throughput=5672 tok/s +2025-11-28 04:08:26,973 - INFO - Epoch 1 Step 1240 (Global: 6740): loss=1.7367, ppl=5.68, grad_norm=0.78, lr=3.34e-05, throughput=5674 tok/s +2025-11-28 04:09:51,400 - INFO - Epoch 1 Step 1250 (Global: 6750): loss=1.5956, ppl=4.93, grad_norm=0.80, lr=3.32e-05, throughput=5685 tok/s +2025-11-28 04:11:16,061 - INFO - Epoch 1 Step 1260 (Global: 6760): loss=1.6118, ppl=5.01, grad_norm=0.77, lr=3.31e-05, throughput=5670 tok/s +2025-11-28 04:12:40,691 - INFO - Epoch 1 Step 1270 (Global: 6770): loss=1.6256, ppl=5.08, grad_norm=0.79, lr=3.29e-05, throughput=5672 tok/s +2025-11-28 04:14:05,574 - INFO - Epoch 1 Step 1280 (Global: 6780): loss=1.6605, ppl=5.26, grad_norm=0.81, lr=3.28e-05, throughput=5655 tok/s +2025-11-28 04:15:30,524 - INFO - Epoch 1 Step 1290 (Global: 6790): loss=1.3708, ppl=3.94, grad_norm=0.73, lr=3.26e-05, throughput=5650 tok/s +2025-11-28 04:16:55,233 - INFO - Epoch 1 Step 1300 (Global: 6800): loss=1.6200, ppl=5.05, grad_norm=0.78, lr=3.24e-05, throughput=5667 tok/s +2025-11-28 04:18:20,038 - INFO - Epoch 1 Step 1310 (Global: 6810): loss=1.6420, ppl=5.17, grad_norm=0.76, lr=3.23e-05, throughput=5660 tok/s +2025-11-28 04:19:44,584 - INFO - Epoch 1 Step 1320 (Global: 6820): loss=1.6756, ppl=5.34, grad_norm=0.79, lr=3.21e-05, throughput=5677 tok/s +2025-11-28 04:21:09,239 - INFO - Epoch 1 Step 1330 (Global: 6830): loss=1.5019, ppl=4.49, grad_norm=0.80, lr=3.20e-05, throughput=5670 tok/s +2025-11-28 04:22:33,759 - INFO - Epoch 1 Step 1340 (Global: 6840): loss=1.7043, ppl=5.50, grad_norm=0.77, lr=3.18e-05, throughput=5679 tok/s +2025-11-28 04:23:58,314 - INFO - Epoch 1 Step 1350 (Global: 6850): loss=1.6808, ppl=5.37, grad_norm=0.75, lr=3.17e-05, throughput=5677 tok/s +2025-11-28 04:25:23,340 - INFO - Epoch 1 Step 1360 (Global: 6860): loss=1.4378, ppl=4.21, grad_norm=0.79, lr=3.15e-05, throughput=5645 tok/s +2025-11-28 04:26:48,212 - INFO - Epoch 1 Step 1370 (Global: 6870): loss=1.8521, ppl=6.37, grad_norm=0.80, lr=3.13e-05, throughput=5656 tok/s +2025-11-28 04:28:12,770 - INFO - Epoch 1 Step 1380 (Global: 6880): loss=1.5025, ppl=4.49, grad_norm=0.74, lr=3.12e-05, throughput=5677 tok/s +2025-11-28 04:29:37,164 - INFO - Epoch 1 Step 1390 (Global: 6890): loss=1.5487, ppl=4.71, grad_norm=0.75, lr=3.10e-05, throughput=5688 tok/s +2025-11-28 04:31:01,847 - INFO - Epoch 1 Step 1400 (Global: 6900): loss=1.4810, ppl=4.40, grad_norm=0.75, lr=3.09e-05, throughput=5668 tok/s +2025-11-28 04:32:26,174 - INFO - Epoch 1 Step 1410 (Global: 6910): loss=1.6824, ppl=5.38, grad_norm=0.75, lr=3.07e-05, throughput=5692 tok/s +2025-11-28 04:33:50,467 - INFO - Epoch 1 Step 1420 (Global: 6920): loss=1.4513, ppl=4.27, grad_norm=0.73, lr=3.06e-05, throughput=5694 tok/s +2025-11-28 04:35:14,944 - INFO - Epoch 1 Step 1430 (Global: 6930): loss=1.6764, ppl=5.35, grad_norm=0.80, lr=3.04e-05, throughput=5682 tok/s +2025-11-28 04:36:40,302 - INFO - Epoch 1 Step 1440 (Global: 6940): loss=1.3506, ppl=3.86, grad_norm=0.78, lr=3.03e-05, throughput=5623 tok/s +2025-11-28 04:38:05,139 - INFO - Epoch 1 Step 1450 (Global: 6950): loss=1.4955, ppl=4.46, grad_norm=0.74, lr=3.01e-05, throughput=5658 tok/s +2025-11-28 04:39:29,821 - INFO - Epoch 1 Step 1460 (Global: 6960): loss=1.5399, ppl=4.66, grad_norm=0.74, lr=3.00e-05, throughput=5668 tok/s +2025-11-28 04:40:54,625 - INFO - Epoch 1 Step 1470 (Global: 6970): loss=1.5777, ppl=4.84, grad_norm=0.76, lr=2.98e-05, throughput=5660 tok/s +2025-11-28 04:42:19,146 - INFO - Epoch 1 Step 1480 (Global: 6980): loss=1.5542, ppl=4.73, grad_norm=0.75, lr=2.96e-05, throughput=5679 tok/s +2025-11-28 04:43:43,888 - INFO - Epoch 1 Step 1490 (Global: 6990): loss=1.4593, ppl=4.30, grad_norm=0.74, lr=2.95e-05, throughput=5664 tok/s +2025-11-28 04:45:08,731 - INFO - Epoch 1 Step 1500 (Global: 7000): loss=1.5786, ppl=4.85, grad_norm=0.75, lr=2.93e-05, throughput=5658 tok/s +2025-11-28 04:45:08,732 - INFO - +Running validation at step 7000... +2025-11-28 04:49:39,852 - INFO - Validation loss: 1.6156, perplexity: 5.03 +2025-11-28 04:49:39,852 - INFO - Qualitative metrics (n=5): +2025-11-28 04:49:39,853 - INFO - BLEU: 0.1568 +2025-11-28 04:49:39,853 - INFO - METEOR: 0.2378 +2025-11-28 04:49:39,853 - INFO - Edit Distance: 0.5587 +2025-11-28 04:49:39,853 - INFO - F-measure: 0.2618 +2025-11-28 04:49:39,853 - INFO - +====================================================================== +2025-11-28 04:49:39,853 - INFO - Qualitative Evaluation Samples: +2025-11-28 04:49:39,853 - INFO - ====================================================================== +2025-11-28 04:49:39,853 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 04:49:39,853 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 04:49:39,853 - INFO - Generated: ' to the band\'s previous work, saying that "the album is a triumph, a triumph of the band\'s ability to make music that is both accessible and accessible to the masses." In a review for The Guardian, Ma...' +2025-11-28 04:49:39,853 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 04:49:39,854 - INFO - ---------------------------------------------------------------------- +2025-11-28 04:49:39,854 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 04:49:39,854 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 04:49:39,854 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was the first fraternal organization in the United States to be founded by a Native American. The Order of Angell was founded in 1921 by...' +2025-11-28 04:49:39,854 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 04:49:39,854 - INFO - ---------------------------------------------------------------------- +2025-11-28 04:49:39,854 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 04:49:39,854 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 04:49:39,854 - INFO - Generated: " be killed by Oga. He is killed by Oga's son, Kiriya, who is later killed by Oga's son, Miki.\nKiriya\nVoiced by: Yūki Kōno\nKiriya is the son of Teimou and the younger brother of Miki. He is a young man..." +2025-11-28 04:49:39,854 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 04:49:39,855 - INFO - ---------------------------------------------------------------------- +2025-11-28 04:49:39,855 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 04:49:39,855 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 04:49:39,855 - INFO - Generated: '/1992 | L2/92/020 | L2/92/020 | L2/92/020 ...' +2025-11-28 04:49:39,855 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 04:49:39,855 - INFO - ---------------------------------------------------------------------- +2025-11-28 04:49:39,855 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 04:49:39,855 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 04:49:39,855 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 04:49:39,856 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 04:49:39,856 - INFO - ---------------------------------------------------------------------- +2025-11-28 04:49:39,856 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_7000.jsonl +2025-11-28 04:50:08,637 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 04:50:08,643 - INFO - New best validation loss: 1.6156, perplexity: 5.03 +2025-11-28 04:51:33,830 - INFO - Epoch 1 Step 1510 (Global: 7010): loss=1.6974, ppl=5.46, grad_norm=0.77, lr=2.92e-05, throughput=5635 tok/s +2025-11-28 04:52:58,551 - INFO - Epoch 1 Step 1520 (Global: 7020): loss=1.5385, ppl=4.66, grad_norm=0.75, lr=2.90e-05, throughput=5666 tok/s +2025-11-28 04:54:23,070 - INFO - Epoch 1 Step 1530 (Global: 7030): loss=1.4567, ppl=4.29, grad_norm=0.73, lr=2.89e-05, throughput=5679 tok/s +2025-11-28 04:55:47,766 - INFO - Epoch 1 Step 1540 (Global: 7040): loss=1.3626, ppl=3.91, grad_norm=0.71, lr=2.87e-05, throughput=5667 tok/s +2025-11-28 04:57:12,522 - INFO - Epoch 1 Step 1550 (Global: 7050): loss=1.5398, ppl=4.66, grad_norm=0.74, lr=2.86e-05, throughput=5663 tok/s +2025-11-28 04:58:37,105 - INFO - Epoch 1 Step 1560 (Global: 7060): loss=1.5575, ppl=4.75, grad_norm=0.74, lr=2.84e-05, throughput=5675 tok/s +2025-11-28 05:00:01,916 - INFO - Epoch 1 Step 1570 (Global: 7070): loss=1.5494, ppl=4.71, grad_norm=0.73, lr=2.83e-05, throughput=5660 tok/s +2025-11-28 05:01:26,912 - INFO - Epoch 1 Step 1580 (Global: 7080): loss=1.4492, ppl=4.26, grad_norm=0.72, lr=2.81e-05, throughput=5647 tok/s +2025-11-28 05:02:51,692 - INFO - Epoch 1 Step 1590 (Global: 7090): loss=1.5504, ppl=4.71, grad_norm=0.82, lr=2.80e-05, throughput=5662 tok/s +2025-11-28 05:04:16,525 - INFO - Epoch 1 Step 1600 (Global: 7100): loss=1.5857, ppl=4.88, grad_norm=0.74, lr=2.78e-05, throughput=5658 tok/s +2025-11-28 05:05:41,691 - INFO - Epoch 1 Step 1610 (Global: 7110): loss=1.4733, ppl=4.36, grad_norm=0.82, lr=2.77e-05, throughput=5636 tok/s +2025-11-28 05:07:06,546 - INFO - Epoch 1 Step 1620 (Global: 7120): loss=1.6095, ppl=5.00, grad_norm=0.73, lr=2.75e-05, throughput=5657 tok/s +2025-11-28 05:08:31,217 - INFO - Epoch 1 Step 1630 (Global: 7130): loss=1.5634, ppl=4.77, grad_norm=0.77, lr=2.74e-05, throughput=5669 tok/s +2025-11-28 05:09:55,890 - INFO - Epoch 1 Step 1640 (Global: 7140): loss=1.5965, ppl=4.94, grad_norm=0.75, lr=2.72e-05, throughput=5669 tok/s +2025-11-28 05:11:20,655 - INFO - Epoch 1 Step 1650 (Global: 7150): loss=1.7301, ppl=5.64, grad_norm=0.80, lr=2.71e-05, throughput=5663 tok/s +2025-11-28 05:12:45,840 - INFO - Epoch 1 Step 1660 (Global: 7160): loss=1.5850, ppl=4.88, grad_norm=0.75, lr=2.69e-05, throughput=5635 tok/s +2025-11-28 05:14:10,763 - INFO - Epoch 1 Step 1670 (Global: 7170): loss=1.6716, ppl=5.32, grad_norm=0.81, lr=2.68e-05, throughput=5652 tok/s +2025-11-28 05:15:35,806 - INFO - Epoch 1 Step 1680 (Global: 7180): loss=1.5543, ppl=4.73, grad_norm=0.79, lr=2.66e-05, throughput=5644 tok/s +2025-11-28 05:17:00,577 - INFO - Epoch 1 Step 1690 (Global: 7190): loss=1.7007, ppl=5.48, grad_norm=0.76, lr=2.65e-05, throughput=5662 tok/s +2025-11-28 05:18:25,572 - INFO - Epoch 1 Step 1700 (Global: 7200): loss=1.3492, ppl=3.85, grad_norm=0.74, lr=2.63e-05, throughput=5647 tok/s +2025-11-28 05:19:50,597 - INFO - Epoch 1 Step 1710 (Global: 7210): loss=1.5655, ppl=4.78, grad_norm=0.80, lr=2.62e-05, throughput=5645 tok/s +2025-11-28 05:21:15,952 - INFO - Epoch 1 Step 1720 (Global: 7220): loss=1.6489, ppl=5.20, grad_norm=0.75, lr=2.60e-05, throughput=5624 tok/s +2025-11-28 05:22:41,019 - INFO - Epoch 1 Step 1730 (Global: 7230): loss=1.7518, ppl=5.77, grad_norm=0.80, lr=2.59e-05, throughput=5643 tok/s +2025-11-28 05:24:05,947 - INFO - Epoch 1 Step 1740 (Global: 7240): loss=1.5998, ppl=4.95, grad_norm=0.83, lr=2.58e-05, throughput=5652 tok/s +2025-11-28 05:25:30,744 - INFO - Epoch 1 Step 1750 (Global: 7250): loss=1.7544, ppl=5.78, grad_norm=0.82, lr=2.56e-05, throughput=5661 tok/s +2025-11-28 05:26:55,632 - INFO - Epoch 1 Step 1760 (Global: 7260): loss=1.6168, ppl=5.04, grad_norm=0.76, lr=2.55e-05, throughput=5655 tok/s +2025-11-28 05:28:20,822 - INFO - Epoch 1 Step 1770 (Global: 7270): loss=1.7272, ppl=5.62, grad_norm=0.79, lr=2.53e-05, throughput=5635 tok/s +2025-11-28 05:29:45,452 - INFO - Epoch 1 Step 1780 (Global: 7280): loss=1.6301, ppl=5.10, grad_norm=0.77, lr=2.52e-05, throughput=5672 tok/s +2025-11-28 05:31:10,095 - INFO - Epoch 1 Step 1790 (Global: 7290): loss=1.6319, ppl=5.11, grad_norm=0.76, lr=2.50e-05, throughput=5671 tok/s +2025-11-28 05:32:34,971 - INFO - Epoch 1 Step 1800 (Global: 7300): loss=1.8109, ppl=6.12, grad_norm=0.82, lr=2.49e-05, throughput=5655 tok/s +2025-11-28 05:33:59,588 - INFO - Epoch 1 Step 1810 (Global: 7310): loss=1.5481, ppl=4.70, grad_norm=0.73, lr=2.47e-05, throughput=5673 tok/s +2025-11-28 05:35:24,361 - INFO - Epoch 1 Step 1820 (Global: 7320): loss=1.7480, ppl=5.74, grad_norm=0.86, lr=2.46e-05, throughput=5662 tok/s +2025-11-28 05:36:49,075 - INFO - Epoch 1 Step 1830 (Global: 7330): loss=1.7024, ppl=5.49, grad_norm=0.77, lr=2.44e-05, throughput=5666 tok/s +2025-11-28 05:38:14,048 - INFO - Epoch 1 Step 1840 (Global: 7340): loss=1.6974, ppl=5.46, grad_norm=0.78, lr=2.43e-05, throughput=5649 tok/s +2025-11-28 05:39:38,948 - INFO - Epoch 1 Step 1850 (Global: 7350): loss=1.5352, ppl=4.64, grad_norm=0.75, lr=2.42e-05, throughput=5654 tok/s +2025-11-28 05:41:03,507 - INFO - Epoch 1 Step 1860 (Global: 7360): loss=1.5635, ppl=4.78, grad_norm=0.75, lr=2.40e-05, throughput=5677 tok/s +2025-11-28 05:42:28,044 - INFO - Epoch 1 Step 1870 (Global: 7370): loss=1.4522, ppl=4.27, grad_norm=0.75, lr=2.39e-05, throughput=5678 tok/s +2025-11-28 05:43:52,786 - INFO - Epoch 1 Step 1880 (Global: 7380): loss=1.6246, ppl=5.08, grad_norm=0.76, lr=2.37e-05, throughput=5664 tok/s +2025-11-28 05:45:17,232 - INFO - Epoch 1 Step 1890 (Global: 7390): loss=1.7400, ppl=5.70, grad_norm=0.83, lr=2.36e-05, throughput=5684 tok/s +2025-11-28 05:46:42,220 - INFO - Epoch 1 Step 1900 (Global: 7400): loss=1.5291, ppl=4.61, grad_norm=0.75, lr=2.34e-05, throughput=5648 tok/s +2025-11-28 05:48:06,726 - INFO - Epoch 1 Step 1910 (Global: 7410): loss=1.7244, ppl=5.61, grad_norm=0.77, lr=2.33e-05, throughput=5680 tok/s +2025-11-28 05:49:31,618 - INFO - Epoch 1 Step 1920 (Global: 7420): loss=1.6798, ppl=5.36, grad_norm=0.77, lr=2.32e-05, throughput=5654 tok/s +2025-11-28 05:50:56,253 - INFO - Epoch 1 Step 1930 (Global: 7430): loss=1.5961, ppl=4.93, grad_norm=0.78, lr=2.30e-05, throughput=5671 tok/s +2025-11-28 05:52:20,808 - INFO - Epoch 1 Step 1940 (Global: 7440): loss=1.3380, ppl=3.81, grad_norm=0.71, lr=2.29e-05, throughput=5677 tok/s +2025-11-28 05:53:45,379 - INFO - Epoch 1 Step 1950 (Global: 7450): loss=1.6587, ppl=5.25, grad_norm=0.73, lr=2.27e-05, throughput=5676 tok/s +2025-11-28 05:55:10,075 - INFO - Epoch 1 Step 1960 (Global: 7460): loss=1.7494, ppl=5.75, grad_norm=0.76, lr=2.26e-05, throughput=5667 tok/s +2025-11-28 05:56:34,666 - INFO - Epoch 1 Step 1970 (Global: 7470): loss=1.5330, ppl=4.63, grad_norm=0.75, lr=2.25e-05, throughput=5674 tok/s +2025-11-28 05:57:59,736 - INFO - Epoch 1 Step 1980 (Global: 7480): loss=1.8465, ppl=6.34, grad_norm=0.80, lr=2.23e-05, throughput=5642 tok/s +2025-11-28 05:59:24,535 - INFO - Epoch 1 Step 1990 (Global: 7490): loss=1.7256, ppl=5.62, grad_norm=0.79, lr=2.22e-05, throughput=5660 tok/s +2025-11-28 06:00:49,129 - INFO - Epoch 1 Step 2000 (Global: 7500): loss=1.5428, ppl=4.68, grad_norm=0.82, lr=2.20e-05, throughput=5674 tok/s +2025-11-28 06:00:49,129 - INFO - +Running validation at step 7500... +2025-11-28 06:05:20,706 - INFO - Validation loss: 1.6090, perplexity: 5.00 +2025-11-28 06:05:20,707 - INFO - Qualitative metrics (n=5): +2025-11-28 06:05:20,707 - INFO - BLEU: 0.1683 +2025-11-28 06:05:20,707 - INFO - METEOR: 0.2366 +2025-11-28 06:05:20,707 - INFO - Edit Distance: 0.5922 +2025-11-28 06:05:20,707 - INFO - F-measure: 0.2649 +2025-11-28 06:05:20,707 - INFO - +====================================================================== +2025-11-28 06:05:20,707 - INFO - Qualitative Evaluation Samples: +2025-11-28 06:05:20,707 - INFO - ====================================================================== +2025-11-28 06:05:20,707 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 06:05:20,708 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 06:05:20,708 - INFO - Generated: ' to the band\'s previous work, saying that "the band\'s sound is more mature, more confident, and more confident than ever." Fitzmaurice also gave the album a positive review, saying that "the band\'s so...' +2025-11-28 06:05:20,708 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 06:05:20,708 - INFO - ---------------------------------------------------------------------- +2025-11-28 06:05:20,708 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 06:05:20,708 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 06:05:20,708 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was the first fraternal organization in the United States to be founded by a Native American. The Order of Angell was founded in 1921 by...' +2025-11-28 06:05:20,708 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 06:05:20,708 - INFO - ---------------------------------------------------------------------- +2025-11-28 06:05:20,708 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 06:05:20,708 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 06:05:20,708 - INFO - Generated: ' be killed by Oga. They are later defeated by the Red Tails and the Six Knights, and are later killed by the Red Tails and the Six Knights.\nKiriya\nVoiced by: Yūki Koyama\nKiriya is the second-in-comman...' +2025-11-28 06:05:20,709 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 06:05:20,709 - INFO - ---------------------------------------------------------------------- +2025-11-28 06:05:20,709 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 06:05:20,709 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 06:05:20,709 - INFO - Generated: '/1992 | L2/92/020 | L2/92/020 | L2/92/020 ...' +2025-11-28 06:05:20,709 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 06:05:20,709 - INFO - ---------------------------------------------------------------------- +2025-11-28 06:05:20,709 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 06:05:20,709 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 06:05:20,709 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 06:05:20,709 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 06:05:20,710 - INFO - ---------------------------------------------------------------------- +2025-11-28 06:05:20,710 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_7500.jsonl +2025-11-28 06:05:47,916 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 06:05:47,922 - INFO - New best validation loss: 1.6090, perplexity: 5.00 +2025-11-28 06:07:12,692 - INFO - Epoch 1 Step 2010 (Global: 7510): loss=1.5731, ppl=4.82, grad_norm=0.75, lr=2.19e-05, throughput=5663 tok/s +2025-11-28 06:08:37,687 - INFO - Epoch 1 Step 2020 (Global: 7520): loss=1.6756, ppl=5.34, grad_norm=0.80, lr=2.18e-05, throughput=5647 tok/s +2025-11-28 06:10:02,416 - INFO - Epoch 1 Step 2030 (Global: 7530): loss=1.5993, ppl=4.95, grad_norm=0.78, lr=2.16e-05, throughput=5665 tok/s +2025-11-28 06:11:27,267 - INFO - Epoch 1 Step 2040 (Global: 7540): loss=1.4716, ppl=4.36, grad_norm=0.79, lr=2.15e-05, throughput=5657 tok/s +2025-11-28 06:12:51,843 - INFO - Epoch 1 Step 2050 (Global: 7550): loss=1.7631, ppl=5.83, grad_norm=0.79, lr=2.14e-05, throughput=5675 tok/s +2025-11-28 06:14:16,578 - INFO - Epoch 1 Step 2060 (Global: 7560): loss=1.4452, ppl=4.24, grad_norm=0.74, lr=2.12e-05, throughput=5665 tok/s +2025-11-28 06:15:41,367 - INFO - Epoch 1 Step 2070 (Global: 7570): loss=1.5091, ppl=4.52, grad_norm=0.75, lr=2.11e-05, throughput=5661 tok/s +2025-11-28 06:17:06,314 - INFO - Epoch 1 Step 2080 (Global: 7580): loss=1.6072, ppl=4.99, grad_norm=0.77, lr=2.09e-05, throughput=5651 tok/s +2025-11-28 06:18:31,196 - INFO - Epoch 1 Step 2090 (Global: 7590): loss=1.8080, ppl=6.10, grad_norm=0.79, lr=2.08e-05, throughput=5655 tok/s +2025-11-28 06:19:56,083 - INFO - Epoch 1 Step 2100 (Global: 7600): loss=1.5733, ppl=4.82, grad_norm=0.78, lr=2.07e-05, throughput=5655 tok/s +2025-11-28 06:21:21,273 - INFO - Epoch 1 Step 2110 (Global: 7610): loss=1.6920, ppl=5.43, grad_norm=0.78, lr=2.05e-05, throughput=5635 tok/s +2025-11-28 06:22:46,164 - INFO - Epoch 1 Step 2120 (Global: 7620): loss=1.4740, ppl=4.37, grad_norm=0.76, lr=2.04e-05, throughput=5654 tok/s +2025-11-28 06:24:11,003 - INFO - Epoch 1 Step 2130 (Global: 7630): loss=1.5767, ppl=4.84, grad_norm=0.74, lr=2.03e-05, throughput=5658 tok/s +2025-11-28 06:25:35,670 - INFO - Epoch 1 Step 2140 (Global: 7640): loss=1.5167, ppl=4.56, grad_norm=0.75, lr=2.01e-05, throughput=5669 tok/s +2025-11-28 06:27:00,229 - INFO - Epoch 1 Step 2150 (Global: 7650): loss=1.6419, ppl=5.17, grad_norm=0.80, lr=2.00e-05, throughput=5677 tok/s +2025-11-28 06:28:24,705 - INFO - Epoch 1 Step 2160 (Global: 7660): loss=1.6489, ppl=5.20, grad_norm=0.77, lr=1.99e-05, throughput=5682 tok/s +2025-11-28 06:29:49,648 - INFO - Epoch 1 Step 2170 (Global: 7670): loss=1.5404, ppl=4.67, grad_norm=0.74, lr=1.97e-05, throughput=5651 tok/s +2025-11-28 06:31:14,496 - INFO - Epoch 1 Step 2180 (Global: 7680): loss=1.7255, ppl=5.62, grad_norm=0.76, lr=1.96e-05, throughput=5657 tok/s +2025-11-28 06:32:39,477 - INFO - Epoch 1 Step 2190 (Global: 7690): loss=1.7661, ppl=5.85, grad_norm=0.79, lr=1.95e-05, throughput=5648 tok/s +2025-11-28 06:34:04,229 - INFO - Epoch 1 Step 2200 (Global: 7700): loss=1.2984, ppl=3.66, grad_norm=0.74, lr=1.93e-05, throughput=5664 tok/s +2025-11-28 06:35:29,174 - INFO - Epoch 1 Step 2210 (Global: 7710): loss=1.5755, ppl=4.83, grad_norm=0.79, lr=1.92e-05, throughput=5651 tok/s +2025-11-28 06:36:53,946 - INFO - Epoch 1 Step 2220 (Global: 7720): loss=1.4550, ppl=4.28, grad_norm=0.74, lr=1.91e-05, throughput=5662 tok/s +2025-11-28 06:38:18,919 - INFO - Epoch 1 Step 2230 (Global: 7730): loss=1.5032, ppl=4.50, grad_norm=0.74, lr=1.89e-05, throughput=5649 tok/s +2025-11-28 06:39:43,253 - INFO - Epoch 1 Step 2240 (Global: 7740): loss=1.6561, ppl=5.24, grad_norm=0.74, lr=1.88e-05, throughput=5692 tok/s +2025-11-28 06:41:07,691 - INFO - Epoch 1 Step 2250 (Global: 7750): loss=1.7406, ppl=5.70, grad_norm=0.80, lr=1.87e-05, throughput=5685 tok/s +2025-11-28 06:42:32,237 - INFO - Epoch 1 Step 2260 (Global: 7760): loss=1.6480, ppl=5.20, grad_norm=0.74, lr=1.85e-05, throughput=5677 tok/s +2025-11-28 06:43:56,646 - INFO - Epoch 1 Step 2270 (Global: 7770): loss=1.6158, ppl=5.03, grad_norm=0.76, lr=1.84e-05, throughput=5687 tok/s +2025-11-28 06:45:20,915 - INFO - Epoch 1 Step 2280 (Global: 7780): loss=1.6433, ppl=5.17, grad_norm=0.76, lr=1.83e-05, throughput=5696 tok/s +2025-11-28 06:46:45,532 - INFO - Epoch 1 Step 2290 (Global: 7790): loss=1.6304, ppl=5.11, grad_norm=0.75, lr=1.82e-05, throughput=5673 tok/s +2025-11-28 06:48:10,018 - INFO - Epoch 1 Step 2300 (Global: 7800): loss=1.6279, ppl=5.09, grad_norm=0.77, lr=1.80e-05, throughput=5681 tok/s +2025-11-28 06:49:34,854 - INFO - Epoch 1 Step 2310 (Global: 7810): loss=1.7752, ppl=5.90, grad_norm=0.79, lr=1.79e-05, throughput=5658 tok/s +2025-11-28 06:50:59,166 - INFO - Epoch 1 Step 2320 (Global: 7820): loss=1.6045, ppl=4.98, grad_norm=0.74, lr=1.78e-05, throughput=5693 tok/s +2025-11-28 06:52:23,711 - INFO - Epoch 1 Step 2330 (Global: 7830): loss=1.5410, ppl=4.67, grad_norm=0.77, lr=1.76e-05, throughput=5678 tok/s +2025-11-28 06:53:48,119 - INFO - Epoch 1 Step 2340 (Global: 7840): loss=1.7407, ppl=5.70, grad_norm=0.80, lr=1.75e-05, throughput=5687 tok/s +2025-11-28 06:55:12,565 - INFO - Epoch 1 Step 2350 (Global: 7850): loss=1.9615, ppl=7.11, grad_norm=0.82, lr=1.74e-05, throughput=5684 tok/s +2025-11-28 06:56:36,993 - INFO - Epoch 1 Step 2360 (Global: 7860): loss=1.6857, ppl=5.40, grad_norm=0.73, lr=1.73e-05, throughput=5685 tok/s +2025-11-28 06:58:01,420 - INFO - Epoch 1 Step 2370 (Global: 7870): loss=1.4669, ppl=4.34, grad_norm=0.77, lr=1.71e-05, throughput=5685 tok/s +2025-11-28 06:59:26,105 - INFO - Epoch 1 Step 2380 (Global: 7880): loss=1.5201, ppl=4.57, grad_norm=0.77, lr=1.70e-05, throughput=5668 tok/s +2025-11-28 07:00:50,309 - INFO - Epoch 1 Step 2390 (Global: 7890): loss=1.4843, ppl=4.41, grad_norm=0.74, lr=1.69e-05, throughput=5700 tok/s +2025-11-28 07:02:14,771 - INFO - Epoch 1 Step 2400 (Global: 7900): loss=1.5001, ppl=4.48, grad_norm=0.74, lr=1.68e-05, throughput=5683 tok/s +2025-11-28 07:03:39,354 - INFO - Epoch 1 Step 2410 (Global: 7910): loss=1.5912, ppl=4.91, grad_norm=0.77, lr=1.66e-05, throughput=5675 tok/s +2025-11-28 07:05:04,015 - INFO - Epoch 1 Step 2420 (Global: 7920): loss=1.4482, ppl=4.26, grad_norm=0.70, lr=1.65e-05, throughput=5670 tok/s +2025-11-28 07:06:28,695 - INFO - Epoch 1 Step 2430 (Global: 7930): loss=1.6838, ppl=5.39, grad_norm=0.78, lr=1.64e-05, throughput=5668 tok/s +2025-11-28 07:07:53,244 - INFO - Epoch 1 Step 2440 (Global: 7940): loss=1.6363, ppl=5.14, grad_norm=0.80, lr=1.63e-05, throughput=5677 tok/s +2025-11-28 07:09:17,862 - INFO - Epoch 1 Step 2450 (Global: 7950): loss=1.6648, ppl=5.28, grad_norm=0.78, lr=1.61e-05, throughput=5673 tok/s +2025-11-28 07:10:42,804 - INFO - Epoch 1 Step 2460 (Global: 7960): loss=1.6843, ppl=5.39, grad_norm=0.77, lr=1.60e-05, throughput=5651 tok/s +2025-11-28 07:12:07,195 - INFO - Epoch 1 Step 2470 (Global: 7970): loss=1.6894, ppl=5.42, grad_norm=0.76, lr=1.59e-05, throughput=5688 tok/s +2025-11-28 07:13:31,956 - INFO - Epoch 1 Step 2480 (Global: 7980): loss=1.3996, ppl=4.05, grad_norm=0.73, lr=1.58e-05, throughput=5663 tok/s +2025-11-28 07:14:56,442 - INFO - Epoch 1 Step 2490 (Global: 7990): loss=1.4051, ppl=4.08, grad_norm=0.70, lr=1.56e-05, throughput=5681 tok/s +2025-11-28 07:16:20,906 - INFO - Epoch 1 Step 2500 (Global: 8000): loss=1.5883, ppl=4.90, grad_norm=0.77, lr=1.55e-05, throughput=5683 tok/s +2025-11-28 07:16:20,906 - INFO - +Running validation at step 8000... +2025-11-28 07:20:51,236 - INFO - Validation loss: 1.6049, perplexity: 4.98 +2025-11-28 07:20:51,236 - INFO - Qualitative metrics (n=5): +2025-11-28 07:20:51,237 - INFO - BLEU: 0.1841 +2025-11-28 07:20:51,237 - INFO - METEOR: 0.2495 +2025-11-28 07:20:51,237 - INFO - Edit Distance: 0.5969 +2025-11-28 07:20:51,237 - INFO - F-measure: 0.2835 +2025-11-28 07:20:51,237 - INFO - +====================================================================== +2025-11-28 07:20:51,238 - INFO - Qualitative Evaluation Samples: +2025-11-28 07:20:51,238 - INFO - ====================================================================== +2025-11-28 07:20:51,238 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 07:20:51,238 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 07:20:51,238 - INFO - Generated: ' to the band\'s previous work, saying that "the album is a little more subdued, but it\'s still a very strong album, and it\'s a very good record." In a mixed review, The Boston Globe gave the album a po...' +2025-11-28 07:20:51,239 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 07:20:51,239 - INFO - ---------------------------------------------------------------------- +2025-11-28 07:20:51,239 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 07:20:51,239 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 07:20:51,239 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was the first fraternal organization in the United States to be founded by a Native American. The Order of Angell was founded in 1921 by...' +2025-11-28 07:20:51,240 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 07:20:51,240 - INFO - ---------------------------------------------------------------------- +2025-11-28 07:20:51,240 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 07:20:51,240 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 07:20:51,240 - INFO - Generated: " be killed by Oga. They are defeated by Oga and Miki, and are then killed by Oga's henchmen.\nKiriya\nVoiced by: Yūki Kaji\nA young man who is the second in command of the Red Tails. He is a member of th..." +2025-11-28 07:20:51,241 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 07:20:51,241 - INFO - ---------------------------------------------------------------------- +2025-11-28 07:20:51,241 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 07:20:51,241 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 07:20:51,241 - INFO - Generated: '-31-1991 | L2/91-202 | ISO/IEC 10646-1:1991 ...' +2025-11-28 07:20:51,242 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 07:20:51,242 - INFO - ---------------------------------------------------------------------- +2025-11-28 07:20:51,242 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 07:20:51,242 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 07:20:51,242 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 07:20:51,243 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 07:20:51,243 - INFO - ---------------------------------------------------------------------- +2025-11-28 07:20:51,243 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_8000.jsonl +2025-11-28 07:21:19,130 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 07:21:19,136 - INFO - New best validation loss: 1.6049, perplexity: 4.98 +2025-11-28 07:22:43,982 - INFO - Epoch 1 Step 2510 (Global: 8010): loss=1.4336, ppl=4.19, grad_norm=0.76, lr=1.54e-05, throughput=5658 tok/s +2025-11-28 07:24:08,621 - INFO - Epoch 1 Step 2520 (Global: 8020): loss=1.8063, ppl=6.09, grad_norm=0.80, lr=1.53e-05, throughput=5671 tok/s +2025-11-28 07:25:33,328 - INFO - Epoch 1 Step 2530 (Global: 8030): loss=1.5947, ppl=4.93, grad_norm=0.76, lr=1.52e-05, throughput=5667 tok/s +2025-11-28 07:26:58,005 - INFO - Epoch 1 Step 2540 (Global: 8040): loss=1.8084, ppl=6.10, grad_norm=0.82, lr=1.50e-05, throughput=5669 tok/s +2025-11-28 07:28:22,422 - INFO - Epoch 1 Step 2550 (Global: 8050): loss=1.5500, ppl=4.71, grad_norm=0.75, lr=1.49e-05, throughput=5686 tok/s +2025-11-28 07:29:47,005 - INFO - Epoch 1 Step 2560 (Global: 8060): loss=1.5026, ppl=4.49, grad_norm=0.76, lr=1.48e-05, throughput=5675 tok/s +2025-11-28 07:31:11,417 - INFO - Epoch 1 Step 2570 (Global: 8070): loss=1.6647, ppl=5.28, grad_norm=0.80, lr=1.47e-05, throughput=5686 tok/s +2025-11-28 07:32:36,000 - INFO - Epoch 1 Step 2580 (Global: 8080): loss=1.8135, ppl=6.13, grad_norm=0.79, lr=1.46e-05, throughput=5675 tok/s +2025-11-28 07:34:00,771 - INFO - Epoch 1 Step 2590 (Global: 8090): loss=1.6123, ppl=5.01, grad_norm=0.75, lr=1.44e-05, throughput=5662 tok/s +2025-11-28 07:35:25,415 - INFO - Epoch 1 Step 2600 (Global: 8100): loss=1.5629, ppl=4.77, grad_norm=0.73, lr=1.43e-05, throughput=5671 tok/s +2025-11-28 07:36:49,823 - INFO - Epoch 1 Step 2610 (Global: 8110): loss=1.5981, ppl=4.94, grad_norm=0.76, lr=1.42e-05, throughput=5687 tok/s +2025-11-28 07:38:14,336 - INFO - Epoch 1 Step 2620 (Global: 8120): loss=1.6246, ppl=5.08, grad_norm=0.73, lr=1.41e-05, throughput=5680 tok/s +2025-11-28 07:39:39,050 - INFO - Epoch 1 Step 2630 (Global: 8130): loss=1.5946, ppl=4.93, grad_norm=0.76, lr=1.40e-05, throughput=5666 tok/s +2025-11-28 07:41:03,486 - INFO - Epoch 1 Step 2640 (Global: 8140): loss=1.6403, ppl=5.16, grad_norm=0.78, lr=1.39e-05, throughput=5685 tok/s +2025-11-28 07:42:28,076 - INFO - Epoch 1 Step 2650 (Global: 8150): loss=1.5262, ppl=4.60, grad_norm=0.77, lr=1.37e-05, throughput=5674 tok/s +2025-11-28 07:43:52,699 - INFO - Epoch 1 Step 2660 (Global: 8160): loss=1.6016, ppl=4.96, grad_norm=0.77, lr=1.36e-05, throughput=5678 tok/s +2025-11-28 07:45:17,150 - INFO - Epoch 1 Step 2670 (Global: 8170): loss=1.6920, ppl=5.43, grad_norm=0.78, lr=1.35e-05, throughput=5684 tok/s +2025-11-28 07:46:41,783 - INFO - Epoch 1 Step 2680 (Global: 8180): loss=1.6024, ppl=4.96, grad_norm=0.73, lr=1.34e-05, throughput=5672 tok/s +2025-11-28 07:48:06,569 - INFO - Epoch 1 Step 2690 (Global: 8190): loss=1.3677, ppl=3.93, grad_norm=0.73, lr=1.33e-05, throughput=5661 tok/s +2025-11-28 07:49:30,831 - INFO - Epoch 1 Step 2700 (Global: 8200): loss=1.6902, ppl=5.42, grad_norm=0.78, lr=1.32e-05, throughput=5697 tok/s +2025-11-28 07:50:55,244 - INFO - Epoch 1 Step 2710 (Global: 8210): loss=1.6840, ppl=5.39, grad_norm=0.80, lr=1.31e-05, throughput=5686 tok/s +2025-11-28 07:52:19,893 - INFO - Epoch 1 Step 2720 (Global: 8220): loss=1.5339, ppl=4.64, grad_norm=0.76, lr=1.29e-05, throughput=5671 tok/s +2025-11-28 07:53:44,435 - INFO - Epoch 1 Step 2730 (Global: 8230): loss=1.4867, ppl=4.42, grad_norm=0.73, lr=1.28e-05, throughput=5678 tok/s +2025-11-28 07:55:09,095 - INFO - Epoch 1 Step 2740 (Global: 8240): loss=1.7848, ppl=5.96, grad_norm=0.78, lr=1.27e-05, throughput=5670 tok/s +2025-11-28 07:56:33,597 - INFO - Epoch 1 Step 2750 (Global: 8250): loss=1.6535, ppl=5.23, grad_norm=0.78, lr=1.26e-05, throughput=5680 tok/s +2025-11-28 07:57:58,279 - INFO - Epoch 1 Step 2760 (Global: 8260): loss=1.6487, ppl=5.20, grad_norm=0.79, lr=1.25e-05, throughput=5668 tok/s +2025-11-28 07:59:23,209 - INFO - Epoch 1 Step 2770 (Global: 8270): loss=1.4810, ppl=4.40, grad_norm=0.72, lr=1.24e-05, throughput=5652 tok/s +2025-11-28 08:00:47,845 - INFO - Epoch 1 Step 2780 (Global: 8280): loss=1.4734, ppl=4.36, grad_norm=0.79, lr=1.23e-05, throughput=5671 tok/s +2025-11-28 08:02:12,490 - INFO - Epoch 1 Step 2790 (Global: 8290): loss=1.5257, ppl=4.60, grad_norm=0.73, lr=1.22e-05, throughput=5671 tok/s +2025-11-28 08:03:37,236 - INFO - Epoch 1 Step 2800 (Global: 8300): loss=1.6865, ppl=5.40, grad_norm=0.73, lr=1.21e-05, throughput=5664 tok/s +2025-11-28 08:05:01,871 - INFO - Epoch 1 Step 2810 (Global: 8310): loss=1.2388, ppl=3.45, grad_norm=0.71, lr=1.20e-05, throughput=5671 tok/s +2025-11-28 08:06:26,610 - INFO - Epoch 1 Step 2820 (Global: 8320): loss=1.7221, ppl=5.60, grad_norm=0.78, lr=1.18e-05, throughput=5664 tok/s +2025-11-28 08:07:51,309 - INFO - Epoch 1 Step 2830 (Global: 8330): loss=1.3595, ppl=3.89, grad_norm=0.71, lr=1.17e-05, throughput=5667 tok/s +2025-11-28 08:09:16,006 - INFO - Epoch 1 Step 2840 (Global: 8340): loss=1.7031, ppl=5.49, grad_norm=0.79, lr=1.16e-05, throughput=5667 tok/s +2025-11-28 08:10:40,848 - INFO - Epoch 1 Step 2850 (Global: 8350): loss=1.5210, ppl=4.58, grad_norm=0.79, lr=1.15e-05, throughput=5658 tok/s +2025-11-28 08:12:05,391 - INFO - Epoch 1 Step 2860 (Global: 8360): loss=1.4264, ppl=4.16, grad_norm=0.70, lr=1.14e-05, throughput=5678 tok/s +2025-11-28 08:13:30,134 - INFO - Epoch 1 Step 2870 (Global: 8370): loss=1.6745, ppl=5.34, grad_norm=0.79, lr=1.13e-05, throughput=5664 tok/s +2025-11-28 08:14:54,627 - INFO - Epoch 1 Step 2880 (Global: 8380): loss=1.5173, ppl=4.56, grad_norm=0.77, lr=1.12e-05, throughput=5681 tok/s +2025-11-28 08:16:20,146 - INFO - Epoch 1 Step 2890 (Global: 8390): loss=1.3791, ppl=3.97, grad_norm=0.71, lr=1.11e-05, throughput=5613 tok/s +2025-11-28 08:17:44,932 - INFO - Epoch 1 Step 2900 (Global: 8400): loss=1.4420, ppl=4.23, grad_norm=0.73, lr=1.10e-05, throughput=5661 tok/s +2025-11-28 08:19:10,107 - INFO - Epoch 1 Step 2910 (Global: 8410): loss=1.6013, ppl=4.96, grad_norm=0.77, lr=1.09e-05, throughput=5635 tok/s +2025-11-28 08:20:34,790 - INFO - Epoch 1 Step 2920 (Global: 8420): loss=1.6694, ppl=5.31, grad_norm=0.77, lr=1.08e-05, throughput=5668 tok/s +2025-11-28 08:21:59,376 - INFO - Epoch 1 Step 2930 (Global: 8430): loss=1.6365, ppl=5.14, grad_norm=0.80, lr=1.07e-05, throughput=5675 tok/s +2025-11-28 08:23:24,305 - INFO - Epoch 1 Step 2940 (Global: 8440): loss=1.5726, ppl=4.82, grad_norm=0.79, lr=1.06e-05, throughput=5652 tok/s +2025-11-28 08:24:48,886 - INFO - Epoch 1 Step 2950 (Global: 8450): loss=1.6415, ppl=5.16, grad_norm=0.77, lr=1.05e-05, throughput=5675 tok/s +2025-11-28 08:26:13,541 - INFO - Epoch 1 Step 2960 (Global: 8460): loss=1.3656, ppl=3.92, grad_norm=0.71, lr=1.04e-05, throughput=5670 tok/s +2025-11-28 08:27:38,181 - INFO - Epoch 1 Step 2970 (Global: 8470): loss=1.6214, ppl=5.06, grad_norm=0.85, lr=1.03e-05, throughput=5671 tok/s +2025-11-28 08:29:03,203 - INFO - Epoch 1 Step 2980 (Global: 8480): loss=1.8468, ppl=6.34, grad_norm=0.80, lr=1.02e-05, throughput=5646 tok/s +2025-11-28 08:30:27,699 - INFO - Epoch 1 Step 2990 (Global: 8490): loss=1.8094, ppl=6.11, grad_norm=0.85, lr=1.01e-05, throughput=5681 tok/s +2025-11-28 08:31:52,075 - INFO - Epoch 1 Step 3000 (Global: 8500): loss=1.5575, ppl=4.75, grad_norm=0.77, lr=9.96e-06, throughput=5689 tok/s +2025-11-28 08:31:52,075 - INFO - +Running validation at step 8500... +2025-11-28 08:36:23,040 - INFO - Validation loss: 1.6027, perplexity: 4.97 +2025-11-28 08:36:23,041 - INFO - Qualitative metrics (n=5): +2025-11-28 08:36:23,041 - INFO - BLEU: 0.1632 +2025-11-28 08:36:23,041 - INFO - METEOR: 0.2313 +2025-11-28 08:36:23,041 - INFO - Edit Distance: 0.5916 +2025-11-28 08:36:23,041 - INFO - F-measure: 0.2591 +2025-11-28 08:36:23,041 - INFO - +====================================================================== +2025-11-28 08:36:23,041 - INFO - Qualitative Evaluation Samples: +2025-11-28 08:36:23,041 - INFO - ====================================================================== +2025-11-28 08:36:23,041 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 08:36:23,042 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 08:36:23,042 - INFO - Generated: ' to the band\'s previous work, saying that "the album is a little more subdued, but it\'s still a lot of fun." In a review for The A.V. Club, Fitzmaurice said that "the album is a little more subdued, b...' +2025-11-28 08:36:23,042 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 08:36:23,042 - INFO - ---------------------------------------------------------------------- +2025-11-28 08:36:23,042 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 08:36:23,042 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 08:36:23,042 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was the first fraternal organization in the United States to be founded by a Native American. The Order of Angell was founded in 1921 by...' +2025-11-28 08:36:23,042 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 08:36:23,042 - INFO - ---------------------------------------------------------------------- +2025-11-28 08:36:23,042 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 08:36:23,042 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 08:36:23,042 - INFO - Generated: ' be killed by Oga. He is killed by Oga and Miki, who then take his body to the Shingetsu Temple.\nKiriya\nVoiced by: Yūki Kaji\nKiriya is the second leader of the Red Tails. He is a tall, muscular man wi...' +2025-11-28 08:36:23,043 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 08:36:23,043 - INFO - ---------------------------------------------------------------------- +2025-11-28 08:36:23,043 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 08:36:23,043 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 08:36:23,043 - INFO - Generated: '-31-1991 | L2/91-202 | ISO/IEC 10646-1:1991 ...' +2025-11-28 08:36:23,043 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 08:36:23,043 - INFO - ---------------------------------------------------------------------- +2025-11-28 08:36:23,044 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 08:36:23,044 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 08:36:23,044 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 08:36:23,044 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 08:36:23,044 - INFO - ---------------------------------------------------------------------- +2025-11-28 08:36:23,044 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_8500.jsonl +2025-11-28 08:36:49,598 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 08:36:49,604 - INFO - New best validation loss: 1.6027, perplexity: 4.97 +2025-11-28 08:38:14,351 - INFO - Epoch 1 Step 3010 (Global: 8510): loss=1.4187, ppl=4.13, grad_norm=0.78, lr=9.86e-06, throughput=5665 tok/s +2025-11-28 08:39:38,906 - INFO - Epoch 1 Step 3020 (Global: 8520): loss=1.6613, ppl=5.27, grad_norm=0.77, lr=9.76e-06, throughput=5677 tok/s +2025-11-28 08:41:03,746 - INFO - Epoch 1 Step 3030 (Global: 8530): loss=1.6001, ppl=4.95, grad_norm=0.77, lr=9.67e-06, throughput=5658 tok/s +2025-11-28 08:42:28,458 - INFO - Epoch 1 Step 3040 (Global: 8540): loss=1.5442, ppl=4.68, grad_norm=0.75, lr=9.57e-06, throughput=5666 tok/s +2025-11-28 08:43:52,880 - INFO - Epoch 1 Step 3050 (Global: 8550): loss=1.7491, ppl=5.75, grad_norm=0.80, lr=9.47e-06, throughput=5686 tok/s +2025-11-28 08:45:17,546 - INFO - Epoch 1 Step 3060 (Global: 8560): loss=1.7288, ppl=5.63, grad_norm=0.79, lr=9.37e-06, throughput=5669 tok/s +2025-11-28 08:46:42,322 - INFO - Epoch 1 Step 3070 (Global: 8570): loss=1.8678, ppl=6.47, grad_norm=0.81, lr=9.27e-06, throughput=5662 tok/s +2025-11-28 08:48:07,033 - INFO - Epoch 1 Step 3080 (Global: 8580): loss=1.4425, ppl=4.23, grad_norm=0.76, lr=9.18e-06, throughput=5666 tok/s +2025-11-28 08:49:32,208 - INFO - Epoch 1 Step 3090 (Global: 8590): loss=1.5418, ppl=4.67, grad_norm=0.75, lr=9.08e-06, throughput=5636 tok/s +2025-11-28 08:50:57,288 - INFO - Epoch 1 Step 3100 (Global: 8600): loss=1.5573, ppl=4.75, grad_norm=0.75, lr=8.98e-06, throughput=5642 tok/s +2025-11-28 08:52:22,244 - INFO - Epoch 1 Step 3110 (Global: 8610): loss=1.7945, ppl=6.02, grad_norm=0.81, lr=8.89e-06, throughput=5650 tok/s +2025-11-28 08:53:46,873 - INFO - Epoch 1 Step 3120 (Global: 8620): loss=1.3542, ppl=3.87, grad_norm=0.71, lr=8.79e-06, throughput=5672 tok/s +2025-11-28 08:55:11,854 - INFO - Epoch 1 Step 3130 (Global: 8630): loss=1.2891, ppl=3.63, grad_norm=0.68, lr=8.70e-06, throughput=5648 tok/s +2025-11-28 08:56:36,827 - INFO - Epoch 1 Step 3140 (Global: 8640): loss=1.6717, ppl=5.32, grad_norm=0.77, lr=8.60e-06, throughput=5649 tok/s +2025-11-28 08:58:01,875 - INFO - Epoch 1 Step 3150 (Global: 8650): loss=1.6594, ppl=5.26, grad_norm=0.79, lr=8.51e-06, throughput=5644 tok/s +2025-11-28 08:59:26,478 - INFO - Epoch 1 Step 3160 (Global: 8660): loss=1.6828, ppl=5.38, grad_norm=0.77, lr=8.42e-06, throughput=5674 tok/s +2025-11-28 09:00:51,500 - INFO - Epoch 1 Step 3170 (Global: 8670): loss=1.5444, ppl=4.69, grad_norm=0.75, lr=8.32e-06, throughput=5646 tok/s +2025-11-28 09:02:16,719 - INFO - Epoch 1 Step 3180 (Global: 8680): loss=1.5827, ppl=4.87, grad_norm=0.75, lr=8.23e-06, throughput=5633 tok/s +2025-11-28 09:03:41,870 - INFO - Epoch 1 Step 3190 (Global: 8690): loss=1.6414, ppl=5.16, grad_norm=0.76, lr=8.14e-06, throughput=5637 tok/s +2025-11-28 09:05:06,720 - INFO - Epoch 1 Step 3200 (Global: 8700): loss=1.5945, ppl=4.93, grad_norm=0.76, lr=8.05e-06, throughput=5657 tok/s +2025-11-28 09:06:31,574 - INFO - Epoch 1 Step 3210 (Global: 8710): loss=1.7471, ppl=5.74, grad_norm=0.77, lr=7.96e-06, throughput=5657 tok/s +2025-11-28 09:07:56,438 - INFO - Epoch 1 Step 3220 (Global: 8720): loss=1.7478, ppl=5.74, grad_norm=0.80, lr=7.87e-06, throughput=5656 tok/s +2025-11-28 09:09:21,074 - INFO - Epoch 1 Step 3230 (Global: 8730): loss=1.5348, ppl=4.64, grad_norm=0.76, lr=7.78e-06, throughput=5671 tok/s +2025-11-28 09:10:46,155 - INFO - Epoch 1 Step 3240 (Global: 8740): loss=1.8409, ppl=6.30, grad_norm=0.79, lr=7.69e-06, throughput=5642 tok/s +2025-11-28 09:12:11,053 - INFO - Epoch 1 Step 3250 (Global: 8750): loss=1.5750, ppl=4.83, grad_norm=0.77, lr=7.60e-06, throughput=5654 tok/s +2025-11-28 09:13:36,165 - INFO - Epoch 1 Step 3260 (Global: 8760): loss=1.7165, ppl=5.57, grad_norm=0.76, lr=7.51e-06, throughput=5640 tok/s +2025-11-28 09:15:01,255 - INFO - Epoch 1 Step 3270 (Global: 8770): loss=1.5568, ppl=4.74, grad_norm=0.76, lr=7.42e-06, throughput=5641 tok/s +2025-11-28 09:16:26,541 - INFO - Epoch 1 Step 3280 (Global: 8780): loss=1.8576, ppl=6.41, grad_norm=0.82, lr=7.33e-06, throughput=5628 tok/s +2025-11-28 09:17:51,741 - INFO - Epoch 1 Step 3290 (Global: 8790): loss=1.5820, ppl=4.86, grad_norm=0.74, lr=7.25e-06, throughput=5634 tok/s +2025-11-28 09:19:17,476 - INFO - Epoch 1 Step 3300 (Global: 8800): loss=1.7322, ppl=5.65, grad_norm=0.78, lr=7.16e-06, throughput=5599 tok/s +2025-11-28 09:20:42,685 - INFO - Epoch 1 Step 3310 (Global: 8810): loss=1.4969, ppl=4.47, grad_norm=0.74, lr=7.07e-06, throughput=5633 tok/s +2025-11-28 09:22:08,100 - INFO - Epoch 1 Step 3320 (Global: 8820): loss=1.5077, ppl=4.52, grad_norm=0.74, lr=6.99e-06, throughput=5620 tok/s +2025-11-28 09:23:33,607 - INFO - Epoch 1 Step 3330 (Global: 8830): loss=1.6482, ppl=5.20, grad_norm=0.78, lr=6.90e-06, throughput=5614 tok/s +2025-11-28 09:24:58,724 - INFO - Epoch 1 Step 3340 (Global: 8840): loss=1.4448, ppl=4.24, grad_norm=0.77, lr=6.82e-06, throughput=5639 tok/s +2025-11-28 09:26:23,825 - INFO - Epoch 1 Step 3350 (Global: 8850): loss=1.6326, ppl=5.12, grad_norm=0.77, lr=6.74e-06, throughput=5640 tok/s +2025-11-28 09:27:48,789 - INFO - Epoch 1 Step 3360 (Global: 8860): loss=1.6206, ppl=5.06, grad_norm=0.77, lr=6.65e-06, throughput=5650 tok/s +2025-11-28 09:29:13,564 - INFO - Epoch 1 Step 3370 (Global: 8870): loss=1.6195, ppl=5.05, grad_norm=0.76, lr=6.57e-06, throughput=5662 tok/s +2025-11-28 09:30:38,484 - INFO - Epoch 1 Step 3380 (Global: 8880): loss=1.5038, ppl=4.50, grad_norm=0.74, lr=6.49e-06, throughput=5652 tok/s +2025-11-28 09:32:03,390 - INFO - Epoch 1 Step 3390 (Global: 8890): loss=1.6076, ppl=4.99, grad_norm=0.76, lr=6.40e-06, throughput=5653 tok/s +2025-11-28 09:33:28,084 - INFO - Epoch 1 Step 3400 (Global: 8900): loss=1.6713, ppl=5.32, grad_norm=0.79, lr=6.32e-06, throughput=5668 tok/s +2025-11-28 09:34:52,761 - INFO - Epoch 1 Step 3410 (Global: 8910): loss=1.3674, ppl=3.92, grad_norm=0.69, lr=6.24e-06, throughput=5669 tok/s +2025-11-28 09:36:17,818 - INFO - Epoch 1 Step 3420 (Global: 8920): loss=1.5803, ppl=4.86, grad_norm=0.75, lr=6.16e-06, throughput=5643 tok/s +2025-11-28 09:37:42,515 - INFO - Epoch 1 Step 3430 (Global: 8930): loss=1.6556, ppl=5.24, grad_norm=0.75, lr=6.08e-06, throughput=5667 tok/s +2025-11-28 09:39:07,624 - INFO - Epoch 1 Step 3440 (Global: 8940): loss=1.5519, ppl=4.72, grad_norm=0.74, lr=6.00e-06, throughput=5640 tok/s +2025-11-28 09:40:32,237 - INFO - Epoch 1 Step 3450 (Global: 8950): loss=1.5044, ppl=4.50, grad_norm=0.75, lr=5.92e-06, throughput=5673 tok/s +2025-11-28 09:41:56,606 - INFO - Epoch 1 Step 3460 (Global: 8960): loss=1.5959, ppl=4.93, grad_norm=0.77, lr=5.84e-06, throughput=5689 tok/s +2025-11-28 09:43:20,962 - INFO - Epoch 1 Step 3470 (Global: 8970): loss=1.5297, ppl=4.62, grad_norm=0.75, lr=5.76e-06, throughput=5690 tok/s +2025-11-28 09:44:45,532 - INFO - Epoch 1 Step 3480 (Global: 8980): loss=1.7617, ppl=5.82, grad_norm=0.77, lr=5.68e-06, throughput=5676 tok/s +2025-11-28 09:46:10,045 - INFO - Epoch 1 Step 3490 (Global: 8990): loss=1.7558, ppl=5.79, grad_norm=0.75, lr=5.61e-06, throughput=5680 tok/s +2025-11-28 09:47:34,542 - INFO - Epoch 1 Step 3500 (Global: 9000): loss=1.6662, ppl=5.29, grad_norm=0.75, lr=5.53e-06, throughput=5681 tok/s +2025-11-28 09:47:34,543 - INFO - +Running validation at step 9000... +2025-11-28 09:52:04,330 - INFO - Validation loss: 1.6017, perplexity: 4.96 +2025-11-28 09:52:04,330 - INFO - Qualitative metrics (n=5): +2025-11-28 09:52:04,331 - INFO - BLEU: 0.1560 +2025-11-28 09:52:04,331 - INFO - METEOR: 0.2298 +2025-11-28 09:52:04,331 - INFO - Edit Distance: 0.5558 +2025-11-28 09:52:04,331 - INFO - F-measure: 0.2608 +2025-11-28 09:52:04,331 - INFO - +====================================================================== +2025-11-28 09:52:04,331 - INFO - Qualitative Evaluation Samples: +2025-11-28 09:52:04,331 - INFO - ====================================================================== +2025-11-28 09:52:04,331 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 09:52:04,331 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 09:52:04,332 - INFO - Generated: ' to the band\'s previous work, saying that "the album is a little more subdued, but it\'s still a lot of fun." In a review for The Boston Globe, critic David Browne said that "the band\'s best work is on...' +2025-11-28 09:52:04,332 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 09:52:04,332 - INFO - ---------------------------------------------------------------------- +2025-11-28 09:52:04,332 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 09:52:04,332 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 09:52:04,332 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was the first fraternal organization in the United States to be founded by a Native American. The Order of Angell was founded in 1921 by...' +2025-11-28 09:52:04,332 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 09:52:04,332 - INFO - ---------------------------------------------------------------------- +2025-11-28 09:52:04,332 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 09:52:04,332 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 09:52:04,332 - INFO - Generated: ' be killed by Oga. They are later defeated by the Red Tails and the Six Knights, and are later killed by the Red Tails.\nKiriya\nVoiced by: Yūki Kaji\nKiriya is the second leader of the Red Tails. He is ...' +2025-11-28 09:52:04,333 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 09:52:04,333 - INFO - ---------------------------------------------------------------------- +2025-11-28 09:52:04,333 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 09:52:04,333 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 09:52:04,333 - INFO - Generated: '-31-1991 | L2/91-202 | ISO/IEC 10646-1:1991 ...' +2025-11-28 09:52:04,333 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 09:52:04,333 - INFO - ---------------------------------------------------------------------- +2025-11-28 09:52:04,333 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 09:52:04,333 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 09:52:04,333 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 09:52:04,334 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 09:52:04,334 - INFO - ---------------------------------------------------------------------- +2025-11-28 09:52:04,334 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_9000.jsonl +2025-11-28 09:52:32,807 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 09:52:32,812 - INFO - New best validation loss: 1.6017, perplexity: 4.96 +2025-11-28 09:53:57,274 - INFO - Epoch 1 Step 3510 (Global: 9010): loss=1.7479, ppl=5.74, grad_norm=0.80, lr=5.45e-06, throughput=5684 tok/s +2025-11-28 09:55:21,851 - INFO - Epoch 1 Step 3520 (Global: 9020): loss=1.4919, ppl=4.45, grad_norm=0.73, lr=5.38e-06, throughput=5675 tok/s +2025-11-28 09:56:46,062 - INFO - Epoch 1 Step 3530 (Global: 9030): loss=1.4703, ppl=4.35, grad_norm=0.79, lr=5.30e-06, throughput=5700 tok/s +2025-11-28 09:58:10,467 - INFO - Epoch 1 Step 3540 (Global: 9040): loss=1.6436, ppl=5.17, grad_norm=0.73, lr=5.23e-06, throughput=5687 tok/s +2025-11-28 09:59:35,180 - INFO - Epoch 1 Step 3550 (Global: 9050): loss=1.4273, ppl=4.17, grad_norm=0.72, lr=5.15e-06, throughput=5666 tok/s +2025-11-28 10:00:59,542 - INFO - Epoch 1 Step 3560 (Global: 9060): loss=1.5043, ppl=4.50, grad_norm=0.74, lr=5.08e-06, throughput=5690 tok/s +2025-11-28 10:02:24,256 - INFO - Epoch 1 Step 3570 (Global: 9070): loss=1.4290, ppl=4.17, grad_norm=0.74, lr=5.01e-06, throughput=5666 tok/s +2025-11-28 10:03:48,772 - INFO - Epoch 1 Step 3580 (Global: 9080): loss=1.6169, ppl=5.04, grad_norm=0.78, lr=4.93e-06, throughput=5679 tok/s +2025-11-28 10:05:13,312 - INFO - Epoch 1 Step 3590 (Global: 9090): loss=1.8172, ppl=6.15, grad_norm=0.82, lr=4.86e-06, throughput=5678 tok/s +2025-11-28 10:06:38,019 - INFO - Epoch 1 Step 3600 (Global: 9100): loss=1.7927, ppl=6.01, grad_norm=0.79, lr=4.79e-06, throughput=5667 tok/s +2025-11-28 10:08:02,711 - INFO - Epoch 1 Step 3610 (Global: 9110): loss=1.5276, ppl=4.61, grad_norm=0.75, lr=4.72e-06, throughput=5668 tok/s +2025-11-28 10:09:27,681 - INFO - Epoch 1 Step 3620 (Global: 9120): loss=1.5525, ppl=4.72, grad_norm=0.73, lr=4.65e-06, throughput=5649 tok/s +2025-11-28 10:10:52,907 - INFO - Epoch 1 Step 3630 (Global: 9130): loss=1.5363, ppl=4.65, grad_norm=0.76, lr=4.58e-06, throughput=5632 tok/s +2025-11-28 10:12:17,812 - INFO - Epoch 1 Step 3640 (Global: 9140): loss=1.6751, ppl=5.34, grad_norm=0.79, lr=4.51e-06, throughput=5653 tok/s +2025-11-28 10:13:42,865 - INFO - Epoch 1 Step 3650 (Global: 9150): loss=1.6145, ppl=5.03, grad_norm=0.76, lr=4.44e-06, throughput=5644 tok/s +2025-11-28 10:15:07,655 - INFO - Epoch 1 Step 3660 (Global: 9160): loss=1.9156, ppl=6.79, grad_norm=0.83, lr=4.37e-06, throughput=5661 tok/s +2025-11-28 10:16:32,639 - INFO - Epoch 1 Step 3670 (Global: 9170): loss=1.5918, ppl=4.91, grad_norm=0.78, lr=4.30e-06, throughput=5648 tok/s +2025-11-28 10:17:57,312 - INFO - Epoch 1 Step 3680 (Global: 9180): loss=1.5087, ppl=4.52, grad_norm=0.73, lr=4.23e-06, throughput=5669 tok/s +2025-11-28 10:19:22,371 - INFO - Epoch 1 Step 3690 (Global: 9190): loss=1.5716, ppl=4.81, grad_norm=0.74, lr=4.17e-06, throughput=5643 tok/s +2025-11-28 10:20:46,836 - INFO - Epoch 1 Step 3700 (Global: 9200): loss=1.6374, ppl=5.14, grad_norm=0.75, lr=4.10e-06, throughput=5683 tok/s +2025-11-28 10:22:12,096 - INFO - Epoch 1 Step 3710 (Global: 9210): loss=1.6932, ppl=5.44, grad_norm=0.76, lr=4.03e-06, throughput=5630 tok/s +2025-11-28 10:23:36,841 - INFO - Epoch 1 Step 3720 (Global: 9220): loss=1.6426, ppl=5.17, grad_norm=0.76, lr=3.97e-06, throughput=5664 tok/s +2025-11-28 10:25:01,936 - INFO - Epoch 1 Step 3730 (Global: 9230): loss=1.6891, ppl=5.41, grad_norm=0.79, lr=3.90e-06, throughput=5641 tok/s +2025-11-28 10:26:26,587 - INFO - Epoch 1 Step 3740 (Global: 9240): loss=1.5904, ppl=4.91, grad_norm=0.76, lr=3.84e-06, throughput=5670 tok/s +2025-11-28 10:27:51,076 - INFO - Epoch 1 Step 3750 (Global: 9250): loss=1.5901, ppl=4.90, grad_norm=0.79, lr=3.77e-06, throughput=5681 tok/s +2025-11-28 10:29:15,853 - INFO - Epoch 1 Step 3760 (Global: 9260): loss=1.4075, ppl=4.09, grad_norm=0.73, lr=3.71e-06, throughput=5662 tok/s +2025-11-28 10:30:40,305 - INFO - Epoch 1 Step 3770 (Global: 9270): loss=1.8265, ppl=6.21, grad_norm=0.81, lr=3.65e-06, throughput=5684 tok/s +2025-11-28 10:32:04,740 - INFO - Epoch 1 Step 3780 (Global: 9280): loss=1.4950, ppl=4.46, grad_norm=0.74, lr=3.58e-06, throughput=5685 tok/s +2025-11-28 10:33:29,295 - INFO - Epoch 1 Step 3790 (Global: 9290): loss=1.5474, ppl=4.70, grad_norm=0.74, lr=3.52e-06, throughput=5677 tok/s +2025-11-28 10:34:53,727 - INFO - Epoch 1 Step 3800 (Global: 9300): loss=1.6580, ppl=5.25, grad_norm=0.77, lr=3.46e-06, throughput=5685 tok/s +2025-11-28 10:36:18,012 - INFO - Epoch 1 Step 3810 (Global: 9310): loss=1.6311, ppl=5.11, grad_norm=0.79, lr=3.40e-06, throughput=5695 tok/s +2025-11-28 10:37:42,467 - INFO - Epoch 1 Step 3820 (Global: 9320): loss=1.4414, ppl=4.23, grad_norm=0.73, lr=3.34e-06, throughput=5684 tok/s +2025-11-28 10:39:06,783 - INFO - Epoch 1 Step 3830 (Global: 9330): loss=1.5719, ppl=4.82, grad_norm=0.75, lr=3.28e-06, throughput=5693 tok/s +2025-11-28 10:40:31,423 - INFO - Epoch 1 Step 3840 (Global: 9340): loss=1.6969, ppl=5.46, grad_norm=0.79, lr=3.22e-06, throughput=5671 tok/s +2025-11-28 10:41:55,826 - INFO - Epoch 1 Step 3850 (Global: 9350): loss=1.7189, ppl=5.58, grad_norm=0.79, lr=3.16e-06, throughput=5687 tok/s +2025-11-28 10:43:20,215 - INFO - Epoch 1 Step 3860 (Global: 9360): loss=1.6230, ppl=5.07, grad_norm=0.75, lr=3.10e-06, throughput=5688 tok/s +2025-11-28 10:44:44,551 - INFO - Epoch 1 Step 3870 (Global: 9370): loss=1.7922, ppl=6.00, grad_norm=0.84, lr=3.05e-06, throughput=5692 tok/s +2025-11-28 10:46:08,992 - INFO - Epoch 1 Step 3880 (Global: 9380): loss=1.2449, ppl=3.47, grad_norm=0.71, lr=2.99e-06, throughput=5685 tok/s +2025-11-28 10:47:33,221 - INFO - Epoch 1 Step 3890 (Global: 9390): loss=1.7139, ppl=5.55, grad_norm=0.75, lr=2.93e-06, throughput=5699 tok/s +2025-11-28 10:48:57,814 - INFO - Epoch 1 Step 3900 (Global: 9400): loss=1.3910, ppl=4.02, grad_norm=0.70, lr=2.88e-06, throughput=5674 tok/s +2025-11-28 10:50:22,480 - INFO - Epoch 1 Step 3910 (Global: 9410): loss=1.4412, ppl=4.23, grad_norm=0.72, lr=2.82e-06, throughput=5669 tok/s +2025-11-28 10:51:46,726 - INFO - Epoch 1 Step 3920 (Global: 9420): loss=1.8589, ppl=6.42, grad_norm=0.80, lr=2.76e-06, throughput=5698 tok/s +2025-11-28 10:53:11,110 - INFO - Epoch 1 Step 3930 (Global: 9430): loss=1.4295, ppl=4.18, grad_norm=0.74, lr=2.71e-06, throughput=5688 tok/s +2025-11-28 10:54:35,693 - INFO - Epoch 1 Step 3940 (Global: 9440): loss=1.6040, ppl=4.97, grad_norm=0.76, lr=2.66e-06, throughput=5675 tok/s +2025-11-28 10:56:00,085 - INFO - Epoch 1 Step 3950 (Global: 9450): loss=1.6071, ppl=4.99, grad_norm=0.77, lr=2.60e-06, throughput=5688 tok/s +2025-11-28 10:57:24,539 - INFO - Epoch 1 Step 3960 (Global: 9460): loss=1.6609, ppl=5.26, grad_norm=0.77, lr=2.55e-06, throughput=5684 tok/s +2025-11-28 10:58:49,001 - INFO - Epoch 1 Step 3970 (Global: 9470): loss=1.6574, ppl=5.25, grad_norm=0.79, lr=2.50e-06, throughput=5683 tok/s +2025-11-28 11:00:13,878 - INFO - Epoch 1 Step 3980 (Global: 9480): loss=1.3884, ppl=4.01, grad_norm=0.73, lr=2.44e-06, throughput=5655 tok/s +2025-11-28 11:01:38,663 - INFO - Epoch 1 Step 3990 (Global: 9490): loss=1.4893, ppl=4.43, grad_norm=0.75, lr=2.39e-06, throughput=5661 tok/s +2025-11-28 11:03:03,164 - INFO - Epoch 1 Step 4000 (Global: 9500): loss=1.3582, ppl=3.89, grad_norm=0.71, lr=2.34e-06, throughput=5680 tok/s +2025-11-28 11:03:03,165 - INFO - +Running validation at step 9500... +2025-11-28 11:07:35,899 - INFO - Validation loss: 1.6014, perplexity: 4.96 +2025-11-28 11:07:35,899 - INFO - Qualitative metrics (n=5): +2025-11-28 11:07:35,899 - INFO - BLEU: 0.1770 +2025-11-28 11:07:35,900 - INFO - METEOR: 0.2487 +2025-11-28 11:07:35,900 - INFO - Edit Distance: 0.6014 +2025-11-28 11:07:35,900 - INFO - F-measure: 0.2794 +2025-11-28 11:07:35,900 - INFO - +====================================================================== +2025-11-28 11:07:35,900 - INFO - Qualitative Evaluation Samples: +2025-11-28 11:07:35,900 - INFO - ====================================================================== +2025-11-28 11:07:35,900 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 11:07:35,900 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 11:07:35,900 - INFO - Generated: ' to the band\'s previous work, saying that "the album is a little more subdued, but it\'s still a lot of fun." In a review for The A.V. Club, Fitzmaurice gave the album a B and said that "the band\'s bes...' +2025-11-28 11:07:35,900 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 11:07:35,900 - INFO - ---------------------------------------------------------------------- +2025-11-28 11:07:35,900 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 11:07:35,900 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 11:07:35,901 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was the first fraternal organization in the United States to be founded by a Native American. The Order of Angell was founded in 1921 by...' +2025-11-28 11:07:35,901 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 11:07:35,901 - INFO - ---------------------------------------------------------------------- +2025-11-28 11:07:35,901 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 11:07:35,901 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 11:07:35,901 - INFO - Generated: ' be killed by Oga. They are later defeated by the Red Tails and the Six Knights, and are later killed by the Red Tails.\nKiriya\nVoiced by: Yūki Kaji\nKiriya is the second leader of the Red Tails. He is ...' +2025-11-28 11:07:35,901 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 11:07:35,901 - INFO - ---------------------------------------------------------------------- +2025-11-28 11:07:35,901 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 11:07:35,901 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 11:07:35,901 - INFO - Generated: '-31-1991 | L2/91-202 | ISO/IEC 10646-1:1991 ...' +2025-11-28 11:07:35,902 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 11:07:35,902 - INFO - ---------------------------------------------------------------------- +2025-11-28 11:07:35,902 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 11:07:35,902 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 11:07:35,902 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 11:07:35,902 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 11:07:35,902 - INFO - ---------------------------------------------------------------------- +2025-11-28 11:07:35,903 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_9500.jsonl +2025-11-28 11:08:03,885 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 11:08:03,891 - INFO - New best validation loss: 1.6014, perplexity: 4.96 +2025-11-28 11:09:28,869 - INFO - Epoch 1 Step 4010 (Global: 9510): loss=1.5133, ppl=4.54, grad_norm=0.73, lr=2.29e-06, throughput=5649 tok/s +2025-11-28 11:10:53,630 - INFO - Epoch 1 Step 4020 (Global: 9520): loss=1.4090, ppl=4.09, grad_norm=0.70, lr=2.24e-06, throughput=5663 tok/s +2025-11-28 11:12:18,564 - INFO - Epoch 1 Step 4030 (Global: 9530): loss=1.6628, ppl=5.27, grad_norm=0.78, lr=2.19e-06, throughput=5652 tok/s +2025-11-28 11:13:43,182 - INFO - Epoch 1 Step 4040 (Global: 9540): loss=1.6354, ppl=5.13, grad_norm=0.77, lr=2.14e-06, throughput=5673 tok/s +2025-11-28 11:15:08,195 - INFO - Epoch 1 Step 4050 (Global: 9550): loss=1.6688, ppl=5.31, grad_norm=0.78, lr=2.10e-06, throughput=5646 tok/s +2025-11-28 11:16:33,233 - INFO - Epoch 1 Step 4060 (Global: 9560): loss=1.7151, ppl=5.56, grad_norm=0.80, lr=2.05e-06, throughput=5645 tok/s +2025-11-28 11:17:58,276 - INFO - Epoch 1 Step 4070 (Global: 9570): loss=1.4685, ppl=4.34, grad_norm=0.72, lr=2.00e-06, throughput=5644 tok/s +2025-11-28 11:19:23,232 - INFO - Epoch 1 Step 4080 (Global: 9580): loss=1.6856, ppl=5.40, grad_norm=0.74, lr=1.95e-06, throughput=5650 tok/s +2025-11-28 11:20:48,481 - INFO - Epoch 1 Step 4090 (Global: 9590): loss=1.7986, ppl=6.04, grad_norm=0.80, lr=1.91e-06, throughput=5631 tok/s +2025-11-28 11:22:13,882 - INFO - Epoch 1 Step 4100 (Global: 9600): loss=1.3607, ppl=3.90, grad_norm=0.72, lr=1.86e-06, throughput=5621 tok/s +2025-11-28 11:23:39,090 - INFO - Epoch 1 Step 4110 (Global: 9610): loss=1.5714, ppl=4.81, grad_norm=0.75, lr=1.82e-06, throughput=5633 tok/s +2025-11-28 11:25:04,228 - INFO - Epoch 1 Step 4120 (Global: 9620): loss=1.3840, ppl=3.99, grad_norm=0.80, lr=1.77e-06, throughput=5638 tok/s +2025-11-28 11:26:29,629 - INFO - Epoch 1 Step 4130 (Global: 9630): loss=1.4596, ppl=4.30, grad_norm=0.73, lr=1.73e-06, throughput=5621 tok/s +2025-11-28 11:27:54,410 - INFO - Epoch 1 Step 4140 (Global: 9640): loss=1.6271, ppl=5.09, grad_norm=0.79, lr=1.68e-06, throughput=5662 tok/s +2025-11-28 11:29:19,365 - INFO - Epoch 1 Step 4150 (Global: 9650): loss=1.6344, ppl=5.13, grad_norm=0.77, lr=1.64e-06, throughput=5650 tok/s +2025-11-28 11:30:44,103 - INFO - Epoch 1 Step 4160 (Global: 9660): loss=1.4925, ppl=4.45, grad_norm=0.74, lr=1.60e-06, throughput=5665 tok/s +2025-11-28 11:32:08,642 - INFO - Epoch 1 Step 4170 (Global: 9670): loss=1.4936, ppl=4.45, grad_norm=0.74, lr=1.56e-06, throughput=5678 tok/s +2025-11-28 11:33:33,206 - INFO - Epoch 1 Step 4180 (Global: 9680): loss=1.5287, ppl=4.61, grad_norm=0.80, lr=1.52e-06, throughput=5676 tok/s +2025-11-28 11:34:59,858 - INFO - Epoch 1 Step 4190 (Global: 9690): loss=1.7590, ppl=5.81, grad_norm=0.78, lr=1.48e-06, throughput=5539 tok/s +2025-11-28 11:36:26,726 - INFO - Epoch 1 Step 4200 (Global: 9700): loss=1.5380, ppl=4.66, grad_norm=0.76, lr=1.44e-06, throughput=5526 tok/s +2025-11-28 11:37:54,809 - INFO - Epoch 1 Step 4210 (Global: 9710): loss=1.4023, ppl=4.06, grad_norm=0.71, lr=1.40e-06, throughput=5449 tok/s +2025-11-28 11:39:20,190 - INFO - Epoch 1 Step 4220 (Global: 9720): loss=1.5357, ppl=4.64, grad_norm=0.72, lr=1.36e-06, throughput=5622 tok/s +2025-11-28 11:40:44,666 - INFO - Epoch 1 Step 4230 (Global: 9730): loss=1.4111, ppl=4.10, grad_norm=0.73, lr=1.32e-06, throughput=5682 tok/s +2025-11-28 11:42:09,017 - INFO - Epoch 1 Step 4240 (Global: 9740): loss=1.6822, ppl=5.38, grad_norm=0.74, lr=1.28e-06, throughput=5691 tok/s +2025-11-28 11:43:33,676 - INFO - Epoch 1 Step 4250 (Global: 9750): loss=1.8612, ppl=6.43, grad_norm=0.80, lr=1.24e-06, throughput=5670 tok/s +2025-11-28 11:44:59,386 - INFO - Epoch 1 Step 4260 (Global: 9760): loss=1.5981, ppl=4.94, grad_norm=0.76, lr=1.21e-06, throughput=5600 tok/s +2025-11-28 11:46:25,559 - INFO - Epoch 1 Step 4270 (Global: 9770): loss=1.7516, ppl=5.76, grad_norm=0.82, lr=1.17e-06, throughput=5570 tok/s +2025-11-28 11:47:50,748 - INFO - Epoch 1 Step 4280 (Global: 9780): loss=1.5829, ppl=4.87, grad_norm=0.79, lr=1.13e-06, throughput=5635 tok/s +2025-11-28 11:49:15,326 - INFO - Epoch 1 Step 4290 (Global: 9790): loss=1.4124, ppl=4.11, grad_norm=0.73, lr=1.10e-06, throughput=5675 tok/s +2025-11-28 11:50:40,050 - INFO - Epoch 1 Step 4300 (Global: 9800): loss=1.4743, ppl=4.37, grad_norm=0.71, lr=1.06e-06, throughput=5666 tok/s +2025-11-28 11:52:04,760 - INFO - Epoch 1 Step 4310 (Global: 9810): loss=1.6973, ppl=5.46, grad_norm=0.84, lr=1.03e-06, throughput=5666 tok/s +2025-11-28 11:53:29,475 - INFO - Epoch 1 Step 4320 (Global: 9820): loss=1.5389, ppl=4.66, grad_norm=0.75, lr=9.97e-07, throughput=5666 tok/s +2025-11-28 11:54:54,418 - INFO - Epoch 1 Step 4330 (Global: 9830): loss=1.3710, ppl=3.94, grad_norm=0.72, lr=9.64e-07, throughput=5651 tok/s +2025-11-28 11:56:19,266 - INFO - Epoch 1 Step 4340 (Global: 9840): loss=1.6642, ppl=5.28, grad_norm=0.75, lr=9.32e-07, throughput=5657 tok/s +2025-11-28 11:57:43,871 - INFO - Epoch 1 Step 4350 (Global: 9850): loss=1.5470, ppl=4.70, grad_norm=0.74, lr=9.00e-07, throughput=5674 tok/s +2025-11-28 11:59:08,601 - INFO - Epoch 1 Step 4360 (Global: 9860): loss=1.6789, ppl=5.36, grad_norm=0.78, lr=8.68e-07, throughput=5665 tok/s +2025-11-28 12:00:33,877 - INFO - Epoch 1 Step 4370 (Global: 9870): loss=1.3344, ppl=3.80, grad_norm=0.70, lr=8.37e-07, throughput=5629 tok/s +2025-11-28 12:01:58,512 - INFO - Epoch 1 Step 4380 (Global: 9880): loss=1.6765, ppl=5.35, grad_norm=0.79, lr=8.07e-07, throughput=5671 tok/s +2025-11-28 12:03:23,161 - INFO - Epoch 1 Step 4390 (Global: 9890): loss=1.5256, ppl=4.60, grad_norm=0.78, lr=7.77e-07, throughput=5671 tok/s +2025-11-28 12:04:47,792 - INFO - Epoch 1 Step 4400 (Global: 9900): loss=1.4876, ppl=4.43, grad_norm=0.72, lr=7.48e-07, throughput=5672 tok/s +2025-11-28 12:06:12,703 - INFO - Epoch 1 Step 4410 (Global: 9910): loss=1.4796, ppl=4.39, grad_norm=0.76, lr=7.20e-07, throughput=5653 tok/s +2025-11-28 12:07:37,362 - INFO - Epoch 1 Step 4420 (Global: 9920): loss=1.7554, ppl=5.79, grad_norm=0.77, lr=6.92e-07, throughput=5670 tok/s +2025-11-28 12:09:01,964 - INFO - Epoch 1 Step 4430 (Global: 9930): loss=1.6160, ppl=5.03, grad_norm=0.73, lr=6.64e-07, throughput=5674 tok/s +2025-11-28 12:10:26,735 - INFO - Epoch 1 Step 4440 (Global: 9940): loss=1.5442, ppl=4.68, grad_norm=0.77, lr=6.37e-07, throughput=5662 tok/s +2025-11-28 12:11:51,662 - INFO - Epoch 1 Step 4450 (Global: 9950): loss=1.5761, ppl=4.84, grad_norm=0.77, lr=6.11e-07, throughput=5652 tok/s +2025-11-28 12:13:16,908 - INFO - Epoch 1 Step 4460 (Global: 9960): loss=1.4371, ppl=4.21, grad_norm=0.75, lr=5.85e-07, throughput=5631 tok/s +2025-11-28 12:14:41,729 - INFO - Epoch 1 Step 4470 (Global: 9970): loss=1.5296, ppl=4.62, grad_norm=0.74, lr=5.60e-07, throughput=5659 tok/s +2025-11-28 12:16:06,524 - INFO - Epoch 1 Step 4480 (Global: 9980): loss=1.6002, ppl=4.95, grad_norm=0.77, lr=5.35e-07, throughput=5661 tok/s +2025-11-28 12:17:31,457 - INFO - Epoch 1 Step 4490 (Global: 9990): loss=1.5823, ppl=4.87, grad_norm=0.80, lr=5.11e-07, throughput=5652 tok/s +2025-11-28 12:18:56,396 - INFO - Epoch 1 Step 4500 (Global: 10000): loss=1.5347, ppl=4.64, grad_norm=0.75, lr=4.87e-07, throughput=5651 tok/s +2025-11-28 12:18:56,397 - INFO - +Running validation at step 10000... +2025-11-28 12:23:33,088 - INFO - Validation loss: 1.6014, perplexity: 4.96 +2025-11-28 12:23:33,088 - INFO - Qualitative metrics (n=5): +2025-11-28 12:23:33,089 - INFO - BLEU: 0.1702 +2025-11-28 12:23:33,089 - INFO - METEOR: 0.2380 +2025-11-28 12:23:33,089 - INFO - Edit Distance: 0.5967 +2025-11-28 12:23:33,089 - INFO - F-measure: 0.2747 +2025-11-28 12:23:33,089 - INFO - +====================================================================== +2025-11-28 12:23:33,089 - INFO - Qualitative Evaluation Samples: +2025-11-28 12:23:33,089 - INFO - ====================================================================== +2025-11-28 12:23:33,089 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 12:23:33,089 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 12:23:33,089 - INFO - Generated: ' to the band\'s previous work, saying that "the album is a little more subdued, but it\'s still a lot of fun." In a review for The A.V. Club, Fitzmaurice said that "the album is a little more subdued, b...' +2025-11-28 12:23:33,090 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 12:23:33,090 - INFO - ---------------------------------------------------------------------- +2025-11-28 12:23:33,090 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 12:23:33,090 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 12:23:33,090 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was the first fraternal organization in the United States to be founded by a Native American. The Order of Angell was founded in 1921 by...' +2025-11-28 12:23:33,090 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 12:23:33,090 - INFO - ---------------------------------------------------------------------- +2025-11-28 12:23:33,090 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 12:23:33,090 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 12:23:33,090 - INFO - Generated: ' be killed by Oga. They are later defeated by the Red Tails and the Six Knights, and are later killed by the Red Tails.\nKiriya\nVoiced by: Yūki Kaji\nKiriya is the second leader of the Red Tails. He is ...' +2025-11-28 12:23:33,091 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 12:23:33,091 - INFO - ---------------------------------------------------------------------- +2025-11-28 12:23:33,091 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 12:23:33,091 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 12:23:33,091 - INFO - Generated: '-31-1991 | L2/91-202 | ISO/IEC 10646-1:1991 ...' +2025-11-28 12:23:33,091 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 12:23:33,092 - INFO - ---------------------------------------------------------------------- +2025-11-28 12:23:33,092 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 12:23:33,092 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 12:23:33,092 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 12:23:33,092 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 12:23:33,092 - INFO - ---------------------------------------------------------------------- +2025-11-28 12:23:33,093 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_10000.jsonl +2025-11-28 12:24:05,787 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/best_checkpoint.pt +2025-11-28 12:24:05,801 - INFO - New best validation loss: 1.6014, perplexity: 4.96 +2025-11-28 12:25:33,930 - INFO - Epoch 1 Step 4510 (Global: 10010): loss=1.4145, ppl=4.11, grad_norm=0.73, lr=4.64e-07, throughput=5448 tok/s +2025-11-28 12:27:01,727 - INFO - Epoch 1 Step 4520 (Global: 10020): loss=1.5182, ppl=4.56, grad_norm=0.76, lr=4.42e-07, throughput=5467 tok/s +2025-11-28 12:28:26,769 - INFO - Epoch 1 Step 4530 (Global: 10030): loss=1.6123, ppl=5.01, grad_norm=0.77, lr=4.20e-07, throughput=5644 tok/s +2025-11-28 12:29:51,452 - INFO - Epoch 1 Step 4540 (Global: 10040): loss=1.7191, ppl=5.58, grad_norm=0.76, lr=3.98e-07, throughput=5668 tok/s +2025-11-28 12:31:16,234 - INFO - Epoch 1 Step 4550 (Global: 10050): loss=1.7749, ppl=5.90, grad_norm=0.78, lr=3.78e-07, throughput=5662 tok/s +2025-11-28 12:32:41,050 - INFO - Epoch 1 Step 4560 (Global: 10060): loss=1.4756, ppl=4.37, grad_norm=0.76, lr=3.57e-07, throughput=5659 tok/s +2025-11-28 12:34:06,283 - INFO - Epoch 1 Step 4570 (Global: 10070): loss=1.8277, ppl=6.22, grad_norm=0.78, lr=3.38e-07, throughput=5632 tok/s +2025-11-28 12:35:31,034 - INFO - Epoch 1 Step 4580 (Global: 10080): loss=1.6386, ppl=5.15, grad_norm=0.79, lr=3.18e-07, throughput=5664 tok/s +2025-11-28 12:36:56,525 - INFO - Epoch 1 Step 4590 (Global: 10090): loss=1.6154, ppl=5.03, grad_norm=0.76, lr=3.00e-07, throughput=5615 tok/s +2025-11-28 12:38:21,776 - INFO - Epoch 1 Step 4600 (Global: 10100): loss=1.5086, ppl=4.52, grad_norm=0.75, lr=2.82e-07, throughput=5631 tok/s +2025-11-28 12:39:46,991 - INFO - Epoch 1 Step 4610 (Global: 10110): loss=1.7789, ppl=5.92, grad_norm=0.77, lr=2.64e-07, throughput=5633 tok/s +2025-11-28 12:41:12,044 - INFO - Epoch 1 Step 4620 (Global: 10120): loss=1.4558, ppl=4.29, grad_norm=0.73, lr=2.47e-07, throughput=5644 tok/s +2025-11-28 12:42:36,795 - INFO - Epoch 1 Step 4630 (Global: 10130): loss=1.7452, ppl=5.73, grad_norm=0.78, lr=2.31e-07, throughput=5664 tok/s +2025-11-28 12:44:01,388 - INFO - Epoch 1 Step 4640 (Global: 10140): loss=1.6046, ppl=4.98, grad_norm=0.75, lr=2.15e-07, throughput=5674 tok/s +2025-11-28 12:45:26,452 - INFO - Epoch 1 Step 4650 (Global: 10150): loss=1.6884, ppl=5.41, grad_norm=0.84, lr=2.00e-07, throughput=5643 tok/s +2025-11-28 12:46:52,079 - INFO - Epoch 1 Step 4660 (Global: 10160): loss=1.7378, ppl=5.68, grad_norm=0.77, lr=1.85e-07, throughput=5606 tok/s +2025-11-28 12:48:17,023 - INFO - Epoch 1 Step 4670 (Global: 10170): loss=1.5648, ppl=4.78, grad_norm=0.77, lr=1.71e-07, throughput=5651 tok/s +2025-11-28 12:49:41,955 - INFO - Epoch 1 Step 4680 (Global: 10180): loss=1.5089, ppl=4.52, grad_norm=0.76, lr=1.58e-07, throughput=5652 tok/s +2025-11-28 12:51:06,575 - INFO - Epoch 1 Step 4690 (Global: 10190): loss=1.3930, ppl=4.03, grad_norm=0.70, lr=1.45e-07, throughput=5672 tok/s +2025-11-28 12:52:31,503 - INFO - Epoch 1 Step 4700 (Global: 10200): loss=1.6868, ppl=5.40, grad_norm=0.75, lr=1.32e-07, throughput=5652 tok/s +2025-11-28 12:53:56,368 - INFO - Epoch 1 Step 4710 (Global: 10210): loss=1.6864, ppl=5.40, grad_norm=0.85, lr=1.20e-07, throughput=5656 tok/s +2025-11-28 12:55:21,374 - INFO - Epoch 1 Step 4720 (Global: 10220): loss=1.6572, ppl=5.24, grad_norm=0.75, lr=1.09e-07, throughput=5647 tok/s +2025-11-28 12:56:46,826 - INFO - Epoch 1 Step 4730 (Global: 10230): loss=1.5832, ppl=4.87, grad_norm=0.77, lr=9.81e-08, throughput=5617 tok/s +2025-11-28 12:58:12,268 - INFO - Epoch 1 Step 4740 (Global: 10240): loss=1.5179, ppl=4.56, grad_norm=0.73, lr=8.79e-08, throughput=5618 tok/s +2025-11-28 12:59:37,177 - INFO - Epoch 1 Step 4750 (Global: 10250): loss=1.5245, ppl=4.59, grad_norm=0.77, lr=7.83e-08, throughput=5653 tok/s +2025-11-28 13:01:01,939 - INFO - Epoch 1 Step 4760 (Global: 10260): loss=1.6453, ppl=5.18, grad_norm=0.77, lr=6.92e-08, throughput=5663 tok/s +2025-11-28 13:02:26,866 - INFO - Epoch 1 Step 4770 (Global: 10270): loss=1.4429, ppl=4.23, grad_norm=0.73, lr=6.06e-08, throughput=5652 tok/s +2025-11-28 13:03:51,658 - INFO - Epoch 1 Step 4780 (Global: 10280): loss=1.5380, ppl=4.66, grad_norm=0.75, lr=5.27e-08, throughput=5661 tok/s +2025-11-28 13:05:16,308 - INFO - Epoch 1 Step 4790 (Global: 10290): loss=1.4771, ppl=4.38, grad_norm=0.78, lr=4.53e-08, throughput=5670 tok/s +2025-11-28 13:06:40,982 - INFO - Epoch 1 Step 4800 (Global: 10300): loss=1.5862, ppl=4.88, grad_norm=0.78, lr=3.84e-08, throughput=5669 tok/s +2025-11-28 13:08:05,603 - INFO - Epoch 1 Step 4810 (Global: 10310): loss=1.4942, ppl=4.46, grad_norm=0.76, lr=3.21e-08, throughput=5672 tok/s +2025-11-28 13:09:30,432 - INFO - Epoch 1 Step 4820 (Global: 10320): loss=1.6741, ppl=5.33, grad_norm=0.80, lr=2.64e-08, throughput=5658 tok/s +2025-11-28 13:10:55,359 - INFO - Epoch 1 Step 4830 (Global: 10330): loss=1.5357, ppl=4.64, grad_norm=0.75, lr=2.12e-08, throughput=5652 tok/s +2025-11-28 13:12:19,976 - INFO - Epoch 1 Step 4840 (Global: 10340): loss=1.6457, ppl=5.18, grad_norm=0.76, lr=1.66e-08, throughput=5673 tok/s +2025-11-28 13:13:45,223 - INFO - Epoch 1 Step 4850 (Global: 10350): loss=1.7993, ppl=6.05, grad_norm=0.79, lr=1.26e-08, throughput=5631 tok/s +2025-11-28 13:15:10,285 - INFO - Epoch 1 Step 4860 (Global: 10360): loss=1.5727, ppl=4.82, grad_norm=0.73, lr=9.12e-09, throughput=5643 tok/s +2025-11-28 13:16:35,260 - INFO - Epoch 1 Step 4870 (Global: 10370): loss=1.4821, ppl=4.40, grad_norm=0.78, lr=6.20e-09, throughput=5649 tok/s +2025-11-28 13:18:00,115 - INFO - Epoch 1 Step 4880 (Global: 10380): loss=1.5294, ppl=4.62, grad_norm=0.74, lr=3.84e-09, throughput=5657 tok/s +2025-11-28 13:19:24,691 - INFO - Epoch 1 Step 4890 (Global: 10390): loss=1.6053, ppl=4.98, grad_norm=0.73, lr=2.05e-09, throughput=5675 tok/s +2025-11-28 13:20:49,860 - INFO - Epoch 1 Step 4900 (Global: 10400): loss=1.5075, ppl=4.52, grad_norm=0.73, lr=8.11e-10, throughput=5636 tok/s +2025-11-28 13:22:14,705 - INFO - Epoch 1 Step 4910 (Global: 10410): loss=1.7548, ppl=5.78, grad_norm=0.80, lr=1.38e-10, throughput=5657 tok/s +2025-11-28 13:23:11,460 - INFO - Flushing 3 remainder batches from gradient accumulation +2025-11-28 13:23:11,461 - INFO - Rescaling gradients by 1.33x (compensating for 3/4 batches) +2025-11-28 13:23:11,734 - INFO - Remainder batch: loss=1.8518, ppl=6.37, grad_norm=1.01 +2025-11-28 13:23:11,742 - INFO - Epoch 1 training: loss=1.5991, ppl=4.95, grad_norm=0.77, throughput=5313 tok/s (44416.9s total) +2025-11-28 13:23:11,748 - INFO - +Running final validation... +2025-11-28 13:27:43,570 - INFO - Validation loss: 1.6014, perplexity: 4.96 +2025-11-28 13:27:43,571 - INFO - Qualitative metrics (n=5): +2025-11-28 13:27:43,571 - INFO - BLEU: 0.1627 +2025-11-28 13:27:43,571 - INFO - METEOR: 0.2303 +2025-11-28 13:27:43,571 - INFO - Edit Distance: 0.5645 +2025-11-28 13:27:43,571 - INFO - F-measure: 0.2493 +2025-11-28 13:27:43,571 - INFO - +====================================================================== +2025-11-28 13:27:43,571 - INFO - Qualitative Evaluation Samples: +2025-11-28 13:27:43,571 - INFO - ====================================================================== +2025-11-28 13:27:43,572 - INFO - +Sample 1 (ID: sample_141920_chunk_1): +2025-11-28 13:27:43,572 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 13:27:43,572 - INFO - Generated: ' to the band\'s previous work, saying that "the album is a little more subdued, but it\'s still a lot of fun." In a review for The A.V. Club, Fitzmaurice gave the album a B and said that "the band\'s bes...' +2025-11-28 13:27:43,572 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...' +2025-11-28 13:27:43,572 - INFO - ---------------------------------------------------------------------- +2025-11-28 13:27:43,572 - INFO - +Sample 2 (ID: sample_170543_chunk_2): +2025-11-28 13:27:43,572 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 13:27:43,572 - INFO - Generated: 'aternal organizations in the United States. The Order of Angell was the first fraternal organization in the United States to be founded by a Native American. The Order of Angell was founded in 1920 by...' +2025-11-28 13:27:43,572 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...' +2025-11-28 13:27:43,572 - INFO - ---------------------------------------------------------------------- +2025-11-28 13:27:43,573 - INFO - +Sample 3 (ID: sample_107152_chunk_9): +2025-11-28 13:27:43,573 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 13:27:43,573 - INFO - Generated: ' be killed by Oga. They are later defeated by the Red Tails and the Six Knights, and are later killed by the Red Tails.\nMiki\nVoiced by: Sayaka Kinoshita\nMiki is the second leader of the Red Tails. She...' +2025-11-28 13:27:43,573 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..." +2025-11-28 13:27:43,573 - INFO - ---------------------------------------------------------------------- +2025-11-28 13:27:43,573 - INFO - +Sample 4 (ID: sample_069148_chunk_0): +2025-11-28 13:27:43,573 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 13:27:43,573 - INFO - Generated: '-31-1991 | L2/91-202 | ISO/IEC 10646-1:1991 ...' +2025-11-28 13:27:43,573 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...' +2025-11-28 13:27:43,573 - INFO - ---------------------------------------------------------------------- +2025-11-28 13:27:43,573 - INFO - +Sample 5 (ID: sample_103176_chunk_4): +2025-11-28 13:27:43,573 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] +2025-11-28 13:27:43,574 - INFO - Generated: '1 | Windows | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 13:27:43,574 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...' +2025-11-28 13:27:43,574 - INFO - ---------------------------------------------------------------------- +2025-11-28 13:27:43,574 - INFO - +Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/qualitative_step_10417.jsonl +2025-11-28 13:27:44,144 - INFO - +Training complete! +2025-11-28 13:28:05,512 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/final_checkpoint.pt +2025-11-28 13:28:05,516 - INFO - Final checkpoint saved to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434/final_checkpoint.pt +2025-11-28 13:28:05,516 - INFO - Best validation loss: 1.6014, perplexity: 4.96 +2025-11-28 13:28:05,516 - INFO - Checkpoints saved to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035_lm_20251126_233434 +2025-11-28 13:28:06,139 - INFO - W&B run finished