WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
local_rank: 2
global rank: 2
local_rank: 4
global rank: 4
local_rank: 3
global rank: 3
local_rank: 5
global rank: 5
local_rank: 1
global rank: 1
local_rank: 7
global rank: 7
local_rank: 6
global rank: 6

    Example runs on 4 GPUs:
    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='decapoda-research/llama-7b-hf' --data_path=data/config.json --run_id=0 &> 0.log
    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='decapoda-research/llama-30b-hf' --data_path=data/config.json --batch_size=16 --micro_batch_size=1 --run_id=1 --save_code=True &> 1.log
    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='EleutherAI/gpt-j-6B' --data_path=data/config.json --run_id=2 &> 2.log
    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='EleutherAI/gpt-neox-20b' --data_path=data/config.json --run_id=8 --batch_size=16 --micro_batch_size=4 &> 8.log
    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='togethercomputer/GPT-NeoXT-Chat-Base-20B' --data_path=data/config.json --prompt_type='dai_faq' --run_id=13 --batch_size=16 --micro_batch_size=4 --num_epochs=100 --val_set_size=0 data_mix_in_path='' &> 13.log
    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='togethercomputer/GPT-NeoXT-Chat-Base-20B' --data_path=data/config.json --run_id=28 --batch_size=16 --micro_batch_size=4 --num_epochs=8 --val_set_size=0 --data_mix_in_factor=0.1 --data_mix_in_prompt_type='human_bot' --save_code=True --cutoff_len=512  &> 28.log

    All metrics:
    CUDA_VISIBLE_DEVICES= finetune.py --data_mix_in_factor=0 --eval_steps=100 --warmup_steps=2 --val_set_size=100 --val_metrics="['bleu', 'rouge', 'sacrebleu', 'meteor']"

    # Fine-tune 20B on 24GB GPUs across 3 nodes with 3+2+2 GPUs
    rippa>
NCCL_P2P_LEVEL=LOC WORLD_SIZE=7 CUDA_VISIBLE_DEVICES="0,1,2" torchrun --node_rank 0 --nproc_per_node=3 --master_port=1234 --nnodes=3 --master_addr=10.10.10.2 finetune.py --data_path=merged_shuffled_OIG_87f6a1e788.json --micro_batch_size=1 --batch_size=7 --cutoff_len=512 --run_id=17 &>log.17.rank0
    ova>
NCCL_P2P_LEVEL=LOC WORLD_SIZE=7 CUDA_VISIBLE_DEVICES="0,1" torchrun --node_rank 1 --nproc_per_node=2 --master_port=1234 --nnodes=3 --master_addr=10.10.10.2 finetune.py --data_path=merged_shuffled_OIG_87f6a1e788.json --micro_batch_size=1 --batch_size=7 --cutoff_len=512 --run_id=17 &>log.17.rank1
    timemachine>
NCCL_P2P_LEVEL=LOC WORLD_SIZE=7 CUDA_VISIBLE_DEVICES="0,1" torchrun --node_rank 2 --nproc_per_node=2 --master_port=1234 --nnodes=3 --master_addr=10.10.10.2 finetune.py --data_path=merged_shuffled_OIG_87f6a1e788.json --micro_batch_size=1 --batch_size=7 --cutoff_len=512 --run_id=17 &>log.17.rank2

    
local_rank: 0
global rank: 0
Training model with params:
save_code: False
run_id: 8
base_model: tiiuae/falcon-40b
tokenizer_base_model: tiiuae/falcon-40b
data_path: h2oai/openassistant_oasst1_h2ogpt_graded
data_col_dict: None
prompt_type: plain
valid_path: None
data_mix_in_path: 0-hero/OIG-small-chip2
data_mix_in_factor: 0.0
data_mix_in_col_dict: {'user': 'instruction', 'chip2': 'output'}
data_mix_in_prompt_type: instruct
output_dir: falcon-40b.h2oaiopenassistant_oasst1_h2ogpt_graded.3_epochs.2e023709e9a36283986d136e66cb94e0bd7e6452.8
lora_weights: 
batch_size: 32
micro_batch_size: 1
gradient_checkpointing: False
fp16: True
train_8bit: False
train_4bit: True
num_epochs: 3
learning_rate: 0.0003
val_set_size: None
val_metrics: []
eval_steps: None
eval_epochs: None
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['query_key_value', 'dense_h_to_4h', 'dense_4h_to_h', 'dense']
llama_type: False
llama_flash_attn: False
train_on_inputs: True
group_by_length: False
resume_from_checkpoint: None
cutoff_len: 512
drop_truncations: True
ddp: True
local_files_only: False
resume_download: True
use_auth_token: False
warmup_steps: 100
logging_steps: 1
save_steps: None
save_total_limit: 3
add_eos_token: False
world_size: 8
local_rank: 0
rank: 0
gpus: 8
device_map: auto
gradient_accumulation_steps: 32
Command: finetune.py --data_path=h2oai/openassistant_oasst1_h2ogpt_graded --drop_truncations=True --train_4bit=True --base_model=tiiuae/falcon-40b --micro_batch_size=1 --batch_size=32 --num_epochs=3 --lora_target_modules=["query_key_value", "dense_h_to_4h", "dense_4h_to_h", "dense"] --run_id=8
Hash: 2e023709e9a36283986d136e66cb94e0bd7e6452
Distributed: data parallel

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/h2ollm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
bin /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/h2ollm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
bin /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
bin /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
bin /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
bin /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
bin /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
bin /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/h2ollm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
/home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/h2ollm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
/home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/h2ollm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
/home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/h2ollm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
/home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/h2ollm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
/home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/h2ollm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/h2ollm/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...

Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards:  11%|█         | 1/9 [00:25<03:22, 25.33s/it]
Loading checkpoint shards:  11%|█         | 1/9 [00:26<03:29, 26.24s/it]
Loading checkpoint shards:  11%|█         | 1/9 [00:27<03:36, 27.02s/it]
Loading checkpoint shards:  11%|█         | 1/9 [00:26<03:35, 26.92s/it]
Loading checkpoint shards:  11%|█         | 1/9 [00:26<03:35, 26.98s/it]
Loading checkpoint shards:  11%|█         | 1/9 [00:27<03:43, 27.90s/it]
Loading checkpoint shards:  11%|█         | 1/9 [00:28<03:46, 28.28s/it]
Loading checkpoint shards:  11%|█         | 1/9 [00:28<03:50, 28.87s/it]
Loading checkpoint shards:  22%|██▏       | 2/9 [00:44<02:32, 21.74s/it]
Loading checkpoint shards:  22%|██▏       | 2/9 [00:48<02:47, 24.00s/it]
Loading checkpoint shards:  22%|██▏       | 2/9 [00:48<02:45, 23.65s/it]
Loading checkpoint shards:  22%|██▏       | 2/9 [00:48<02:48, 24.07s/it]
Loading checkpoint shards:  22%|██▏       | 2/9 [00:49<02:50, 24.39s/it]
Loading checkpoint shards:  22%|██▏       | 2/9 [00:49<02:50, 24.33s/it]
Loading checkpoint shards:  22%|██▏       | 2/9 [00:52<03:00, 25.84s/it]
Loading checkpoint shards:  22%|██▏       | 2/9 [00:54<03:09, 27.11s/it]
Loading checkpoint shards:  33%|███▎      | 3/9 [01:02<02:01, 20.23s/it]
Loading checkpoint shards:  33%|███▎      | 3/9 [01:08<02:10, 21.74s/it]
Loading checkpoint shards:  33%|███▎      | 3/9 [01:10<02:18, 23.15s/it]
Loading checkpoint shards:  33%|███▎      | 3/9 [01:10<02:18, 23.06s/it]
Loading checkpoint shards:  33%|███▎      | 3/9 [01:11<02:20, 23.36s/it]
Loading checkpoint shards:  33%|███▎      | 3/9 [01:12<02:21, 23.54s/it]
Loading checkpoint shards:  33%|███▎      | 3/9 [01:17<02:32, 25.49s/it]
Loading checkpoint shards:  33%|███▎      | 3/9 [01:20<02:38, 26.35s/it]
Loading checkpoint shards:  44%|████▍     | 4/9 [01:21<01:37, 19.54s/it]
Loading checkpoint shards:  44%|████▍     | 4/9 [01:27<01:44, 20.90s/it]
Loading checkpoint shards:  44%|████▍     | 4/9 [01:32<01:52, 22.56s/it]
Loading checkpoint shards:  44%|████▍     | 4/9 [01:33<01:54, 22.82s/it]
Loading checkpoint shards:  44%|████▍     | 4/9 [01:33<01:54, 22.89s/it]
Loading checkpoint shards:  44%|████▍     | 4/9 [01:34<01:55, 23.09s/it]
Loading checkpoint shards:  44%|████▍     | 4/9 [01:42<02:06, 25.29s/it]
Loading checkpoint shards:  56%|█████▌    | 5/9 [01:43<01:22, 20.61s/it]
Loading checkpoint shards:  44%|████▍     | 4/9 [01:46<02:11, 26.28s/it]
Loading checkpoint shards:  56%|█████▌    | 5/9 [01:47<01:21, 20.46s/it]
Loading checkpoint shards:  56%|█████▌    | 5/9 [01:54<01:29, 22.37s/it]
Loading checkpoint shards:  56%|█████▌    | 5/9 [01:55<01:30, 22.73s/it]
Loading checkpoint shards:  56%|█████▌    | 5/9 [01:56<01:31, 22.78s/it]
Loading checkpoint shards:  56%|█████▌    | 5/9 [01:57<01:31, 22.99s/it]
Loading checkpoint shards:  67%|██████▋   | 6/9 [02:03<01:01, 20.37s/it]
Loading checkpoint shards:  56%|█████▌    | 5/9 [02:05<01:35, 23.78s/it]
Loading checkpoint shards:  56%|█████▌    | 5/9 [02:09<01:43, 25.81s/it]
Loading checkpoint shards:  67%|██████▋   | 6/9 [02:16<01:09, 23.27s/it]
Loading checkpoint shards:  67%|██████▋   | 6/9 [02:17<01:07, 22.47s/it]
Loading checkpoint shards:  67%|██████▋   | 6/9 [02:17<01:07, 22.52s/it]
Loading checkpoint shards:  67%|██████▋   | 6/9 [02:20<01:09, 23.19s/it]
Loading checkpoint shards:  67%|██████▋   | 6/9 [02:20<01:09, 23.15s/it]
Loading checkpoint shards:  78%|███████▊  | 7/9 [02:25<00:41, 20.84s/it]
Loading checkpoint shards:  67%|██████▋   | 6/9 [02:28<01:10, 23.57s/it]
Loading checkpoint shards:  67%|██████▋   | 6/9 [02:34<01:16, 25.58s/it]
Loading checkpoint shards:  78%|███████▊  | 7/9 [02:40<00:44, 22.48s/it]
Loading checkpoint shards:  78%|███████▊  | 7/9 [02:40<00:47, 23.78s/it]
Loading checkpoint shards:  78%|███████▊  | 7/9 [02:41<00:45, 22.89s/it]
Loading checkpoint shards:  78%|███████▊  | 7/9 [02:45<00:47, 23.75s/it]
Loading checkpoint shards:  78%|███████▊  | 7/9 [02:45<00:47, 23.60s/it]
Loading checkpoint shards:  89%|████████▉ | 8/9 [02:50<00:22, 22.16s/it]
Loading checkpoint shards:  78%|███████▊  | 7/9 [02:52<00:47, 23.56s/it]
Loading checkpoint shards:  78%|███████▊  | 7/9 [03:00<00:51, 25.65s/it]
Loading checkpoint shards:  89%|████████▉ | 8/9 [03:04<00:23, 23.08s/it]
Loading checkpoint shards:  89%|████████▉ | 8/9 [03:05<00:23, 23.20s/it]
Loading checkpoint shards:  89%|████████▉ | 8/9 [03:05<00:24, 24.08s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:08<00:00, 20.89s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:08<00:00, 20.98s/it]

Loading checkpoint shards:  89%|████████▉ | 8/9 [03:09<00:23, 23.90s/it]
Loading checkpoint shards:  89%|████████▉ | 8/9 [03:09<00:23, 23.95s/it]
Loading checkpoint shards:  89%|████████▉ | 8/9 [03:16<00:23, 23.67s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:22<00:00, 21.51s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:22<00:00, 22.51s/it]

Loading checkpoint shards: 100%|██████████| 9/9 [03:22<00:00, 21.86s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:22<00:00, 22.52s/it]

Loading checkpoint shards: 100%|██████████| 9/9 [03:22<00:00, 21.49s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:22<00:00, 22.53s/it]

Loading checkpoint shards:  89%|████████▉ | 8/9 [03:23<00:25, 25.01s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:26<00:00, 21.69s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:26<00:00, 22.93s/it]

Loading checkpoint shards: 100%|██████████| 9/9 [03:28<00:00, 22.18s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:28<00:00, 23.14s/it]

Loading checkpoint shards: 100%|██████████| 9/9 [03:30<00:00, 20.60s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:30<00:00, 23.35s/it]

Loading checkpoint shards: 100%|██████████| 9/9 [03:34<00:00, 20.60s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:34<00:00, 23.84s/it]
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear4bit(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear4bit(
                in_features=8192, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear4bit(
                in_features=8192, out_features=32768, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=32768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(
                in_features=32768, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 55541760 || all params: 20974518272 || trainable%: 0.2648058910327664
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 256.02it/s]

Map (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):   0%|          | 36/30368 [00:00<04:41, 107.64 examples/s]
Map (num_proc=16):   1%|          | 205/30368 [00:00<00:54, 556.22 examples/s]
Map (num_proc=16):   2%|▏         | 499/30368 [00:00<00:24, 1207.98 examples/s]
Map (num_proc=16):   3%|▎         | 952/30368 [00:00<00:14, 2074.55 examples/s]
Map (num_proc=16):   6%|▌         | 1778/30368 [00:00<00:07, 3805.61 examples/s]
Map (num_proc=16):   9%|▉         | 2726/30368 [00:00<00:05, 5424.43 examples/s]
Map (num_proc=16):  12%|█▏        | 3695/30368 [00:00<00:04, 6539.01 examples/s]
Map (num_proc=16):  15%|█▌        | 4685/30368 [00:01<00:03, 7460.32 examples/s]
Map (num_proc=16):  19%|█▊        | 5633/30368 [00:01<00:03, 7987.03 examples/s]
Map (num_proc=16):  22%|██▏       | 6582/30368 [00:01<00:02, 8389.02 examples/s]
Map (num_proc=16):  25%|██▍       | 7518/30368 [00:01<00:02, 8569.23 examples/s]
Map (num_proc=16):  28%|██▊       | 8507/30368 [00:01<00:02, 8794.67 examples/s]
Map (num_proc=16):  31%|███▏      | 9494/30368 [00:01<00:02, 9102.20 examples/s]
Map (num_proc=16):  35%|███▍      | 10480/30368 [00:01<00:02, 9297.00 examples/s]
Map (num_proc=16):  38%|███▊      | 11498/30368 [00:01<00:01, 9462.84 examples/s]
Map (num_proc=16):  41%|████      | 12516/30368 [00:01<00:01, 9646.17 examples/s]
Map (num_proc=16):  44%|████▍     | 13501/30368 [00:02<00:01, 9392.53 examples/s]
Map (num_proc=16):  48%|████▊     | 14462/30368 [00:02<00:01, 8983.61 examples/s]
Map (num_proc=16):  51%|█████     | 15390/30368 [00:02<00:01, 8394.27 examples/s]
Map (num_proc=16):  54%|█████▎    | 16268/30368 [00:02<00:01, 7440.18 examples/s]
Map (num_proc=16):  56%|█████▌    | 17037/30368 [00:02<00:01, 7344.74 examples/s]PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear4bit(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear4bit(
                in_features=8192, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear4bit(
                in_features=8192, out_features=32768, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=32768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(
                in_features=32768, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 55541760 || all params: 20974518272 || trainable%: 0.2648058910327664

Map (num_proc=16):  59%|█████▉    | 17951/30368 [00:02<00:01, 7781.52 examples/s]
Map (num_proc=16):  62%|██████▏   | 18904/30368 [00:02<00:01, 8213.05 examples/s]
Map (num_proc=16):  65%|██████▌   | 19888/30368 [00:02<00:01, 8628.23 examples/s]
Map (num_proc=16):  68%|██████▊   | 20773/30368 [00:02<00:01, 8668.32 examples/s]
Map (num_proc=16):  71%|███████▏  | 21661/30368 [00:03<00:01, 8603.77 examples/s]
Map (num_proc=16):  74%|███████▍  | 22532/30368 [00:03<00:00, 8607.40 examples/s]
Map (num_proc=16):  77%|███████▋  | 23492/30368 [00:03<00:00, 8890.50 examples/s]
Map (num_proc=16):  80%|████████  | 24414/30368 [00:03<00:00, 8960.18 examples/s]
Map (num_proc=16):  83%|████████▎ | 25339/30368 [00:03<00:00, 8998.03 examples/s]
Map (num_proc=16):  86%|████████▋ | 26244/30368 [00:03<00:00, 8034.30 examples/s]
Map (num_proc=16):  89%|████████▉ | 27089/30368 [00:03<00:00, 5426.16 examples/s]
Map (num_proc=16):  91%|█████████▏| 27763/30368 [00:04<00:00, 4913.22 examples/s]
Map (num_proc=16):  93%|█████████▎| 28359/30368 [00:04<00:00, 5099.33 examples/s]
Map (num_proc=16):  96%|█████████▌| 29116/30368 [00:04<00:00, 5647.00 examples/s]
Map (num_proc=16):  98%|█████████▊| 29762/30368 [00:04<00:00, 5521.54 examples/s]Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 449.98it/s]

Map (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):   0%|          | 29/30368 [00:00<05:28, 92.31 examples/s]
Map (num_proc=16):   1%|          | 210/30368 [00:00<00:55, 543.89 examples/s]
Map (num_proc=16): 100%|██████████| 30368/30368 [00:05<00:00, 1713.27 examples/s]
Map (num_proc=16):   2%|▏         | 583/30368 [00:00<00:22, 1309.49 examples/s]
                                                                                 

Map (num_proc=16):   4%|▎         | 1112/30368 [00:00<00:13, 2246.17 examples/s]
Map (num_proc=16):   7%|▋         | 2033/30368 [00:00<00:07, 3944.13 examples/s]
Map (num_proc=16):   9%|▉         | 2842/30368 [00:00<00:05, 5033.83 examples/s]
Map (num_proc=16):  13%|█▎        | 3848/30368 [00:01<00:04, 6293.72 examples/s]
Map (num_proc=16):  16%|█▌        | 4854/30368 [00:01<00:03, 7246.32 examples/s]
Map (num_proc=16):  19%|█▉        | 5875/30368 [00:01<00:03, 8046.91 examples/s]
Map (num_proc=16):  23%|██▎       | 6871/30368 [00:01<00:02, 8574.00 examples/s]
Map (num_proc=16):  26%|██▌       | 7909/30368 [00:01<00:02, 9049.94 examples/s]
Filter (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):  29%|██▉       | 8882/30368 [00:01<00:02, 9234.29 examples/s]
Map (num_proc=16):  32%|███▏      | 9850/30368 [00:01<00:02, 9198.17 examples/s]
Map (num_proc=16):  36%|███▌      | 10890/30368 [00:01<00:02, 9523.28 examples/s]
Map (num_proc=16):  39%|███▉      | 11935/30368 [00:01<00:01, 9757.52 examples/s]
Map (num_proc=16):  43%|████▎     | 12934/30368 [00:01<00:01, 9724.46 examples/s]
Map (num_proc=16):  46%|████▌     | 13933/30368 [00:02<00:01, 9454.21 examples/s]
Filter (num_proc=16):   3%|▎         | 1000/30368 [00:00<00:17, 1645.87 examples/s]
Map (num_proc=16):  49%|████▉     | 14901/30368 [00:02<00:01, 9004.56 examples/s]
Filter (num_proc=16):  49%|████▉     | 15000/30368 [00:00<00:00, 27910.39 examples/s]
Map (num_proc=16):  52%|█████▏    | 15815/30368 [00:02<00:01, 7910.16 examples/s]
Map (num_proc=16):  55%|█████▍    | 16660/30368 [00:02<00:01, 7591.96 examples/s]
Map (num_proc=16):  58%|█████▊    | 17498/30368 [00:02<00:01, 7714.26 examples/s]
Filter (num_proc=16):  79%|███████▉  | 24082/30368 [00:01<00:00, 23242.81 examples/s]
Map (num_proc=16):  60%|██████    | 18291/30368 [00:02<00:01, 7668.15 examples/s]
Map (num_proc=16):  63%|██████▎   | 19131/30368 [00:02<00:01, 7800.76 examples/s]
Filter (num_proc=16): 100%|██████████| 30368/30368 [00:01<00:00, 26448.28 examples/s]
Map (num_proc=16):  66%|██████▌   | 20093/30368 [00:02<00:01, 8296.71 examples/s]
                                                                                     

Map (num_proc=16):  69%|██████▉   | 21046/30368 [00:02<00:01, 8597.07 examples/s]
Map (num_proc=16):  73%|███████▎  | 22030/30368 [00:03<00:00, 8885.92 examples/s]
Map (num_proc=16):  76%|███████▌  | 23002/30368 [00:03<00:00, 9006.26 examples/s]
Map (num_proc=16):  79%|███████▉  | 23964/30368 [00:03<00:00, 9101.99 examples/s]
Map (num_proc=16):  82%|████████▏ | 24940/30368 [00:03<00:00, 9291.31 examples/s]
Map (num_proc=16):  85%|████████▌ | 25884/30368 [00:03<00:00, 8701.95 examples/s]
Map (num_proc=16):  88%|████████▊ | 26782/30368 [00:03<00:00, 7620.60 examples/s]
Map (num_proc=16):  91%|█████████ | 27576/30368 [00:03<00:00, 6056.24 examples/s]
Map (num_proc=16):  93%|█████████▎| 28246/30368 [00:04<00:00, 4810.68 examples/s]
Map (num_proc=16):  95%|█████████▍| 28838/30368 [00:04<00:00, 5003.87 examples/s]
Map (num_proc=16):  97%|█████████▋| 29423/30368 [00:04<00:00, 5094.93 examples/s]
Map (num_proc=16):  99%|█████████▉| 29997/30368 [00:04<00:00, 5089.83 examples/s]
                                                                                 

Filter (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Filter (num_proc=16):   3%|▎         | 1000/30368 [00:00<00:19, 1498.56 examples/s]
Filter (num_proc=16):  53%|█████▎    | 16000/30368 [00:01<00:00, 16874.36 examples/s]
Filter (num_proc=16):  97%|█████████▋| 29470/30368 [00:01<00:00, 32380.87 examples/s]PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear4bit(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear4bit(
                in_features=8192, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear4bit(
                in_features=8192, out_features=32768, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=32768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(
                in_features=32768, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 55541760 || all params: 20974518272 || trainable%: 0.2648058910327664

                                                                                     
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 270.36it/s]
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear4bit(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear4bit(
                in_features=8192, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear4bit(
                in_features=8192, out_features=32768, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=32768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(
                in_features=32768, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 55541760 || all params: 20974518272 || trainable%: 0.2648058910327664

Map (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):   0%|          | 26/30368 [00:00<06:01, 83.88 examples/s]
Map (num_proc=16):   0%|          | 145/30368 [00:00<01:12, 418.59 examples/s]
Map (num_proc=16):   2%|▏         | 476/30368 [00:00<00:24, 1243.60 examples/s]
Map (num_proc=16):   3%|▎         | 1003/30368 [00:00<00:12, 2270.73 examples/s]
Map (num_proc=16):   7%|▋         | 1999/30368 [00:00<00:06, 4403.94 examples/s]
Map (num_proc=16):  10%|▉         | 2955/30368 [00:00<00:04, 5825.98 examples/s]
Map (num_proc=16):  13%|█▎        | 3971/30368 [00:00<00:03, 7042.08 examples/s]
Map (num_proc=16):  16%|█▋        | 4961/30368 [00:01<00:03, 7870.08 examples/s]
Map (num_proc=16):  19%|█▉        | 5871/30368 [00:01<00:02, 8217.34 examples/s]
Map (num_proc=16):  23%|██▎       | 6862/30368 [00:01<00:02, 8700.67 examples/s]
Map (num_proc=16):  26%|██▌       | 7850/30368 [00:01<00:02, 8986.68 examples/s]
Map (num_proc=16):  29%|██▉       | 8870/30368 [00:01<00:02, 9276.01 examples/s]
Map (num_proc=16):  33%|███▎      | 9875/30368 [00:01<00:02, 9424.34 examples/s]
Map (num_proc=16):  36%|███▌      | 10889/30368 [00:01<00:02, 9611.04 examples/s]Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 229.72it/s]

Map (num_proc=16):  39%|███▉      | 11928/30368 [00:01<00:01, 9764.00 examples/s]
Map (num_proc=16):  43%|████▎     | 12915/30368 [00:01<00:01, 9650.77 examples/s]
Map (num_proc=16):  46%|████▌     | 13886/30368 [00:01<00:01, 9482.33 examples/s]
Map (num_proc=16):  49%|████▉     | 14840/30368 [00:02<00:01, 8680.29 examples/s]
Map (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):  52%|█████▏    | 15746/30368 [00:02<00:01, 8328.20 examples/s]
Map (num_proc=16):  55%|█████▍    | 16599/30368 [00:02<00:01, 7667.00 examples/s]
Map (num_proc=16):  57%|█████▋    | 17390/30368 [00:02<00:01, 7621.22 examples/s]
Map (num_proc=16):   0%|          | 23/30368 [00:00<06:43, 75.12 examples/s]
Map (num_proc=16):  60%|██████    | 18252/30368 [00:02<00:01, 7875.50 examples/s]
Map (num_proc=16):   0%|          | 75/30368 [00:00<02:19, 216.47 examples/s]
Map (num_proc=16):  63%|██████▎   | 19168/30368 [00:02<00:01, 8192.09 examples/s]
Map (num_proc=16):   1%|▏         | 406/30368 [00:00<00:28, 1058.10 examples/s]
Map (num_proc=16):  66%|██████▋   | 20166/30368 [00:02<00:01, 8658.77 examples/s]
Map (num_proc=16):   3%|▎         | 902/30368 [00:00<00:14, 1971.13 examples/s]
Map (num_proc=16):  70%|██████▉   | 21145/30368 [00:02<00:01, 8931.39 examples/s]
Map (num_proc=16):  73%|███████▎  | 22097/30368 [00:02<00:00, 9063.10 examples/s]
Map (num_proc=16):   5%|▌         | 1551/30368 [00:00<00:09, 2980.60 examples/s]
Map (num_proc=16):  76%|███████▌  | 23064/30368 [00:03<00:00, 9210.83 examples/s]
Map (num_proc=16):   8%|▊         | 2502/30368 [00:00<00:05, 4681.05 examples/s]
Map (num_proc=16):  79%|███████▉  | 24064/30368 [00:03<00:00, 9382.24 examples/s]
Map (num_proc=16):  11%|█▏        | 3440/30368 [00:01<00:04, 5927.72 examples/s]
Map (num_proc=16):  82%|████████▏ | 25030/30368 [00:03<00:00, 9370.63 examples/s]
Map (num_proc=16):  15%|█▍        | 4414/30368 [00:01<00:03, 6871.95 examples/s]
Map (num_proc=16):  17%|█▋        | 5303/30368 [00:01<00:03, 7417.24 examples/s]
Map (num_proc=16):  86%|████████▌ | 25981/30368 [00:03<00:00, 8752.51 examples/s]
Map (num_proc=16):  20%|██        | 6202/30368 [00:01<00:03, 7809.23 examples/s]
Map (num_proc=16):  89%|████████▊ | 26888/30368 [00:03<00:00, 6939.40 examples/s]
Map (num_proc=16):  23%|██▎       | 7034/30368 [00:01<00:02, 7904.65 examples/s]
Map (num_proc=16):  26%|██▌       | 7883/30368 [00:01<00:02, 7922.42 examples/s]
Map (num_proc=16):  29%|██▊       | 8704/30368 [00:01<00:02, 7987.27 examples/s]
Map (num_proc=16):  91%|█████████ | 27649/30368 [00:03<00:00, 4706.85 examples/s]
Map (num_proc=16):  32%|███▏      | 9602/30368 [00:01<00:02, 8252.97 examples/s]
Map (num_proc=16):  35%|███▍      | 10484/30368 [00:01<00:02, 8416.34 examples/s]
Map (num_proc=16):  93%|█████████▎| 28259/30368 [00:04<00:00, 4711.94 examples/s]
Map (num_proc=16):  38%|███▊      | 11482/30368 [00:01<00:02, 8856.71 examples/s]
Map (num_proc=16):  95%|█████████▌| 28885/30368 [00:04<00:00, 5003.45 examples/s]
Map (num_proc=16):  41%|████      | 12493/30368 [00:02<00:01, 9213.22 examples/s]
Map (num_proc=16):  98%|█████████▊| 29612/30368 [00:04<00:00, 5460.44 examples/s]
Map (num_proc=16):  44%|████▍     | 13421/30368 [00:02<00:01, 9174.90 examples/s]
Map (num_proc=16):  47%|████▋     | 14350/30368 [00:02<00:01, 8462.46 examples/s]
Map (num_proc=16): 100%|█████████▉| 30250/30368 [00:04<00:00, 4455.52 examples/s]
Map (num_proc=16):  50%|█████     | 15230/30368 [00:02<00:01, 7933.36 examples/s]
Map (num_proc=16):  53%|█████▎    | 16065/30368 [00:02<00:01, 7990.86 examples/s]
                                                                                 

Map (num_proc=16):  56%|█████▌    | 16899/30368 [00:02<00:01, 7960.28 examples/s]
Map (num_proc=16):  58%|█████▊    | 17709/30368 [00:02<00:01, 7420.14 examples/s]
Map (num_proc=16):  61%|██████    | 18470/30368 [00:02<00:01, 7442.08 examples/s]
Map (num_proc=16):  64%|██████▍   | 19405/30368 [00:02<00:01, 7967.45 examples/s]
Map (num_proc=16):  67%|██████▋   | 20314/30368 [00:03<00:01, 8175.26 examples/s]
Filter (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):  70%|███████   | 21324/30368 [00:03<00:01, 8648.63 examples/s]
Map (num_proc=16):  73%|███████▎  | 22253/30368 [00:03<00:00, 8788.87 examples/s]
Map (num_proc=16):  76%|███████▋  | 23194/30368 [00:03<00:00, 8922.45 examples/s]
Map (num_proc=16):  80%|███████▉  | 24156/30368 [00:03<00:00, 9114.06 examples/s]
Map (num_proc=16):  83%|████████▎ | 25116/30368 [00:03<00:00, 9125.41 examples/s]
Map (num_proc=16):  86%|████████▌ | 26038/30368 [00:03<00:00, 8347.62 examples/s]
Filter (num_proc=16):   3%|▎         | 1000/30368 [00:00<00:18, 1581.65 examples/s]
Map (num_proc=16):  89%|████████▊ | 26918/30368 [00:03<00:00, 6623.31 examples/s]
Map (num_proc=16):  91%|█████████ | 27670/30368 [00:04<00:00, 6117.70 examples/s]
Filter (num_proc=16):  53%|█████▎    | 16000/30368 [00:01<00:00, 18302.86 examples/s]
Map (num_proc=16):  93%|█████████▎| 28342/30368 [00:04<00:00, 5429.39 examples/s]
Filter (num_proc=16):  73%|███████▎  | 22286/30368 [00:01<00:00, 22999.86 examples/s]
Map (num_proc=16):  95%|█████████▌| 28927/30368 [00:04<00:00, 4470.48 examples/s]
Map (num_proc=16):  97%|█████████▋| 29438/30368 [00:04<00:00, 4528.99 examples/s]
Map (num_proc=16):  99%|█████████▊| 29943/30368 [00:04<00:00, 4435.66 examples/s]
Filter (num_proc=16): 100%|██████████| 30368/30368 [00:01<00:00, 16440.27 examples/s]
                                                                                 

                                                                                     

Filter (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Filter (num_proc=16):   3%|▎         | 1000/30368 [00:00<00:17, 1654.43 examples/s]
Filter (num_proc=16):  52%|█████▏    | 15898/30368 [00:01<00:00, 17461.68 examples/s]
Filter (num_proc=16):  97%|█████████▋| 29470/30368 [00:01<00:00, 33671.39 examples/s]
                                                                                     
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear4bit(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear4bit(
                in_features=8192, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear4bit(
                in_features=8192, out_features=32768, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=32768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(
                in_features=32768, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 55541760 || all params: 20974518272 || trainable%: 0.2648058910327664
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear4bit(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear4bit(
                in_features=8192, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear4bit(
                in_features=8192, out_features=32768, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=32768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(
                in_features=32768, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 55541760 || all params: 20974518272 || trainable%: 0.2648058910327664
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 240.04it/s]
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 231.08it/s]

Map (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):   0%|          | 32/30368 [00:00<05:14, 96.51 examples/s]
Map (num_proc=16):   1%|          | 239/30368 [00:00<00:46, 647.09 examples/s]
Map (num_proc=16):   0%|          | 29/30368 [00:00<06:01, 84.03 examples/s]
Map (num_proc=16):   2%|▏         | 549/30368 [00:00<00:23, 1283.86 examples/s]
Map (num_proc=16):   1%|          | 235/30368 [00:00<00:50, 597.69 examples/s]
Map (num_proc=16):   4%|▎         | 1097/30368 [00:00<00:12, 2299.91 examples/s]
Map (num_proc=16):   6%|▋         | 1939/30368 [00:00<00:07, 3857.16 examples/s]
Map (num_proc=16):   2%|▏         | 637/30368 [00:00<00:20, 1433.86 examples/s]
Map (num_proc=16):   9%|▉         | 2811/30368 [00:00<00:05, 5158.59 examples/s]
Map (num_proc=16):   4%|▍         | 1201/30368 [00:00<00:12, 2415.63 examples/s]
Map (num_proc=16):  12%|█▏        | 3710/30368 [00:01<00:04, 6157.48 examples/s]
Map (num_proc=16):   7%|▋         | 2129/30368 [00:00<00:06, 4169.51 examples/s]
Map (num_proc=16):  15%|█▌        | 4629/30368 [00:01<00:03, 6967.35 examples/s]
Map (num_proc=16):  10%|▉         | 2996/30368 [00:00<00:05, 5350.51 examples/s]
Map (num_proc=16):  18%|█▊        | 5577/30368 [00:01<00:03, 7592.50 examples/s]
Map (num_proc=16):  13%|█▎        | 3955/30368 [00:01<00:04, 6462.26 examples/s]
Map (num_proc=16):  21%|██▏       | 6486/30368 [00:01<00:02, 7966.73 examples/s]
Map (num_proc=16):  16%|█▌        | 4862/30368 [00:01<00:03, 7173.14 examples/s]
Map (num_proc=16):  25%|██▍       | 7441/30368 [00:01<00:02, 8367.86 examples/s]PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear4bit(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear4bit(
                in_features=8192, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear4bit(
                in_features=8192, out_features=32768, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=32768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(
                in_features=32768, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 55541760 || all params: 20974518272 || trainable%: 0.2648058910327664

Map (num_proc=16):  19%|█▉        | 5715/30368 [00:01<00:03, 7548.58 examples/s]
Map (num_proc=16):  28%|██▊       | 8378/30368 [00:01<00:02, 8642.79 examples/s]
Map (num_proc=16):  22%|██▏       | 6682/30368 [00:01<00:02, 8065.51 examples/s]
Map (num_proc=16):  31%|███       | 9327/30368 [00:01<00:02, 8776.14 examples/s]
Map (num_proc=16):  25%|██▌       | 7648/30368 [00:01<00:02, 8450.46 examples/s]
Map (num_proc=16):  34%|███▍      | 10281/30368 [00:01<00:02, 8945.54 examples/s]
Map (num_proc=16):  28%|██▊       | 8595/30368 [00:01<00:02, 8714.85 examples/s]
Map (num_proc=16):  37%|███▋      | 11249/30368 [00:01<00:02, 9121.49 examples/s]
Map (num_proc=16):  31%|███▏      | 9555/30368 [00:01<00:02, 8913.20 examples/s]
Map (num_proc=16):  40%|████      | 12169/30368 [00:01<00:02, 9019.31 examples/s]
Map (num_proc=16):  35%|███▍      | 10539/30368 [00:01<00:02, 9149.12 examples/s]
Map (num_proc=16):  43%|████▎     | 13101/30368 [00:02<00:01, 9101.92 examples/s]
Map (num_proc=16):  38%|███▊      | 11495/30368 [00:01<00:02, 9147.82 examples/s]
Map (num_proc=16):  46%|████▌     | 14027/30368 [00:02<00:01, 8653.61 examples/s]
Map (num_proc=16):  41%|████      | 12432/30368 [00:01<00:01, 9159.53 examples/s]
Map (num_proc=16):  49%|████▉     | 14910/30368 [00:02<00:01, 8384.86 examples/s]
Map (num_proc=16):  44%|████▍     | 13364/30368 [00:02<00:01, 9085.78 examples/s]
Map (num_proc=16):  47%|████▋     | 14280/30368 [00:02<00:01, 8497.34 examples/s]
Map (num_proc=16):  52%|█████▏    | 15767/30368 [00:02<00:01, 7581.58 examples/s]
Map (num_proc=16):  50%|████▉     | 15147/30368 [00:02<00:01, 8132.71 examples/s]
Map (num_proc=16):  55%|█████▍    | 16555/30368 [00:02<00:01, 7322.57 examples/s]
Map (num_proc=16):  53%|█████▎    | 15998/30368 [00:02<00:01, 7917.66 examples/s]
Map (num_proc=16):  57%|█████▋    | 17307/30368 [00:02<00:01, 6781.05 examples/s]
Map (num_proc=16):  55%|█████▌    | 16826/30368 [00:02<00:01, 7546.98 examples/s]
Map (num_proc=16):  60%|█████▉    | 18092/30368 [00:02<00:01, 7038.44 examples/s]
Map (num_proc=16):  58%|█████▊    | 17588/30368 [00:02<00:01, 7382.58 examples/s]
Map (num_proc=16):  62%|██████▏   | 18969/30368 [00:02<00:01, 7465.60 examples/s]
Map (num_proc=16):  65%|██████▌   | 19834/30368 [00:02<00:01, 7760.23 examples/s]
Map (num_proc=16):  60%|██████    | 18364/30368 [00:02<00:01, 7410.79 examples/s]
Map (num_proc=16):  68%|██████▊   | 20746/30368 [00:03<00:01, 8114.03 examples/s]
Map (num_proc=16):  63%|██████▎   | 19262/30368 [00:02<00:01, 7791.47 examples/s]
Map (num_proc=16):  71%|███████▏  | 21691/30368 [00:03<00:01, 8474.48 examples/s]
Map (num_proc=16):  67%|██████▋   | 20253/30368 [00:02<00:01, 8326.52 examples/s]
Map (num_proc=16):  75%|███████▍  | 22681/30368 [00:03<00:00, 8858.40 examples/s]
Map (num_proc=16):  70%|██████▉   | 21200/30368 [00:03<00:01, 8589.99 examples/s]Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 417.84it/s]

Map (num_proc=16):  78%|███████▊  | 23662/30368 [00:03<00:00, 9088.00 examples/s]
Map (num_proc=16):  73%|███████▎  | 22165/30368 [00:03<00:00, 8828.93 examples/s]
Map (num_proc=16):  81%|████████  | 24599/30368 [00:03<00:00, 9070.64 examples/s]
Map (num_proc=16):  76%|███████▌  | 23145/30368 [00:03<00:00, 9080.87 examples/s]
Map (num_proc=16):  79%|███████▉  | 24129/30368 [00:03<00:00, 9284.12 examples/s]
Map (num_proc=16):  84%|████████▍ | 25532/30368 [00:03<00:00, 8914.37 examples/s]
Map (num_proc=16):  83%|████████▎ | 25075/30368 [00:03<00:00, 8718.03 examples/s]
Map (num_proc=16):  87%|████████▋ | 26431/30368 [00:03<00:00, 7609.11 examples/s]
Map (num_proc=16):  86%|████████▌ | 25972/30368 [00:03<00:00, 7594.75 examples/s]
Map (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):  88%|████████▊ | 26762/30368 [00:03<00:00, 5668.59 examples/s]
Map (num_proc=16):  90%|████████▉ | 27231/30368 [00:04<00:00, 4576.69 examples/s]
Map (num_proc=16):  90%|█████████ | 27431/30368 [00:04<00:00, 4665.58 examples/s]
Map (num_proc=16):  92%|█████████▏| 27875/30368 [00:04<00:00, 3828.82 examples/s]
Map (num_proc=16):  92%|█████████▏| 27983/30368 [00:04<00:00, 4283.45 examples/s]
Map (num_proc=16):  94%|█████████▎| 28406/30368 [00:04<00:00, 3777.21 examples/s]
Map (num_proc=16):  94%|█████████▍| 28483/30368 [00:04<00:00, 4240.27 examples/s]
Map (num_proc=16):  95%|█████████▌| 28879/30368 [00:04<00:00, 3936.88 examples/s]
Map (num_proc=16):   0%|          | 23/30368 [00:00<13:53, 36.39 examples/s]
Map (num_proc=16):  95%|█████████▌| 28998/30368 [00:04<00:00, 4417.57 examples/s]
Map (num_proc=16):  97%|█████████▋| 29446/30368 [00:04<00:00, 4276.60 examples/s]
Map (num_proc=16):   1%|          | 209/30368 [00:00<01:23, 362.30 examples/s]
Map (num_proc=16):  97%|█████████▋| 29519/30368 [00:04<00:00, 4586.08 examples/s]
Map (num_proc=16):  99%|█████████▊| 29945/30368 [00:04<00:00, 4165.30 examples/s]
Map (num_proc=16):   2%|▏         | 577/30368 [00:00<00:30, 963.37 examples/s]
Map (num_proc=16):  99%|█████████▉| 30028/30368 [00:04<00:00, 3939.46 examples/s]
Map (num_proc=16):   4%|▍         | 1233/30368 [00:01<00:14, 2030.72 examples/s]
Map (num_proc=16):   6%|▋         | 1902/30368 [00:01<00:09, 2954.83 examples/s]
Map (num_proc=16):   9%|▉         | 2860/30368 [00:01<00:06, 4503.81 examples/s]
Map (num_proc=16):  12%|█▏        | 3754/30368 [00:01<00:04, 5589.94 examples/s]
                                                                                 

Map (num_proc=16):  15%|█▌        | 4668/30368 [00:01<00:03, 6455.23 examples/s]
                                                                                 

Map (num_proc=16):  18%|█▊        | 5594/30368 [00:01<00:03, 7160.98 examples/s]
Map (num_proc=16):  22%|██▏       | 6551/30368 [00:01<00:03, 7788.14 examples/s]
Map (num_proc=16):  25%|██▍       | 7549/30368 [00:01<00:02, 8324.15 examples/s]
Map (num_proc=16):  28%|██▊       | 8544/30368 [00:01<00:02, 8720.62 examples/s]
Map (num_proc=16):  31%|███▏      | 9565/30368 [00:01<00:02, 9120.20 examples/s]
Map (num_proc=16):  35%|███▍      | 10593/30368 [00:02<00:02, 9420.05 examples/s]
Map (num_proc=16):  38%|███▊      | 11620/30368 [00:02<00:01, 9633.23 examples/s]
Filter (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):  42%|████▏     | 12629/30368 [00:02<00:01, 9671.50 examples/s]
Filter (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):  45%|████▍     | 13621/30368 [00:02<00:01, 9514.04 examples/s]
Map (num_proc=16):  48%|████▊     | 14598/30368 [00:02<00:01, 8589.00 examples/s]
Map (num_proc=16):  51%|█████     | 15491/30368 [00:02<00:01, 8366.60 examples/s]
Map (num_proc=16):  54%|█████▍    | 16355/30368 [00:02<00:01, 8214.62 examples/s]
Map (num_proc=16):  57%|█████▋    | 17189/30368 [00:02<00:01, 7597.68 examples/s]
Filter (num_proc=16):   3%|▎         | 1000/30368 [00:00<00:17, 1645.18 examples/s]
Filter (num_proc=16):   3%|▎         | 1000/30368 [00:00<00:18, 1584.74 examples/s]
Filter (num_proc=16):  20%|█▉        | 6000/30368 [00:00<00:02, 10836.91 examples/s]
Map (num_proc=16):  59%|█████▉    | 17976/30368 [00:03<00:01, 6557.85 examples/s]
Filter (num_proc=16):  53%|█████▎    | 16000/30368 [00:00<00:00, 28688.80 examples/s]
Map (num_proc=16):  62%|██████▏   | 18832/30368 [00:03<00:01, 7019.22 examples/s]
Map (num_proc=16):  65%|██████▌   | 19795/30368 [00:03<00:01, 7662.56 examples/s]
Filter (num_proc=16):  56%|█████▌    | 16898/30368 [00:01<00:00, 20290.38 examples/s]
Map (num_proc=16):  68%|██████▊   | 20790/30368 [00:03<00:01, 8249.70 examples/s]
Map (num_proc=16):  72%|███████▏  | 21755/30368 [00:03<00:00, 8613.65 examples/s]
Filter (num_proc=16):  73%|███████▎  | 22286/30368 [00:01<00:00, 24067.56 examples/s]
Filter (num_proc=16):  82%|████████▏ | 24980/30368 [00:01<00:00, 24574.73 examples/s]
Map (num_proc=16):  75%|███████▍  | 22713/30368 [00:03<00:00, 8864.06 examples/s]
                                                                                     

                                                                                     

Map (num_proc=16):  78%|███████▊  | 23668/30368 [00:03<00:00, 9055.45 examples/s]
Map (num_proc=16):  81%|████████  | 24615/30368 [00:03<00:00, 8867.73 examples/s]
Map (num_proc=16):  84%|████████▍ | 25540/30368 [00:03<00:00, 8774.19 examples/s]
Map (num_proc=16):  87%|████████▋ | 26435/30368 [00:04<00:00, 7623.78 examples/s]
Map (num_proc=16):  90%|████████▉ | 27267/30368 [00:04<00:00, 7339.25 examples/s]
Map (num_proc=16):  92%|█████████▏| 28044/30368 [00:04<00:00, 6941.45 examples/s]
Map (num_proc=16):  95%|█████████▍| 28783/30368 [00:04<00:00, 5519.30 examples/s]
Map (num_proc=16):  97%|█████████▋| 29410/30368 [00:04<00:00, 5104.38 examples/s]
Map (num_proc=16):  99%|█████████▊| 29977/30368 [00:04<00:00, 4677.30 examples/s]
                                                                                 

Filter (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Filter (num_proc=16):   3%|▎         | 1000/30368 [00:00<00:17, 1671.11 examples/s]
Filter (num_proc=16):  43%|████▎     | 13000/30368 [00:00<00:00, 24436.79 examples/s]
Filter (num_proc=16):  67%|██████▋   | 20490/30368 [00:01<00:00, 21331.46 examples/s]
Filter (num_proc=16): 100%|██████████| 30368/30368 [00:01<00:00, 33240.57 examples/s]
                                                                                     
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear4bit(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear4bit(
                in_features=8192, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear4bit(
                in_features=8192, out_features=32768, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=32768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(
                in_features=32768, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 55541760 || all params: 20974518272 || trainable%: 0.2648058910327664
Using Validation Metrics: []
Supported Metrics: ['bleu', 'rouge', 'sacrebleu', 'meteor']
Auto set val_set_size 1000
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 230.54it/s]
Tokenizing 30368 training rows

Map (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Map (num_proc=16):   0%|          | 31/30368 [00:00<05:29, 91.95 examples/s]
Map (num_proc=16):   1%|          | 204/30368 [00:00<00:58, 513.95 examples/s]
Map (num_proc=16):   2%|▏         | 639/30368 [00:00<00:19, 1498.28 examples/s]
Map (num_proc=16):   4%|▍         | 1268/30368 [00:00<00:10, 2653.69 examples/s]
Map (num_proc=16):   7%|▋         | 2149/30368 [00:00<00:06, 4293.76 examples/s]
Map (num_proc=16):  10%|▉         | 3024/30368 [00:00<00:04, 5507.82 examples/s]
Map (num_proc=16):  13%|█▎        | 4014/30368 [00:01<00:03, 6705.78 examples/s]
Map (num_proc=16):  16%|█▋        | 4968/30368 [00:01<00:03, 7495.95 examples/s]
Map (num_proc=16):  20%|█▉        | 5983/30368 [00:01<00:02, 8178.10 examples/s]
Map (num_proc=16):  23%|██▎       | 6993/30368 [00:01<00:02, 8715.38 examples/s]
Map (num_proc=16):  26%|██▌       | 7911/30368 [00:01<00:02, 8801.50 examples/s]
Map (num_proc=16):  29%|██▉       | 8908/30368 [00:01<00:02, 9104.58 examples/s]
Map (num_proc=16):  33%|███▎      | 9918/30368 [00:01<00:02, 9362.67 examples/s]
Map (num_proc=16):  36%|███▌      | 10876/30368 [00:01<00:02, 9411.44 examples/s]
Map (num_proc=16):  39%|███▉      | 11886/30368 [00:01<00:01, 9589.08 examples/s]
Map (num_proc=16):  42%|████▏     | 12868/30368 [00:01<00:01, 9417.71 examples/s]
Map (num_proc=16):  45%|████▌     | 13816/30368 [00:02<00:01, 9361.18 examples/s]
Map (num_proc=16):  49%|████▊     | 14763/30368 [00:02<00:01, 8207.51 examples/s]
Map (num_proc=16):  51%|█████▏    | 15639/30368 [00:02<00:01, 8104.06 examples/s]
Map (num_proc=16):  54%|█████▍    | 16472/30368 [00:02<00:01, 7668.05 examples/s]
Map (num_proc=16):  57%|█████▋    | 17341/30368 [00:02<00:01, 7915.98 examples/s]
Map (num_proc=16):  60%|█████▉    | 18152/30368 [00:02<00:01, 7943.36 examples/s]
Map (num_proc=16):  63%|██████▎   | 19021/30368 [00:02<00:01, 8117.18 examples/s]
Map (num_proc=16):  66%|██████▌   | 19932/30368 [00:02<00:01, 8392.07 examples/s]
Map (num_proc=16):  69%|██████▉   | 20879/30368 [00:02<00:01, 8618.91 examples/s]
Map (num_proc=16):  72%|███████▏  | 21791/30368 [00:03<00:00, 8704.88 examples/s]
Map (num_proc=16):  75%|███████▍  | 22729/30368 [00:03<00:00, 8806.10 examples/s]
Map (num_proc=16):  78%|███████▊  | 23691/30368 [00:03<00:00, 8998.69 examples/s]
Map (num_proc=16):  81%|████████  | 24624/30368 [00:03<00:00, 9080.04 examples/s]
Map (num_proc=16):  84%|████████▍ | 25568/30368 [00:03<00:00, 8654.79 examples/s]
Map (num_proc=16):  87%|████████▋ | 26467/30368 [00:03<00:00, 6909.25 examples/s]
Map (num_proc=16):  90%|████████▉ | 27225/30368 [00:03<00:00, 6715.19 examples/s]
Map (num_proc=16):  92%|█████████▏| 27999/30368 [00:03<00:00, 6961.39 examples/s]
Map (num_proc=16):  95%|█████████▍| 28732/30368 [00:04<00:00, 6726.99 examples/s]
Map (num_proc=16):  97%|█████████▋| 29440/30368 [00:04<00:00, 5717.55 examples/s]
Map (num_proc=16):  99%|█████████▉| 30053/30368 [00:04<00:00, 3341.74 examples/s]
                                                                                 
avoid keeping truncated cases to avoid contaminating model with truncation cases.  Original size: 30368

Filter (num_proc=16):   0%|          | 0/30368 [00:00<?, ? examples/s]
Filter (num_proc=16):   3%|▎         | 1000/30368 [00:00<00:18, 1572.94 examples/s]
Filter (num_proc=16):  56%|█████▌    | 16898/30368 [00:01<00:00, 18317.28 examples/s]
                                                                                     
avoid keeping truncated cases to avoid contaminating model with truncation cases.  New size: 21583
Final fine-tuning data:
Train Dataset({
    features: ['input', 'source', 'prompt_type', 'grade_deberta', 'id', 'input_ids', 'token_type_ids', 'attention_mask', 'labels'],
    num_rows: 21583
})
Valid None
Sample input: {'input': ["<human>: Est-ce que tu peux suggérer des nouveaux concepts de business qui n'ont pas besoin d'investissement ?\n<bot>: Le but d’un investissement dans une entreprise est souvent de pouvoir s’addresser à un marché plus large. Pour construire une affaire sans aucun investissement, il est plus aisé de proposer un service local. Cela peut être de l’assistance à domicile, de l’aide scolaire, du ménage, de la réparation, de la recharge de trottinettes électriques, une activité de coursier… Il est également possible de publier des articles ou des vidéos sur des sites rémunérés par la publicité. De nombreuses personnes possèdent déjà le matériel et la connaissance nécessaires à ces activités d’autoentrepreneur, sans besoin d’investir de l’argent dans cette activité.\n\nDe nombreuses entreprises démarrent sans autre investissement que l’expertise du fondateur. Par exemple, les premiers paquets de pâte Panzani furent confectionnés à la main dans le grenier des beaux-parents du fondateur.\n\n<human>: Je pensais surtout à des business qui ont besoin d'un investissement matériel ou salarial minimum. Par exemple, le développement d'une petite application ne nécessite rien d'autre qu'un ordinateur et être un bon développeur.\n\n<human>:"], 'source': ['OpenAssistant/oasst1'], 'prompt_type': ['plain'], 'grade_deberta': [0.4624449610710144], 'id': [418], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]}
No neptune configured, set NEPTUNE_API_TOKEN env var.
Auto set eval_steps to 101 out of 2023 total training steps
Auto step save_steps to 101
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.

  0%|          | 0/2022 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.

  0%|          | 1/2022 [00:10<6:09:48, 10.98s/it]
                                                  
{'loss': 1.72, 'learning_rate': 2.3999999999999997e-05, 'epoch': 0.0}

  0%|          | 1/2022 [00:11<6:09:48, 10.98s/it]
  0%|          | 2/2022 [00:18<5:03:18,  9.01s/it]
                                                  
{'loss': 1.7901, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.0}

  0%|          | 2/2022 [00:18<5:03:18,  9.01s/it]
  0%|          | 3/2022 [00:26<4:42:41,  8.40s/it]
                                                  
{'loss': 1.7771, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.0}

  0%|          | 3/2022 [00:26<4:42:41,  8.40s/it]
  0%|          | 4/2022 [00:34<4:36:26,  8.22s/it]
                                                  
{'loss': 1.7505, 'learning_rate': 7.199999999999999e-05, 'epoch': 0.01}

  0%|          | 4/2022 [00:34<4:36:26,  8.22s/it]
  0%|          | 5/2022 [00:42<4:35:26,  8.19s/it]
                                                  
{'loss': 1.5872, 'learning_rate': 9.599999999999999e-05, 'epoch': 0.01}

  0%|          | 5/2022 [00:42<4:35:26,  8.19s/it]
  0%|          | 6/2022 [00:50<4:32:47,  8.12s/it]
                                                  
{'loss': 1.6307, 'learning_rate': 0.00011999999999999999, 'epoch': 0.01}

  0%|          | 6/2022 [00:50<4:32:47,  8.12s/it]
  0%|          | 7/2022 [00:58<4:31:05,  8.07s/it]
                                                  
{'loss': 1.7806, 'learning_rate': 0.00014399999999999998, 'epoch': 0.01}

  0%|          | 7/2022 [00:58<4:31:05,  8.07s/it]
  0%|          | 8/2022 [01:06<4:33:53,  8.16s/it]
                                                  
{'loss': 1.5383, 'learning_rate': 0.000168, 'epoch': 0.01}

  0%|          | 8/2022 [01:06<4:33:53,  8.16s/it]
  0%|          | 9/2022 [01:14<4:27:31,  7.97s/it]
                                                  
{'loss': 1.3604, 'learning_rate': 0.00019199999999999998, 'epoch': 0.01}

  0%|          | 9/2022 [01:14<4:27:31,  7.97s/it]
  0%|          | 10/2022 [01:22<4:25:19,  7.91s/it]
                                                   
{'loss': 1.3874, 'learning_rate': 0.00021599999999999996, 'epoch': 0.01}

  0%|          | 10/2022 [01:22<4:25:19,  7.91s/it]
  1%|          | 11/2022 [01:29<4:23:25,  7.86s/it]
                                                   
{'loss': 1.3469, 'learning_rate': 0.00023999999999999998, 'epoch': 0.02}

  1%|          | 11/2022 [01:29<4:23:25,  7.86s/it]
  1%|          | 12/2022 [01:37<4:24:50,  7.91s/it]
                                                   
{'loss': 1.3082, 'learning_rate': 0.00026399999999999997, 'epoch': 0.02}

  1%|          | 12/2022 [01:37<4:24:50,  7.91s/it]
  1%|          | 13/2022 [01:45<4:25:59,  7.94s/it]
                                                   
{'loss': 1.459, 'learning_rate': 0.00028799999999999995, 'epoch': 0.02}

  1%|          | 13/2022 [01:45<4:25:59,  7.94s/it]
  1%|          | 14/2022 [01:53<4:22:12,  7.84s/it]
                                                   
{'loss': 1.4543, 'learning_rate': 0.00029937565036420395, 'epoch': 0.02}

  1%|          | 14/2022 [01:53<4:22:12,  7.84s/it]
  1%|          | 15/2022 [02:01<4:24:35,  7.91s/it]
                                                   
{'loss': 1.3481, 'learning_rate': 0.0002981269510926118, 'epoch': 0.02}

  1%|          | 15/2022 [02:01<4:24:35,  7.91s/it]
  1%|          | 16/2022 [02:08<4:20:02,  7.78s/it]
                                                   
{'loss': 1.4128, 'learning_rate': 0.00029687825182101975, 'epoch': 0.02}

  1%|          | 16/2022 [02:08<4:20:02,  7.78s/it]
  1%|          | 17/2022 [02:16<4:22:41,  7.86s/it]
                                                   
{'loss': 1.2761, 'learning_rate': 0.00029562955254942765, 'epoch': 0.03}

  1%|          | 17/2022 [02:17<4:22:41,  7.86s/it]
  1%|          | 18/2022 [02:24<4:20:54,  7.81s/it]
                                                   
{'loss': 1.3463, 'learning_rate': 0.00029438085327783555, 'epoch': 0.03}

  1%|          | 18/2022 [02:24<4:20:54,  7.81s/it]
  1%|          | 19/2022 [02:32<4:21:17,  7.83s/it]
                                                   
{'loss': 1.3093, 'learning_rate': 0.00029313215400624345, 'epoch': 0.03}

  1%|          | 19/2022 [02:32<4:21:17,  7.83s/it]
  1%|          | 20/2022 [02:40<4:19:48,  7.79s/it]
                                                   
{'loss': 1.3559, 'learning_rate': 0.0002918834547346514, 'epoch': 0.03}

  1%|          | 20/2022 [02:40<4:19:48,  7.79s/it]
  1%|          | 21/2022 [02:47<4:18:26,  7.75s/it]
                                                   
{'loss': 1.2977, 'learning_rate': 0.00029063475546305925, 'epoch': 0.03}

  1%|          | 21/2022 [02:47<4:18:26,  7.75s/it]
  1%|          | 22/2022 [02:55<4:20:29,  7.81s/it]
                                                   
{'loss': 1.3874, 'learning_rate': 0.0002893860561914672, 'epoch': 0.03}

  1%|          | 22/2022 [02:55<4:20:29,  7.81s/it]
  1%|          | 23/2022 [03:03<4:21:49,  7.86s/it]
                                                   
{'loss': 1.2443, 'learning_rate': 0.0002881373569198751, 'epoch': 0.03}

  1%|          | 23/2022 [03:03<4:21:49,  7.86s/it]
  1%|          | 24/2022 [03:11<4:20:43,  7.83s/it]
                                                   
{'loss': 1.4007, 'learning_rate': 0.000286888657648283, 'epoch': 0.04}

  1%|          | 24/2022 [03:11<4:20:43,  7.83s/it]
  1%|          | 25/2022 [03:19<4:22:43,  7.89s/it]
                                                   
{'loss': 1.3496, 'learning_rate': 0.0002856399583766909, 'epoch': 0.04}

  1%|          | 25/2022 [03:19<4:22:43,  7.89s/it]
  1%|▏         | 26/2022 [03:27<4:22:19,  7.89s/it]
                                                   
{'loss': 1.5295, 'learning_rate': 0.00028439125910509886, 'epoch': 0.04}

  1%|▏         | 26/2022 [03:27<4:22:19,  7.89s/it]
  1%|▏         | 27/2022 [03:35<4:21:37,  7.87s/it]
                                                   
{'loss': 1.3746, 'learning_rate': 0.0002831425598335067, 'epoch': 0.04}

  1%|▏         | 27/2022 [03:35<4:21:37,  7.87s/it]
  1%|▏         | 28/2022 [03:43<4:22:29,  7.90s/it]
                                                   
{'loss': 1.3915, 'learning_rate': 0.00028189386056191466, 'epoch': 0.04}

  1%|▏         | 28/2022 [03:43<4:22:29,  7.90s/it]
  1%|▏         | 29/2022 [03:50<4:19:03,  7.80s/it]
                                                   
{'loss': 1.2188, 'learning_rate': 0.00028064516129032256, 'epoch': 0.04}

  1%|▏         | 29/2022 [03:50<4:19:03,  7.80s/it]
  1%|▏         | 30/2022 [03:58<4:21:27,  7.88s/it]
                                                   
{'loss': 1.385, 'learning_rate': 0.00027939646201873046, 'epoch': 0.04}

  1%|▏         | 30/2022 [03:58<4:21:27,  7.88s/it]
  2%|▏         | 31/2022 [04:07<4:23:22,  7.94s/it]
                                                   
{'loss': 1.3208, 'learning_rate': 0.00027814776274713836, 'epoch': 0.05}

  2%|▏         | 31/2022 [04:07<4:23:22,  7.94s/it]
  2%|▏         | 32/2022 [04:15<4:25:40,  8.01s/it]
                                                   
{'loss': 1.311, 'learning_rate': 0.00027689906347554626, 'epoch': 0.05}

  2%|▏         | 32/2022 [04:15<4:25:40,  8.01s/it]
  2%|▏         | 33/2022 [04:22<4:22:05,  7.91s/it]
                                                   
{'loss': 1.2505, 'learning_rate': 0.00027565036420395416, 'epoch': 0.05}

  2%|▏         | 33/2022 [04:22<4:22:05,  7.91s/it]
  2%|▏         | 34/2022 [04:30<4:20:06,  7.85s/it]
                                                   
{'loss': 1.3768, 'learning_rate': 0.0002744016649323621, 'epoch': 0.05}

  2%|▏         | 34/2022 [04:30<4:20:06,  7.85s/it]
  2%|▏         | 35/2022 [04:38<4:23:07,  7.95s/it]
                                                   
{'loss': 1.3399, 'learning_rate': 0.00027315296566077, 'epoch': 0.05}

  2%|▏         | 35/2022 [04:38<4:23:07,  7.95s/it]
  2%|▏         | 36/2022 [04:46<4:21:29,  7.90s/it]
                                                   
{'loss': 1.2658, 'learning_rate': 0.0002719042663891779, 'epoch': 0.05}

  2%|▏         | 36/2022 [04:46<4:21:29,  7.90s/it]
  2%|▏         | 37/2022 [04:54<4:19:46,  7.85s/it]
                                                   
{'loss': 1.3099, 'learning_rate': 0.0002706555671175858, 'epoch': 0.05}

  2%|▏         | 37/2022 [04:54<4:19:46,  7.85s/it]
  2%|▏         | 38/2022 [05:02<4:20:12,  7.87s/it]
                                                   
{'loss': 1.3298, 'learning_rate': 0.0002694068678459937, 'epoch': 0.06}

  2%|▏         | 38/2022 [05:02<4:20:12,  7.87s/it]
  2%|▏         | 39/2022 [05:09<4:18:32,  7.82s/it]
                                                   
{'loss': 1.1099, 'learning_rate': 0.0002681581685744016, 'epoch': 0.06}

  2%|▏         | 39/2022 [05:09<4:18:32,  7.82s/it]
  2%|▏         | 40/2022 [05:17<4:18:53,  7.84s/it]
                                                   
{'loss': 1.2084, 'learning_rate': 0.0002669094693028096, 'epoch': 0.06}

  2%|▏         | 40/2022 [05:17<4:18:53,  7.84s/it]
  2%|▏         | 41/2022 [05:25<4:18:07,  7.82s/it]
                                                   
{'loss': 1.3454, 'learning_rate': 0.0002656607700312175, 'epoch': 0.06}

  2%|▏         | 41/2022 [05:25<4:18:07,  7.82s/it]
  2%|▏         | 42/2022 [05:33<4:19:49,  7.87s/it]
                                                   
{'loss': 1.182, 'learning_rate': 0.0002644120707596254, 'epoch': 0.06}

  2%|▏         | 42/2022 [05:33<4:19:49,  7.87s/it]
  2%|▏         | 43/2022 [05:41<4:18:04,  7.82s/it]
                                                   
{'loss': 1.2821, 'learning_rate': 0.0002631633714880333, 'epoch': 0.06}

  2%|▏         | 43/2022 [05:41<4:18:04,  7.82s/it]
  2%|▏         | 44/2022 [05:49<4:18:17,  7.83s/it]
                                                   
{'loss': 1.3321, 'learning_rate': 0.0002619146722164412, 'epoch': 0.07}

  2%|▏         | 44/2022 [05:49<4:18:17,  7.83s/it]
  2%|▏         | 45/2022 [05:56<4:14:03,  7.71s/it]
                                                   
{'loss': 1.2609, 'learning_rate': 0.0002606659729448491, 'epoch': 0.07}

  2%|▏         | 45/2022 [05:56<4:14:03,  7.71s/it]
  2%|▏         | 46/2022 [06:04<4:15:28,  7.76s/it]
                                                   
{'loss': 1.2614, 'learning_rate': 0.000259417273673257, 'epoch': 0.07}

  2%|▏         | 46/2022 [06:04<4:15:28,  7.76s/it]
  2%|▏         | 47/2022 [06:12<4:16:35,  7.80s/it]
                                                   
{'loss': 1.1908, 'learning_rate': 0.00025816857440166493, 'epoch': 0.07}

  2%|▏         | 47/2022 [06:12<4:16:35,  7.80s/it]
  2%|▏         | 48/2022 [06:20<4:20:14,  7.91s/it]
                                                   
{'loss': 1.2909, 'learning_rate': 0.00025691987513007283, 'epoch': 0.07}

  2%|▏         | 48/2022 [06:20<4:20:14,  7.91s/it]
  2%|▏         | 49/2022 [06:28<4:24:53,  8.06s/it]
                                                   
{'loss': 1.2423, 'learning_rate': 0.00025567117585848073, 'epoch': 0.07}

  2%|▏         | 49/2022 [06:28<4:24:53,  8.06s/it]
  2%|▏         | 50/2022 [06:36<4:25:28,  8.08s/it]
                                                   
{'loss': 1.1415, 'learning_rate': 0.00025442247658688863, 'epoch': 0.07}

  2%|▏         | 50/2022 [06:37<4:25:28,  8.08s/it]
  3%|▎         | 51/2022 [06:44<4:22:24,  7.99s/it]
                                                   
{'loss': 1.3807, 'learning_rate': 0.00025317377731529653, 'epoch': 0.08}

  3%|▎         | 51/2022 [06:44<4:22:24,  7.99s/it]
  3%|▎         | 52/2022 [06:52<4:18:50,  7.88s/it]
                                                   
{'loss': 1.2623, 'learning_rate': 0.00025192507804370443, 'epoch': 0.08}

  3%|▎         | 52/2022 [06:52<4:18:50,  7.88s/it]
  3%|▎         | 53/2022 [06:59<4:14:30,  7.76s/it]
                                                   
{'loss': 1.2821, 'learning_rate': 0.0002506763787721124, 'epoch': 0.08}

  3%|▎         | 53/2022 [06:59<4:14:30,  7.76s/it]
  3%|▎         | 54/2022 [07:07<4:10:24,  7.63s/it]
                                                   
{'loss': 1.2406, 'learning_rate': 0.0002494276795005203, 'epoch': 0.08}

  3%|▎         | 54/2022 [07:07<4:10:24,  7.63s/it]
  3%|▎         | 55/2022 [07:14<4:11:32,  7.67s/it]
                                                   
{'loss': 1.2504, 'learning_rate': 0.0002481789802289282, 'epoch': 0.08}

  3%|▎         | 55/2022 [07:15<4:11:32,  7.67s/it]
  3%|▎         | 56/2022 [07:23<4:18:19,  7.88s/it]
                                                   
{'loss': 1.1912, 'learning_rate': 0.0002469302809573361, 'epoch': 0.08}

  3%|▎         | 56/2022 [07:23<4:18:19,  7.88s/it]
  3%|▎         | 57/2022 [07:31<4:18:14,  7.89s/it]
                                                   
{'loss': 1.0911, 'learning_rate': 0.000245681581685744, 'epoch': 0.08}

  3%|▎         | 57/2022 [07:31<4:18:14,  7.89s/it]
  3%|▎         | 58/2022 [07:39<4:21:47,  8.00s/it]
                                                   
{'loss': 1.2113, 'learning_rate': 0.0002444328824141519, 'epoch': 0.09}

  3%|▎         | 58/2022 [07:39<4:21:47,  8.00s/it]
  3%|▎         | 59/2022 [07:46<4:16:42,  7.85s/it]
                                                   
{'loss': 1.3252, 'learning_rate': 0.00024318418314255981, 'epoch': 0.09}

  3%|▎         | 59/2022 [07:47<4:16:42,  7.85s/it]
  3%|▎         | 60/2022 [07:55<4:18:13,  7.90s/it]
                                                   
{'loss': 1.1426, 'learning_rate': 0.00024193548387096771, 'epoch': 0.09}

  3%|▎         | 60/2022 [07:55<4:18:13,  7.90s/it]
  3%|▎         | 61/2022 [08:02<4:15:13,  7.81s/it]
                                                   
{'loss': 1.2809, 'learning_rate': 0.00024068678459937561, 'epoch': 0.09}

  3%|▎         | 61/2022 [08:02<4:15:13,  7.81s/it]
  3%|▎         | 62/2022 [08:10<4:13:28,  7.76s/it]
                                                   
{'loss': 1.2077, 'learning_rate': 0.00023943808532778354, 'epoch': 0.09}

  3%|▎         | 62/2022 [08:10<4:13:28,  7.76s/it]
  3%|▎         | 63/2022 [08:18<4:14:58,  7.81s/it]
                                                   
{'loss': 1.2898, 'learning_rate': 0.00023818938605619144, 'epoch': 0.09}

  3%|▎         | 63/2022 [08:18<4:14:58,  7.81s/it]
  3%|▎         | 64/2022 [08:26<4:15:23,  7.83s/it]
                                                   
{'loss': 1.2199, 'learning_rate': 0.00023694068678459934, 'epoch': 0.09}

  3%|▎         | 64/2022 [08:26<4:15:23,  7.83s/it]
  3%|▎         | 65/2022 [08:33<4:13:50,  7.78s/it]
                                                   
{'loss': 1.3221, 'learning_rate': 0.00023569198751300727, 'epoch': 0.1}

  3%|▎         | 65/2022 [08:33<4:13:50,  7.78s/it]
  3%|▎         | 66/2022 [08:41<4:16:03,  7.85s/it]
                                                   
{'loss': 1.392, 'learning_rate': 0.00023444328824141517, 'epoch': 0.1}

  3%|▎         | 66/2022 [08:41<4:16:03,  7.85s/it]
  3%|▎         | 67/2022 [08:49<4:17:59,  7.92s/it]
                                                   
{'loss': 1.3226, 'learning_rate': 0.00023319458896982307, 'epoch': 0.1}

  3%|▎         | 67/2022 [08:49<4:17:59,  7.92s/it]
  3%|▎         | 68/2022 [08:57<4:19:36,  7.97s/it]
                                                   
{'loss': 1.235, 'learning_rate': 0.000231945889698231, 'epoch': 0.1}

  3%|▎         | 68/2022 [08:57<4:19:36,  7.97s/it]
  3%|▎         | 69/2022 [09:05<4:19:18,  7.97s/it]
                                                   
{'loss': 1.3188, 'learning_rate': 0.0002306971904266389, 'epoch': 0.1}

  3%|▎         | 69/2022 [09:05<4:19:18,  7.97s/it]
  3%|▎         | 70/2022 [09:13<4:16:52,  7.90s/it]
                                                   
{'loss': 1.3334, 'learning_rate': 0.0002294484911550468, 'epoch': 0.1}

  3%|▎         | 70/2022 [09:13<4:16:52,  7.90s/it]
  4%|▎         | 71/2022 [09:21<4:16:55,  7.90s/it]
                                                   
{'loss': 1.199, 'learning_rate': 0.00022819979188345473, 'epoch': 0.11}

  4%|▎         | 71/2022 [09:21<4:16:55,  7.90s/it]
  4%|▎         | 72/2022 [09:29<4:13:06,  7.79s/it]
                                                   
{'loss': 1.229, 'learning_rate': 0.00022695109261186263, 'epoch': 0.11}

  4%|▎         | 72/2022 [09:29<4:13:06,  7.79s/it]
  4%|▎         | 73/2022 [09:36<4:13:14,  7.80s/it]
                                                   
{'loss': 1.2263, 'learning_rate': 0.00022570239334027053, 'epoch': 0.11}

  4%|▎         | 73/2022 [09:36<4:13:14,  7.80s/it]
  4%|▎         | 74/2022 [09:44<4:12:19,  7.77s/it]
                                                   
{'loss': 1.3877, 'learning_rate': 0.00022445369406867843, 'epoch': 0.11}

  4%|▎         | 74/2022 [09:44<4:12:19,  7.77s/it]
  4%|▎         | 75/2022 [09:52<4:12:59,  7.80s/it]
                                                   
{'loss': 1.2693, 'learning_rate': 0.00022320499479708635, 'epoch': 0.11}

  4%|▎         | 75/2022 [09:52<4:12:59,  7.80s/it]
  4%|▍         | 76/2022 [10:00<4:14:06,  7.84s/it]
                                                   
{'loss': 1.3059, 'learning_rate': 0.00022195629552549425, 'epoch': 0.11}

  4%|▍         | 76/2022 [10:00<4:14:06,  7.84s/it]
  4%|▍         | 77/2022 [10:08<4:13:49,  7.83s/it]
                                                   
{'loss': 1.3247, 'learning_rate': 0.00022070759625390215, 'epoch': 0.11}

  4%|▍         | 77/2022 [10:08<4:13:49,  7.83s/it]
  4%|▍         | 78/2022 [10:16<4:13:57,  7.84s/it]
                                                   
{'loss': 1.3356, 'learning_rate': 0.00021945889698231008, 'epoch': 0.12}

  4%|▍         | 78/2022 [10:16<4:13:57,  7.84s/it]
  4%|▍         | 79/2022 [10:23<4:13:25,  7.83s/it]
                                                   
{'loss': 1.205, 'learning_rate': 0.00021821019771071798, 'epoch': 0.12}

  4%|▍         | 79/2022 [10:23<4:13:25,  7.83s/it]
  4%|▍         | 80/2022 [10:31<4:11:44,  7.78s/it]
                                                   
{'loss': 1.2709, 'learning_rate': 0.00021696149843912588, 'epoch': 0.12}

  4%|▍         | 80/2022 [10:31<4:11:44,  7.78s/it]
  4%|▍         | 81/2022 [10:39<4:09:48,  7.72s/it]
                                                   
{'loss': 1.2975, 'learning_rate': 0.0002157127991675338, 'epoch': 0.12}

  4%|▍         | 81/2022 [10:39<4:09:48,  7.72s/it]
  4%|▍         | 82/2022 [10:46<4:06:10,  7.61s/it]
                                                   
{'loss': 1.2262, 'learning_rate': 0.0002144640998959417, 'epoch': 0.12}

  4%|▍         | 82/2022 [10:46<4:06:10,  7.61s/it]
  4%|▍         | 83/2022 [10:54<4:07:40,  7.66s/it]
                                                   
{'loss': 1.2191, 'learning_rate': 0.0002132154006243496, 'epoch': 0.12}

  4%|▍         | 83/2022 [10:54<4:07:40,  7.66s/it]
  4%|▍         | 84/2022 [11:02<4:09:26,  7.72s/it]
                                                   
{'loss': 1.2179, 'learning_rate': 0.00021196670135275754, 'epoch': 0.12}

  4%|▍         | 84/2022 [11:02<4:09:26,  7.72s/it]
  4%|▍         | 85/2022 [11:09<4:09:57,  7.74s/it]
                                                   
{'loss': 1.1837, 'learning_rate': 0.00021071800208116544, 'epoch': 0.13}

  4%|▍         | 85/2022 [11:09<4:09:57,  7.74s/it]
  4%|▍         | 86/2022 [11:17<4:10:21,  7.76s/it]
                                                   
{'loss': 1.2273, 'learning_rate': 0.00020946930280957334, 'epoch': 0.13}

  4%|▍         | 86/2022 [11:17<4:10:21,  7.76s/it]
  4%|▍         | 87/2022 [11:25<4:09:42,  7.74s/it]
                                                   
{'loss': 1.2402, 'learning_rate': 0.00020822060353798127, 'epoch': 0.13}

  4%|▍         | 87/2022 [11:25<4:09:42,  7.74s/it]
  4%|▍         | 88/2022 [11:33<4:11:24,  7.80s/it]
                                                   
{'loss': 1.2526, 'learning_rate': 0.00020697190426638914, 'epoch': 0.13}

  4%|▍         | 88/2022 [11:33<4:11:24,  7.80s/it]
  4%|▍         | 89/2022 [11:41<4:14:08,  7.89s/it]
                                                   
{'loss': 1.1743, 'learning_rate': 0.00020572320499479707, 'epoch': 0.13}

  4%|▍         | 89/2022 [11:41<4:14:08,  7.89s/it]
  4%|▍         | 90/2022 [11:48<4:11:10,  7.80s/it]
                                                   
{'loss': 1.2465, 'learning_rate': 0.000204474505723205, 'epoch': 0.13}

  4%|▍         | 90/2022 [11:49<4:11:10,  7.80s/it]
  5%|▍         | 91/2022 [11:57<4:16:19,  7.96s/it]
                                                   
{'loss': 1.2391, 'learning_rate': 0.00020322580645161287, 'epoch': 0.13}

  5%|▍         | 91/2022 [11:57<4:16:19,  7.96s/it]
  5%|▍         | 92/2022 [12:05<4:14:10,  7.90s/it]
                                                   
{'loss': 1.2397, 'learning_rate': 0.0002019771071800208, 'epoch': 0.14}

  5%|▍         | 92/2022 [12:05<4:14:10,  7.90s/it]
  5%|▍         | 93/2022 [12:12<4:12:03,  7.84s/it]
                                                   
{'loss': 1.2748, 'learning_rate': 0.00020072840790842872, 'epoch': 0.14}

  5%|▍         | 93/2022 [12:12<4:12:03,  7.84s/it]
  5%|▍         | 94/2022 [12:20<4:09:02,  7.75s/it]
                                                   
{'loss': 1.1515, 'learning_rate': 0.0001994797086368366, 'epoch': 0.14}

  5%|▍         | 94/2022 [12:20<4:09:02,  7.75s/it]
  5%|▍         | 95/2022 [12:28<4:12:25,  7.86s/it]
                                                   
{'loss': 1.1176, 'learning_rate': 0.00019823100936524452, 'epoch': 0.14}

  5%|▍         | 95/2022 [12:28<4:12:25,  7.86s/it]
  5%|▍         | 96/2022 [12:36<4:11:02,  7.82s/it]
                                                   
{'loss': 1.3444, 'learning_rate': 0.00019698231009365245, 'epoch': 0.14}

  5%|▍         | 96/2022 [12:36<4:11:02,  7.82s/it]
  5%|▍         | 97/2022 [12:44<4:11:24,  7.84s/it]
                                                   
{'loss': 1.1987, 'learning_rate': 0.00019573361082206032, 'epoch': 0.14}

  5%|▍         | 97/2022 [12:44<4:11:24,  7.84s/it]
  5%|▍         | 98/2022 [12:51<4:09:25,  7.78s/it]
                                                   
{'loss': 1.195, 'learning_rate': 0.00019448491155046825, 'epoch': 0.15}

  5%|▍         | 98/2022 [12:51<4:09:25,  7.78s/it]
  5%|▍         | 99/2022 [12:59<4:08:34,  7.76s/it]
                                                   
{'loss': 1.1188, 'learning_rate': 0.00019323621227887618, 'epoch': 0.15}

  5%|▍         | 99/2022 [12:59<4:08:34,  7.76s/it]
  5%|▍         | 100/2022 [13:07<4:09:54,  7.80s/it]
                                                    
{'loss': 1.2533, 'learning_rate': 0.00019198751300728405, 'epoch': 0.15}

  5%|▍         | 100/2022 [13:07<4:09:54,  7.80s/it]
  5%|▍         | 101/2022 [13:15<4:09:43,  7.80s/it]
                                                    
{'loss': 1.2439, 'learning_rate': 0.00019073881373569198, 'epoch': 0.15}

  5%|▍         | 101/2022 [13:15<4:09:43,  7.80s/it]
  5%|▌         | 102/2022 [13:23<4:16:09,  8.00s/it]
                                                    
{'loss': 1.2167, 'learning_rate': 0.0001894901144640999, 'epoch': 0.15}

  5%|▌         | 102/2022 [13:23<4:16:09,  8.00s/it]
  5%|▌         | 103/2022 [13:31<4:10:55,  7.85s/it]
                                                    
{'loss': 1.219, 'learning_rate': 0.00018824141519250778, 'epoch': 0.15}

  5%|▌         | 103/2022 [13:31<4:10:55,  7.85s/it]
  5%|▌         | 104/2022 [13:39<4:12:30,  7.90s/it]
                                                    
{'loss': 1.281, 'learning_rate': 0.0001869927159209157, 'epoch': 0.15}

  5%|▌         | 104/2022 [13:39<4:12:30,  7.90s/it]
  5%|▌         | 105/2022 [13:46<4:12:24,  7.90s/it]
                                                    
{'loss': 1.2078, 'learning_rate': 0.00018574401664932358, 'epoch': 0.16}

  5%|▌         | 105/2022 [13:46<4:12:24,  7.90s/it]
  5%|▌         | 106/2022 [13:54<4:12:07,  7.90s/it]
                                                    
{'loss': 1.2394, 'learning_rate': 0.0001844953173777315, 'epoch': 0.16}

  5%|▌         | 106/2022 [13:54<4:12:07,  7.90s/it]
  5%|▌         | 107/2022 [14:02<4:10:41,  7.85s/it]
                                                    
{'loss': 1.1848, 'learning_rate': 0.00018324661810613943, 'epoch': 0.16}

  5%|▌         | 107/2022 [14:02<4:10:41,  7.85s/it]
  5%|▌         | 108/2022 [14:10<4:08:24,  7.79s/it]
                                                    
{'loss': 1.143, 'learning_rate': 0.0001819979188345473, 'epoch': 0.16}

  5%|▌         | 108/2022 [14:10<4:08:24,  7.79s/it]
  5%|▌         | 109/2022 [14:18<4:13:52,  7.96s/it]
                                                    
{'loss': 1.3097, 'learning_rate': 0.00018074921956295523, 'epoch': 0.16}

  5%|▌         | 109/2022 [14:18<4:13:52,  7.96s/it]
  5%|▌         | 110/2022 [14:26<4:11:52,  7.90s/it]
                                                    
{'loss': 1.2561, 'learning_rate': 0.00017950052029136316, 'epoch': 0.16}

  5%|▌         | 110/2022 [14:26<4:11:52,  7.90s/it]
  5%|▌         | 111/2022 [14:34<4:10:02,  7.85s/it]
                                                    
{'loss': 1.2177, 'learning_rate': 0.00017825182101977103, 'epoch': 0.16}

  5%|▌         | 111/2022 [14:34<4:10:02,  7.85s/it]
  6%|▌         | 112/2022 [14:42<4:15:24,  8.02s/it]
                                                    
{'loss': 1.2049, 'learning_rate': 0.00017700312174817896, 'epoch': 0.17}

  6%|▌         | 112/2022 [14:42<4:15:24,  8.02s/it]
  6%|▌         | 113/2022 [14:50<4:12:23,  7.93s/it]
                                                    
{'loss': 1.2621, 'learning_rate': 0.0001757544224765869, 'epoch': 0.17}

  6%|▌         | 113/2022 [14:50<4:12:23,  7.93s/it]
  6%|▌         | 114/2022 [14:57<4:09:57,  7.86s/it]
                                                    
{'loss': 1.1098, 'learning_rate': 0.00017450572320499476, 'epoch': 0.17}

  6%|▌         | 114/2022 [14:57<4:09:57,  7.86s/it]
  6%|▌         | 115/2022 [15:05<4:10:44,  7.89s/it]
                                                    
{'loss': 1.2416, 'learning_rate': 0.0001732570239334027, 'epoch': 0.17}

  6%|▌         | 115/2022 [15:05<4:10:44,  7.89s/it]
  6%|▌         | 116/2022 [15:13<4:09:26,  7.85s/it]
                                                    
{'loss': 1.2561, 'learning_rate': 0.00017200832466181062, 'epoch': 0.17}

  6%|▌         | 116/2022 [15:13<4:09:26,  7.85s/it]
  6%|▌         | 117/2022 [15:21<4:08:55,  7.84s/it]
                                                    
{'loss': 1.2926, 'learning_rate': 0.0001707596253902185, 'epoch': 0.17}

  6%|▌         | 117/2022 [15:21<4:08:55,  7.84s/it]
  6%|▌         | 118/2022 [15:29<4:06:15,  7.76s/it]
                                                    
{'loss': 1.042, 'learning_rate': 0.00016951092611862642, 'epoch': 0.17}

  6%|▌         | 118/2022 [15:29<4:06:15,  7.76s/it]
  6%|▌         | 119/2022 [15:37<4:10:58,  7.91s/it]
                                                    
{'loss': 1.1691, 'learning_rate': 0.00016826222684703432, 'epoch': 0.18}

  6%|▌         | 119/2022 [15:37<4:10:58,  7.91s/it]
  6%|▌         | 120/2022 [15:45<4:11:41,  7.94s/it]
                                                    
{'loss': 1.2659, 'learning_rate': 0.00016701352757544222, 'epoch': 0.18}

  6%|▌         | 120/2022 [15:45<4:11:41,  7.94s/it]
  6%|▌         | 121/2022 [15:53<4:11:02,  7.92s/it]
                                                    
{'loss': 1.2825, 'learning_rate': 0.00016576482830385014, 'epoch': 0.18}

  6%|▌         | 121/2022 [15:53<4:11:02,  7.92s/it]
  6%|▌         | 122/2022 [16:00<4:06:51,  7.80s/it]
                                                    
{'loss': 1.1293, 'learning_rate': 0.00016451612903225804, 'epoch': 0.18}

  6%|▌         | 122/2022 [16:00<4:06:51,  7.80s/it]
  6%|▌         | 123/2022 [16:08<4:08:23,  7.85s/it]
                                                    
{'loss': 1.2542, 'learning_rate': 0.00016326742976066595, 'epoch': 0.18}

  6%|▌         | 123/2022 [16:08<4:08:23,  7.85s/it]
  6%|▌         | 124/2022 [16:16<4:08:37,  7.86s/it]
                                                    
{'loss': 1.2081, 'learning_rate': 0.00016201873048907387, 'epoch': 0.18}

  6%|▌         | 124/2022 [16:16<4:08:37,  7.86s/it]
  6%|▌         | 125/2022 [16:24<4:07:22,  7.82s/it]
                                                    
{'loss': 1.2652, 'learning_rate': 0.00016077003121748177, 'epoch': 0.19}

  6%|▌         | 125/2022 [16:24<4:07:22,  7.82s/it]
  6%|▌         | 126/2022 [16:32<4:06:07,  7.79s/it]
                                                    
{'loss': 1.3019, 'learning_rate': 0.00015952133194588967, 'epoch': 0.19}

  6%|▌         | 126/2022 [16:32<4:06:07,  7.79s/it]
  6%|▋         | 127/2022 [16:39<4:07:04,  7.82s/it]
                                                    
{'loss': 1.3047, 'learning_rate': 0.0001582726326742976, 'epoch': 0.19}

  6%|▋         | 127/2022 [16:39<4:07:04,  7.82s/it]
  6%|▋         | 128/2022 [16:47<4:08:19,  7.87s/it]
                                                    
{'loss': 1.1237, 'learning_rate': 0.0001570239334027055, 'epoch': 0.19}

  6%|▋         | 128/2022 [16:47<4:08:19,  7.87s/it]
  6%|▋         | 129/2022 [16:55<4:06:52,  7.83s/it]
                                                    
{'loss': 1.1581, 'learning_rate': 0.0001557752341311134, 'epoch': 0.19}

  6%|▋         | 129/2022 [16:55<4:06:52,  7.83s/it]
  6%|▋         | 130/2022 [17:03<4:05:27,  7.78s/it]
                                                    
{'loss': 1.3299, 'learning_rate': 0.00015452653485952133, 'epoch': 0.19}

  6%|▋         | 130/2022 [17:03<4:05:27,  7.78s/it]
  6%|▋         | 131/2022 [17:10<4:02:01,  7.68s/it]
                                                    
{'loss': 1.191, 'learning_rate': 0.00015327783558792923, 'epoch': 0.19}

  6%|▋         | 131/2022 [17:10<4:02:01,  7.68s/it]
  7%|▋         | 132/2022 [17:18<3:58:37,  7.58s/it]
                                                    
{'loss': 1.1506, 'learning_rate': 0.00015202913631633713, 'epoch': 0.2}

  7%|▋         | 132/2022 [17:18<3:58:37,  7.58s/it]
  7%|▋         | 133/2022 [17:25<4:00:41,  7.65s/it]
                                                    
{'loss': 1.3452, 'learning_rate': 0.00015078043704474503, 'epoch': 0.2}

  7%|▋         | 133/2022 [17:25<4:00:41,  7.65s/it]
  7%|▋         | 134/2022 [17:33<4:03:14,  7.73s/it]
                                                    
{'loss': 1.1319, 'learning_rate': 0.00014953173777315296, 'epoch': 0.2}

  7%|▋         | 134/2022 [17:33<4:03:14,  7.73s/it]
  7%|▋         | 135/2022 [17:41<3:59:12,  7.61s/it]
                                                    
{'loss': 1.3945, 'learning_rate': 0.00014828303850156086, 'epoch': 0.2}

  7%|▋         | 135/2022 [17:41<3:59:12,  7.61s/it]
  7%|▋         | 136/2022 [17:48<3:57:09,  7.54s/it]
                                                    
{'loss': 1.2335, 'learning_rate': 0.00014703433922996878, 'epoch': 0.2}

  7%|▋         | 136/2022 [17:48<3:57:09,  7.54s/it]
  7%|▋         | 137/2022 [17:56<3:57:04,  7.55s/it]
                                                    
{'loss': 1.1988, 'learning_rate': 0.00014578563995837668, 'epoch': 0.2}

  7%|▋         | 137/2022 [17:56<3:57:04,  7.55s/it]
  7%|▋         | 138/2022 [18:04<4:02:29,  7.72s/it]
                                                    
{'loss': 1.228, 'learning_rate': 0.00014453694068678458, 'epoch': 0.2}

  7%|▋         | 138/2022 [18:04<4:02:29,  7.72s/it]
  7%|▋         | 139/2022 [18:11<4:01:29,  7.70s/it]
                                                    
{'loss': 1.1, 'learning_rate': 0.0001432882414151925, 'epoch': 0.21}

  7%|▋         | 139/2022 [18:11<4:01:29,  7.70s/it]
  7%|▋         | 140/2022 [18:20<4:05:58,  7.84s/it]
                                                    
{'loss': 1.1687, 'learning_rate': 0.0001420395421436004, 'epoch': 0.21}

  7%|▋         | 140/2022 [18:20<4:05:58,  7.84s/it]
  7%|▋         | 141/2022 [18:27<4:04:39,  7.80s/it]
                                                    
{'loss': 1.248, 'learning_rate': 0.0001407908428720083, 'epoch': 0.21}

  7%|▋         | 141/2022 [18:27<4:04:39,  7.80s/it]
  7%|▋         | 142/2022 [18:35<4:04:48,  7.81s/it]
                                                    
{'loss': 1.0921, 'learning_rate': 0.00013954214360041624, 'epoch': 0.21}

  7%|▋         | 142/2022 [18:35<4:04:48,  7.81s/it]
  7%|▋         | 143/2022 [18:42<4:00:47,  7.69s/it]
                                                    
{'loss': 1.0645, 'learning_rate': 0.00013829344432882414, 'epoch': 0.21}

  7%|▋         | 143/2022 [18:43<4:00:47,  7.69s/it]
  7%|▋         | 144/2022 [18:50<4:01:21,  7.71s/it]
                                                    
{'loss': 1.2415, 'learning_rate': 0.00013704474505723204, 'epoch': 0.21}

  7%|▋         | 144/2022 [18:50<4:01:21,  7.71s/it]
  7%|▋         | 145/2022 [18:59<4:07:34,  7.91s/it]
                                                    
{'loss': 1.1341, 'learning_rate': 0.00013579604578563994, 'epoch': 0.21}

  7%|▋         | 145/2022 [18:59<4:07:34,  7.91s/it]
  7%|▋         | 146/2022 [19:06<4:05:34,  7.85s/it]
                                                    
{'loss': 1.2017, 'learning_rate': 0.00013454734651404787, 'epoch': 0.22}

  7%|▋         | 146/2022 [19:06<4:05:34,  7.85s/it]
  7%|▋         | 147/2022 [19:14<4:02:06,  7.75s/it]
                                                    
{'loss': 1.2595, 'learning_rate': 0.00013329864724245577, 'epoch': 0.22}

  7%|▋         | 147/2022 [19:14<4:02:06,  7.75s/it]
  7%|▋         | 148/2022 [19:21<4:00:44,  7.71s/it]
                                                    
{'loss': 1.2795, 'learning_rate': 0.00013204994797086367, 'epoch': 0.22}

  7%|▋         | 148/2022 [19:21<4:00:44,  7.71s/it]
  7%|▋         | 149/2022 [19:29<4:00:53,  7.72s/it]
                                                    
{'loss': 1.1865, 'learning_rate': 0.0001308012486992716, 'epoch': 0.22}

  7%|▋         | 149/2022 [19:29<4:00:53,  7.72s/it]
  7%|▋         | 150/2022 [19:37<4:00:25,  7.71s/it]
                                                    
{'loss': 1.2782, 'learning_rate': 0.0001295525494276795, 'epoch': 0.22}

  7%|▋         | 150/2022 [19:37<4:00:25,  7.71s/it]
  7%|▋         | 151/2022 [19:45<4:00:30,  7.71s/it]
                                                    
{'loss': 1.1908, 'learning_rate': 0.0001283038501560874, 'epoch': 0.22}

  7%|▋         | 151/2022 [19:45<4:00:30,  7.71s/it]
  8%|▊         | 152/2022 [19:53<4:03:09,  7.80s/it]
                                                    
{'loss': 1.4632, 'learning_rate': 0.0001270551508844953, 'epoch': 0.23}

  8%|▊         | 152/2022 [19:53<4:03:09,  7.80s/it]
  8%|▊         | 153/2022 [20:01<4:04:46,  7.86s/it]
                                                    
{'loss': 1.1081, 'learning_rate': 0.00012580645161290322, 'epoch': 0.23}

  8%|▊         | 153/2022 [20:01<4:04:46,  7.86s/it]
  8%|▊         | 154/2022 [20:09<4:05:18,  7.88s/it]
                                                    
{'loss': 1.1578, 'learning_rate': 0.00012455775234131112, 'epoch': 0.23}

  8%|▊         | 154/2022 [20:09<4:05:18,  7.88s/it]
  8%|▊         | 155/2022 [20:16<4:01:32,  7.76s/it]
                                                    
{'loss': 1.2025, 'learning_rate': 0.00012330905306971902, 'epoch': 0.23}

  8%|▊         | 155/2022 [20:16<4:01:32,  7.76s/it]
  8%|▊         | 156/2022 [20:24<4:00:12,  7.72s/it]
                                                    
{'loss': 1.1722, 'learning_rate': 0.00012206035379812695, 'epoch': 0.23}

  8%|▊         | 156/2022 [20:24<4:00:12,  7.72s/it]
  8%|▊         | 157/2022 [20:32<4:06:51,  7.94s/it]
                                                    
{'loss': 1.1656, 'learning_rate': 0.00012081165452653485, 'epoch': 0.23}

  8%|▊         | 157/2022 [20:32<4:06:51,  7.94s/it]
  8%|▊         | 158/2022 [20:40<4:04:21,  7.87s/it]
                                                    
{'loss': 1.1803, 'learning_rate': 0.00011956295525494275, 'epoch': 0.23}

  8%|▊         | 158/2022 [20:40<4:04:21,  7.87s/it]
  8%|▊         | 159/2022 [20:47<3:59:23,  7.71s/it]
                                                    
{'loss': 1.338, 'learning_rate': 0.00011831425598335065, 'epoch': 0.24}

  8%|▊         | 159/2022 [20:47<3:59:23,  7.71s/it]
  8%|▊         | 160/2022 [20:55<3:59:31,  7.72s/it]
                                                    
{'loss': 1.1766, 'learning_rate': 0.00011706555671175858, 'epoch': 0.24}

  8%|▊         | 160/2022 [20:55<3:59:31,  7.72s/it]
  8%|▊         | 161/2022 [21:03<4:00:07,  7.74s/it]
                                                    
{'loss': 1.1073, 'learning_rate': 0.00011581685744016648, 'epoch': 0.24}

  8%|▊         | 161/2022 [21:03<4:00:07,  7.74s/it]
  8%|▊         | 162/2022 [21:11<4:02:24,  7.82s/it]
                                                    
{'loss': 1.2543, 'learning_rate': 0.00011456815816857438, 'epoch': 0.24}

  8%|▊         | 162/2022 [21:11<4:02:24,  7.82s/it]
  8%|▊         | 163/2022 [21:19<4:03:35,  7.86s/it]
                                                    
{'loss': 1.196, 'learning_rate': 0.00011331945889698231, 'epoch': 0.24}

  8%|▊         | 163/2022 [21:19<4:03:35,  7.86s/it]
  8%|▊         | 164/2022 [21:26<4:02:14,  7.82s/it]
                                                    
{'loss': 1.1328, 'learning_rate': 0.00011207075962539021, 'epoch': 0.24}

  8%|▊         | 164/2022 [21:26<4:02:14,  7.82s/it]
  8%|▊         | 165/2022 [21:34<4:00:23,  7.77s/it]
                                                    
{'loss': 1.1829, 'learning_rate': 0.00011082206035379811, 'epoch': 0.24}

  8%|▊         | 165/2022 [21:34<4:00:23,  7.77s/it]
  8%|▊         | 166/2022 [21:42<3:58:52,  7.72s/it]
                                                    
{'loss': 1.1805, 'learning_rate': 0.00010957336108220602, 'epoch': 0.25}

  8%|▊         | 166/2022 [21:42<3:58:52,  7.72s/it]
  8%|▊         | 167/2022 [21:49<3:58:54,  7.73s/it]
                                                    
{'loss': 1.2008, 'learning_rate': 0.00010832466181061394, 'epoch': 0.25}

  8%|▊         | 167/2022 [21:49<3:58:54,  7.73s/it]
  8%|▊         | 168/2022 [21:57<3:57:34,  7.69s/it]
                                                    
{'loss': 1.1274, 'learning_rate': 0.00010707596253902184, 'epoch': 0.25}

  8%|▊         | 168/2022 [21:57<3:57:34,  7.69s/it]
  8%|▊         | 169/2022 [22:04<3:55:54,  7.64s/it]
                                                    
{'loss': 1.1755, 'learning_rate': 0.00010582726326742975, 'epoch': 0.25}

  8%|▊         | 169/2022 [22:04<3:55:54,  7.64s/it]
  8%|▊         | 170/2022 [22:12<3:55:48,  7.64s/it]
                                                    
{'loss': 1.1228, 'learning_rate': 0.00010457856399583766, 'epoch': 0.25}

  8%|▊         | 170/2022 [22:12<3:55:48,  7.64s/it]
  8%|▊         | 171/2022 [22:20<3:54:53,  7.61s/it]
                                                    
{'loss': 1.105, 'learning_rate': 0.00010332986472424556, 'epoch': 0.25}

  8%|▊         | 171/2022 [22:20<3:54:53,  7.61s/it]
  9%|▊         | 172/2022 [22:28<3:58:13,  7.73s/it]
                                                    
{'loss': 1.3069, 'learning_rate': 0.00010208116545265348, 'epoch': 0.26}

  9%|▊         | 172/2022 [22:28<3:58:13,  7.73s/it]
  9%|▊         | 173/2022 [22:36<4:00:04,  7.79s/it]
                                                    
{'loss': 1.2266, 'learning_rate': 0.00010083246618106138, 'epoch': 0.26}

  9%|▊         | 173/2022 [22:36<4:00:04,  7.79s/it]
  9%|▊         | 174/2022 [22:43<4:00:23,  7.81s/it]
                                                    
{'loss': 1.1277, 'learning_rate': 9.958376690946929e-05, 'epoch': 0.26}

  9%|▊         | 174/2022 [22:43<4:00:23,  7.81s/it]
  9%|▊         | 175/2022 [22:52<4:03:22,  7.91s/it]
                                                    
{'loss': 1.1535, 'learning_rate': 9.83350676378772e-05, 'epoch': 0.26}

  9%|▊         | 175/2022 [22:52<4:03:22,  7.91s/it]
  9%|▊         | 176/2022 [22:59<4:00:39,  7.82s/it]
                                                    
{'loss': 1.1008, 'learning_rate': 9.70863683662851e-05, 'epoch': 0.26}

  9%|▊         | 176/2022 [22:59<4:00:39,  7.82s/it]
  9%|▉         | 177/2022 [23:07<4:00:52,  7.83s/it]
                                                    
{'loss': 1.2211, 'learning_rate': 9.583766909469302e-05, 'epoch': 0.26}

  9%|▉         | 177/2022 [23:07<4:00:52,  7.83s/it]
  9%|▉         | 178/2022 [23:15<3:59:51,  7.80s/it]
                                                    
{'loss': 1.2919, 'learning_rate': 9.458896982310093e-05, 'epoch': 0.26}

  9%|▉         | 178/2022 [23:15<3:59:51,  7.80s/it]
  9%|▉         | 179/2022 [23:23<4:00:31,  7.83s/it]
                                                    
{'loss': 1.201, 'learning_rate': 9.334027055150883e-05, 'epoch': 0.27}

  9%|▉         | 179/2022 [23:23<4:00:31,  7.83s/it]
  9%|▉         | 180/2022 [23:31<4:03:18,  7.93s/it]
                                                    
{'loss': 1.013, 'learning_rate': 9.209157127991673e-05, 'epoch': 0.27}

  9%|▉         | 180/2022 [23:31<4:03:18,  7.93s/it]
  9%|▉         | 181/2022 [23:39<4:01:24,  7.87s/it]
                                                    
{'loss': 1.236, 'learning_rate': 9.084287200832466e-05, 'epoch': 0.27}

  9%|▉         | 181/2022 [23:39<4:01:24,  7.87s/it]
  9%|▉         | 182/2022 [23:46<4:00:12,  7.83s/it]
                                                    
{'loss': 1.1252, 'learning_rate': 8.959417273673256e-05, 'epoch': 0.27}

  9%|▉         | 182/2022 [23:46<4:00:12,  7.83s/it]
  9%|▉         | 183/2022 [23:54<3:56:37,  7.72s/it]
                                                    
{'loss': 1.1485, 'learning_rate': 8.834547346514046e-05, 'epoch': 0.27}

  9%|▉         | 183/2022 [23:54<3:56:37,  7.72s/it]
  9%|▉         | 184/2022 [24:02<3:56:42,  7.73s/it]
                                                    
{'loss': 1.2576, 'learning_rate': 8.709677419354839e-05, 'epoch': 0.27}

  9%|▉         | 184/2022 [24:02<3:56:42,  7.73s/it]
  9%|▉         | 185/2022 [24:09<3:54:33,  7.66s/it]
                                                    
{'loss': 1.1772, 'learning_rate': 8.584807492195629e-05, 'epoch': 0.27}

  9%|▉         | 185/2022 [24:09<3:54:33,  7.66s/it]
  9%|▉         | 186/2022 [24:17<3:56:12,  7.72s/it]
                                                    
{'loss': 1.1984, 'learning_rate': 8.459937565036419e-05, 'epoch': 0.28}

  9%|▉         | 186/2022 [24:17<3:56:12,  7.72s/it]
  9%|▉         | 187/2022 [24:25<3:56:14,  7.72s/it]
                                                    
{'loss': 1.1704, 'learning_rate': 8.335067637877212e-05, 'epoch': 0.28}

  9%|▉         | 187/2022 [24:25<3:56:14,  7.72s/it]
  9%|▉         | 188/2022 [24:32<3:53:21,  7.63s/it]
                                                    
{'loss': 1.4028, 'learning_rate': 8.210197710718002e-05, 'epoch': 0.28}

  9%|▉         | 188/2022 [24:32<3:53:21,  7.63s/it]
  9%|▉         | 189/2022 [24:40<3:53:11,  7.63s/it]
                                                    
{'loss': 1.0775, 'learning_rate': 8.085327783558792e-05, 'epoch': 0.28}

  9%|▉         | 189/2022 [24:40<3:53:11,  7.63s/it]
  9%|▉         | 190/2022 [24:47<3:54:06,  7.67s/it]
                                                    
{'loss': 1.142, 'learning_rate': 7.960457856399583e-05, 'epoch': 0.28}

  9%|▉         | 190/2022 [24:47<3:54:06,  7.67s/it]
  9%|▉         | 191/2022 [24:55<3:51:28,  7.59s/it]
                                                    
{'loss': 1.1912, 'learning_rate': 7.835587929240374e-05, 'epoch': 0.28}

  9%|▉         | 191/2022 [24:55<3:51:28,  7.59s/it]
  9%|▉         | 192/2022 [25:03<3:54:18,  7.68s/it]
                                                    
{'loss': 1.3236, 'learning_rate': 7.710718002081164e-05, 'epoch': 0.28}

  9%|▉         | 192/2022 [25:03<3:54:18,  7.68s/it]
 10%|▉         | 193/2022 [25:11<3:58:12,  7.81s/it]
                                                    
{'loss': 1.1313, 'learning_rate': 7.585848074921956e-05, 'epoch': 0.29}

 10%|▉         | 193/2022 [25:11<3:58:12,  7.81s/it]
 10%|▉         | 194/2022 [25:19<3:58:17,  7.82s/it]
                                                    
{'loss': 1.1483, 'learning_rate': 7.460978147762746e-05, 'epoch': 0.29}

 10%|▉         | 194/2022 [25:19<3:58:17,  7.82s/it]
 10%|▉         | 195/2022 [25:26<3:53:48,  7.68s/it]
                                                    
{'loss': 0.9911, 'learning_rate': 7.336108220603537e-05, 'epoch': 0.29}

 10%|▉         | 195/2022 [25:26<3:53:48,  7.68s/it]
 10%|▉         | 196/2022 [25:34<3:55:09,  7.73s/it]
                                                    
{'loss': 1.1088, 'learning_rate': 7.211238293444329e-05, 'epoch': 0.29}

 10%|▉         | 196/2022 [25:34<3:55:09,  7.73s/it]
 10%|▉         | 197/2022 [25:42<3:54:09,  7.70s/it]
                                                    
{'loss': 1.148, 'learning_rate': 7.086368366285119e-05, 'epoch': 0.29}

 10%|▉         | 197/2022 [25:42<3:54:09,  7.70s/it]
 10%|▉         | 198/2022 [25:49<3:56:16,  7.77s/it]
                                                    
{'loss': 1.0885, 'learning_rate': 6.96149843912591e-05, 'epoch': 0.29}

 10%|▉         | 198/2022 [25:49<3:56:16,  7.77s/it]
 10%|▉         | 199/2022 [25:57<3:57:28,  7.82s/it]
                                                    
{'loss': 1.1366, 'learning_rate': 6.836628511966701e-05, 'epoch': 0.3}

 10%|▉         | 199/2022 [25:57<3:57:28,  7.82s/it]
 10%|▉         | 200/2022 [26:05<3:55:45,  7.76s/it]
                                                    
{'loss': 1.2184, 'learning_rate': 6.711758584807491e-05, 'epoch': 0.3}

 10%|▉         | 200/2022 [26:05<3:55:45,  7.76s/it]
 10%|▉         | 201/2022 [26:13<3:57:02,  7.81s/it]
                                                    
{'loss': 1.2912, 'learning_rate': 6.586888657648283e-05, 'epoch': 0.3}

 10%|▉         | 201/2022 [26:13<3:57:02,  7.81s/it]
 10%|▉         | 202/2022 [26:21<3:55:08,  7.75s/it]
                                                    
{'loss': 1.1561, 'learning_rate': 6.462018730489074e-05, 'epoch': 0.3}

 10%|▉         | 202/2022 [26:21<3:55:08,  7.75s/it]
 10%|█         | 203/2022 [26:29<3:59:56,  7.91s/it]
                                                    
{'loss': 1.2104, 'learning_rate': 6.337148803329864e-05, 'epoch': 0.3}

 10%|█         | 203/2022 [26:29<3:59:56,  7.91s/it]
 10%|█         | 204/2022 [26:36<3:57:22,  7.83s/it]
                                                    
{'loss': 1.3148, 'learning_rate': 6.212278876170656e-05, 'epoch': 0.3}

 10%|█         | 204/2022 [26:37<3:57:22,  7.83s/it]
 10%|█         | 205/2022 [26:44<3:55:00,  7.76s/it]
                                                    
{'loss': 1.3829, 'learning_rate': 6.0874089490114456e-05, 'epoch': 0.3}

 10%|█         | 205/2022 [26:44<3:55:00,  7.76s/it]
 10%|█         | 206/2022 [26:52<3:54:03,  7.73s/it]
                                                    
{'loss': 1.0826, 'learning_rate': 5.962539021852237e-05, 'epoch': 0.31}

 10%|█         | 206/2022 [26:52<3:54:03,  7.73s/it]
 10%|█         | 207/2022 [26:59<3:53:43,  7.73s/it]
                                                    
{'loss': 1.2471, 'learning_rate': 5.837669094693028e-05, 'epoch': 0.31}

 10%|█         | 207/2022 [26:59<3:53:43,  7.73s/it]
 10%|█         | 208/2022 [27:08<3:56:55,  7.84s/it]
                                                    
{'loss': 0.968, 'learning_rate': 5.7127991675338184e-05, 'epoch': 0.31}

 10%|█         | 208/2022 [27:08<3:56:55,  7.84s/it]
 10%|█         | 209/2022 [27:16<3:59:17,  7.92s/it]
                                                    
{'loss': 1.2604, 'learning_rate': 5.58792924037461e-05, 'epoch': 0.31}

 10%|█         | 209/2022 [27:16<3:59:17,  7.92s/it]
 10%|█         | 210/2022 [27:23<3:57:34,  7.87s/it]
                                                    
{'loss': 1.181, 'learning_rate': 5.4630593132154e-05, 'epoch': 0.31}

 10%|█         | 210/2022 [27:23<3:57:34,  7.87s/it]
 10%|█         | 211/2022 [27:31<3:56:00,  7.82s/it]
                                                    
{'loss': 1.2648, 'learning_rate': 5.338189386056191e-05, 'epoch': 0.31}

 10%|█         | 211/2022 [27:31<3:56:00,  7.82s/it]
 10%|█         | 212/2022 [27:39<3:54:32,  7.77s/it]
                                                    
{'loss': 1.1182, 'learning_rate': 5.213319458896981e-05, 'epoch': 0.31}

 10%|█         | 212/2022 [27:39<3:54:32,  7.77s/it]
 11%|█         | 213/2022 [27:46<3:52:23,  7.71s/it]
                                                    
{'loss': 1.19, 'learning_rate': 5.0884495317377726e-05, 'epoch': 0.32}

 11%|█         | 213/2022 [27:46<3:52:23,  7.71s/it]
 11%|█         | 214/2022 [27:54<3:52:49,  7.73s/it]
                                                    
{'loss': 1.1172, 'learning_rate': 4.963579604578564e-05, 'epoch': 0.32}

 11%|█         | 214/2022 [27:54<3:52:49,  7.73s/it]
 11%|█         | 215/2022 [28:02<3:53:13,  7.74s/it]
                                                    
{'loss': 1.2356, 'learning_rate': 4.838709677419354e-05, 'epoch': 0.32}

 11%|█         | 215/2022 [28:02<3:53:13,  7.74s/it]
 11%|█         | 216/2022 [28:09<3:51:15,  7.68s/it]
                                                    
{'loss': 1.0922, 'learning_rate': 4.7138397502601454e-05, 'epoch': 0.32}

 11%|█         | 216/2022 [28:09<3:51:15,  7.68s/it]
 11%|█         | 217/2022 [28:17<3:50:04,  7.65s/it]
                                                    
{'loss': 1.0768, 'learning_rate': 4.588969823100936e-05, 'epoch': 0.32}

 11%|█         | 217/2022 [28:17<3:50:04,  7.65s/it]
 11%|█         | 218/2022 [28:25<3:51:13,  7.69s/it]
                                                    
{'loss': 1.2771, 'learning_rate': 4.464099895941727e-05, 'epoch': 0.32}

 11%|█         | 218/2022 [28:25<3:51:13,  7.69s/it]
 11%|█         | 219/2022 [28:32<3:48:21,  7.60s/it]
                                                    
{'loss': 1.2137, 'learning_rate': 4.3392299687825175e-05, 'epoch': 0.32}

 11%|█         | 219/2022 [28:32<3:48:21,  7.60s/it]
 11%|█         | 220/2022 [28:40<3:50:31,  7.68s/it]
                                                    
{'loss': 1.1861, 'learning_rate': 4.214360041623309e-05, 'epoch': 0.33}

 11%|█         | 220/2022 [28:40<3:50:31,  7.68s/it]
 11%|█         | 221/2022 [28:48<3:57:11,  7.90s/it]
                                                    
{'loss': 1.0871, 'learning_rate': 4.0894901144640996e-05, 'epoch': 0.33}

 11%|█         | 221/2022 [28:48<3:57:11,  7.90s/it]
 11%|█         | 222/2022 [28:56<3:56:35,  7.89s/it]
                                                    
{'loss': 1.2751, 'learning_rate': 3.96462018730489e-05, 'epoch': 0.33}

 11%|█         | 222/2022 [28:56<3:56:35,  7.89s/it]
 11%|█         | 223/2022 [29:04<3:57:41,  7.93s/it]
                                                    
{'loss': 1.2052, 'learning_rate': 3.839750260145682e-05, 'epoch': 0.33}

 11%|█         | 223/2022 [29:04<3:57:41,  7.93s/it]
 11%|█         | 224/2022 [29:12<3:55:42,  7.87s/it]
                                                    
{'loss': 1.3022, 'learning_rate': 3.7148803329864724e-05, 'epoch': 0.33}

 11%|█         | 224/2022 [29:12<3:55:42,  7.87s/it]
 11%|█         | 225/2022 [29:20<3:53:56,  7.81s/it]
                                                    
{'loss': 1.1488, 'learning_rate': 3.590010405827263e-05, 'epoch': 0.33}

 11%|█         | 225/2022 [29:20<3:53:56,  7.81s/it]
 11%|█         | 226/2022 [29:27<3:52:13,  7.76s/it]
                                                    
{'loss': 1.2793, 'learning_rate': 3.465140478668054e-05, 'epoch': 0.34}

 11%|█         | 226/2022 [29:27<3:52:13,  7.76s/it]
 11%|█         | 227/2022 [29:35<3:53:47,  7.81s/it]
                                                    
{'loss': 1.0295, 'learning_rate': 3.3402705515088445e-05, 'epoch': 0.34}

 11%|█         | 227/2022 [29:35<3:53:47,  7.81s/it]
 11%|█▏        | 228/2022 [29:43<3:52:48,  7.79s/it]
                                                    
{'loss': 1.2033, 'learning_rate': 3.215400624349635e-05, 'epoch': 0.34}

 11%|█▏        | 228/2022 [29:43<3:52:48,  7.79s/it]
 11%|█▏        | 229/2022 [29:51<3:53:25,  7.81s/it]
                                                    
{'loss': 1.1722, 'learning_rate': 3.090530697190426e-05, 'epoch': 0.34}

 11%|█▏        | 229/2022 [29:51<3:53:25,  7.81s/it]
 11%|█▏        | 230/2022 [29:59<3:53:23,  7.81s/it]
                                                    
{'loss': 1.2709, 'learning_rate': 2.9656607700312173e-05, 'epoch': 0.34}

 11%|█▏        | 230/2022 [29:59<3:53:23,  7.81s/it]
 11%|█▏        | 231/2022 [30:07<3:54:48,  7.87s/it]
                                                    
{'loss': 1.2885, 'learning_rate': 2.8407908428720083e-05, 'epoch': 0.34}

 11%|█▏        | 231/2022 [30:07<3:54:48,  7.87s/it]
 11%|█▏        | 232/2022 [30:15<3:53:54,  7.84s/it]
                                                    
{'loss': 1.2003, 'learning_rate': 2.715920915712799e-05, 'epoch': 0.34}

 11%|█▏        | 232/2022 [30:15<3:53:54,  7.84s/it]
 12%|█▏        | 233/2022 [30:22<3:54:47,  7.87s/it]
                                                    
{'loss': 1.159, 'learning_rate': 2.5910509885535897e-05, 'epoch': 0.35}

 12%|█▏        | 233/2022 [30:22<3:54:47,  7.87s/it]
 12%|█▏        | 234/2022 [30:30<3:52:19,  7.80s/it]
                                                    
{'loss': 1.2135, 'learning_rate': 2.4661810613943804e-05, 'epoch': 0.35}

 12%|█▏        | 234/2022 [30:30<3:52:19,  7.80s/it]
 12%|█▏        | 235/2022 [30:38<3:51:42,  7.78s/it]
                                                    
{'loss': 1.0527, 'learning_rate': 2.3413111342351715e-05, 'epoch': 0.35}

 12%|█▏        | 235/2022 [30:38<3:51:42,  7.78s/it]
 12%|█▏        | 236/2022 [30:46<3:54:10,  7.87s/it]
                                                    
{'loss': 1.1269, 'learning_rate': 2.2164412070759625e-05, 'epoch': 0.35}

 12%|█▏        | 236/2022 [30:46<3:54:10,  7.87s/it]
 12%|█▏        | 237/2022 [30:54<3:55:04,  7.90s/it]
                                                    
{'loss': 1.245, 'learning_rate': 2.0915712799167532e-05, 'epoch': 0.35}

 12%|█▏        | 237/2022 [30:54<3:55:04,  7.90s/it]
 12%|█▏        | 238/2022 [31:02<3:53:54,  7.87s/it]
                                                    
{'loss': 1.1421, 'learning_rate': 1.9667013527575442e-05, 'epoch': 0.35}

 12%|█▏        | 238/2022 [31:02<3:53:54,  7.87s/it]
 12%|█▏        | 239/2022 [31:09<3:52:39,  7.83s/it]
                                                    
{'loss': 1.1739, 'learning_rate': 1.841831425598335e-05, 'epoch': 0.35}

 12%|█▏        | 239/2022 [31:09<3:52:39,  7.83s/it]
 12%|█▏        | 240/2022 [31:17<3:51:42,  7.80s/it]
                                                    
{'loss': 1.32, 'learning_rate': 1.7169614984391256e-05, 'epoch': 0.36}

 12%|█▏        | 240/2022 [31:17<3:51:42,  7.80s/it]
 12%|█▏        | 241/2022 [31:25<3:52:05,  7.82s/it]
                                                    
{'loss': 1.0685, 'learning_rate': 1.5920915712799167e-05, 'epoch': 0.36}

 12%|█▏        | 241/2022 [31:25<3:52:05,  7.82s/it]
 12%|█▏        | 242/2022 [31:33<3:52:05,  7.82s/it]
                                                    
{'loss': 1.2704, 'learning_rate': 1.4672216441207076e-05, 'epoch': 0.36}

 12%|█▏        | 242/2022 [31:33<3:52:05,  7.82s/it]
 12%|█▏        | 243/2022 [31:40<3:49:55,  7.75s/it]
                                                    
{'loss': 1.2717, 'learning_rate': 1.3423517169614983e-05, 'epoch': 0.36}

 12%|█▏        | 243/2022 [31:40<3:49:55,  7.75s/it]
 12%|█▏        | 244/2022 [31:48<3:49:57,  7.76s/it]
                                                    
{'loss': 1.125, 'learning_rate': 1.2174817898022893e-05, 'epoch': 0.36}

 12%|█▏        | 244/2022 [31:48<3:49:57,  7.76s/it]
 12%|█▏        | 245/2022 [31:56<3:51:14,  7.81s/it]
                                                    
{'loss': 1.0787, 'learning_rate': 1.09261186264308e-05, 'epoch': 0.36}

 12%|█▏        | 245/2022 [31:56<3:51:14,  7.81s/it]
 12%|█▏        | 246/2022 [32:04<3:51:07,  7.81s/it]
                                                    
{'loss': 1.2497, 'learning_rate': 9.677419354838709e-06, 'epoch': 0.36}

 12%|█▏        | 246/2022 [32:04<3:51:07,  7.81s/it]
 12%|█▏        | 247/2022 [32:12<3:50:34,  7.79s/it]
                                                    
{'loss': 1.1846, 'learning_rate': 8.428720083246617e-06, 'epoch': 0.37}

 12%|█▏        | 247/2022 [32:12<3:50:34,  7.79s/it]
 12%|█▏        | 248/2022 [32:20<3:51:11,  7.82s/it]
                                                    
{'loss': 1.2741, 'learning_rate': 7.180020811654525e-06, 'epoch': 0.37}

 12%|█▏        | 248/2022 [32:20<3:51:11,  7.82s/it]
 12%|█▏        | 249/2022 [32:27<3:51:46,  7.84s/it]
                                                    
{'loss': 1.1659, 'learning_rate': 5.931321540062434e-06, 'epoch': 0.37}

 12%|█▏        | 249/2022 [32:27<3:51:46,  7.84s/it]
 12%|█▏        | 250/2022 [32:36<3:53:56,  7.92s/it]
                                                    
{'loss': 1.06, 'learning_rate': 4.682622268470343e-06, 'epoch': 0.37}

 12%|█▏        | 250/2022 [32:36<3:53:56,  7.92s/it]
 12%|█▏        | 251/2022 [32:43<3:53:11,  7.90s/it]
                                                    
{'loss': 1.1278, 'learning_rate': 3.433922996878252e-06, 'epoch': 0.37}

 12%|█▏        | 251/2022 [32:43<3:53:11,  7.90s/it]
 12%|█▏        | 252/2022 [32:51<3:50:25,  7.81s/it]
                                                    
{'loss': 1.1003, 'learning_rate': 2.18522372528616e-06, 'epoch': 0.37}

 12%|█▏        | 252/2022 [32:51<3:50:25,  7.81s/it]
 13%|█▎        | 253/2022 [32:59<3:50:42,  7.83s/it]
                                                    
{'loss': 1.1845, 'learning_rate': 9.365244536940686e-07, 'epoch': 0.38}

 13%|█▎        | 253/2022 [32:59<3:50:42,  7.83s/it]
 13%|█▎        | 254/2022 [33:07<3:51:14,  7.85s/it]
                                                    
{'loss': 1.2163, 'learning_rate': 0.0, 'epoch': 0.38}

 13%|█▎        | 254/2022 [33:07<3:51:14,  7.85s/it]
 13%|█▎        | 255/2022 [33:15<3:51:23,  7.86s/it]
                                                    
{'loss': 1.1824, 'learning_rate': 0.0, 'epoch': 0.38}

 13%|█▎        | 255/2022 [33:15<3:51:23,  7.86s/it]
 13%|█▎        | 256/2022 [33:23<3:53:06,  7.92s/it]
                                                    
{'loss': 1.2251, 'learning_rate': 0.0, 'epoch': 0.38}

 13%|█▎        | 256/2022 [33:23<3:53:06,  7.92s/it]
 13%|█▎        | 257/2022 [33:31<3:53:46,  7.95s/it]
                                                    
{'loss': 1.1514, 'learning_rate': 0.0, 'epoch': 0.38}

 13%|█▎        | 257/2022 [33:31<3:53:46,  7.95s/it]
 13%|█▎        | 258/2022 [33:39<3:52:10,  7.90s/it]
                                                    
{'loss': 1.1696, 'learning_rate': 0.0, 'epoch': 0.38}

 13%|█▎        | 258/2022 [33:39<3:52:10,  7.90s/it]
 13%|█▎        | 259/2022 [33:47<3:53:23,  7.94s/it]
                                                    
{'loss': 1.2826, 'learning_rate': 0.0, 'epoch': 0.38}

 13%|█▎        | 259/2022 [33:47<3:53:23,  7.94s/it]
 13%|█▎        | 260/2022 [33:54<3:52:25,  7.91s/it]
                                                    
{'loss': 1.2165, 'learning_rate': 0.0, 'epoch': 0.39}

 13%|█▎        | 260/2022 [33:54<3:52:25,  7.91s/it]
 13%|█▎        | 261/2022 [34:02<3:48:33,  7.79s/it]
                                                    
{'loss': 1.134, 'learning_rate': 0.0, 'epoch': 0.39}

 13%|█▎        | 261/2022 [34:02<3:48:33,  7.79s/it]
 13%|█▎        | 262/2022 [34:10<3:51:53,  7.91s/it]
                                                    
{'loss': 1.0914, 'learning_rate': 0.0, 'epoch': 0.39}

 13%|█▎        | 262/2022 [34:10<3:51:53,  7.91s/it]
 13%|█▎        | 263/2022 [34:18<3:51:33,  7.90s/it]
                                                    
{'loss': 1.1596, 'learning_rate': 0.0, 'epoch': 0.39}

 13%|█▎        | 263/2022 [34:18<3:51:33,  7.90s/it]
 13%|█▎        | 264/2022 [34:26<3:48:34,  7.80s/it]
                                                    
{'loss': 1.2264, 'learning_rate': 0.0, 'epoch': 0.39}

 13%|█▎        | 264/2022 [34:26<3:48:34,  7.80s/it]
 13%|█▎        | 265/2022 [34:33<3:46:54,  7.75s/it]
                                                    
{'loss': 1.2526, 'learning_rate': 0.0, 'epoch': 0.39}

 13%|█▎        | 265/2022 [34:33<3:46:54,  7.75s/it]
 13%|█▎        | 266/2022 [34:41<3:46:55,  7.75s/it]
                                                    
{'loss': 1.2202, 'learning_rate': 0.0, 'epoch': 0.39}

 13%|█▎        | 266/2022 [34:41<3:46:55,  7.75s/it]
 13%|█▎        | 267/2022 [34:49<3:49:10,  7.84s/it]
                                                    
{'loss': 1.0204, 'learning_rate': 0.0, 'epoch': 0.4}

 13%|█▎        | 267/2022 [34:49<3:49:10,  7.84s/it]
 13%|█▎        | 268/2022 [34:57<3:53:31,  7.99s/it]
                                                    
{'loss': 1.0749, 'learning_rate': 0.0, 'epoch': 0.4}

 13%|█▎        | 268/2022 [34:57<3:53:31,  7.99s/it]
 13%|█▎        | 269/2022 [35:05<3:52:27,  7.96s/it]
                                                    
{'loss': 1.3021, 'learning_rate': 0.0, 'epoch': 0.4}

 13%|█▎        | 269/2022 [35:05<3:52:27,  7.96s/it]
 13%|█▎        | 270/2022 [35:13<3:49:51,  7.87s/it]
                                                    
{'loss': 1.1353, 'learning_rate': 0.0, 'epoch': 0.4}

 13%|█▎        | 270/2022 [35:13<3:49:51,  7.87s/it]
 13%|█▎        | 271/2022 [35:21<3:52:38,  7.97s/it]
                                                    
{'loss': 1.3448, 'learning_rate': 0.0, 'epoch': 0.4}

 13%|█▎        | 271/2022 [35:21<3:52:38,  7.97s/it]
 13%|█▎        | 272/2022 [35:29<3:48:47,  7.84s/it]
                                                    
{'loss': 1.1375, 'learning_rate': 0.0, 'epoch': 0.4}

 13%|█▎        | 272/2022 [35:29<3:48:47,  7.84s/it]
 14%|█▎        | 273/2022 [35:36<3:48:10,  7.83s/it]
                                                    
{'loss': 1.2415, 'learning_rate': 0.0, 'epoch': 0.4}

 14%|█▎        | 273/2022 [35:36<3:48:10,  7.83s/it]
 14%|█▎        | 274/2022 [35:44<3:48:34,  7.85s/it]
                                                    
{'loss': 1.2263, 'learning_rate': 0.0, 'epoch': 0.41}

 14%|█▎        | 274/2022 [35:44<3:48:34,  7.85s/it]
 14%|█▎        | 275/2022 [35:52<3:47:00,  7.80s/it]
                                                    
{'loss': 1.1488, 'learning_rate': 0.0, 'epoch': 0.41}

 14%|█▎        | 275/2022 [35:52<3:47:00,  7.80s/it]
 14%|█▎        | 276/2022 [36:00<3:46:07,  7.77s/it]
                                                    
{'loss': 1.2281, 'learning_rate': 0.0, 'epoch': 0.41}

 14%|█▎        | 276/2022 [36:00<3:46:07,  7.77s/it]
 14%|█▎        | 277/2022 [36:08<3:49:07,  7.88s/it]
                                                    
{'loss': 1.1874, 'learning_rate': 0.0, 'epoch': 0.41}

 14%|█▎        | 277/2022 [36:08<3:49:07,  7.88s/it]
 14%|█▎        | 278/2022 [36:15<3:45:59,  7.78s/it]
                                                    
{'loss': 1.2475, 'learning_rate': 0.0, 'epoch': 0.41}

 14%|█▎        | 278/2022 [36:15<3:45:59,  7.78s/it]
 14%|█▍        | 279/2022 [36:23<3:42:05,  7.65s/it]
                                                    
{'loss': 1.2352, 'learning_rate': 0.0, 'epoch': 0.41}

 14%|█▍        | 279/2022 [36:23<3:42:05,  7.65s/it]
 14%|█▍        | 280/2022 [36:31<3:44:35,  7.74s/it]
                                                    
{'loss': 1.3655, 'learning_rate': 0.0, 'epoch': 0.42}

 14%|█▍        | 280/2022 [36:31<3:44:35,  7.74s/it]
 14%|█▍        | 281/2022 [36:39<3:46:13,  7.80s/it]
                                                    
{'loss': 1.1683, 'learning_rate': 0.0, 'epoch': 0.42}

 14%|█▍        | 281/2022 [36:39<3:46:13,  7.80s/it]
 14%|█▍        | 282/2022 [36:46<3:45:17,  7.77s/it]
                                                    
{'loss': 1.3209, 'learning_rate': 0.0, 'epoch': 0.42}

 14%|█▍        | 282/2022 [36:46<3:45:17,  7.77s/it]
 14%|█▍        | 283/2022 [36:54<3:43:16,  7.70s/it]
                                                    
{'loss': 1.2065, 'learning_rate': 0.0, 'epoch': 0.42}

 14%|█▍        | 283/2022 [36:54<3:43:16,  7.70s/it]
 14%|█▍        | 284/2022 [37:02<3:47:23,  7.85s/it]
                                                    
{'loss': 1.1392, 'learning_rate': 0.0, 'epoch': 0.42}

 14%|█▍        | 284/2022 [37:02<3:47:23,  7.85s/it]
 14%|█▍        | 285/2022 [37:10<3:49:51,  7.94s/it]
                                                    
{'loss': 1.1719, 'learning_rate': 0.0, 'epoch': 0.42}

 14%|█▍        | 285/2022 [37:10<3:49:51,  7.94s/it]
 14%|█▍        | 286/2022 [37:18<3:47:27,  7.86s/it]
                                                    
{'loss': 1.2455, 'learning_rate': 0.0, 'epoch': 0.42}

 14%|█▍        | 286/2022 [37:18<3:47:27,  7.86s/it]
 14%|█▍        | 287/2022 [37:25<3:42:49,  7.71s/it]
                                                    
{'loss': 1.3018, 'learning_rate': 0.0, 'epoch': 0.43}

 14%|█▍        | 287/2022 [37:25<3:42:49,  7.71s/it]
 14%|█▍        | 288/2022 [37:33<3:43:23,  7.73s/it]
                                                    
{'loss': 1.1331, 'learning_rate': 0.0, 'epoch': 0.43}

 14%|█▍        | 288/2022 [37:33<3:43:23,  7.73s/it]
 14%|█▍        | 289/2022 [37:40<3:41:21,  7.66s/it]
                                                    
{'loss': 1.1339, 'learning_rate': 0.0, 'epoch': 0.43}

 14%|█▍        | 289/2022 [37:41<3:41:21,  7.66s/it]
 14%|█▍        | 290/2022 [37:48<3:43:15,  7.73s/it]
                                                    
{'loss': 1.0934, 'learning_rate': 0.0, 'epoch': 0.43}

 14%|█▍        | 290/2022 [37:48<3:43:15,  7.73s/it]
 14%|█▍        | 291/2022 [37:56<3:44:29,  7.78s/it]
                                                    
{'loss': 1.2624, 'learning_rate': 0.0, 'epoch': 0.43}

 14%|█▍        | 291/2022 [37:56<3:44:29,  7.78s/it]
 14%|█▍        | 292/2022 [38:04<3:41:54,  7.70s/it]
                                                    
{'loss': 1.1963, 'learning_rate': 0.0, 'epoch': 0.43}

 14%|█▍        | 292/2022 [38:04<3:41:54,  7.70s/it]
 14%|█▍        | 293/2022 [38:12<3:42:23,  7.72s/it]
                                                    
{'loss': 1.0561, 'learning_rate': 0.0, 'epoch': 0.43}

 14%|█▍        | 293/2022 [38:12<3:42:23,  7.72s/it]
 15%|█▍        | 294/2022 [38:20<3:45:09,  7.82s/it]
                                                    
{'loss': 1.0482, 'learning_rate': 0.0, 'epoch': 0.44}

 15%|█▍        | 294/2022 [38:20<3:45:09,  7.82s/it]
 15%|█▍        | 295/2022 [38:27<3:44:30,  7.80s/it]
                                                    
{'loss': 1.0544, 'learning_rate': 0.0, 'epoch': 0.44}

 15%|█▍        | 295/2022 [38:27<3:44:30,  7.80s/it]
 15%|█▍        | 296/2022 [38:35<3:43:51,  7.78s/it]
                                                    
{'loss': 1.1083, 'learning_rate': 0.0, 'epoch': 0.44}

 15%|█▍        | 296/2022 [38:35<3:43:51,  7.78s/it]
 15%|█▍        | 297/2022 [38:43<3:45:19,  7.84s/it]
                                                    
{'loss': 1.1735, 'learning_rate': 0.0, 'epoch': 0.44}

 15%|█▍        | 297/2022 [38:43<3:45:19,  7.84s/it]
 15%|█▍        | 298/2022 [38:51<3:44:53,  7.83s/it]
                                                    
{'loss': 1.19, 'learning_rate': 0.0, 'epoch': 0.44}

 15%|█▍        | 298/2022 [38:51<3:44:53,  7.83s/it]
 15%|█▍        | 299/2022 [38:59<3:43:30,  7.78s/it]
                                                    
{'loss': 1.1886, 'learning_rate': 0.0, 'epoch': 0.44}

 15%|█▍        | 299/2022 [38:59<3:43:30,  7.78s/it]
 15%|█▍        | 300/2022 [39:06<3:42:52,  7.77s/it]
                                                    
{'loss': 1.2309, 'learning_rate': 0.0, 'epoch': 0.44}

 15%|█▍        | 300/2022 [39:06<3:42:52,  7.77s/it]
 15%|█▍        | 301/2022 [39:14<3:41:45,  7.73s/it]
                                                    
{'loss': 1.1865, 'learning_rate': 0.0, 'epoch': 0.45}

 15%|█▍        | 301/2022 [39:14<3:41:45,  7.73s/it]
 15%|█▍        | 302/2022 [39:21<3:38:06,  7.61s/it]
                                                    
{'loss': 1.1774, 'learning_rate': 0.0, 'epoch': 0.45}

 15%|█▍        | 302/2022 [39:21<3:38:06,  7.61s/it]
 15%|█▍        | 303/2022 [39:29<3:38:17,  7.62s/it]
                                                    
{'loss': 1.2001, 'learning_rate': 0.0, 'epoch': 0.45}

 15%|█▍        | 303/2022 [39:29<3:38:17,  7.62s/it]
 15%|█▌        | 304/2022 [39:37<3:46:37,  7.91s/it]
                                                    
{'loss': 1.1533, 'learning_rate': 0.0, 'epoch': 0.45}

 15%|█▌        | 304/2022 [39:37<3:46:37,  7.91s/it]
 15%|█▌        | 305/2022 [39:45<3:47:06,  7.94s/it]
                                                    
{'loss': 1.1432, 'learning_rate': 0.0, 'epoch': 0.45}

 15%|█▌        | 305/2022 [39:45<3:47:06,  7.94s/it]
 15%|█▌        | 306/2022 [39:53<3:46:34,  7.92s/it]
                                                    
{'loss': 1.115, 'learning_rate': 0.0, 'epoch': 0.45}

 15%|█▌        | 306/2022 [39:53<3:46:34,  7.92s/it]
 15%|█▌        | 307/2022 [40:01<3:45:43,  7.90s/it]
                                                    
{'loss': 1.114, 'learning_rate': 0.0, 'epoch': 0.46}

 15%|█▌        | 307/2022 [40:01<3:45:43,  7.90s/it]
 15%|█▌        | 308/2022 [40:09<3:43:33,  7.83s/it]
                                                    
{'loss': 1.1641, 'learning_rate': 0.0, 'epoch': 0.46}

 15%|█▌        | 308/2022 [40:09<3:43:33,  7.83s/it]
 15%|█▌        | 309/2022 [40:17<3:43:33,  7.83s/it]
                                                    
{'loss': 1.1801, 'learning_rate': 0.0, 'epoch': 0.46}

 15%|█▌        | 309/2022 [40:17<3:43:33,  7.83s/it]
 15%|█▌        | 310/2022 [40:25<3:44:24,  7.86s/it]
                                                    
{'loss': 1.2503, 'learning_rate': 0.0, 'epoch': 0.46}

 15%|█▌        | 310/2022 [40:25<3:44:24,  7.86s/it]
 15%|█▌        | 311/2022 [40:32<3:43:34,  7.84s/it]
                                                    
{'loss': 1.0487, 'learning_rate': 0.0, 'epoch': 0.46}

 15%|█▌        | 311/2022 [40:32<3:43:34,  7.84s/it]
 15%|█▌        | 312/2022 [40:40<3:44:34,  7.88s/it]
                                                    
{'loss': 1.1735, 'learning_rate': 0.0, 'epoch': 0.46}

 15%|█▌        | 312/2022 [40:40<3:44:34,  7.88s/it]
 15%|█▌        | 313/2022 [40:48<3:44:18,  7.87s/it]
                                                    
{'loss': 1.284, 'learning_rate': 0.0, 'epoch': 0.46}

 15%|█▌        | 313/2022 [40:48<3:44:18,  7.87s/it]
 16%|█▌        | 314/2022 [40:56<3:45:40,  7.93s/it]
                                                    
{'loss': 1.2786, 'learning_rate': 0.0, 'epoch': 0.47}

 16%|█▌        | 314/2022 [40:56<3:45:40,  7.93s/it]
 16%|█▌        | 315/2022 [41:04<3:42:16,  7.81s/it]
                                                    
{'loss': 1.1989, 'learning_rate': 0.0, 'epoch': 0.47}

 16%|█▌        | 315/2022 [41:04<3:42:16,  7.81s/it]
 16%|█▌        | 316/2022 [41:12<3:40:56,  7.77s/it]
                                                    
{'loss': 1.208, 'learning_rate': 0.0, 'epoch': 0.47}

 16%|█▌        | 316/2022 [41:12<3:40:56,  7.77s/it]
 16%|█▌        | 317/2022 [41:19<3:38:22,  7.69s/it]
                                                    
{'loss': 1.2301, 'learning_rate': 0.0, 'epoch': 0.47}

 16%|█▌        | 317/2022 [41:19<3:38:22,  7.69s/it]
 16%|█▌        | 318/2022 [41:27<3:38:29,  7.69s/it]
                                                    
{'loss': 1.1883, 'learning_rate': 0.0, 'epoch': 0.47}

 16%|█▌        | 318/2022 [41:27<3:38:29,  7.69s/it]
 16%|█▌        | 319/2022 [41:34<3:38:01,  7.68s/it]
                                                    
{'loss': 1.0529, 'learning_rate': 0.0, 'epoch': 0.47}

 16%|█▌        | 319/2022 [41:34<3:38:01,  7.68s/it]
 16%|█▌        | 320/2022 [41:42<3:37:21,  7.66s/it]
                                                    
{'loss': 1.1429, 'learning_rate': 0.0, 'epoch': 0.47}

 16%|█▌        | 320/2022 [41:42<3:37:21,  7.66s/it]
 16%|█▌        | 321/2022 [41:50<3:38:13,  7.70s/it]
                                                    
{'loss': 1.0837, 'learning_rate': 0.0, 'epoch': 0.48}

 16%|█▌        | 321/2022 [41:50<3:38:13,  7.70s/it]
 16%|█▌        | 322/2022 [41:58<3:40:43,  7.79s/it]
                                                    
{'loss': 1.1826, 'learning_rate': 0.0, 'epoch': 0.48}

 16%|█▌        | 322/2022 [41:58<3:40:43,  7.79s/it]
 16%|█▌        | 323/2022 [42:05<3:39:23,  7.75s/it]
                                                    
{'loss': 1.2028, 'learning_rate': 0.0, 'epoch': 0.48}

 16%|█▌        | 323/2022 [42:05<3:39:23,  7.75s/it]
 16%|█▌        | 324/2022 [42:14<3:43:09,  7.89s/it]
                                                    
{'loss': 1.1703, 'learning_rate': 0.0, 'epoch': 0.48}

 16%|█▌        | 324/2022 [42:14<3:43:09,  7.89s/it]
 16%|█▌        | 325/2022 [42:22<3:43:43,  7.91s/it]
                                                    
{'loss': 1.1954, 'learning_rate': 0.0, 'epoch': 0.48}

 16%|█▌        | 325/2022 [42:22<3:43:43,  7.91s/it]
 16%|█▌        | 326/2022 [42:29<3:39:05,  7.75s/it]
                                                    
{'loss': 1.2483, 'learning_rate': 0.0, 'epoch': 0.48}

 16%|█▌        | 326/2022 [42:29<3:39:05,  7.75s/it]
 16%|█▌        | 327/2022 [42:37<3:40:36,  7.81s/it]
                                                    
{'loss': 1.165, 'learning_rate': 0.0, 'epoch': 0.48}

 16%|█▌        | 327/2022 [42:37<3:40:36,  7.81s/it]
 16%|█▌        | 328/2022 [42:45<3:39:37,  7.78s/it]
                                                    
{'loss': 1.1598, 'learning_rate': 0.0, 'epoch': 0.49}

 16%|█▌        | 328/2022 [42:45<3:39:37,  7.78s/it]
 16%|█▋        | 329/2022 [42:52<3:37:39,  7.71s/it]
                                                    
{'loss': 1.0962, 'learning_rate': 0.0, 'epoch': 0.49}

 16%|█▋        | 329/2022 [42:52<3:37:39,  7.71s/it]
 16%|█▋        | 330/2022 [43:01<3:42:34,  7.89s/it]
                                                    
{'loss': 1.2954, 'learning_rate': 0.0, 'epoch': 0.49}

 16%|█▋        | 330/2022 [43:01<3:42:34,  7.89s/it]
 16%|█▋        | 331/2022 [43:08<3:42:15,  7.89s/it]
                                                    
{'loss': 1.2335, 'learning_rate': 0.0, 'epoch': 0.49}

 16%|█▋        | 331/2022 [43:08<3:42:15,  7.89s/it]
 16%|█▋        | 332/2022 [43:16<3:39:57,  7.81s/it]
                                                    
{'loss': 1.1669, 'learning_rate': 0.0, 'epoch': 0.49}

 16%|█▋        | 332/2022 [43:16<3:39:57,  7.81s/it]
 16%|█▋        | 333/2022 [43:24<3:37:33,  7.73s/it]
                                                    
{'loss': 1.0576, 'learning_rate': 0.0, 'epoch': 0.49}

 16%|█▋        | 333/2022 [43:24<3:37:33,  7.73s/it]
 17%|█▋        | 334/2022 [43:31<3:36:39,  7.70s/it]
                                                    
{'loss': 1.1187, 'learning_rate': 0.0, 'epoch': 0.5}

 17%|█▋        | 334/2022 [43:31<3:36:39,  7.70s/it]
 17%|█▋        | 335/2022 [43:39<3:39:28,  7.81s/it]
                                                    
{'loss': 1.1974, 'learning_rate': 0.0, 'epoch': 0.5}

 17%|█▋        | 335/2022 [43:39<3:39:28,  7.81s/it]
 17%|█▋        | 336/2022 [43:47<3:40:36,  7.85s/it]
                                                    
{'loss': 1.2241, 'learning_rate': 0.0, 'epoch': 0.5}

 17%|█▋        | 336/2022 [43:47<3:40:36,  7.85s/it]
 17%|█▋        | 337/2022 [43:55<3:38:18,  7.77s/it]
                                                    
{'loss': 1.2997, 'learning_rate': 0.0, 'epoch': 0.5}

 17%|█▋        | 337/2022 [43:55<3:38:18,  7.77s/it]
 17%|█▋        | 338/2022 [44:03<3:39:21,  7.82s/it]
                                                    
{'loss': 1.1263, 'learning_rate': 0.0, 'epoch': 0.5}

 17%|█▋        | 338/2022 [44:03<3:39:21,  7.82s/it]
 17%|█▋        | 339/2022 [44:11<3:39:44,  7.83s/it]
                                                    
{'loss': 1.3192, 'learning_rate': 0.0, 'epoch': 0.5}

 17%|█▋        | 339/2022 [44:11<3:39:44,  7.83s/it]
 17%|█▋        | 340/2022 [44:18<3:37:46,  7.77s/it]
                                                    
{'loss': 1.246, 'learning_rate': 0.0, 'epoch': 0.5}

 17%|█▋        | 340/2022 [44:18<3:37:46,  7.77s/it]
 17%|█▋        | 341/2022 [44:26<3:35:13,  7.68s/it]
                                                    
{'loss': 1.0697, 'learning_rate': 0.0, 'epoch': 0.51}

 17%|█▋        | 341/2022 [44:26<3:35:13,  7.68s/it]
 17%|█▋        | 342/2022 [44:33<3:32:42,  7.60s/it]
                                                    
{'loss': 1.1497, 'learning_rate': 0.0, 'epoch': 0.51}

 17%|█▋        | 342/2022 [44:33<3:32:42,  7.60s/it]
 17%|█▋        | 343/2022 [44:41<3:37:59,  7.79s/it]
                                                    
{'loss': 1.1972, 'learning_rate': 0.0, 'epoch': 0.51}

 17%|█▋        | 343/2022 [44:41<3:37:59,  7.79s/it]
 17%|█▋        | 344/2022 [44:49<3:35:28,  7.70s/it]
                                                    
{'loss': 1.2456, 'learning_rate': 0.0, 'epoch': 0.51}

 17%|█▋        | 344/2022 [44:49<3:35:28,  7.70s/it]
 17%|█▋        | 345/2022 [44:56<3:32:05,  7.59s/it]
                                                    
{'loss': 1.153, 'learning_rate': 0.0, 'epoch': 0.51}

 17%|█▋        | 345/2022 [44:56<3:32:05,  7.59s/it]
 17%|█▋        | 346/2022 [45:04<3:34:05,  7.66s/it]
                                                    
{'loss': 1.1597, 'learning_rate': 0.0, 'epoch': 0.51}

 17%|█▋        | 346/2022 [45:04<3:34:05,  7.66s/it]
 17%|█▋        | 347/2022 [45:11<3:31:26,  7.57s/it]
                                                    
{'loss': 1.0568, 'learning_rate': 0.0, 'epoch': 0.51}

 17%|█▋        | 347/2022 [45:11<3:31:26,  7.57s/it]
 17%|█▋        | 348/2022 [45:19<3:33:27,  7.65s/it]
                                                    
{'loss': 1.2606, 'learning_rate': 0.0, 'epoch': 0.52}

 17%|█▋        | 348/2022 [45:19<3:33:27,  7.65s/it]
 17%|█▋        | 349/2022 [45:27<3:30:45,  7.56s/it]
                                                    
{'loss': 1.2179, 'learning_rate': 0.0, 'epoch': 0.52}

 17%|█▋        | 349/2022 [45:27<3:30:45,  7.56s/it]
 17%|█▋        | 350/2022 [45:34<3:31:17,  7.58s/it]
                                                    
{'loss': 1.2591, 'learning_rate': 0.0, 'epoch': 0.52}

 17%|█▋        | 350/2022 [45:34<3:31:17,  7.58s/it]
 17%|█▋        | 351/2022 [45:42<3:31:15,  7.59s/it]
                                                    
{'loss': 1.0424, 'learning_rate': 0.0, 'epoch': 0.52}

 17%|█▋        | 351/2022 [45:42<3:31:15,  7.59s/it]
 17%|█▋        | 352/2022 [45:50<3:36:18,  7.77s/it]
                                                    
{'loss': 1.2266, 'learning_rate': 0.0, 'epoch': 0.52}

 17%|█▋        | 352/2022 [45:50<3:36:18,  7.77s/it]
 17%|█▋        | 353/2022 [45:58<3:36:05,  7.77s/it]
                                                    
{'loss': 1.1343, 'learning_rate': 0.0, 'epoch': 0.52}

 17%|█▋        | 353/2022 [45:58<3:36:05,  7.77s/it]
 18%|█▊        | 354/2022 [46:05<3:35:55,  7.77s/it]
                                                    
{'loss': 1.073, 'learning_rate': 0.0, 'epoch': 0.52}

 18%|█▊        | 354/2022 [46:06<3:35:55,  7.77s/it]
 18%|█▊        | 355/2022 [46:13<3:36:27,  7.79s/it]
                                                    
{'loss': 1.1229, 'learning_rate': 0.0, 'epoch': 0.53}

 18%|█▊        | 355/2022 [46:13<3:36:27,  7.79s/it]
 18%|█▊        | 356/2022 [46:21<3:38:46,  7.88s/it]
                                                    
{'loss': 1.0915, 'learning_rate': 0.0, 'epoch': 0.53}

 18%|█▊        | 356/2022 [46:21<3:38:46,  7.88s/it]
 18%|█▊        | 357/2022 [46:29<3:39:34,  7.91s/it]
                                                    
{'loss': 1.3387, 'learning_rate': 0.0, 'epoch': 0.53}

 18%|█▊        | 357/2022 [46:29<3:39:34,  7.91s/it]
 18%|█▊        | 358/2022 [46:37<3:39:18,  7.91s/it]
                                                    
{'loss': 1.1737, 'learning_rate': 0.0, 'epoch': 0.53}

 18%|█▊        | 358/2022 [46:37<3:39:18,  7.91s/it]
 18%|█▊        | 359/2022 [46:45<3:35:22,  7.77s/it]
                                                    
{'loss': 1.158, 'learning_rate': 0.0, 'epoch': 0.53}

 18%|█▊        | 359/2022 [46:45<3:35:22,  7.77s/it]
 18%|█▊        | 360/2022 [46:53<3:37:45,  7.86s/it]
                                                    
{'loss': 1.15, 'learning_rate': 0.0, 'epoch': 0.53}

 18%|█▊        | 360/2022 [46:53<3:37:45,  7.86s/it]
 18%|█▊        | 361/2022 [47:01<3:36:51,  7.83s/it]
                                                    
{'loss': 1.2327, 'learning_rate': 0.0, 'epoch': 0.54}

 18%|█▊        | 361/2022 [47:01<3:36:51,  7.83s/it]
 18%|█▊        | 362/2022 [47:08<3:36:29,  7.82s/it]
                                                    
{'loss': 1.2177, 'learning_rate': 0.0, 'epoch': 0.54}

 18%|█▊        | 362/2022 [47:08<3:36:29,  7.82s/it]
 18%|█▊        | 363/2022 [47:16<3:36:26,  7.83s/it]
                                                    
{'loss': 1.1586, 'learning_rate': 0.0, 'epoch': 0.54}

 18%|█▊        | 363/2022 [47:16<3:36:26,  7.83s/it]
 18%|█▊        | 364/2022 [47:24<3:36:12,  7.82s/it]
                                                    
{'loss': 1.2057, 'learning_rate': 0.0, 'epoch': 0.54}

 18%|█▊        | 364/2022 [47:24<3:36:12,  7.82s/it]
 18%|█▊        | 365/2022 [47:32<3:34:09,  7.75s/it]
                                                    
{'loss': 1.2065, 'learning_rate': 0.0, 'epoch': 0.54}

 18%|█▊        | 365/2022 [47:32<3:34:09,  7.75s/it]
 18%|█▊        | 366/2022 [47:40<3:35:40,  7.81s/it]
                                                    
{'loss': 0.9791, 'learning_rate': 0.0, 'epoch': 0.54}

 18%|█▊        | 366/2022 [47:40<3:35:40,  7.81s/it]
 18%|█▊        | 367/2022 [47:47<3:34:21,  7.77s/it]
                                                    
{'loss': 1.1265, 'learning_rate': 0.0, 'epoch': 0.54}

 18%|█▊        | 367/2022 [47:47<3:34:21,  7.77s/it]
 18%|█▊        | 368/2022 [47:55<3:33:42,  7.75s/it]
                                                    
{'loss': 1.1035, 'learning_rate': 0.0, 'epoch': 0.55}

 18%|█▊        | 368/2022 [47:55<3:33:42,  7.75s/it]
 18%|█▊        | 369/2022 [48:03<3:34:50,  7.80s/it]
                                                    
{'loss': 1.0321, 'learning_rate': 0.0, 'epoch': 0.55}

 18%|█▊        | 369/2022 [48:03<3:34:50,  7.80s/it]
 18%|█▊        | 370/2022 [48:11<3:37:20,  7.89s/it]
                                                    
{'loss': 1.3708, 'learning_rate': 0.0, 'epoch': 0.55}

 18%|█▊        | 370/2022 [48:11<3:37:20,  7.89s/it]
 18%|█▊        | 371/2022 [48:19<3:35:56,  7.85s/it]
                                                    
{'loss': 1.1964, 'learning_rate': 0.0, 'epoch': 0.55}

 18%|█▊        | 371/2022 [48:19<3:35:56,  7.85s/it]
 18%|█▊        | 372/2022 [48:26<3:32:28,  7.73s/it]
                                                    
{'loss': 1.0309, 'learning_rate': 0.0, 'epoch': 0.55}

 18%|█▊        | 372/2022 [48:26<3:32:28,  7.73s/it]
 18%|█▊        | 373/2022 [48:34<3:37:02,  7.90s/it]
                                                    
{'loss': 1.0834, 'learning_rate': 0.0, 'epoch': 0.55}

 18%|█▊        | 373/2022 [48:35<3:37:02,  7.90s/it]
 18%|█▊        | 374/2022 [48:42<3:37:00,  7.90s/it]
                                                    
{'loss': 1.2421, 'learning_rate': 0.0, 'epoch': 0.55}

 18%|█▊        | 374/2022 [48:42<3:37:00,  7.90s/it]
 19%|█▊        | 375/2022 [48:50<3:36:54,  7.90s/it]
                                                    
{'loss': 1.125, 'learning_rate': 0.0, 'epoch': 0.56}

 19%|█▊        | 375/2022 [48:50<3:36:54,  7.90s/it]
 19%|█▊        | 376/2022 [48:59<3:40:00,  8.02s/it]
                                                    
{'loss': 1.2114, 'learning_rate': 0.0, 'epoch': 0.56}

 19%|█▊        | 376/2022 [48:59<3:40:00,  8.02s/it]
 19%|█▊        | 377/2022 [49:06<3:35:33,  7.86s/it]
                                                    
{'loss': 1.1526, 'learning_rate': 0.0, 'epoch': 0.56}

 19%|█▊        | 377/2022 [49:06<3:35:33,  7.86s/it]
 19%|█▊        | 378/2022 [49:14<3:37:29,  7.94s/it]
                                                    
{'loss': 1.0792, 'learning_rate': 0.0, 'epoch': 0.56}

 19%|█▊        | 378/2022 [49:14<3:37:29,  7.94s/it]
 19%|█▊        | 379/2022 [49:22<3:34:16,  7.83s/it]
                                                    
{'loss': 1.2593, 'learning_rate': 0.0, 'epoch': 0.56}

 19%|█▊        | 379/2022 [49:22<3:34:16,  7.83s/it]
 19%|█▉        | 380/2022 [49:30<3:35:20,  7.87s/it]
                                                    
{'loss': 1.237, 'learning_rate': 0.0, 'epoch': 0.56}

 19%|█▉        | 380/2022 [49:30<3:35:20,  7.87s/it]
 19%|█▉        | 381/2022 [49:38<3:38:31,  7.99s/it]
                                                    
{'loss': 1.2734, 'learning_rate': 0.0, 'epoch': 0.56}

 19%|█▉        | 381/2022 [49:38<3:38:31,  7.99s/it]
 19%|█▉        | 382/2022 [49:46<3:36:52,  7.93s/it]
                                                    
{'loss': 1.114, 'learning_rate': 0.0, 'epoch': 0.57}

 19%|█▉        | 382/2022 [49:46<3:36:52,  7.93s/it]
 19%|█▉        | 383/2022 [49:53<3:32:26,  7.78s/it]
                                                    
{'loss': 1.2489, 'learning_rate': 0.0, 'epoch': 0.57}

 19%|█▉        | 383/2022 [49:53<3:32:26,  7.78s/it]
 19%|█▉        | 384/2022 [50:01<3:35:15,  7.89s/it]
                                                    
{'loss': 1.0853, 'learning_rate': 0.0, 'epoch': 0.57}

 19%|█▉        | 384/2022 [50:01<3:35:15,  7.89s/it]
 19%|█▉        | 385/2022 [50:09<3:36:06,  7.92s/it]
                                                    
{'loss': 1.3592, 'learning_rate': 0.0, 'epoch': 0.57}

 19%|█▉        | 385/2022 [50:09<3:36:06,  7.92s/it]
 19%|█▉        | 386/2022 [50:17<3:34:09,  7.85s/it]
                                                    
{'loss': 1.1715, 'learning_rate': 0.0, 'epoch': 0.57}

 19%|█▉        | 386/2022 [50:17<3:34:09,  7.85s/it]
 19%|█▉        | 387/2022 [50:25<3:32:34,  7.80s/it]
                                                    
{'loss': 1.1626, 'learning_rate': 0.0, 'epoch': 0.57}

 19%|█▉        | 387/2022 [50:25<3:32:34,  7.80s/it]
 19%|█▉        | 388/2022 [50:33<3:34:06,  7.86s/it]
                                                    
{'loss': 1.0987, 'learning_rate': 0.0, 'epoch': 0.58}

 19%|█▉        | 388/2022 [50:33<3:34:06,  7.86s/it]
 19%|█▉        | 389/2022 [50:40<3:31:37,  7.78s/it]
                                                    
{'loss': 1.1378, 'learning_rate': 0.0, 'epoch': 0.58}

 19%|█▉        | 389/2022 [50:40<3:31:37,  7.78s/it]
 19%|█▉        | 390/2022 [50:48<3:31:58,  7.79s/it]
                                                    
{'loss': 1.3379, 'learning_rate': 0.0, 'epoch': 0.58}

 19%|█▉        | 390/2022 [50:48<3:31:58,  7.79s/it]
 19%|█▉        | 391/2022 [50:56<3:31:50,  7.79s/it]
                                                    
{'loss': 1.0052, 'learning_rate': 0.0, 'epoch': 0.58}

 19%|█▉        | 391/2022 [50:56<3:31:50,  7.79s/it]
 19%|█▉        | 392/2022 [51:04<3:32:27,  7.82s/it]
                                                    
{'loss': 1.1216, 'learning_rate': 0.0, 'epoch': 0.58}

 19%|█▉        | 392/2022 [51:04<3:32:27,  7.82s/it]
 19%|█▉        | 393/2022 [51:12<3:34:12,  7.89s/it]
                                                    
{'loss': 1.1418, 'learning_rate': 0.0, 'epoch': 0.58}

 19%|█▉        | 393/2022 [51:12<3:34:12,  7.89s/it]
 19%|█▉        | 394/2022 [51:20<3:32:03,  7.82s/it]
                                                    
{'loss': 1.1744, 'learning_rate': 0.0, 'epoch': 0.58}

 19%|█▉        | 394/2022 [51:20<3:32:03,  7.82s/it]
 20%|█▉        | 395/2022 [51:28<3:35:02,  7.93s/it]
                                                    
{'loss': 1.343, 'learning_rate': 0.0, 'epoch': 0.59}

 20%|█▉        | 395/2022 [51:28<3:35:02,  7.93s/it]
 20%|█▉        | 396/2022 [51:35<3:32:40,  7.85s/it]
                                                    
{'loss': 1.083, 'learning_rate': 0.0, 'epoch': 0.59}

 20%|█▉        | 396/2022 [51:35<3:32:40,  7.85s/it]
 20%|█▉        | 397/2022 [51:44<3:35:12,  7.95s/it]
                                                    
{'loss': 1.1135, 'learning_rate': 0.0, 'epoch': 0.59}

 20%|█▉        | 397/2022 [51:44<3:35:12,  7.95s/it]
 20%|█▉        | 398/2022 [51:51<3:34:12,  7.91s/it]
                                                    
{'loss': 1.1188, 'learning_rate': 0.0, 'epoch': 0.59}

 20%|█▉        | 398/2022 [51:51<3:34:12,  7.91s/it]
 20%|█▉        | 399/2022 [51:59<3:32:32,  7.86s/it]
                                                    
{'loss': 1.2325, 'learning_rate': 0.0, 'epoch': 0.59}

 20%|█▉        | 399/2022 [51:59<3:32:32,  7.86s/it]
 20%|█▉        | 400/2022 [52:07<3:30:28,  7.79s/it]
                                                    
{'loss': 1.1725, 'learning_rate': 0.0, 'epoch': 0.59}

 20%|█▉        | 400/2022 [52:07<3:30:28,  7.79s/it]
 20%|█▉        | 401/2022 [52:14<3:29:24,  7.75s/it]
                                                    
{'loss': 1.3032, 'learning_rate': 0.0, 'epoch': 0.59}

 20%|█▉        | 401/2022 [52:14<3:29:24,  7.75s/it]
 20%|█▉        | 402/2022 [52:22<3:28:41,  7.73s/it]
                                                    
{'loss': 1.2506, 'learning_rate': 0.0, 'epoch': 0.6}

 20%|█▉        | 402/2022 [52:22<3:28:41,  7.73s/it]
 20%|█▉        | 403/2022 [52:30<3:30:42,  7.81s/it]
                                                    
{'loss': 1.0552, 'learning_rate': 0.0, 'epoch': 0.6}

 20%|█▉        | 403/2022 [52:30<3:30:42,  7.81s/it]
 20%|█▉        | 404/2022 [52:38<3:31:56,  7.86s/it]
                                                    
{'loss': 1.2856, 'learning_rate': 0.0, 'epoch': 0.6}

 20%|█▉        | 404/2022 [52:38<3:31:56,  7.86s/it]
 20%|██        | 405/2022 [52:47<3:37:28,  8.07s/it]
                                                    
{'loss': 1.3149, 'learning_rate': 0.0, 'epoch': 0.6}

 20%|██        | 405/2022 [52:47<3:37:28,  8.07s/it]
 20%|██        | 406/2022 [52:54<3:32:35,  7.89s/it]
                                                    
{'loss': 1.0942, 'learning_rate': 0.0, 'epoch': 0.6}

 20%|██        | 406/2022 [52:54<3:32:35,  7.89s/it]
 20%|██        | 407/2022 [53:02<3:29:59,  7.80s/it]
                                                    
{'loss': 1.2206, 'learning_rate': 0.0, 'epoch': 0.6}

 20%|██        | 407/2022 [53:02<3:29:59,  7.80s/it]
 20%|██        | 408/2022 [53:10<3:30:27,  7.82s/it]
                                                    
{'loss': 1.1552, 'learning_rate': 0.0, 'epoch': 0.6}

 20%|██        | 408/2022 [53:10<3:30:27,  7.82s/it]
 20%|██        | 409/2022 [53:18<3:32:37,  7.91s/it]
                                                    
{'loss': 1.1518, 'learning_rate': 0.0, 'epoch': 0.61}

 20%|██        | 409/2022 [53:18<3:32:37,  7.91s/it]
 20%|██        | 410/2022 [53:25<3:30:57,  7.85s/it]
                                                    
{'loss': 1.1057, 'learning_rate': 0.0, 'epoch': 0.61}

 20%|██        | 410/2022 [53:25<3:30:57,  7.85s/it]
 20%|██        | 411/2022 [53:34<3:33:13,  7.94s/it]
                                                    
{'loss': 1.0732, 'learning_rate': 0.0, 'epoch': 0.61}

 20%|██        | 411/2022 [53:34<3:33:13,  7.94s/it]
 20%|██        | 412/2022 [53:41<3:30:10,  7.83s/it]
                                                    
{'loss': 1.1272, 'learning_rate': 0.0, 'epoch': 0.61}

 20%|██        | 412/2022 [53:41<3:30:10,  7.83s/it]
 20%|██        | 413/2022 [53:49<3:31:08,  7.87s/it]
                                                    
{'loss': 1.1894, 'learning_rate': 0.0, 'epoch': 0.61}

 20%|██        | 413/2022 [53:49<3:31:08,  7.87s/it]
 20%|██        | 414/2022 [53:57<3:28:50,  7.79s/it]
                                                    
{'loss': 1.1768, 'learning_rate': 0.0, 'epoch': 0.61}

 20%|██        | 414/2022 [53:57<3:28:50,  7.79s/it]
 21%|██        | 415/2022 [54:04<3:27:04,  7.73s/it]
                                                    
{'loss': 1.1445, 'learning_rate': 0.0, 'epoch': 0.62}

 21%|██        | 415/2022 [54:04<3:27:04,  7.73s/it]
 21%|██        | 416/2022 [54:12<3:27:19,  7.75s/it]
                                                    
{'loss': 1.1525, 'learning_rate': 0.0, 'epoch': 0.62}

 21%|██        | 416/2022 [54:12<3:27:19,  7.75s/it]
 21%|██        | 417/2022 [54:20<3:27:03,  7.74s/it]
                                                    
{'loss': 1.2836, 'learning_rate': 0.0, 'epoch': 0.62}

 21%|██        | 417/2022 [54:20<3:27:03,  7.74s/it]
 21%|██        | 418/2022 [54:28<3:30:05,  7.86s/it]
                                                    
{'loss': 1.1884, 'learning_rate': 0.0, 'epoch': 0.62}

 21%|██        | 418/2022 [54:28<3:30:05,  7.86s/it]
 21%|██        | 419/2022 [54:35<3:27:42,  7.77s/it]
                                                    
{'loss': 1.0984, 'learning_rate': 0.0, 'epoch': 0.62}

 21%|██        | 419/2022 [54:36<3:27:42,  7.77s/it]
 21%|██        | 420/2022 [54:43<3:28:34,  7.81s/it]
                                                    
{'loss': 1.1192, 'learning_rate': 0.0, 'epoch': 0.62}

 21%|██        | 420/2022 [54:43<3:28:34,  7.81s/it]
 21%|██        | 421/2022 [54:51<3:26:49,  7.75s/it]
                                                    
{'loss': 1.0524, 'learning_rate': 0.0, 'epoch': 0.62}

 21%|██        | 421/2022 [54:51<3:26:49,  7.75s/it]
 21%|██        | 422/2022 [54:59<3:29:35,  7.86s/it]
                                                    
{'loss': 1.1544, 'learning_rate': 0.0, 'epoch': 0.63}

 21%|██        | 422/2022 [54:59<3:29:35,  7.86s/it]
 21%|██        | 423/2022 [55:07<3:30:18,  7.89s/it]
                                                    
{'loss': 1.3337, 'learning_rate': 0.0, 'epoch': 0.63}

 21%|██        | 423/2022 [55:07<3:30:18,  7.89s/it]
 21%|██        | 424/2022 [55:15<3:27:46,  7.80s/it]
                                                    
{'loss': 1.2294, 'learning_rate': 0.0, 'epoch': 0.63}

 21%|██        | 424/2022 [55:15<3:27:46,  7.80s/it]
 21%|██        | 425/2022 [55:22<3:27:29,  7.80s/it]
                                                    
{'loss': 1.1984, 'learning_rate': 0.0, 'epoch': 0.63}

 21%|██        | 425/2022 [55:22<3:27:29,  7.80s/it]
 21%|██        | 426/2022 [55:30<3:27:27,  7.80s/it]
                                                    
{'loss': 1.3662, 'learning_rate': 0.0, 'epoch': 0.63}

 21%|██        | 426/2022 [55:30<3:27:27,  7.80s/it]
 21%|██        | 427/2022 [55:38<3:27:02,  7.79s/it]
                                                    
{'loss': 1.1421, 'learning_rate': 0.0, 'epoch': 0.63}

 21%|██        | 427/2022 [55:38<3:27:02,  7.79s/it]
 21%|██        | 428/2022 [55:46<3:28:04,  7.83s/it]
                                                    
{'loss': 1.2245, 'learning_rate': 0.0, 'epoch': 0.63}

 21%|██        | 428/2022 [55:46<3:28:04,  7.83s/it]
 21%|██        | 429/2022 [55:54<3:27:42,  7.82s/it]
                                                    
{'loss': 1.2369, 'learning_rate': 0.0, 'epoch': 0.64}

 21%|██        | 429/2022 [55:54<3:27:42,  7.82s/it]
 21%|██▏       | 430/2022 [56:01<3:25:28,  7.74s/it]
                                                    
{'loss': 1.0808, 'learning_rate': 0.0, 'epoch': 0.64}

 21%|██▏       | 430/2022 [56:01<3:25:28,  7.74s/it]
 21%|██▏       | 431/2022 [56:09<3:24:50,  7.72s/it]
                                                    
{'loss': 1.2277, 'learning_rate': 0.0, 'epoch': 0.64}

 21%|██▏       | 431/2022 [56:09<3:24:50,  7.72s/it]
 21%|██▏       | 432/2022 [56:17<3:30:48,  7.96s/it]
                                                    
{'loss': 1.1075, 'learning_rate': 0.0, 'epoch': 0.64}

 21%|██▏       | 432/2022 [56:18<3:30:48,  7.96s/it]
 21%|██▏       | 433/2022 [56:25<3:30:21,  7.94s/it]
                                                    
{'loss': 1.0711, 'learning_rate': 0.0, 'epoch': 0.64}

 21%|██▏       | 433/2022 [56:25<3:30:21,  7.94s/it]
 21%|██▏       | 434/2022 [56:33<3:27:47,  7.85s/it]
                                                    
{'loss': 1.2561, 'learning_rate': 0.0, 'epoch': 0.64}

 21%|██▏       | 434/2022 [56:33<3:27:47,  7.85s/it]
 22%|██▏       | 435/2022 [56:41<3:26:46,  7.82s/it]
                                                    
{'loss': 1.1881, 'learning_rate': 0.0, 'epoch': 0.64}

 22%|██▏       | 435/2022 [56:41<3:26:46,  7.82s/it]
 22%|██▏       | 436/2022 [56:48<3:25:53,  7.79s/it]
                                                    
{'loss': 1.1047, 'learning_rate': 0.0, 'epoch': 0.65}

 22%|██▏       | 436/2022 [56:49<3:25:53,  7.79s/it]
 22%|██▏       | 437/2022 [56:57<3:28:01,  7.87s/it]
                                                    
{'loss': 1.1582, 'learning_rate': 0.0, 'epoch': 0.65}

 22%|██▏       | 437/2022 [56:57<3:28:01,  7.87s/it]
 22%|██▏       | 438/2022 [57:04<3:26:00,  7.80s/it]
                                                    
{'loss': 1.1845, 'learning_rate': 0.0, 'epoch': 0.65}

 22%|██▏       | 438/2022 [57:04<3:26:00,  7.80s/it]
 22%|██▏       | 439/2022 [57:12<3:26:38,  7.83s/it]
                                                    
{'loss': 1.2475, 'learning_rate': 0.0, 'epoch': 0.65}

 22%|██▏       | 439/2022 [57:12<3:26:38,  7.83s/it]
 22%|██▏       | 440/2022 [57:20<3:26:39,  7.84s/it]
                                                    
{'loss': 1.1517, 'learning_rate': 0.0, 'epoch': 0.65}

 22%|██▏       | 440/2022 [57:20<3:26:39,  7.84s/it]
 22%|██▏       | 441/2022 [57:28<3:27:52,  7.89s/it]
                                                    
{'loss': 1.1901, 'learning_rate': 0.0, 'epoch': 0.65}

 22%|██▏       | 441/2022 [57:28<3:27:52,  7.89s/it]
 22%|██▏       | 442/2022 [57:36<3:25:56,  7.82s/it]
                                                    
{'loss': 1.1744, 'learning_rate': 0.0, 'epoch': 0.66}

 22%|██▏       | 442/2022 [57:36<3:25:56,  7.82s/it]
 22%|██▏       | 443/2022 [57:43<3:25:27,  7.81s/it]
                                                    
{'loss': 1.0104, 'learning_rate': 0.0, 'epoch': 0.66}

 22%|██▏       | 443/2022 [57:43<3:25:27,  7.81s/it]
 22%|██▏       | 444/2022 [57:51<3:27:11,  7.88s/it]
                                                    
{'loss': 1.1206, 'learning_rate': 0.0, 'epoch': 0.66}

 22%|██▏       | 444/2022 [57:51<3:27:11,  7.88s/it]
 22%|██▏       | 445/2022 [57:59<3:27:25,  7.89s/it]
                                                    
{'loss': 1.1327, 'learning_rate': 0.0, 'epoch': 0.66}

 22%|██▏       | 445/2022 [57:59<3:27:25,  7.89s/it]
 22%|██▏       | 446/2022 [58:07<3:24:54,  7.80s/it]
                                                    
{'loss': 1.1723, 'learning_rate': 0.0, 'epoch': 0.66}

 22%|██▏       | 446/2022 [58:07<3:24:54,  7.80s/it]
 22%|██▏       | 447/2022 [58:14<3:21:44,  7.69s/it]
                                                    
{'loss': 1.1762, 'learning_rate': 0.0, 'epoch': 0.66}

 22%|██▏       | 447/2022 [58:14<3:21:44,  7.69s/it]
 22%|██▏       | 448/2022 [58:22<3:23:44,  7.77s/it]
                                                    
{'loss': 1.0968, 'learning_rate': 0.0, 'epoch': 0.66}

 22%|██▏       | 448/2022 [58:22<3:23:44,  7.77s/it]
 22%|██▏       | 449/2022 [58:30<3:23:47,  7.77s/it]
                                                    
{'loss': 1.2117, 'learning_rate': 0.0, 'epoch': 0.67}

 22%|██▏       | 449/2022 [58:30<3:23:47,  7.77s/it]
 22%|██▏       | 450/2022 [58:38<3:24:38,  7.81s/it]
                                                    
{'loss': 1.1631, 'learning_rate': 0.0, 'epoch': 0.67}

 22%|██▏       | 450/2022 [58:38<3:24:38,  7.81s/it]
 22%|██▏       | 451/2022 [58:46<3:22:43,  7.74s/it]
                                                    
{'loss': 1.236, 'learning_rate': 0.0, 'epoch': 0.67}

 22%|██▏       | 451/2022 [58:46<3:22:43,  7.74s/it]
 22%|██▏       | 452/2022 [58:53<3:23:06,  7.76s/it]
                                                    
{'loss': 1.1407, 'learning_rate': 0.0, 'epoch': 0.67}

 22%|██▏       | 452/2022 [58:53<3:23:06,  7.76s/it]
 22%|██▏       | 453/2022 [59:01<3:22:41,  7.75s/it]
                                                    
{'loss': 1.202, 'learning_rate': 0.0, 'epoch': 0.67}

 22%|██▏       | 453/2022 [59:01<3:22:41,  7.75s/it]
 22%|██▏       | 454/2022 [59:09<3:21:18,  7.70s/it]
                                                    
{'loss': 1.3043, 'learning_rate': 0.0, 'epoch': 0.67}

 22%|██▏       | 454/2022 [59:09<3:21:18,  7.70s/it]
 23%|██▎       | 455/2022 [59:17<3:22:13,  7.74s/it]
                                                    
{'loss': 1.1418, 'learning_rate': 0.0, 'epoch': 0.67}

 23%|██▎       | 455/2022 [59:17<3:22:13,  7.74s/it]
 23%|██▎       | 456/2022 [59:24<3:23:10,  7.78s/it]
                                                    
{'loss': 1.2235, 'learning_rate': 0.0, 'epoch': 0.68}

 23%|██▎       | 456/2022 [59:24<3:23:10,  7.78s/it]
 23%|██▎       | 457/2022 [59:32<3:21:45,  7.74s/it]
                                                    
{'loss': 1.2294, 'learning_rate': 0.0, 'epoch': 0.68}

 23%|██▎       | 457/2022 [59:32<3:21:45,  7.74s/it]
 23%|██▎       | 458/2022 [59:40<3:22:55,  7.78s/it]
                                                    
{'loss': 1.2313, 'learning_rate': 0.0, 'epoch': 0.68}

 23%|██▎       | 458/2022 [59:40<3:22:55,  7.78s/it]
 23%|██▎       | 459/2022 [59:48<3:24:37,  7.86s/it]
                                                    
{'loss': 1.0518, 'learning_rate': 0.0, 'epoch': 0.68}

 23%|██▎       | 459/2022 [59:48<3:24:37,  7.86s/it]
 23%|██▎       | 460/2022 [59:56<3:22:01,  7.76s/it]
                                                    
{'loss': 1.2222, 'learning_rate': 0.0, 'epoch': 0.68}

 23%|██▎       | 460/2022 [59:56<3:22:01,  7.76s/it]
 23%|██▎       | 461/2022 [1:00:03<3:20:26,  7.70s/it]
                                                      
{'loss': 1.1383, 'learning_rate': 0.0, 'epoch': 0.68}

 23%|██▎       | 461/2022 [1:00:03<3:20:26,  7.70s/it]
 23%|██▎       | 462/2022 [1:00:11<3:20:05,  7.70s/it]
                                                      
{'loss': 1.2758, 'learning_rate': 0.0, 'epoch': 0.68}

 23%|██▎       | 462/2022 [1:00:11<3:20:05,  7.70s/it]
 23%|██▎       | 463/2022 [1:00:19<3:20:41,  7.72s/it]
                                                      
{'loss': 1.0702, 'learning_rate': 0.0, 'epoch': 0.69}

 23%|██▎       | 463/2022 [1:00:19<3:20:41,  7.72s/it]
 23%|██▎       | 464/2022 [1:00:26<3:22:12,  7.79s/it]
                                                      
{'loss': 1.0083, 'learning_rate': 0.0, 'epoch': 0.69}

 23%|██▎       | 464/2022 [1:00:27<3:22:12,  7.79s/it]
 23%|██▎       | 465/2022 [1:00:34<3:21:24,  7.76s/it]
                                                      
{'loss': 1.1569, 'learning_rate': 0.0, 'epoch': 0.69}

 23%|██▎       | 465/2022 [1:00:34<3:21:24,  7.76s/it]
 23%|██▎       | 466/2022 [1:00:42<3:23:17,  7.84s/it]
                                                      
{'loss': 1.0819, 'learning_rate': 0.0, 'epoch': 0.69}

 23%|██▎       | 466/2022 [1:00:42<3:23:17,  7.84s/it]
 23%|██▎       | 467/2022 [1:00:50<3:23:46,  7.86s/it]
                                                      
{'loss': 1.0491, 'learning_rate': 0.0, 'epoch': 0.69}

 23%|██▎       | 467/2022 [1:00:50<3:23:46,  7.86s/it]
 23%|██▎       | 468/2022 [1:00:58<3:22:49,  7.83s/it]
                                                      
{'loss': 1.1968, 'learning_rate': 0.0, 'epoch': 0.69}

 23%|██▎       | 468/2022 [1:00:58<3:22:49,  7.83s/it]
 23%|██▎       | 469/2022 [1:01:05<3:20:18,  7.74s/it]
                                                      
{'loss': 1.234, 'learning_rate': 0.0, 'epoch': 0.7}

 23%|██▎       | 469/2022 [1:01:05<3:20:18,  7.74s/it]
 23%|██▎       | 470/2022 [1:01:13<3:18:54,  7.69s/it]
                                                      
{'loss': 1.1116, 'learning_rate': 0.0, 'epoch': 0.7}

 23%|██▎       | 470/2022 [1:01:13<3:18:54,  7.69s/it]
 23%|██▎       | 471/2022 [1:01:21<3:20:02,  7.74s/it]
                                                      
{'loss': 1.0538, 'learning_rate': 0.0, 'epoch': 0.7}

 23%|██▎       | 471/2022 [1:01:21<3:20:02,  7.74s/it]
 23%|██▎       | 472/2022 [1:01:29<3:22:23,  7.83s/it]
                                                      
{'loss': 1.1142, 'learning_rate': 0.0, 'epoch': 0.7}

 23%|██▎       | 472/2022 [1:01:29<3:22:23,  7.83s/it]
 23%|██▎       | 473/2022 [1:01:37<3:23:29,  7.88s/it]
                                                      
{'loss': 1.0705, 'learning_rate': 0.0, 'epoch': 0.7}

 23%|██▎       | 473/2022 [1:01:37<3:23:29,  7.88s/it]
 23%|██▎       | 474/2022 [1:01:45<3:22:27,  7.85s/it]
                                                      
{'loss': 1.1163, 'learning_rate': 0.0, 'epoch': 0.7}

 23%|██▎       | 474/2022 [1:01:45<3:22:27,  7.85s/it]
 23%|██▎       | 475/2022 [1:01:53<3:23:31,  7.89s/it]
                                                      
{'loss': 1.2979, 'learning_rate': 0.0, 'epoch': 0.7}

 23%|██▎       | 475/2022 [1:01:53<3:23:31,  7.89s/it]
 24%|██▎       | 476/2022 [1:02:01<3:24:18,  7.93s/it]
                                                      
{'loss': 1.1137, 'learning_rate': 0.0, 'epoch': 0.71}

 24%|██▎       | 476/2022 [1:02:01<3:24:18,  7.93s/it]
 24%|██▎       | 477/2022 [1:02:09<3:23:29,  7.90s/it]
                                                      
{'loss': 1.2165, 'learning_rate': 0.0, 'epoch': 0.71}

 24%|██▎       | 477/2022 [1:02:09<3:23:29,  7.90s/it]
 24%|██▎       | 478/2022 [1:02:16<3:22:43,  7.88s/it]
                                                      
{'loss': 1.1782, 'learning_rate': 0.0, 'epoch': 0.71}

 24%|██▎       | 478/2022 [1:02:16<3:22:43,  7.88s/it]
 24%|██▎       | 479/2022 [1:02:24<3:20:27,  7.79s/it]
                                                      
{'loss': 1.1751, 'learning_rate': 0.0, 'epoch': 0.71}

 24%|██▎       | 479/2022 [1:02:24<3:20:27,  7.79s/it]
 24%|██▎       | 480/2022 [1:02:32<3:19:14,  7.75s/it]
                                                      
{'loss': 1.0663, 'learning_rate': 0.0, 'epoch': 0.71}

 24%|██▎       | 480/2022 [1:02:32<3:19:14,  7.75s/it]
 24%|██▍       | 481/2022 [1:02:40<3:20:19,  7.80s/it]
                                                      
{'loss': 1.1116, 'learning_rate': 0.0, 'epoch': 0.71}

 24%|██▍       | 481/2022 [1:02:40<3:20:19,  7.80s/it]
 24%|██▍       | 482/2022 [1:02:47<3:18:09,  7.72s/it]
                                                      
{'loss': 1.2368, 'learning_rate': 0.0, 'epoch': 0.71}

 24%|██▍       | 482/2022 [1:02:47<3:18:09,  7.72s/it]
 24%|██▍       | 483/2022 [1:02:55<3:16:29,  7.66s/it]
                                                      
{'loss': 1.1627, 'learning_rate': 0.0, 'epoch': 0.72}

 24%|██▍       | 483/2022 [1:02:55<3:16:29,  7.66s/it]
 24%|██▍       | 484/2022 [1:03:02<3:18:14,  7.73s/it]
                                                      
{'loss': 1.1374, 'learning_rate': 0.0, 'epoch': 0.72}

 24%|██▍       | 484/2022 [1:03:02<3:18:14,  7.73s/it]
 24%|██▍       | 485/2022 [1:03:10<3:19:42,  7.80s/it]
                                                      
{'loss': 1.1724, 'learning_rate': 0.0, 'epoch': 0.72}

 24%|██▍       | 485/2022 [1:03:10<3:19:42,  7.80s/it]
 24%|██▍       | 486/2022 [1:03:18<3:19:53,  7.81s/it]
                                                      
{'loss': 1.0725, 'learning_rate': 0.0, 'epoch': 0.72}

 24%|██▍       | 486/2022 [1:03:18<3:19:53,  7.81s/it]
 24%|██▍       | 487/2022 [1:03:26<3:16:47,  7.69s/it]
                                                      
{'loss': 1.1271, 'learning_rate': 0.0, 'epoch': 0.72}

 24%|██▍       | 487/2022 [1:03:26<3:16:47,  7.69s/it]
 24%|██▍       | 488/2022 [1:03:33<3:17:19,  7.72s/it]
                                                      
{'loss': 1.2094, 'learning_rate': 0.0, 'epoch': 0.72}

 24%|██▍       | 488/2022 [1:03:33<3:17:19,  7.72s/it]
 24%|██▍       | 489/2022 [1:03:41<3:18:14,  7.76s/it]
                                                      
{'loss': 1.2421, 'learning_rate': 0.0, 'epoch': 0.72}

 24%|██▍       | 489/2022 [1:03:41<3:18:14,  7.76s/it]
 24%|██▍       | 490/2022 [1:03:49<3:17:24,  7.73s/it]
                                                      
{'loss': 1.1614, 'learning_rate': 0.0, 'epoch': 0.73}

 24%|██▍       | 490/2022 [1:03:49<3:17:24,  7.73s/it]
 24%|██▍       | 491/2022 [1:03:57<3:18:00,  7.76s/it]
                                                      
{'loss': 1.1872, 'learning_rate': 0.0, 'epoch': 0.73}

 24%|██▍       | 491/2022 [1:03:57<3:18:00,  7.76s/it]
 24%|██▍       | 492/2022 [1:04:05<3:19:21,  7.82s/it]
                                                      
{'loss': 1.2446, 'learning_rate': 0.0, 'epoch': 0.73}

 24%|██▍       | 492/2022 [1:04:05<3:19:21,  7.82s/it]
 24%|██▍       | 493/2022 [1:04:12<3:18:15,  7.78s/it]
                                                      
{'loss': 1.2327, 'learning_rate': 0.0, 'epoch': 0.73}

 24%|██▍       | 493/2022 [1:04:12<3:18:15,  7.78s/it]
 24%|██▍       | 494/2022 [1:04:20<3:18:11,  7.78s/it]
                                                      
{'loss': 1.2151, 'learning_rate': 0.0, 'epoch': 0.73}

 24%|██▍       | 494/2022 [1:04:20<3:18:11,  7.78s/it]
 24%|██▍       | 495/2022 [1:04:28<3:18:03,  7.78s/it]
                                                      
{'loss': 1.1465, 'learning_rate': 0.0, 'epoch': 0.73}

 24%|██▍       | 495/2022 [1:04:28<3:18:03,  7.78s/it]
 25%|██▍       | 496/2022 [1:04:36<3:19:23,  7.84s/it]
                                                      
{'loss': 1.1763, 'learning_rate': 0.0, 'epoch': 0.74}

 25%|██▍       | 496/2022 [1:04:36<3:19:23,  7.84s/it]
 25%|██▍       | 497/2022 [1:04:44<3:19:04,  7.83s/it]
                                                      
{'loss': 1.2006, 'learning_rate': 0.0, 'epoch': 0.74}

 25%|██▍       | 497/2022 [1:04:44<3:19:04,  7.83s/it]
 25%|██▍       | 498/2022 [1:04:51<3:16:43,  7.74s/it]
                                                      
{'loss': 1.2444, 'learning_rate': 0.0, 'epoch': 0.74}

 25%|██▍       | 498/2022 [1:04:51<3:16:43,  7.74s/it]
 25%|██▍       | 499/2022 [1:04:59<3:18:32,  7.82s/it]
                                                      
{'loss': 1.1077, 'learning_rate': 0.0, 'epoch': 0.74}

 25%|██▍       | 499/2022 [1:04:59<3:18:32,  7.82s/it]
 25%|██▍       | 500/2022 [1:05:07<3:17:39,  7.79s/it]
                                                      
{'loss': 1.1226, 'learning_rate': 0.0, 'epoch': 0.74}

 25%|██▍       | 500/2022 [1:05:07<3:17:39,  7.79s/it]
 25%|██▍       | 501/2022 [1:05:15<3:16:59,  7.77s/it]
                                                      
{'loss': 1.3204, 'learning_rate': 0.0, 'epoch': 0.74}

 25%|██▍       | 501/2022 [1:05:15<3:16:59,  7.77s/it]
 25%|██▍       | 502/2022 [1:05:23<3:16:45,  7.77s/it]
                                                      
{'loss': 1.1587, 'learning_rate': 0.0, 'epoch': 0.74}

 25%|██▍       | 502/2022 [1:05:23<3:16:45,  7.77s/it]
 25%|██▍       | 503/2022 [1:05:30<3:17:12,  7.79s/it]
                                                      
{'loss': 1.2473, 'learning_rate': 0.0, 'epoch': 0.75}

 25%|██▍       | 503/2022 [1:05:30<3:17:12,  7.79s/it]
 25%|██▍       | 504/2022 [1:05:38<3:18:23,  7.84s/it]
                                                      
{'loss': 1.1534, 'learning_rate': 0.0, 'epoch': 0.75}

 25%|██▍       | 504/2022 [1:05:38<3:18:23,  7.84s/it]
 25%|██▍       | 505/2022 [1:05:46<3:18:47,  7.86s/it]
                                                      
{'loss': 1.045, 'learning_rate': 0.0, 'epoch': 0.75}

 25%|██▍       | 505/2022 [1:05:46<3:18:47,  7.86s/it]
 25%|██▌       | 506/2022 [1:05:55<3:23:34,  8.06s/it]
                                                      
{'loss': 1.1644, 'learning_rate': 0.0, 'epoch': 0.75}

 25%|██▌       | 506/2022 [1:05:55<3:23:34,  8.06s/it]
 25%|██▌       | 507/2022 [1:06:02<3:20:21,  7.93s/it]
                                                      
{'loss': 1.2782, 'learning_rate': 0.0, 'epoch': 0.75}

 25%|██▌       | 507/2022 [1:06:02<3:20:21,  7.93s/it]
 25%|██▌       | 508/2022 [1:06:10<3:19:54,  7.92s/it]
                                                      
{'loss': 1.1076, 'learning_rate': 0.0, 'epoch': 0.75}

 25%|██▌       | 508/2022 [1:06:10<3:19:54,  7.92s/it]
 25%|██▌       | 509/2022 [1:06:18<3:20:01,  7.93s/it]
                                                      
{'loss': 1.1552, 'learning_rate': 0.0, 'epoch': 0.75}

 25%|██▌       | 509/2022 [1:06:18<3:20:01,  7.93s/it]
 25%|██▌       | 510/2022 [1:06:26<3:18:53,  7.89s/it]
                                                      
{'loss': 1.2307, 'learning_rate': 0.0, 'epoch': 0.76}

 25%|██▌       | 510/2022 [1:06:26<3:18:53,  7.89s/it]
 25%|██▌       | 511/2022 [1:06:34<3:19:23,  7.92s/it]
                                                      
{'loss': 1.2366, 'learning_rate': 0.0, 'epoch': 0.76}

 25%|██▌       | 511/2022 [1:06:34<3:19:23,  7.92s/it]
 25%|██▌       | 512/2022 [1:06:42<3:21:51,  8.02s/it]
                                                      
{'loss': 1.1364, 'learning_rate': 0.0, 'epoch': 0.76}

 25%|██▌       | 512/2022 [1:06:42<3:21:51,  8.02s/it]
 25%|██▌       | 513/2022 [1:06:50<3:21:55,  8.03s/it]
                                                      
{'loss': 1.1986, 'learning_rate': 0.0, 'epoch': 0.76}

 25%|██▌       | 513/2022 [1:06:50<3:21:55,  8.03s/it]
 25%|██▌       | 514/2022 [1:06:58<3:19:42,  7.95s/it]
                                                      
{'loss': 1.1257, 'learning_rate': 0.0, 'epoch': 0.76}

 25%|██▌       | 514/2022 [1:06:58<3:19:42,  7.95s/it]
 25%|██▌       | 515/2022 [1:07:06<3:17:05,  7.85s/it]
                                                      
{'loss': 1.1425, 'learning_rate': 0.0, 'epoch': 0.76}

 25%|██▌       | 515/2022 [1:07:06<3:17:05,  7.85s/it]
 26%|██▌       | 516/2022 [1:07:13<3:13:22,  7.70s/it]
                                                      
{'loss': 1.1971, 'learning_rate': 0.0, 'epoch': 0.77}

 26%|██▌       | 516/2022 [1:07:13<3:13:22,  7.70s/it]
 26%|██▌       | 517/2022 [1:07:21<3:12:22,  7.67s/it]
                                                      
{'loss': 1.0309, 'learning_rate': 0.0, 'epoch': 0.77}

 26%|██▌       | 517/2022 [1:07:21<3:12:22,  7.67s/it]
 26%|██▌       | 518/2022 [1:07:28<3:11:51,  7.65s/it]
                                                      
{'loss': 1.2154, 'learning_rate': 0.0, 'epoch': 0.77}

 26%|██▌       | 518/2022 [1:07:28<3:11:51,  7.65s/it]
 26%|██▌       | 519/2022 [1:07:36<3:15:52,  7.82s/it]
                                                      
{'loss': 1.019, 'learning_rate': 0.0, 'epoch': 0.77}

 26%|██▌       | 519/2022 [1:07:37<3:15:52,  7.82s/it]
 26%|██▌       | 520/2022 [1:07:44<3:15:58,  7.83s/it]
                                                      
{'loss': 1.0538, 'learning_rate': 0.0, 'epoch': 0.77}

 26%|██▌       | 520/2022 [1:07:44<3:15:58,  7.83s/it]
 26%|██▌       | 521/2022 [1:07:52<3:13:30,  7.74s/it]
                                                      
{'loss': 1.2092, 'learning_rate': 0.0, 'epoch': 0.77}

 26%|██▌       | 521/2022 [1:07:52<3:13:30,  7.74s/it]
 26%|██▌       | 522/2022 [1:08:00<3:15:00,  7.80s/it]
                                                      
{'loss': 1.0487, 'learning_rate': 0.0, 'epoch': 0.77}

 26%|██▌       | 522/2022 [1:08:00<3:15:00,  7.80s/it]
 26%|██▌       | 523/2022 [1:08:08<3:16:58,  7.88s/it]
                                                      
{'loss': 1.1373, 'learning_rate': 0.0, 'epoch': 0.78}

 26%|██▌       | 523/2022 [1:08:08<3:16:58,  7.88s/it]
 26%|██▌       | 524/2022 [1:08:16<3:18:27,  7.95s/it]
                                                      
{'loss': 1.2065, 'learning_rate': 0.0, 'epoch': 0.78}

 26%|██▌       | 524/2022 [1:08:16<3:18:27,  7.95s/it]
 26%|██▌       | 525/2022 [1:08:24<3:17:42,  7.92s/it]
                                                      
{'loss': 1.0829, 'learning_rate': 0.0, 'epoch': 0.78}

 26%|██▌       | 525/2022 [1:08:24<3:17:42,  7.92s/it]
 26%|██▌       | 526/2022 [1:08:32<3:17:31,  7.92s/it]
                                                      
{'loss': 1.262, 'learning_rate': 0.0, 'epoch': 0.78}

 26%|██▌       | 526/2022 [1:08:32<3:17:31,  7.92s/it]
 26%|██▌       | 527/2022 [1:08:40<3:16:37,  7.89s/it]
                                                      
{'loss': 1.1282, 'learning_rate': 0.0, 'epoch': 0.78}

 26%|██▌       | 527/2022 [1:08:40<3:16:37,  7.89s/it]
 26%|██▌       | 528/2022 [1:08:47<3:13:52,  7.79s/it]
                                                      
{'loss': 1.2026, 'learning_rate': 0.0, 'epoch': 0.78}

 26%|██▌       | 528/2022 [1:08:47<3:13:52,  7.79s/it]
 26%|██▌       | 529/2022 [1:08:55<3:15:01,  7.84s/it]
                                                      
{'loss': 1.2319, 'learning_rate': 0.0, 'epoch': 0.78}

 26%|██▌       | 529/2022 [1:08:55<3:15:01,  7.84s/it]
 26%|██▌       | 530/2022 [1:09:03<3:15:09,  7.85s/it]
                                                      
{'loss': 1.1591, 'learning_rate': 0.0, 'epoch': 0.79}

 26%|██▌       | 530/2022 [1:09:03<3:15:09,  7.85s/it]
 26%|██▋       | 531/2022 [1:09:11<3:13:34,  7.79s/it]
                                                      
{'loss': 1.1657, 'learning_rate': 0.0, 'epoch': 0.79}

 26%|██▋       | 531/2022 [1:09:11<3:13:34,  7.79s/it]
 26%|██▋       | 532/2022 [1:09:18<3:13:04,  7.77s/it]
                                                      
{'loss': 1.2596, 'learning_rate': 0.0, 'epoch': 0.79}

 26%|██▋       | 532/2022 [1:09:18<3:13:04,  7.77s/it]
 26%|██▋       | 533/2022 [1:09:26<3:12:25,  7.75s/it]
                                                      
{'loss': 1.1433, 'learning_rate': 0.0, 'epoch': 0.79}

 26%|██▋       | 533/2022 [1:09:26<3:12:25,  7.75s/it]
 26%|██▋       | 534/2022 [1:09:34<3:11:45,  7.73s/it]
                                                      
{'loss': 1.2707, 'learning_rate': 0.0, 'epoch': 0.79}

 26%|██▋       | 534/2022 [1:09:34<3:11:45,  7.73s/it]
 26%|██▋       | 535/2022 [1:09:42<3:13:57,  7.83s/it]
                                                      
{'loss': 1.1905, 'learning_rate': 0.0, 'epoch': 0.79}

 26%|██▋       | 535/2022 [1:09:42<3:13:57,  7.83s/it]
 27%|██▋       | 536/2022 [1:09:49<3:11:38,  7.74s/it]
                                                      
{'loss': 1.2993, 'learning_rate': 0.0, 'epoch': 0.79}

 27%|██▋       | 536/2022 [1:09:49<3:11:38,  7.74s/it]
 27%|██▋       | 537/2022 [1:09:58<3:15:34,  7.90s/it]
                                                      
{'loss': 1.2439, 'learning_rate': 0.0, 'epoch': 0.8}

 27%|██▋       | 537/2022 [1:09:58<3:15:34,  7.90s/it]
 27%|██▋       | 538/2022 [1:10:05<3:14:32,  7.87s/it]
                                                      
{'loss': 1.0609, 'learning_rate': 0.0, 'epoch': 0.8}

 27%|██▋       | 538/2022 [1:10:05<3:14:32,  7.87s/it]
 27%|██▋       | 539/2022 [1:10:13<3:14:42,  7.88s/it]
                                                      
{'loss': 1.2368, 'learning_rate': 0.0, 'epoch': 0.8}

 27%|██▋       | 539/2022 [1:10:13<3:14:42,  7.88s/it]
 27%|██▋       | 540/2022 [1:10:21<3:14:59,  7.89s/it]
                                                      
{'loss': 1.0771, 'learning_rate': 0.0, 'epoch': 0.8}

 27%|██▋       | 540/2022 [1:10:21<3:14:59,  7.89s/it]
 27%|██▋       | 541/2022 [1:10:29<3:13:56,  7.86s/it]
                                                      
{'loss': 1.0854, 'learning_rate': 0.0, 'epoch': 0.8}

 27%|██▋       | 541/2022 [1:10:29<3:13:56,  7.86s/it]
 27%|██▋       | 542/2022 [1:10:37<3:16:01,  7.95s/it]
                                                      
{'loss': 1.0296, 'learning_rate': 0.0, 'epoch': 0.8}

 27%|██▋       | 542/2022 [1:10:37<3:16:01,  7.95s/it]
 27%|██▋       | 543/2022 [1:10:45<3:14:52,  7.91s/it]
                                                      
{'loss': 1.3105, 'learning_rate': 0.0, 'epoch': 0.81}

 27%|██▋       | 543/2022 [1:10:45<3:14:52,  7.91s/it]
 27%|██▋       | 544/2022 [1:10:53<3:13:18,  7.85s/it]
                                                      
{'loss': 1.1468, 'learning_rate': 0.0, 'epoch': 0.81}

 27%|██▋       | 544/2022 [1:10:53<3:13:18,  7.85s/it]
 27%|██▋       | 545/2022 [1:11:00<3:11:15,  7.77s/it]
                                                      
{'loss': 1.2278, 'learning_rate': 0.0, 'epoch': 0.81}

 27%|██▋       | 545/2022 [1:11:00<3:11:15,  7.77s/it]
 27%|██▋       | 546/2022 [1:11:08<3:11:20,  7.78s/it]
                                                      
{'loss': 1.2448, 'learning_rate': 0.0, 'epoch': 0.81}

 27%|██▋       | 546/2022 [1:11:08<3:11:20,  7.78s/it]
 27%|██▋       | 547/2022 [1:11:16<3:13:57,  7.89s/it]
                                                      
{'loss': 1.0613, 'learning_rate': 0.0, 'epoch': 0.81}

 27%|██▋       | 547/2022 [1:11:16<3:13:57,  7.89s/it]
 27%|██▋       | 548/2022 [1:11:24<3:12:55,  7.85s/it]
                                                      
{'loss': 1.2483, 'learning_rate': 0.0, 'epoch': 0.81}

 27%|██▋       | 548/2022 [1:11:24<3:12:55,  7.85s/it]
 27%|██▋       | 549/2022 [1:11:32<3:11:17,  7.79s/it]
                                                      
{'loss': 1.1629, 'learning_rate': 0.0, 'epoch': 0.81}

 27%|██▋       | 549/2022 [1:11:32<3:11:17,  7.79s/it]
 27%|██▋       | 550/2022 [1:11:39<3:11:25,  7.80s/it]
                                                      
{'loss': 1.225, 'learning_rate': 0.0, 'epoch': 0.82}

 27%|██▋       | 550/2022 [1:11:39<3:11:25,  7.80s/it]
 27%|██▋       | 551/2022 [1:11:47<3:08:05,  7.67s/it]
                                                      
{'loss': 1.2186, 'learning_rate': 0.0, 'epoch': 0.82}

 27%|██▋       | 551/2022 [1:11:47<3:08:05,  7.67s/it]
 27%|██▋       | 552/2022 [1:11:55<3:10:36,  7.78s/it]
                                                      
{'loss': 1.0735, 'learning_rate': 0.0, 'epoch': 0.82}

 27%|██▋       | 552/2022 [1:11:55<3:10:36,  7.78s/it]
 27%|██▋       | 553/2022 [1:12:03<3:09:51,  7.75s/it]
                                                      
{'loss': 1.2452, 'learning_rate': 0.0, 'epoch': 0.82}

 27%|██▋       | 553/2022 [1:12:03<3:09:51,  7.75s/it]
 27%|██▋       | 554/2022 [1:12:10<3:08:54,  7.72s/it]
                                                      
{'loss': 1.1062, 'learning_rate': 0.0, 'epoch': 0.82}

 27%|██▋       | 554/2022 [1:12:10<3:08:54,  7.72s/it]
 27%|██▋       | 555/2022 [1:12:18<3:09:30,  7.75s/it]
                                                      
{'loss': 1.2272, 'learning_rate': 0.0, 'epoch': 0.82}

 27%|██▋       | 555/2022 [1:12:18<3:09:30,  7.75s/it]
 27%|██▋       | 556/2022 [1:12:26<3:09:07,  7.74s/it]
                                                      
{'loss': 1.1905, 'learning_rate': 0.0, 'epoch': 0.82}

 27%|██▋       | 556/2022 [1:12:26<3:09:07,  7.74s/it]
 28%|██▊       | 557/2022 [1:12:34<3:09:59,  7.78s/it]
                                                      
{'loss': 1.0757, 'learning_rate': 0.0, 'epoch': 0.83}

 28%|██▊       | 557/2022 [1:12:34<3:09:59,  7.78s/it]
 28%|██▊       | 558/2022 [1:12:41<3:06:29,  7.64s/it]
                                                      
{'loss': 1.2464, 'learning_rate': 0.0, 'epoch': 0.83}

 28%|██▊       | 558/2022 [1:12:41<3:06:29,  7.64s/it]
 28%|██▊       | 559/2022 [1:12:49<3:10:54,  7.83s/it]
                                                      
{'loss': 1.127, 'learning_rate': 0.0, 'epoch': 0.83}

 28%|██▊       | 559/2022 [1:12:49<3:10:54,  7.83s/it]
 28%|██▊       | 560/2022 [1:12:58<3:15:20,  8.02s/it]
                                                      
{'loss': 1.0369, 'learning_rate': 0.0, 'epoch': 0.83}

 28%|██▊       | 560/2022 [1:12:58<3:15:20,  8.02s/it]
 28%|██▊       | 561/2022 [1:13:06<3:16:10,  8.06s/it]
                                                      
{'loss': 1.1536, 'learning_rate': 0.0, 'epoch': 0.83}

 28%|██▊       | 561/2022 [1:13:06<3:16:10,  8.06s/it]
 28%|██▊       | 562/2022 [1:13:14<3:16:06,  8.06s/it]
                                                      
{'loss': 1.2265, 'learning_rate': 0.0, 'epoch': 0.83}

 28%|██▊       | 562/2022 [1:13:14<3:16:06,  8.06s/it]
 28%|██▊       | 563/2022 [1:13:21<3:12:44,  7.93s/it]
                                                      
{'loss': 1.2863, 'learning_rate': 0.0, 'epoch': 0.83}

 28%|██▊       | 563/2022 [1:13:21<3:12:44,  7.93s/it]
 28%|██▊       | 564/2022 [1:13:29<3:08:16,  7.75s/it]
                                                      
{'loss': 1.256, 'learning_rate': 0.0, 'epoch': 0.84}

 28%|██▊       | 564/2022 [1:13:29<3:08:16,  7.75s/it]
 28%|██▊       | 565/2022 [1:13:36<3:07:09,  7.71s/it]
                                                      
{'loss': 1.2795, 'learning_rate': 0.0, 'epoch': 0.84}

 28%|██▊       | 565/2022 [1:13:36<3:07:09,  7.71s/it]
 28%|██▊       | 566/2022 [1:13:44<3:06:45,  7.70s/it]
                                                      
{'loss': 1.2506, 'learning_rate': 0.0, 'epoch': 0.84}

 28%|██▊       | 566/2022 [1:13:44<3:06:45,  7.70s/it]
 28%|██▊       | 567/2022 [1:13:52<3:07:45,  7.74s/it]
                                                      
{'loss': 1.2619, 'learning_rate': 0.0, 'epoch': 0.84}

 28%|██▊       | 567/2022 [1:13:52<3:07:45,  7.74s/it]
 28%|██▊       | 568/2022 [1:14:00<3:07:03,  7.72s/it]
                                                      
{'loss': 1.2369, 'learning_rate': 0.0, 'epoch': 0.84}

 28%|██▊       | 568/2022 [1:14:00<3:07:03,  7.72s/it]
 28%|██▊       | 569/2022 [1:14:07<3:07:22,  7.74s/it]
                                                      
{'loss': 1.1944, 'learning_rate': 0.0, 'epoch': 0.84}

 28%|██▊       | 569/2022 [1:14:07<3:07:22,  7.74s/it]
 28%|██▊       | 570/2022 [1:14:15<3:07:54,  7.76s/it]
                                                      
{'loss': 1.1883, 'learning_rate': 0.0, 'epoch': 0.85}

 28%|██▊       | 570/2022 [1:14:15<3:07:54,  7.76s/it]
 28%|██▊       | 571/2022 [1:14:23<3:07:56,  7.77s/it]
                                                      
{'loss': 1.2117, 'learning_rate': 0.0, 'epoch': 0.85}

 28%|██▊       | 571/2022 [1:14:23<3:07:56,  7.77s/it]
 28%|██▊       | 572/2022 [1:14:31<3:07:25,  7.76s/it]
                                                      
{'loss': 1.2449, 'learning_rate': 0.0, 'epoch': 0.85}

 28%|██▊       | 572/2022 [1:14:31<3:07:25,  7.76s/it]
 28%|██▊       | 573/2022 [1:14:38<3:05:54,  7.70s/it]
                                                      
{'loss': 1.2549, 'learning_rate': 0.0, 'epoch': 0.85}

 28%|██▊       | 573/2022 [1:14:38<3:05:54,  7.70s/it]
 28%|██▊       | 574/2022 [1:14:46<3:05:49,  7.70s/it]
                                                      
{'loss': 1.2034, 'learning_rate': 0.0, 'epoch': 0.85}

 28%|██▊       | 574/2022 [1:14:46<3:05:49,  7.70s/it]
 28%|██▊       | 575/2022 [1:14:54<3:06:00,  7.71s/it]
                                                      
{'loss': 1.0937, 'learning_rate': 0.0, 'epoch': 0.85}

 28%|██▊       | 575/2022 [1:14:54<3:06:00,  7.71s/it]
 28%|██▊       | 576/2022 [1:15:02<3:07:12,  7.77s/it]
                                                      
{'loss': 1.0682, 'learning_rate': 0.0, 'epoch': 0.85}

 28%|██▊       | 576/2022 [1:15:02<3:07:12,  7.77s/it]
 29%|██▊       | 577/2022 [1:15:10<3:08:02,  7.81s/it]
                                                      
{'loss': 1.234, 'learning_rate': 0.0, 'epoch': 0.86}

 29%|██▊       | 577/2022 [1:15:10<3:08:02,  7.81s/it]
 29%|██▊       | 578/2022 [1:15:18<3:09:52,  7.89s/it]
                                                      
{'loss': 1.0424, 'learning_rate': 0.0, 'epoch': 0.86}

 29%|██▊       | 578/2022 [1:15:18<3:09:52,  7.89s/it]
 29%|██▊       | 579/2022 [1:15:26<3:09:49,  7.89s/it]
                                                      
{'loss': 1.0438, 'learning_rate': 0.0, 'epoch': 0.86}

 29%|██▊       | 579/2022 [1:15:26<3:09:49,  7.89s/it]
 29%|██▊       | 580/2022 [1:15:33<3:07:55,  7.82s/it]
                                                      
{'loss': 1.1463, 'learning_rate': 0.0, 'epoch': 0.86}

 29%|██▊       | 580/2022 [1:15:33<3:07:55,  7.82s/it]
 29%|██▊       | 581/2022 [1:15:41<3:06:58,  7.79s/it]
                                                      
{'loss': 1.1164, 'learning_rate': 0.0, 'epoch': 0.86}

 29%|██▊       | 581/2022 [1:15:41<3:06:58,  7.79s/it]
 29%|██▉       | 582/2022 [1:15:48<3:05:03,  7.71s/it]
                                                      
{'loss': 1.0441, 'learning_rate': 0.0, 'epoch': 0.86}

 29%|██▉       | 582/2022 [1:15:48<3:05:03,  7.71s/it]
 29%|██▉       | 583/2022 [1:15:56<3:04:30,  7.69s/it]
                                                      
{'loss': 1.1517, 'learning_rate': 0.0, 'epoch': 0.86}

 29%|██▉       | 583/2022 [1:15:56<3:04:30,  7.69s/it]
 29%|██▉       | 584/2022 [1:16:04<3:07:29,  7.82s/it]
                                                      
{'loss': 1.1121, 'learning_rate': 0.0, 'epoch': 0.87}

 29%|██▉       | 584/2022 [1:16:04<3:07:29,  7.82s/it]
 29%|██▉       | 585/2022 [1:16:12<3:08:55,  7.89s/it]
                                                      
{'loss': 1.1468, 'learning_rate': 0.0, 'epoch': 0.87}

 29%|██▉       | 585/2022 [1:16:12<3:08:55,  7.89s/it]
 29%|██▉       | 586/2022 [1:16:20<3:07:26,  7.83s/it]
                                                      
{'loss': 1.1195, 'learning_rate': 0.0, 'epoch': 0.87}

 29%|██▉       | 586/2022 [1:16:20<3:07:26,  7.83s/it]
 29%|██▉       | 587/2022 [1:16:28<3:05:50,  7.77s/it]
                                                      
{'loss': 1.2491, 'learning_rate': 0.0, 'epoch': 0.87}

 29%|██▉       | 587/2022 [1:16:28<3:05:50,  7.77s/it]
 29%|██▉       | 588/2022 [1:16:35<3:03:43,  7.69s/it]
                                                      
{'loss': 1.2364, 'learning_rate': 0.0, 'epoch': 0.87}

 29%|██▉       | 588/2022 [1:16:35<3:03:43,  7.69s/it]
 29%|██▉       | 589/2022 [1:16:43<3:02:16,  7.63s/it]
                                                      
{'loss': 1.2355, 'learning_rate': 0.0, 'epoch': 0.87}

 29%|██▉       | 589/2022 [1:16:43<3:02:16,  7.63s/it]
 29%|██▉       | 590/2022 [1:16:50<3:04:18,  7.72s/it]
                                                      
{'loss': 1.3316, 'learning_rate': 0.0, 'epoch': 0.87}

 29%|██▉       | 590/2022 [1:16:50<3:04:18,  7.72s/it]
 29%|██▉       | 591/2022 [1:16:58<3:05:49,  7.79s/it]
                                                      
{'loss': 1.15, 'learning_rate': 0.0, 'epoch': 0.88}

 29%|██▉       | 591/2022 [1:16:58<3:05:49,  7.79s/it]
 29%|██▉       | 592/2022 [1:17:06<3:04:44,  7.75s/it]
                                                      
{'loss': 1.2147, 'learning_rate': 0.0, 'epoch': 0.88}

 29%|██▉       | 592/2022 [1:17:06<3:04:44,  7.75s/it]
 29%|██▉       | 593/2022 [1:17:14<3:05:47,  7.80s/it]
                                                      
{'loss': 1.2902, 'learning_rate': 0.0, 'epoch': 0.88}

 29%|██▉       | 593/2022 [1:17:14<3:05:47,  7.80s/it]
 29%|██▉       | 594/2022 [1:17:22<3:04:56,  7.77s/it]
                                                      
{'loss': 1.1974, 'learning_rate': 0.0, 'epoch': 0.88}

 29%|██▉       | 594/2022 [1:17:22<3:04:56,  7.77s/it]
 29%|██▉       | 595/2022 [1:17:30<3:06:39,  7.85s/it]
                                                      
{'loss': 1.165, 'learning_rate': 0.0, 'epoch': 0.88}

 29%|██▉       | 595/2022 [1:17:30<3:06:39,  7.85s/it]
 29%|██▉       | 596/2022 [1:17:37<3:05:33,  7.81s/it]
                                                      
{'loss': 1.0276, 'learning_rate': 0.0, 'epoch': 0.88}

 29%|██▉       | 596/2022 [1:17:37<3:05:33,  7.81s/it]
 30%|██▉       | 597/2022 [1:17:45<3:06:42,  7.86s/it]
                                                      
{'loss': 1.1181, 'learning_rate': 0.0, 'epoch': 0.89}

 30%|██▉       | 597/2022 [1:17:45<3:06:42,  7.86s/it]
 30%|██▉       | 598/2022 [1:17:54<3:08:08,  7.93s/it]
                                                      
{'loss': 1.2534, 'learning_rate': 0.0, 'epoch': 0.89}

 30%|██▉       | 598/2022 [1:17:54<3:08:08,  7.93s/it]
 30%|██▉       | 599/2022 [1:18:01<3:08:01,  7.93s/it]
                                                      
{'loss': 1.1327, 'learning_rate': 0.0, 'epoch': 0.89}

 30%|██▉       | 599/2022 [1:18:01<3:08:01,  7.93s/it]
 30%|██▉       | 600/2022 [1:18:09<3:06:10,  7.86s/it]
                                                      
{'loss': 1.2021, 'learning_rate': 0.0, 'epoch': 0.89}

 30%|██▉       | 600/2022 [1:18:09<3:06:10,  7.86s/it]
 30%|██▉       | 601/2022 [1:18:17<3:04:20,  7.78s/it]
                                                      
{'loss': 1.0381, 'learning_rate': 0.0, 'epoch': 0.89}

 30%|██▉       | 601/2022 [1:18:17<3:04:20,  7.78s/it]
 30%|██▉       | 602/2022 [1:18:24<3:03:38,  7.76s/it]
                                                      
{'loss': 1.0965, 'learning_rate': 0.0, 'epoch': 0.89}

 30%|██▉       | 602/2022 [1:18:24<3:03:38,  7.76s/it]
 30%|██▉       | 603/2022 [1:18:33<3:06:47,  7.90s/it]
                                                      
{'loss': 1.0944, 'learning_rate': 0.0, 'epoch': 0.89}

 30%|██▉       | 603/2022 [1:18:33<3:06:47,  7.90s/it]
 30%|██▉       | 604/2022 [1:18:40<3:03:22,  7.76s/it]
                                                      
{'loss': 1.1376, 'learning_rate': 0.0, 'epoch': 0.9}

 30%|██▉       | 604/2022 [1:18:40<3:03:22,  7.76s/it]
 30%|██▉       | 605/2022 [1:18:48<3:03:12,  7.76s/it]
                                                      
{'loss': 1.1496, 'learning_rate': 0.0, 'epoch': 0.9}

 30%|██▉       | 605/2022 [1:18:48<3:03:12,  7.76s/it]
 30%|██▉       | 606/2022 [1:18:56<3:06:07,  7.89s/it]
                                                      
{'loss': 1.0617, 'learning_rate': 0.0, 'epoch': 0.9}

 30%|██▉       | 606/2022 [1:18:56<3:06:07,  7.89s/it]
 30%|███       | 607/2022 [1:19:05<3:12:01,  8.14s/it]
                                                      
{'loss': 1.1484, 'learning_rate': 0.0, 'epoch': 0.9}

 30%|███       | 607/2022 [1:19:05<3:12:01,  8.14s/it]
 30%|███       | 608/2022 [1:19:12<3:08:22,  7.99s/it]
                                                      
{'loss': 1.0748, 'learning_rate': 0.0, 'epoch': 0.9}

 30%|███       | 608/2022 [1:19:12<3:08:22,  7.99s/it]
 30%|███       | 609/2022 [1:19:21<3:09:21,  8.04s/it]
                                                      
{'loss': 1.1481, 'learning_rate': 0.0, 'epoch': 0.9}

 30%|███       | 609/2022 [1:19:21<3:09:21,  8.04s/it]
 30%|███       | 610/2022 [1:19:28<3:07:18,  7.96s/it]
                                                      
{'loss': 1.2132, 'learning_rate': 0.0, 'epoch': 0.9}

 30%|███       | 610/2022 [1:19:28<3:07:18,  7.96s/it]
 30%|███       | 611/2022 [1:19:36<3:04:00,  7.82s/it]
                                                      
{'loss': 1.1735, 'learning_rate': 0.0, 'epoch': 0.91}

 30%|███       | 611/2022 [1:19:36<3:04:00,  7.82s/it]
 30%|███       | 612/2022 [1:19:44<3:04:39,  7.86s/it]
                                                      
{'loss': 1.1064, 'learning_rate': 0.0, 'epoch': 0.91}

 30%|███       | 612/2022 [1:19:44<3:04:39,  7.86s/it]
 30%|███       | 613/2022 [1:19:52<3:03:47,  7.83s/it]
                                                      
{'loss': 1.3188, 'learning_rate': 0.0, 'epoch': 0.91}

 30%|███       | 613/2022 [1:19:52<3:03:47,  7.83s/it]
 30%|███       | 614/2022 [1:20:00<3:08:35,  8.04s/it]
                                                      
{'loss': 1.1761, 'learning_rate': 0.0, 'epoch': 0.91}

 30%|███       | 614/2022 [1:20:00<3:08:35,  8.04s/it]
 30%|███       | 615/2022 [1:20:08<3:08:09,  8.02s/it]
                                                      
{'loss': 1.1377, 'learning_rate': 0.0, 'epoch': 0.91}

 30%|███       | 615/2022 [1:20:08<3:08:09,  8.02s/it]
 30%|███       | 616/2022 [1:20:16<3:03:59,  7.85s/it]
                                                      
{'loss': 1.1333, 'learning_rate': 0.0, 'epoch': 0.91}

 30%|███       | 616/2022 [1:20:16<3:03:59,  7.85s/it]
 31%|███       | 617/2022 [1:20:23<3:04:27,  7.88s/it]
                                                      
{'loss': 1.2052, 'learning_rate': 0.0, 'epoch': 0.91}

 31%|███       | 617/2022 [1:20:23<3:04:27,  7.88s/it]
 31%|███       | 618/2022 [1:20:31<3:03:13,  7.83s/it]
                                                      
{'loss': 1.0825, 'learning_rate': 0.0, 'epoch': 0.92}

 31%|███       | 618/2022 [1:20:31<3:03:13,  7.83s/it]
 31%|███       | 619/2022 [1:20:39<3:01:22,  7.76s/it]
                                                      
{'loss': 1.2697, 'learning_rate': 0.0, 'epoch': 0.92}

 31%|███       | 619/2022 [1:20:39<3:01:22,  7.76s/it]
 31%|███       | 620/2022 [1:20:47<3:01:49,  7.78s/it]
                                                      
{'loss': 1.2994, 'learning_rate': 0.0, 'epoch': 0.92}

 31%|███       | 620/2022 [1:20:47<3:01:49,  7.78s/it]
 31%|███       | 621/2022 [1:20:54<2:59:02,  7.67s/it]
                                                      
{'loss': 1.3712, 'learning_rate': 0.0, 'epoch': 0.92}

 31%|███       | 621/2022 [1:20:54<2:59:02,  7.67s/it]
 31%|███       | 622/2022 [1:21:02<3:00:03,  7.72s/it]
                                                      
{'loss': 1.1513, 'learning_rate': 0.0, 'epoch': 0.92}

 31%|███       | 622/2022 [1:21:02<3:00:03,  7.72s/it]
 31%|███       | 623/2022 [1:21:10<2:59:33,  7.70s/it]
                                                      
{'loss': 1.058, 'learning_rate': 0.0, 'epoch': 0.92}

 31%|███       | 623/2022 [1:21:10<2:59:33,  7.70s/it]
 31%|███       | 624/2022 [1:21:17<2:59:14,  7.69s/it]
                                                      
{'loss': 1.192, 'learning_rate': 0.0, 'epoch': 0.93}

 31%|███       | 624/2022 [1:21:17<2:59:14,  7.69s/it]
 31%|███       | 625/2022 [1:21:25<2:57:51,  7.64s/it]
                                                      
{'loss': 1.3463, 'learning_rate': 0.0, 'epoch': 0.93}

 31%|███       | 625/2022 [1:21:25<2:57:51,  7.64s/it]
 31%|███       | 626/2022 [1:21:33<2:59:32,  7.72s/it]
                                                      
{'loss': 1.1918, 'learning_rate': 0.0, 'epoch': 0.93}

 31%|███       | 626/2022 [1:21:33<2:59:32,  7.72s/it]
 31%|███       | 627/2022 [1:21:40<2:57:19,  7.63s/it]
                                                      
{'loss': 1.1496, 'learning_rate': 0.0, 'epoch': 0.93}

 31%|███       | 627/2022 [1:21:40<2:57:19,  7.63s/it]
 31%|███       | 628/2022 [1:21:48<2:57:34,  7.64s/it]
                                                      
{'loss': 1.1004, 'learning_rate': 0.0, 'epoch': 0.93}

 31%|███       | 628/2022 [1:21:48<2:57:34,  7.64s/it]
 31%|███       | 629/2022 [1:21:56<2:58:50,  7.70s/it]
                                                      
{'loss': 1.0073, 'learning_rate': 0.0, 'epoch': 0.93}

 31%|███       | 629/2022 [1:21:56<2:58:50,  7.70s/it]
 31%|███       | 630/2022 [1:22:03<2:58:46,  7.71s/it]
                                                      
{'loss': 1.1455, 'learning_rate': 0.0, 'epoch': 0.93}

 31%|███       | 630/2022 [1:22:03<2:58:46,  7.71s/it]
 31%|███       | 631/2022 [1:22:11<2:58:56,  7.72s/it]
                                                      
{'loss': 1.1106, 'learning_rate': 0.0, 'epoch': 0.94}

 31%|███       | 631/2022 [1:22:11<2:58:56,  7.72s/it]
 31%|███▏      | 632/2022 [1:22:19<2:58:05,  7.69s/it]
                                                      
{'loss': 1.1858, 'learning_rate': 0.0, 'epoch': 0.94}

 31%|███▏      | 632/2022 [1:22:19<2:58:05,  7.69s/it]
 31%|███▏      | 633/2022 [1:22:27<2:59:25,  7.75s/it]
                                                      
{'loss': 1.2445, 'learning_rate': 0.0, 'epoch': 0.94}

 31%|███▏      | 633/2022 [1:22:27<2:59:25,  7.75s/it]
 31%|███▏      | 634/2022 [1:22:34<2:58:56,  7.74s/it]
                                                      
{'loss': 1.1018, 'learning_rate': 0.0, 'epoch': 0.94}

 31%|███▏      | 634/2022 [1:22:34<2:58:56,  7.74s/it]
 31%|███▏      | 635/2022 [1:22:42<2:59:31,  7.77s/it]
                                                      
{'loss': 1.1208, 'learning_rate': 0.0, 'epoch': 0.94}

 31%|███▏      | 635/2022 [1:22:42<2:59:31,  7.77s/it]
 31%|███▏      | 636/2022 [1:22:50<2:59:02,  7.75s/it]
                                                      
{'loss': 1.1661, 'learning_rate': 0.0, 'epoch': 0.94}

 31%|███▏      | 636/2022 [1:22:50<2:59:02,  7.75s/it]
 32%|███▏      | 637/2022 [1:22:57<2:57:14,  7.68s/it]
                                                      
{'loss': 1.1812, 'learning_rate': 0.0, 'epoch': 0.94}

 32%|███▏      | 637/2022 [1:22:57<2:57:14,  7.68s/it]
 32%|███▏      | 638/2022 [1:23:05<2:57:12,  7.68s/it]
                                                      
{'loss': 1.2901, 'learning_rate': 0.0, 'epoch': 0.95}

 32%|███▏      | 638/2022 [1:23:05<2:57:12,  7.68s/it]
 32%|███▏      | 639/2022 [1:23:13<2:59:07,  7.77s/it]
                                                      
{'loss': 1.1876, 'learning_rate': 0.0, 'epoch': 0.95}

 32%|███▏      | 639/2022 [1:23:13<2:59:07,  7.77s/it]
 32%|███▏      | 640/2022 [1:23:20<2:57:19,  7.70s/it]
                                                      
{'loss': 1.2688, 'learning_rate': 0.0, 'epoch': 0.95}

 32%|███▏      | 640/2022 [1:23:20<2:57:19,  7.70s/it]
 32%|███▏      | 641/2022 [1:23:29<3:00:47,  7.86s/it]
                                                      
{'loss': 1.1331, 'learning_rate': 0.0, 'epoch': 0.95}

 32%|███▏      | 641/2022 [1:23:29<3:00:47,  7.86s/it]
 32%|███▏      | 642/2022 [1:23:37<3:00:48,  7.86s/it]
                                                      
{'loss': 1.177, 'learning_rate': 0.0, 'epoch': 0.95}

 32%|███▏      | 642/2022 [1:23:37<3:00:48,  7.86s/it]
 32%|███▏      | 643/2022 [1:23:44<2:59:02,  7.79s/it]
                                                      
{'loss': 1.3437, 'learning_rate': 0.0, 'epoch': 0.95}

 32%|███▏      | 643/2022 [1:23:44<2:59:02,  7.79s/it]
 32%|███▏      | 644/2022 [1:23:52<2:57:50,  7.74s/it]
                                                      
{'loss': 1.3464, 'learning_rate': 0.0, 'epoch': 0.95}

 32%|███▏      | 644/2022 [1:23:52<2:57:50,  7.74s/it]
 32%|███▏      | 645/2022 [1:24:00<2:57:26,  7.73s/it]
                                                      
{'loss': 1.1242, 'learning_rate': 0.0, 'epoch': 0.96}

 32%|███▏      | 645/2022 [1:24:00<2:57:26,  7.73s/it]
 32%|███▏      | 646/2022 [1:24:07<2:56:14,  7.68s/it]
                                                      
{'loss': 1.0496, 'learning_rate': 0.0, 'epoch': 0.96}

 32%|███▏      | 646/2022 [1:24:07<2:56:14,  7.68s/it]
 32%|███▏      | 647/2022 [1:24:15<2:55:17,  7.65s/it]
                                                      
{'loss': 1.2403, 'learning_rate': 0.0, 'epoch': 0.96}

 32%|███▏      | 647/2022 [1:24:15<2:55:17,  7.65s/it]
 32%|███▏      | 648/2022 [1:24:23<2:57:33,  7.75s/it]
                                                      
{'loss': 1.2157, 'learning_rate': 0.0, 'epoch': 0.96}

 32%|███▏      | 648/2022 [1:24:23<2:57:33,  7.75s/it]
 32%|███▏      | 649/2022 [1:24:30<2:57:22,  7.75s/it]
                                                      
{'loss': 1.0708, 'learning_rate': 0.0, 'epoch': 0.96}

 32%|███▏      | 649/2022 [1:24:30<2:57:22,  7.75s/it]
 32%|███▏      | 650/2022 [1:24:38<2:56:08,  7.70s/it]
                                                      
{'loss': 1.1051, 'learning_rate': 0.0, 'epoch': 0.96}

 32%|███▏      | 650/2022 [1:24:38<2:56:08,  7.70s/it]
 32%|███▏      | 651/2022 [1:24:46<2:57:00,  7.75s/it]
                                                      
{'loss': 1.1625, 'learning_rate': 0.0, 'epoch': 0.97}

 32%|███▏      | 651/2022 [1:24:46<2:57:00,  7.75s/it]
 32%|███▏      | 652/2022 [1:24:53<2:55:51,  7.70s/it]
                                                      
{'loss': 1.2208, 'learning_rate': 0.0, 'epoch': 0.97}

 32%|███▏      | 652/2022 [1:24:53<2:55:51,  7.70s/it]
 32%|███▏      | 653/2022 [1:25:01<2:56:28,  7.73s/it]
                                                      
{'loss': 1.1858, 'learning_rate': 0.0, 'epoch': 0.97}

 32%|███▏      | 653/2022 [1:25:01<2:56:28,  7.73s/it]
 32%|███▏      | 654/2022 [1:25:09<2:56:32,  7.74s/it]
                                                      
{'loss': 1.2827, 'learning_rate': 0.0, 'epoch': 0.97}

 32%|███▏      | 654/2022 [1:25:09<2:56:32,  7.74s/it]
 32%|███▏      | 655/2022 [1:25:17<2:55:02,  7.68s/it]
                                                      
{'loss': 1.1436, 'learning_rate': 0.0, 'epoch': 0.97}

 32%|███▏      | 655/2022 [1:25:17<2:55:02,  7.68s/it]
 32%|███▏      | 656/2022 [1:25:24<2:55:32,  7.71s/it]
                                                      
{'loss': 1.0126, 'learning_rate': 0.0, 'epoch': 0.97}

 32%|███▏      | 656/2022 [1:25:24<2:55:32,  7.71s/it]
 32%|███▏      | 657/2022 [1:25:32<2:57:57,  7.82s/it]
                                                      
{'loss': 1.1778, 'learning_rate': 0.0, 'epoch': 0.97}

 32%|███▏      | 657/2022 [1:25:32<2:57:57,  7.82s/it]
 33%|███▎      | 658/2022 [1:25:40<2:59:16,  7.89s/it]
                                                      
{'loss': 1.1816, 'learning_rate': 0.0, 'epoch': 0.98}

 33%|███▎      | 658/2022 [1:25:40<2:59:16,  7.89s/it]
 33%|███▎      | 659/2022 [1:25:48<2:58:35,  7.86s/it]
                                                      
{'loss': 1.2475, 'learning_rate': 0.0, 'epoch': 0.98}

 33%|███▎      | 659/2022 [1:25:48<2:58:35,  7.86s/it]
 33%|███▎      | 660/2022 [1:25:56<2:58:37,  7.87s/it]
                                                      
{'loss': 1.1541, 'learning_rate': 0.0, 'epoch': 0.98}

 33%|███▎      | 660/2022 [1:25:56<2:58:37,  7.87s/it]
 33%|███▎      | 661/2022 [1:26:04<2:58:18,  7.86s/it]
                                                      
{'loss': 1.2389, 'learning_rate': 0.0, 'epoch': 0.98}

 33%|███▎      | 661/2022 [1:26:04<2:58:18,  7.86s/it]
 33%|███▎      | 662/2022 [1:26:12<2:56:06,  7.77s/it]
                                                      
{'loss': 1.2854, 'learning_rate': 0.0, 'epoch': 0.98}

 33%|███▎      | 662/2022 [1:26:12<2:56:06,  7.77s/it]
 33%|███▎      | 663/2022 [1:26:20<2:59:39,  7.93s/it]
                                                      
{'loss': 1.3063, 'learning_rate': 0.0, 'epoch': 0.98}

 33%|███▎      | 663/2022 [1:26:20<2:59:39,  7.93s/it]
 33%|███▎      | 664/2022 [1:26:28<2:58:37,  7.89s/it]
                                                      
{'loss': 1.1033, 'learning_rate': 0.0, 'epoch': 0.98}

 33%|███▎      | 664/2022 [1:26:28<2:58:37,  7.89s/it]
 33%|███▎      | 665/2022 [1:26:36<2:58:35,  7.90s/it]
                                                      
{'loss': 1.1436, 'learning_rate': 0.0, 'epoch': 0.99}

 33%|███▎      | 665/2022 [1:26:36<2:58:35,  7.90s/it]
 33%|███▎      | 666/2022 [1:26:44<3:00:09,  7.97s/it]
                                                      
{'loss': 1.074, 'learning_rate': 0.0, 'epoch': 0.99}

 33%|███▎      | 666/2022 [1:26:44<3:00:09,  7.97s/it]
 33%|███▎      | 667/2022 [1:26:52<2:59:41,  7.96s/it]
                                                      
{'loss': 1.2103, 'learning_rate': 0.0, 'epoch': 0.99}

 33%|███▎      | 667/2022 [1:26:52<2:59:41,  7.96s/it]
 33%|███▎      | 668/2022 [1:26:59<2:57:51,  7.88s/it]
                                                      
{'loss': 1.1785, 'learning_rate': 0.0, 'epoch': 0.99}

 33%|███▎      | 668/2022 [1:26:59<2:57:51,  7.88s/it]
 33%|███▎      | 669/2022 [1:27:07<2:59:08,  7.94s/it]
                                                      
{'loss': 1.183, 'learning_rate': 0.0, 'epoch': 0.99}

 33%|███▎      | 669/2022 [1:27:07<2:59:08,  7.94s/it]
 33%|███▎      | 670/2022 [1:27:15<2:56:44,  7.84s/it]
                                                      
{'loss': 1.206, 'learning_rate': 0.0, 'epoch': 0.99}

 33%|███▎      | 670/2022 [1:27:15<2:56:44,  7.84s/it]
 33%|███▎      | 671/2022 [1:27:23<2:57:13,  7.87s/it]
                                                      
{'loss': 1.2886, 'learning_rate': 0.0, 'epoch': 0.99}

 33%|███▎      | 671/2022 [1:27:23<2:57:13,  7.87s/it]
 33%|███▎      | 672/2022 [1:27:31<2:57:05,  7.87s/it]
                                                      
{'loss': 1.3037, 'learning_rate': 0.0, 'epoch': 1.0}

 33%|███▎      | 672/2022 [1:27:31<2:57:05,  7.87s/it]
 33%|███▎      | 673/2022 [1:27:38<2:54:14,  7.75s/it]
                                                      
{'loss': 1.2792, 'learning_rate': 0.0, 'epoch': 1.0}

 33%|███▎      | 673/2022 [1:27:38<2:54:14,  7.75s/it]
 33%|███▎      | 674/2022 [1:27:46<2:56:49,  7.87s/it]
                                                      
{'loss': 1.0601, 'learning_rate': 0.0, 'epoch': 1.0}

 33%|███▎      | 674/2022 [1:27:46<2:56:49,  7.87s/it]
 33%|███▎      | 675/2022 [1:27:54<2:55:00,  7.80s/it]
                                                      
{'loss': 1.2429, 'learning_rate': 0.0, 'epoch': 1.0}

 33%|███▎      | 675/2022 [1:27:54<2:55:00,  7.80s/it]
 33%|███▎      | 676/2022 [1:28:02<2:55:06,  7.81s/it]
                                                      
{'loss': 0.9624, 'learning_rate': 0.0, 'epoch': 1.0}

 33%|███▎      | 676/2022 [1:28:02<2:55:06,  7.81s/it]
 33%|███▎      | 677/2022 [1:28:10<2:54:28,  7.78s/it]
                                                      
{'loss': 1.0877, 'learning_rate': 0.0, 'epoch': 1.0}

 33%|███▎      | 677/2022 [1:28:10<2:54:28,  7.78s/it]
 34%|███▎      | 678/2022 [1:28:18<2:55:23,  7.83s/it]
                                                      
{'loss': 0.9877, 'learning_rate': 0.0, 'epoch': 1.01}

 34%|███▎      | 678/2022 [1:28:18<2:55:23,  7.83s/it]
 34%|███▎      | 679/2022 [1:28:25<2:54:28,  7.80s/it]
                                                      
{'loss': 1.2937, 'learning_rate': 0.0, 'epoch': 1.01}

 34%|███▎      | 679/2022 [1:28:25<2:54:28,  7.80s/it]
 34%|███▎      | 680/2022 [1:28:33<2:53:10,  7.74s/it]
                                                      
{'loss': 1.0564, 'learning_rate': 0.0, 'epoch': 1.01}

 34%|███▎      | 680/2022 [1:28:33<2:53:10,  7.74s/it]
 34%|███▎      | 681/2022 [1:28:41<2:53:12,  7.75s/it]
                                                      
{'loss': 1.1316, 'learning_rate': 0.0, 'epoch': 1.01}

 34%|███▎      | 681/2022 [1:28:41<2:53:12,  7.75s/it]
 34%|███▎      | 682/2022 [1:28:48<2:53:15,  7.76s/it]
                                                      
{'loss': 1.0475, 'learning_rate': 0.0, 'epoch': 1.01}

 34%|███▎      | 682/2022 [1:28:48<2:53:15,  7.76s/it]
 34%|███▍      | 683/2022 [1:28:56<2:54:41,  7.83s/it]
                                                      
{'loss': 1.1962, 'learning_rate': 0.0, 'epoch': 1.01}

 34%|███▍      | 683/2022 [1:28:56<2:54:41,  7.83s/it]
 34%|███▍      | 684/2022 [1:29:04<2:53:55,  7.80s/it]
                                                      
{'loss': 1.1735, 'learning_rate': 0.0, 'epoch': 1.01}

 34%|███▍      | 684/2022 [1:29:04<2:53:55,  7.80s/it]
 34%|███▍      | 685/2022 [1:29:12<2:54:39,  7.84s/it]
                                                      
{'loss': 1.0943, 'learning_rate': 0.0, 'epoch': 1.02}

 34%|███▍      | 685/2022 [1:29:12<2:54:39,  7.84s/it]
 34%|███▍      | 686/2022 [1:29:20<2:53:11,  7.78s/it]
                                                      
{'loss': 1.2723, 'learning_rate': 0.0, 'epoch': 1.02}

 34%|███▍      | 686/2022 [1:29:20<2:53:11,  7.78s/it]
 34%|███▍      | 687/2022 [1:29:28<2:53:46,  7.81s/it]
                                                      
{'loss': 1.2124, 'learning_rate': 0.0, 'epoch': 1.02}

 34%|███▍      | 687/2022 [1:29:28<2:53:46,  7.81s/it]
 34%|███▍      | 688/2022 [1:29:35<2:53:44,  7.81s/it]
                                                      
{'loss': 1.1373, 'learning_rate': 0.0, 'epoch': 1.02}

 34%|███▍      | 688/2022 [1:29:35<2:53:44,  7.81s/it]
 34%|███▍      | 689/2022 [1:29:43<2:53:18,  7.80s/it]
                                                      
{'loss': 1.2807, 'learning_rate': 0.0, 'epoch': 1.02}

 34%|███▍      | 689/2022 [1:29:43<2:53:18,  7.80s/it]
 34%|███▍      | 690/2022 [1:29:51<2:52:03,  7.75s/it]
                                                      
{'loss': 1.1418, 'learning_rate': 0.0, 'epoch': 1.02}

 34%|███▍      | 690/2022 [1:29:51<2:52:03,  7.75s/it]
 34%|███▍      | 691/2022 [1:29:59<2:53:49,  7.84s/it]
                                                      
{'loss': 1.2185, 'learning_rate': 0.0, 'epoch': 1.02}

 34%|███▍      | 691/2022 [1:29:59<2:53:49,  7.84s/it]
 34%|███▍      | 692/2022 [1:30:07<2:52:51,  7.80s/it]
                                                      
{'loss': 1.0986, 'learning_rate': 0.0, 'epoch': 1.03}

 34%|███▍      | 692/2022 [1:30:07<2:52:51,  7.80s/it]
 34%|███▍      | 693/2022 [1:30:14<2:52:28,  7.79s/it]
                                                      
{'loss': 1.0861, 'learning_rate': 0.0, 'epoch': 1.03}

 34%|███▍      | 693/2022 [1:30:14<2:52:28,  7.79s/it]
 34%|███▍      | 694/2022 [1:30:22<2:51:14,  7.74s/it]
                                                      
{'loss': 1.1743, 'learning_rate': 0.0, 'epoch': 1.03}

 34%|███▍      | 694/2022 [1:30:22<2:51:14,  7.74s/it]
 34%|███▍      | 695/2022 [1:30:30<2:52:50,  7.81s/it]
                                                      
{'loss': 1.1931, 'learning_rate': 0.0, 'epoch': 1.03}

 34%|███▍      | 695/2022 [1:30:30<2:52:50,  7.81s/it]
 34%|███▍      | 696/2022 [1:30:38<2:52:24,  7.80s/it]
                                                      
{'loss': 1.0985, 'learning_rate': 0.0, 'epoch': 1.03}

 34%|███▍      | 696/2022 [1:30:38<2:52:24,  7.80s/it]
 34%|███▍      | 697/2022 [1:30:46<2:52:24,  7.81s/it]
                                                      
{'loss': 1.0804, 'learning_rate': 0.0, 'epoch': 1.03}

 34%|███▍      | 697/2022 [1:30:46<2:52:24,  7.81s/it]
 35%|███▍      | 698/2022 [1:30:53<2:51:51,  7.79s/it]
                                                      
{'loss': 1.1281, 'learning_rate': 0.0, 'epoch': 1.03}

 35%|███▍      | 698/2022 [1:30:53<2:51:51,  7.79s/it]
 35%|███▍      | 699/2022 [1:31:02<2:54:29,  7.91s/it]
                                                      
{'loss': 1.1509, 'learning_rate': 0.0, 'epoch': 1.04}

 35%|███▍      | 699/2022 [1:31:02<2:54:29,  7.91s/it]
 35%|███▍      | 700/2022 [1:31:09<2:53:49,  7.89s/it]
                                                      
{'loss': 1.0497, 'learning_rate': 0.0, 'epoch': 1.04}

 35%|███▍      | 700/2022 [1:31:09<2:53:49,  7.89s/it]
 35%|███▍      | 701/2022 [1:31:17<2:52:51,  7.85s/it]
                                                      
{'loss': 1.2737, 'learning_rate': 0.0, 'epoch': 1.04}

 35%|███▍      | 701/2022 [1:31:17<2:52:51,  7.85s/it]
 35%|███▍      | 702/2022 [1:31:25<2:52:12,  7.83s/it]
                                                      
{'loss': 1.2903, 'learning_rate': 0.0, 'epoch': 1.04}

 35%|███▍      | 702/2022 [1:31:25<2:52:12,  7.83s/it]
 35%|███▍      | 703/2022 [1:31:33<2:52:23,  7.84s/it]
                                                      
{'loss': 1.1439, 'learning_rate': 0.0, 'epoch': 1.04}

 35%|███▍      | 703/2022 [1:31:33<2:52:23,  7.84s/it]
 35%|███▍      | 704/2022 [1:31:40<2:50:39,  7.77s/it]
                                                      
{'loss': 1.1683, 'learning_rate': 0.0, 'epoch': 1.04}

 35%|███▍      | 704/2022 [1:31:40<2:50:39,  7.77s/it]
 35%|███▍      | 705/2022 [1:31:48<2:49:13,  7.71s/it]
                                                      
{'loss': 1.2123, 'learning_rate': 0.0, 'epoch': 1.05}

 35%|███▍      | 705/2022 [1:31:48<2:49:13,  7.71s/it]
 35%|███▍      | 706/2022 [1:31:56<2:49:02,  7.71s/it]
                                                      
{'loss': 1.2294, 'learning_rate': 0.0, 'epoch': 1.05}

 35%|███▍      | 706/2022 [1:31:56<2:49:02,  7.71s/it]
 35%|███▍      | 707/2022 [1:32:04<2:50:42,  7.79s/it]
                                                      
{'loss': 1.0827, 'learning_rate': 0.0, 'epoch': 1.05}

 35%|███▍      | 707/2022 [1:32:04<2:50:42,  7.79s/it]
 35%|███▌      | 708/2022 [1:32:12<2:56:01,  8.04s/it]
                                                      
{'loss': 1.2223, 'learning_rate': 0.0, 'epoch': 1.05}

 35%|███▌      | 708/2022 [1:32:12<2:56:01,  8.04s/it]
 35%|███▌      | 709/2022 [1:32:20<2:53:27,  7.93s/it]
                                                      
{'loss': 1.2007, 'learning_rate': 0.0, 'epoch': 1.05}

 35%|███▌      | 709/2022 [1:32:20<2:53:27,  7.93s/it]
 35%|███▌      | 710/2022 [1:32:28<2:54:19,  7.97s/it]
                                                      
{'loss': 1.2119, 'learning_rate': 0.0, 'epoch': 1.05}

 35%|███▌      | 710/2022 [1:32:28<2:54:19,  7.97s/it]
 35%|███▌      | 711/2022 [1:32:36<2:53:57,  7.96s/it]
                                                      
{'loss': 1.0651, 'learning_rate': 0.0, 'epoch': 1.05}

 35%|███▌      | 711/2022 [1:32:36<2:53:57,  7.96s/it]
 35%|███▌      | 712/2022 [1:32:44<2:54:10,  7.98s/it]
                                                      
{'loss': 1.2886, 'learning_rate': 0.0, 'epoch': 1.06}

 35%|███▌      | 712/2022 [1:32:44<2:54:10,  7.98s/it]
 35%|███▌      | 713/2022 [1:32:52<2:53:42,  7.96s/it]
                                                      
{'loss': 1.1769, 'learning_rate': 0.0, 'epoch': 1.06}

 35%|███▌      | 713/2022 [1:32:52<2:53:42,  7.96s/it]
 35%|███▌      | 714/2022 [1:33:00<2:52:05,  7.89s/it]
                                                      
{'loss': 1.1595, 'learning_rate': 0.0, 'epoch': 1.06}

 35%|███▌      | 714/2022 [1:33:00<2:52:05,  7.89s/it]
 35%|███▌      | 715/2022 [1:33:07<2:50:34,  7.83s/it]
                                                      
{'loss': 1.2011, 'learning_rate': 0.0, 'epoch': 1.06}

 35%|███▌      | 715/2022 [1:33:07<2:50:34,  7.83s/it]
 35%|███▌      | 716/2022 [1:33:15<2:49:24,  7.78s/it]
                                                      
{'loss': 1.0911, 'learning_rate': 0.0, 'epoch': 1.06}

 35%|███▌      | 716/2022 [1:33:15<2:49:24,  7.78s/it]
 35%|███▌      | 717/2022 [1:33:23<2:50:23,  7.83s/it]
                                                      
{'loss': 1.1003, 'learning_rate': 0.0, 'epoch': 1.06}

 35%|███▌      | 717/2022 [1:33:23<2:50:23,  7.83s/it]
 36%|███▌      | 718/2022 [1:33:31<2:48:46,  7.77s/it]
                                                      
{'loss': 1.2402, 'learning_rate': 0.0, 'epoch': 1.06}

 36%|███▌      | 718/2022 [1:33:31<2:48:46,  7.77s/it]
 36%|███▌      | 719/2022 [1:33:38<2:48:48,  7.77s/it]
                                                      
{'loss': 1.0337, 'learning_rate': 0.0, 'epoch': 1.07}

 36%|███▌      | 719/2022 [1:33:38<2:48:48,  7.77s/it]
 36%|███▌      | 720/2022 [1:33:46<2:47:35,  7.72s/it]
                                                      
{'loss': 1.0317, 'learning_rate': 0.0, 'epoch': 1.07}

 36%|███▌      | 720/2022 [1:33:46<2:47:35,  7.72s/it]
 36%|███▌      | 721/2022 [1:33:53<2:46:30,  7.68s/it]
                                                      
{'loss': 1.2095, 'learning_rate': 0.0, 'epoch': 1.07}

 36%|███▌      | 721/2022 [1:33:54<2:46:30,  7.68s/it]
 36%|███▌      | 722/2022 [1:34:01<2:46:07,  7.67s/it]
                                                      
{'loss': 1.1217, 'learning_rate': 0.0, 'epoch': 1.07}

 36%|███▌      | 722/2022 [1:34:01<2:46:07,  7.67s/it]
 36%|███▌      | 723/2022 [1:34:09<2:46:35,  7.69s/it]
                                                      
{'loss': 1.1591, 'learning_rate': 0.0, 'epoch': 1.07}

 36%|███▌      | 723/2022 [1:34:09<2:46:35,  7.69s/it]
 36%|███▌      | 724/2022 [1:34:17<2:47:09,  7.73s/it]
                                                      
{'loss': 1.1262, 'learning_rate': 0.0, 'epoch': 1.07}

 36%|███▌      | 724/2022 [1:34:17<2:47:09,  7.73s/it]
 36%|███▌      | 725/2022 [1:34:25<2:48:29,  7.79s/it]
                                                      
{'loss': 1.4058, 'learning_rate': 0.0, 'epoch': 1.07}

 36%|███▌      | 725/2022 [1:34:25<2:48:29,  7.79s/it]
 36%|███▌      | 726/2022 [1:34:33<2:52:43,  8.00s/it]
                                                      
{'loss': 1.1669, 'learning_rate': 0.0, 'epoch': 1.08}

 36%|███▌      | 726/2022 [1:34:33<2:52:43,  8.00s/it]
 36%|███▌      | 727/2022 [1:34:41<2:52:41,  8.00s/it]
                                                      
{'loss': 1.0641, 'learning_rate': 0.0, 'epoch': 1.08}

 36%|███▌      | 727/2022 [1:34:41<2:52:41,  8.00s/it]
 36%|███▌      | 728/2022 [1:34:49<2:50:36,  7.91s/it]
                                                      
{'loss': 1.1359, 'learning_rate': 0.0, 'epoch': 1.08}

 36%|███▌      | 728/2022 [1:34:49<2:50:36,  7.91s/it]
 36%|███▌      | 729/2022 [1:34:57<2:50:28,  7.91s/it]
                                                      
{'loss': 1.1418, 'learning_rate': 0.0, 'epoch': 1.08}

 36%|███▌      | 729/2022 [1:34:57<2:50:28,  7.91s/it]
 36%|███▌      | 730/2022 [1:35:05<2:52:34,  8.01s/it]
                                                      
{'loss': 1.0835, 'learning_rate': 0.0, 'epoch': 1.08}

 36%|███▌      | 730/2022 [1:35:05<2:52:34,  8.01s/it]
 36%|███▌      | 731/2022 [1:35:13<2:51:10,  7.96s/it]
                                                      
{'loss': 1.1256, 'learning_rate': 0.0, 'epoch': 1.08}

 36%|███▌      | 731/2022 [1:35:13<2:51:10,  7.96s/it]
 36%|███▌      | 732/2022 [1:35:21<2:49:37,  7.89s/it]
                                                      
{'loss': 1.1256, 'learning_rate': 0.0, 'epoch': 1.09}

 36%|███▌      | 732/2022 [1:35:21<2:49:37,  7.89s/it]
 36%|███▋      | 733/2022 [1:35:29<2:50:09,  7.92s/it]
                                                      
{'loss': 1.3186, 'learning_rate': 0.0, 'epoch': 1.09}

 36%|███▋      | 733/2022 [1:35:29<2:50:09,  7.92s/it]
 36%|███▋      | 734/2022 [1:35:36<2:48:28,  7.85s/it]
                                                      
{'loss': 1.0702, 'learning_rate': 0.0, 'epoch': 1.09}

 36%|███▋      | 734/2022 [1:35:36<2:48:28,  7.85s/it]
 36%|███▋      | 735/2022 [1:35:44<2:48:56,  7.88s/it]
                                                      
{'loss': 1.0095, 'learning_rate': 0.0, 'epoch': 1.09}

 36%|███▋      | 735/2022 [1:35:44<2:48:56,  7.88s/it]
 36%|███▋      | 736/2022 [1:35:52<2:47:05,  7.80s/it]
                                                      
{'loss': 1.0403, 'learning_rate': 0.0, 'epoch': 1.09}

 36%|███▋      | 736/2022 [1:35:52<2:47:05,  7.80s/it]
 36%|███▋      | 737/2022 [1:35:59<2:46:14,  7.76s/it]
                                                      
{'loss': 1.0801, 'learning_rate': 0.0, 'epoch': 1.09}

 36%|███▋      | 737/2022 [1:35:59<2:46:14,  7.76s/it]
 36%|███▋      | 738/2022 [1:36:07<2:47:20,  7.82s/it]
                                                      
{'loss': 1.0674, 'learning_rate': 0.0, 'epoch': 1.09}

 36%|███▋      | 738/2022 [1:36:07<2:47:20,  7.82s/it]
 37%|███▋      | 739/2022 [1:36:15<2:46:42,  7.80s/it]
                                                      
{'loss': 1.184, 'learning_rate': 0.0, 'epoch': 1.1}

 37%|███▋      | 739/2022 [1:36:15<2:46:42,  7.80s/it]
 37%|███▋      | 740/2022 [1:36:23<2:47:02,  7.82s/it]
                                                      
{'loss': 1.3201, 'learning_rate': 0.0, 'epoch': 1.1}

 37%|███▋      | 740/2022 [1:36:23<2:47:02,  7.82s/it]
 37%|███▋      | 741/2022 [1:36:31<2:45:40,  7.76s/it]
                                                      
{'loss': 1.0518, 'learning_rate': 0.0, 'epoch': 1.1}

 37%|███▋      | 741/2022 [1:36:31<2:45:40,  7.76s/it]
 37%|███▋      | 742/2022 [1:36:38<2:43:18,  7.66s/it]
                                                      
{'loss': 1.3261, 'learning_rate': 0.0, 'epoch': 1.1}

 37%|███▋      | 742/2022 [1:36:38<2:43:18,  7.66s/it]
 37%|███▋      | 743/2022 [1:36:46<2:43:23,  7.67s/it]
                                                      
{'loss': 1.2935, 'learning_rate': 0.0, 'epoch': 1.1}

 37%|███▋      | 743/2022 [1:36:46<2:43:23,  7.67s/it]
 37%|███▋      | 744/2022 [1:36:54<2:46:04,  7.80s/it]
                                                      
{'loss': 1.2733, 'learning_rate': 0.0, 'epoch': 1.1}

 37%|███▋      | 744/2022 [1:36:54<2:46:04,  7.80s/it]
 37%|███▋      | 745/2022 [1:37:02<2:45:26,  7.77s/it]
                                                      
{'loss': 1.1116, 'learning_rate': 0.0, 'epoch': 1.1}

 37%|███▋      | 745/2022 [1:37:02<2:45:26,  7.77s/it]
 37%|███▋      | 746/2022 [1:37:09<2:45:33,  7.78s/it]
                                                      
{'loss': 1.0337, 'learning_rate': 0.0, 'epoch': 1.11}

 37%|███▋      | 746/2022 [1:37:09<2:45:33,  7.78s/it]
 37%|███▋      | 747/2022 [1:37:17<2:46:26,  7.83s/it]
                                                      
{'loss': 1.1648, 'learning_rate': 0.0, 'epoch': 1.11}

 37%|███▋      | 747/2022 [1:37:17<2:46:26,  7.83s/it]
 37%|███▋      | 748/2022 [1:37:25<2:44:47,  7.76s/it]
                                                      
{'loss': 1.165, 'learning_rate': 0.0, 'epoch': 1.11}

 37%|███▋      | 748/2022 [1:37:25<2:44:47,  7.76s/it]
 37%|███▋      | 749/2022 [1:37:33<2:44:14,  7.74s/it]
                                                      
{'loss': 1.3243, 'learning_rate': 0.0, 'epoch': 1.11}

 37%|███▋      | 749/2022 [1:37:33<2:44:14,  7.74s/it]
 37%|███▋      | 750/2022 [1:37:40<2:43:49,  7.73s/it]
                                                      
{'loss': 1.1046, 'learning_rate': 0.0, 'epoch': 1.11}

 37%|███▋      | 750/2022 [1:37:40<2:43:49,  7.73s/it]
 37%|███▋      | 751/2022 [1:37:48<2:44:49,  7.78s/it]
                                                      
{'loss': 1.1443, 'learning_rate': 0.0, 'epoch': 1.11}

 37%|███▋      | 751/2022 [1:37:48<2:44:49,  7.78s/it]
 37%|███▋      | 752/2022 [1:37:56<2:43:34,  7.73s/it]
                                                      
{'loss': 1.1071, 'learning_rate': 0.0, 'epoch': 1.11}

 37%|███▋      | 752/2022 [1:37:56<2:43:34,  7.73s/it]
 37%|███▋      | 753/2022 [1:38:04<2:45:31,  7.83s/it]
                                                      
{'loss': 1.1664, 'learning_rate': 0.0, 'epoch': 1.12}

 37%|███▋      | 753/2022 [1:38:04<2:45:31,  7.83s/it]
 37%|███▋      | 754/2022 [1:38:12<2:45:10,  7.82s/it]
                                                      
{'loss': 1.208, 'learning_rate': 0.0, 'epoch': 1.12}

 37%|███▋      | 754/2022 [1:38:12<2:45:10,  7.82s/it]
 37%|███▋      | 755/2022 [1:38:19<2:42:58,  7.72s/it]
                                                      
{'loss': 1.1541, 'learning_rate': 0.0, 'epoch': 1.12}

 37%|███▋      | 755/2022 [1:38:19<2:42:58,  7.72s/it]
 37%|███▋      | 756/2022 [1:38:27<2:41:32,  7.66s/it]
                                                      
{'loss': 1.1127, 'learning_rate': 0.0, 'epoch': 1.12}

 37%|███▋      | 756/2022 [1:38:27<2:41:32,  7.66s/it]
 37%|███▋      | 757/2022 [1:38:35<2:43:26,  7.75s/it]
                                                      
{'loss': 1.0815, 'learning_rate': 0.0, 'epoch': 1.12}

 37%|███▋      | 757/2022 [1:38:35<2:43:26,  7.75s/it]
 37%|███▋      | 758/2022 [1:38:42<2:42:21,  7.71s/it]
                                                      
{'loss': 1.2104, 'learning_rate': 0.0, 'epoch': 1.12}

 37%|███▋      | 758/2022 [1:38:42<2:42:21,  7.71s/it]
 38%|███▊      | 759/2022 [1:38:50<2:43:13,  7.75s/it]
                                                      
{'loss': 1.1869, 'learning_rate': 0.0, 'epoch': 1.13}

 38%|███▊      | 759/2022 [1:38:50<2:43:13,  7.75s/it]
 38%|███▊      | 760/2022 [1:38:58<2:42:55,  7.75s/it]
                                                      
{'loss': 1.1344, 'learning_rate': 0.0, 'epoch': 1.13}

 38%|███▊      | 760/2022 [1:38:58<2:42:55,  7.75s/it]
 38%|███▊      | 761/2022 [1:39:05<2:41:23,  7.68s/it]
                                                      
{'loss': 1.3714, 'learning_rate': 0.0, 'epoch': 1.13}

 38%|███▊      | 761/2022 [1:39:05<2:41:23,  7.68s/it]
 38%|███▊      | 762/2022 [1:39:13<2:42:19,  7.73s/it]
                                                      
{'loss': 1.1369, 'learning_rate': 0.0, 'epoch': 1.13}

 38%|███▊      | 762/2022 [1:39:13<2:42:19,  7.73s/it]
 38%|███▊      | 763/2022 [1:39:20<2:39:27,  7.60s/it]
                                                      
{'loss': 1.1676, 'learning_rate': 0.0, 'epoch': 1.13}

 38%|███▊      | 763/2022 [1:39:21<2:39:27,  7.60s/it]
 38%|███▊      | 764/2022 [1:39:28<2:39:52,  7.63s/it]
                                                      
{'loss': 1.1317, 'learning_rate': 0.0, 'epoch': 1.13}

 38%|███▊      | 764/2022 [1:39:28<2:39:52,  7.63s/it]
 38%|███▊      | 765/2022 [1:39:36<2:41:10,  7.69s/it]
                                                      
{'loss': 1.1324, 'learning_rate': 0.0, 'epoch': 1.13}

 38%|███▊      | 765/2022 [1:39:36<2:41:10,  7.69s/it]
 38%|███▊      | 766/2022 [1:39:44<2:42:19,  7.75s/it]
                                                      
{'loss': 1.1158, 'learning_rate': 0.0, 'epoch': 1.14}

 38%|███▊      | 766/2022 [1:39:44<2:42:19,  7.75s/it]
 38%|███▊      | 767/2022 [1:39:51<2:40:31,  7.67s/it]
                                                      
{'loss': 1.0906, 'learning_rate': 0.0, 'epoch': 1.14}

 38%|███▊      | 767/2022 [1:39:51<2:40:31,  7.67s/it]
 38%|███▊      | 768/2022 [1:40:00<2:43:13,  7.81s/it]
                                                      
{'loss': 1.054, 'learning_rate': 0.0, 'epoch': 1.14}

 38%|███▊      | 768/2022 [1:40:00<2:43:13,  7.81s/it]
 38%|███▊      | 769/2022 [1:40:07<2:43:17,  7.82s/it]
                                                      
{'loss': 0.9911, 'learning_rate': 0.0, 'epoch': 1.14}

 38%|███▊      | 769/2022 [1:40:07<2:43:17,  7.82s/it]
 38%|███▊      | 770/2022 [1:40:15<2:41:10,  7.72s/it]
                                                      
{'loss': 1.3684, 'learning_rate': 0.0, 'epoch': 1.14}

 38%|███▊      | 770/2022 [1:40:15<2:41:10,  7.72s/it]
 38%|███▊      | 771/2022 [1:40:23<2:43:31,  7.84s/it]
                                                      
{'loss': 1.135, 'learning_rate': 0.0, 'epoch': 1.14}

 38%|███▊      | 771/2022 [1:40:23<2:43:31,  7.84s/it]
 38%|███▊      | 772/2022 [1:40:30<2:40:32,  7.71s/it]
                                                      
{'loss': 1.2425, 'learning_rate': 0.0, 'epoch': 1.14}

 38%|███▊      | 772/2022 [1:40:30<2:40:32,  7.71s/it]
 38%|███▊      | 773/2022 [1:40:38<2:41:59,  7.78s/it]
                                                      
{'loss': 1.1879, 'learning_rate': 0.0, 'epoch': 1.15}

 38%|███▊      | 773/2022 [1:40:38<2:41:59,  7.78s/it]
 38%|███▊      | 774/2022 [1:40:46<2:41:13,  7.75s/it]
                                                      
{'loss': 1.1181, 'learning_rate': 0.0, 'epoch': 1.15}

 38%|███▊      | 774/2022 [1:40:46<2:41:13,  7.75s/it]
 38%|███▊      | 775/2022 [1:40:54<2:42:55,  7.84s/it]
                                                      
{'loss': 1.0198, 'learning_rate': 0.0, 'epoch': 1.15}

 38%|███▊      | 775/2022 [1:40:54<2:42:55,  7.84s/it]
 38%|███▊      | 776/2022 [1:41:02<2:43:27,  7.87s/it]
                                                      
{'loss': 1.2111, 'learning_rate': 0.0, 'epoch': 1.15}

 38%|███▊      | 776/2022 [1:41:02<2:43:27,  7.87s/it]
 38%|███▊      | 777/2022 [1:41:10<2:43:47,  7.89s/it]
                                                      
{'loss': 1.2031, 'learning_rate': 0.0, 'epoch': 1.15}

 38%|███▊      | 777/2022 [1:41:10<2:43:47,  7.89s/it]
 38%|███▊      | 778/2022 [1:41:18<2:42:33,  7.84s/it]
                                                      
{'loss': 1.1984, 'learning_rate': 0.0, 'epoch': 1.15}

 38%|███▊      | 778/2022 [1:41:18<2:42:33,  7.84s/it]
 39%|███▊      | 779/2022 [1:41:26<2:43:27,  7.89s/it]
                                                      
{'loss': 1.1753, 'learning_rate': 0.0, 'epoch': 1.15}

 39%|███▊      | 779/2022 [1:41:26<2:43:27,  7.89s/it]
 39%|███▊      | 780/2022 [1:41:34<2:44:05,  7.93s/it]
                                                      
{'loss': 1.2491, 'learning_rate': 0.0, 'epoch': 1.16}

 39%|███▊      | 780/2022 [1:41:34<2:44:05,  7.93s/it]
 39%|███▊      | 781/2022 [1:41:42<2:43:13,  7.89s/it]
                                                      
{'loss': 1.3025, 'learning_rate': 0.0, 'epoch': 1.16}

 39%|███▊      | 781/2022 [1:41:42<2:43:13,  7.89s/it]
 39%|███▊      | 782/2022 [1:41:49<2:40:20,  7.76s/it]
                                                      
{'loss': 1.2226, 'learning_rate': 0.0, 'epoch': 1.16}

 39%|███▊      | 782/2022 [1:41:49<2:40:20,  7.76s/it]
 39%|███▊      | 783/2022 [1:41:56<2:38:38,  7.68s/it]
                                                      
{'loss': 0.9977, 'learning_rate': 0.0, 'epoch': 1.16}

 39%|███▊      | 783/2022 [1:41:56<2:38:38,  7.68s/it]
 39%|███▉      | 784/2022 [1:42:04<2:40:39,  7.79s/it]
                                                      
{'loss': 1.3875, 'learning_rate': 0.0, 'epoch': 1.16}

 39%|███▉      | 784/2022 [1:42:05<2:40:39,  7.79s/it]
 39%|███▉      | 785/2022 [1:42:13<2:42:30,  7.88s/it]
                                                      
{'loss': 1.2206, 'learning_rate': 0.0, 'epoch': 1.16}

 39%|███▉      | 785/2022 [1:42:13<2:42:30,  7.88s/it]
 39%|███▉      | 786/2022 [1:42:20<2:38:04,  7.67s/it]
                                                      
{'loss': 1.1213, 'learning_rate': 0.0, 'epoch': 1.17}

 39%|███▉      | 786/2022 [1:42:20<2:38:04,  7.67s/it]
 39%|███▉      | 787/2022 [1:42:28<2:38:33,  7.70s/it]
                                                      
{'loss': 1.303, 'learning_rate': 0.0, 'epoch': 1.17}

 39%|███▉      | 787/2022 [1:42:28<2:38:33,  7.70s/it]
 39%|███▉      | 788/2022 [1:42:36<2:41:17,  7.84s/it]
                                                      
{'loss': 1.1019, 'learning_rate': 0.0, 'epoch': 1.17}

 39%|███▉      | 788/2022 [1:42:36<2:41:17,  7.84s/it]
 39%|███▉      | 789/2022 [1:42:43<2:40:23,  7.80s/it]
                                                      
{'loss': 1.3107, 'learning_rate': 0.0, 'epoch': 1.17}

 39%|███▉      | 789/2022 [1:42:43<2:40:23,  7.80s/it]
 39%|███▉      | 790/2022 [1:42:51<2:37:58,  7.69s/it]
                                                      
{'loss': 1.0585, 'learning_rate': 0.0, 'epoch': 1.17}

 39%|███▉      | 790/2022 [1:42:51<2:37:58,  7.69s/it]
 39%|███▉      | 791/2022 [1:42:59<2:38:26,  7.72s/it]
                                                      
{'loss': 1.304, 'learning_rate': 0.0, 'epoch': 1.17}

 39%|███▉      | 791/2022 [1:42:59<2:38:26,  7.72s/it]
 39%|███▉      | 792/2022 [1:43:06<2:36:57,  7.66s/it]
                                                      
{'loss': 1.1403, 'learning_rate': 0.0, 'epoch': 1.17}

 39%|███▉      | 792/2022 [1:43:06<2:36:57,  7.66s/it]
 39%|███▉      | 793/2022 [1:43:14<2:36:50,  7.66s/it]
                                                      
{'loss': 1.2287, 'learning_rate': 0.0, 'epoch': 1.18}

 39%|███▉      | 793/2022 [1:43:14<2:36:50,  7.66s/it]
 39%|███▉      | 794/2022 [1:43:22<2:37:13,  7.68s/it]
                                                      
{'loss': 1.1681, 'learning_rate': 0.0, 'epoch': 1.18}

 39%|███▉      | 794/2022 [1:43:22<2:37:13,  7.68s/it]
 39%|███▉      | 795/2022 [1:43:29<2:38:08,  7.73s/it]
                                                      
{'loss': 1.0658, 'learning_rate': 0.0, 'epoch': 1.18}

 39%|███▉      | 795/2022 [1:43:29<2:38:08,  7.73s/it]
 39%|███▉      | 796/2022 [1:43:37<2:39:52,  7.82s/it]
                                                      
{'loss': 1.1187, 'learning_rate': 0.0, 'epoch': 1.18}

 39%|███▉      | 796/2022 [1:43:37<2:39:52,  7.82s/it]
 39%|███▉      | 797/2022 [1:43:46<2:41:35,  7.91s/it]
                                                      
{'loss': 1.2745, 'learning_rate': 0.0, 'epoch': 1.18}

 39%|███▉      | 797/2022 [1:43:46<2:41:35,  7.91s/it]
 39%|███▉      | 798/2022 [1:43:54<2:42:31,  7.97s/it]
                                                      
{'loss': 1.1649, 'learning_rate': 0.0, 'epoch': 1.18}

 39%|███▉      | 798/2022 [1:43:54<2:42:31,  7.97s/it]
 40%|███▉      | 799/2022 [1:44:01<2:40:37,  7.88s/it]
                                                      
{'loss': 1.3262, 'learning_rate': 0.0, 'epoch': 1.18}

 40%|███▉      | 799/2022 [1:44:01<2:40:37,  7.88s/it]
 40%|███▉      | 800/2022 [1:44:10<2:42:44,  7.99s/it]
                                                      
{'loss': 1.0743, 'learning_rate': 0.0, 'epoch': 1.19}

 40%|███▉      | 800/2022 [1:44:10<2:42:44,  7.99s/it]
 40%|███▉      | 801/2022 [1:44:18<2:43:11,  8.02s/it]
                                                      
{'loss': 1.1403, 'learning_rate': 0.0, 'epoch': 1.19}

 40%|███▉      | 801/2022 [1:44:18<2:43:11,  8.02s/it]
 40%|███▉      | 802/2022 [1:44:25<2:40:01,  7.87s/it]
                                                      
{'loss': 1.1501, 'learning_rate': 0.0, 'epoch': 1.19}

 40%|███▉      | 802/2022 [1:44:25<2:40:01,  7.87s/it]
 40%|███▉      | 803/2022 [1:44:33<2:41:53,  7.97s/it]
                                                      
{'loss': 1.0426, 'learning_rate': 0.0, 'epoch': 1.19}

 40%|███▉      | 803/2022 [1:44:33<2:41:53,  7.97s/it]
 40%|███▉      | 804/2022 [1:44:41<2:40:21,  7.90s/it]
                                                      
{'loss': 1.2011, 'learning_rate': 0.0, 'epoch': 1.19}

 40%|███▉      | 804/2022 [1:44:41<2:40:21,  7.90s/it]
 40%|███▉      | 805/2022 [1:44:49<2:38:46,  7.83s/it]
                                                      
{'loss': 1.1305, 'learning_rate': 0.0, 'epoch': 1.19}

 40%|███▉      | 805/2022 [1:44:49<2:38:46,  7.83s/it]
 40%|███▉      | 806/2022 [1:44:57<2:38:41,  7.83s/it]
                                                      
{'loss': 1.1192, 'learning_rate': 0.0, 'epoch': 1.19}

 40%|███▉      | 806/2022 [1:44:57<2:38:41,  7.83s/it]
 40%|███▉      | 807/2022 [1:45:04<2:38:45,  7.84s/it]
                                                      
{'loss': 1.0638, 'learning_rate': 0.0, 'epoch': 1.2}

 40%|███▉      | 807/2022 [1:45:05<2:38:45,  7.84s/it]
 40%|███▉      | 808/2022 [1:45:12<2:39:23,  7.88s/it]
                                                      
{'loss': 1.1924, 'learning_rate': 0.0, 'epoch': 1.2}

 40%|███▉      | 808/2022 [1:45:12<2:39:23,  7.88s/it]
 40%|████      | 809/2022 [1:45:21<2:42:34,  8.04s/it]
                                                      
{'loss': 1.0386, 'learning_rate': 0.0, 'epoch': 1.2}

 40%|████      | 809/2022 [1:45:21<2:42:34,  8.04s/it]
 40%|████      | 810/2022 [1:45:29<2:40:35,  7.95s/it]
                                                      
{'loss': 1.2083, 'learning_rate': 0.0, 'epoch': 1.2}

 40%|████      | 810/2022 [1:45:29<2:40:35,  7.95s/it]
 40%|████      | 811/2022 [1:45:36<2:38:09,  7.84s/it]
                                                      
{'loss': 1.1142, 'learning_rate': 0.0, 'epoch': 1.2}

 40%|████      | 811/2022 [1:45:36<2:38:09,  7.84s/it]
 40%|████      | 812/2022 [1:45:43<2:34:18,  7.65s/it]
                                                      
{'loss': 1.1176, 'learning_rate': 0.0, 'epoch': 1.2}

 40%|████      | 812/2022 [1:45:43<2:34:18,  7.65s/it]
 40%|████      | 813/2022 [1:45:51<2:35:27,  7.71s/it]
                                                      
{'loss': 1.1189, 'learning_rate': 0.0, 'epoch': 1.21}

 40%|████      | 813/2022 [1:45:51<2:35:27,  7.71s/it]
 40%|████      | 814/2022 [1:45:59<2:35:52,  7.74s/it]
                                                      
{'loss': 1.1614, 'learning_rate': 0.0, 'epoch': 1.21}

 40%|████      | 814/2022 [1:45:59<2:35:52,  7.74s/it]
 40%|████      | 815/2022 [1:46:07<2:34:49,  7.70s/it]
                                                      
{'loss': 1.1399, 'learning_rate': 0.0, 'epoch': 1.21}

 40%|████      | 815/2022 [1:46:07<2:34:49,  7.70s/it]
 40%|████      | 816/2022 [1:46:14<2:32:51,  7.60s/it]
                                                      
{'loss': 1.1401, 'learning_rate': 0.0, 'epoch': 1.21}

 40%|████      | 816/2022 [1:46:14<2:32:51,  7.60s/it]
 40%|████      | 817/2022 [1:46:22<2:33:09,  7.63s/it]
                                                      
{'loss': 1.1587, 'learning_rate': 0.0, 'epoch': 1.21}

 40%|████      | 817/2022 [1:46:22<2:33:09,  7.63s/it]
 40%|████      | 818/2022 [1:46:30<2:35:48,  7.76s/it]
                                                      
{'loss': 1.1185, 'learning_rate': 0.0, 'epoch': 1.21}

 40%|████      | 818/2022 [1:46:30<2:35:48,  7.76s/it]
 41%|████      | 819/2022 [1:46:38<2:35:56,  7.78s/it]
                                                      
{'loss': 1.2802, 'learning_rate': 0.0, 'epoch': 1.21}

 41%|████      | 819/2022 [1:46:38<2:35:56,  7.78s/it]
 41%|████      | 820/2022 [1:46:45<2:34:47,  7.73s/it]
                                                      
{'loss': 1.1319, 'learning_rate': 0.0, 'epoch': 1.22}

 41%|████      | 820/2022 [1:46:45<2:34:47,  7.73s/it]
 41%|████      | 821/2022 [1:46:53<2:34:40,  7.73s/it]
                                                      
{'loss': 1.1194, 'learning_rate': 0.0, 'epoch': 1.22}

 41%|████      | 821/2022 [1:46:53<2:34:40,  7.73s/it]
 41%|████      | 822/2022 [1:47:01<2:37:57,  7.90s/it]
                                                      
{'loss': 1.1153, 'learning_rate': 0.0, 'epoch': 1.22}

 41%|████      | 822/2022 [1:47:01<2:37:57,  7.90s/it]
 41%|████      | 823/2022 [1:47:09<2:38:09,  7.91s/it]
                                                      
{'loss': 1.1491, 'learning_rate': 0.0, 'epoch': 1.22}

 41%|████      | 823/2022 [1:47:09<2:38:09,  7.91s/it]
 41%|████      | 824/2022 [1:47:17<2:36:15,  7.83s/it]
                                                      
{'loss': 1.085, 'learning_rate': 0.0, 'epoch': 1.22}

 41%|████      | 824/2022 [1:47:17<2:36:15,  7.83s/it]
 41%|████      | 825/2022 [1:47:25<2:37:51,  7.91s/it]
                                                      
{'loss': 1.0584, 'learning_rate': 0.0, 'epoch': 1.22}

 41%|████      | 825/2022 [1:47:25<2:37:51,  7.91s/it]
 41%|████      | 826/2022 [1:47:33<2:38:27,  7.95s/it]
                                                      
{'loss': 1.1038, 'learning_rate': 0.0, 'epoch': 1.22}

 41%|████      | 826/2022 [1:47:33<2:38:27,  7.95s/it]
 41%|████      | 827/2022 [1:47:41<2:36:40,  7.87s/it]
                                                      
{'loss': 1.1568, 'learning_rate': 0.0, 'epoch': 1.23}

 41%|████      | 827/2022 [1:47:41<2:36:40,  7.87s/it]
 41%|████      | 828/2022 [1:47:49<2:36:31,  7.87s/it]
                                                      
{'loss': 1.2367, 'learning_rate': 0.0, 'epoch': 1.23}

 41%|████      | 828/2022 [1:47:49<2:36:31,  7.87s/it]
 41%|████      | 829/2022 [1:47:56<2:35:17,  7.81s/it]
                                                      
{'loss': 0.9333, 'learning_rate': 0.0, 'epoch': 1.23}

 41%|████      | 829/2022 [1:47:56<2:35:17,  7.81s/it]
 41%|████      | 830/2022 [1:48:04<2:37:20,  7.92s/it]
                                                      
{'loss': 1.1656, 'learning_rate': 0.0, 'epoch': 1.23}

 41%|████      | 830/2022 [1:48:04<2:37:20,  7.92s/it]
 41%|████      | 831/2022 [1:48:12<2:35:41,  7.84s/it]
                                                      
{'loss': 1.2114, 'learning_rate': 0.0, 'epoch': 1.23}

 41%|████      | 831/2022 [1:48:12<2:35:41,  7.84s/it]
 41%|████      | 832/2022 [1:48:20<2:37:12,  7.93s/it]
                                                      
{'loss': 1.3258, 'learning_rate': 0.0, 'epoch': 1.23}

 41%|████      | 832/2022 [1:48:20<2:37:12,  7.93s/it]
 41%|████      | 833/2022 [1:48:28<2:38:59,  8.02s/it]
                                                      
{'loss': 0.9998, 'learning_rate': 0.0, 'epoch': 1.23}

 41%|████      | 833/2022 [1:48:28<2:38:59,  8.02s/it]
 41%|████      | 834/2022 [1:48:37<2:39:46,  8.07s/it]
                                                      
{'loss': 1.1433, 'learning_rate': 0.0, 'epoch': 1.24}

 41%|████      | 834/2022 [1:48:37<2:39:46,  8.07s/it]
 41%|████▏     | 835/2022 [1:48:44<2:36:36,  7.92s/it]
                                                      
{'loss': 1.1706, 'learning_rate': 0.0, 'epoch': 1.24}

 41%|████▏     | 835/2022 [1:48:44<2:36:36,  7.92s/it]
 41%|████▏     | 836/2022 [1:48:52<2:37:05,  7.95s/it]
                                                      
{'loss': 1.0668, 'learning_rate': 0.0, 'epoch': 1.24}

 41%|████▏     | 836/2022 [1:48:52<2:37:05,  7.95s/it]
 41%|████▏     | 837/2022 [1:49:00<2:37:08,  7.96s/it]
                                                      
{'loss': 1.375, 'learning_rate': 0.0, 'epoch': 1.24}

 41%|████▏     | 837/2022 [1:49:00<2:37:08,  7.96s/it]
 41%|████▏     | 838/2022 [1:49:08<2:37:02,  7.96s/it]
                                                      
{'loss': 1.0931, 'learning_rate': 0.0, 'epoch': 1.24}

 41%|████▏     | 838/2022 [1:49:08<2:37:02,  7.96s/it]
 41%|████▏     | 839/2022 [1:49:16<2:37:53,  8.01s/it]
                                                      
{'loss': 1.2218, 'learning_rate': 0.0, 'epoch': 1.24}

 41%|████▏     | 839/2022 [1:49:16<2:37:53,  8.01s/it]
 42%|████▏     | 840/2022 [1:49:24<2:35:34,  7.90s/it]
                                                      
{'loss': 1.1731, 'learning_rate': 0.0, 'epoch': 1.25}

 42%|████▏     | 840/2022 [1:49:24<2:35:34,  7.90s/it]
 42%|████▏     | 841/2022 [1:49:32<2:34:00,  7.82s/it]
                                                      
{'loss': 1.1156, 'learning_rate': 0.0, 'epoch': 1.25}

 42%|████▏     | 841/2022 [1:49:32<2:34:00,  7.82s/it]
 42%|████▏     | 842/2022 [1:49:39<2:31:42,  7.71s/it]
                                                      
{'loss': 1.2567, 'learning_rate': 0.0, 'epoch': 1.25}

 42%|████▏     | 842/2022 [1:49:39<2:31:42,  7.71s/it]
 42%|████▏     | 843/2022 [1:49:47<2:31:05,  7.69s/it]
                                                      
{'loss': 1.0854, 'learning_rate': 0.0, 'epoch': 1.25}

 42%|████▏     | 843/2022 [1:49:47<2:31:05,  7.69s/it]
 42%|████▏     | 844/2022 [1:49:54<2:31:45,  7.73s/it]
                                                      
{'loss': 1.2684, 'learning_rate': 0.0, 'epoch': 1.25}

 42%|████▏     | 844/2022 [1:49:54<2:31:45,  7.73s/it]
 42%|████▏     | 845/2022 [1:50:02<2:32:18,  7.76s/it]
                                                      
{'loss': 1.1922, 'learning_rate': 0.0, 'epoch': 1.25}

 42%|████▏     | 845/2022 [1:50:02<2:32:18,  7.76s/it]
 42%|████▏     | 846/2022 [1:50:10<2:32:48,  7.80s/it]
                                                      
{'loss': 1.2323, 'learning_rate': 0.0, 'epoch': 1.25}

 42%|████▏     | 846/2022 [1:50:10<2:32:48,  7.80s/it]
 42%|████▏     | 847/2022 [1:50:18<2:31:34,  7.74s/it]
                                                      
{'loss': 1.2078, 'learning_rate': 0.0, 'epoch': 1.26}

 42%|████▏     | 847/2022 [1:50:18<2:31:34,  7.74s/it]
 42%|████▏     | 848/2022 [1:50:26<2:31:28,  7.74s/it]
                                                      
{'loss': 1.1754, 'learning_rate': 0.0, 'epoch': 1.26}

 42%|████▏     | 848/2022 [1:50:26<2:31:28,  7.74s/it]
 42%|████▏     | 849/2022 [1:50:33<2:31:47,  7.76s/it]
                                                      
{'loss': 1.2657, 'learning_rate': 0.0, 'epoch': 1.26}

 42%|████▏     | 849/2022 [1:50:33<2:31:47,  7.76s/it]
 42%|████▏     | 850/2022 [1:50:41<2:30:58,  7.73s/it]
                                                      
{'loss': 1.1489, 'learning_rate': 0.0, 'epoch': 1.26}

 42%|████▏     | 850/2022 [1:50:41<2:30:58,  7.73s/it]
 42%|████▏     | 851/2022 [1:50:49<2:33:30,  7.87s/it]
                                                      
{'loss': 1.0508, 'learning_rate': 0.0, 'epoch': 1.26}

 42%|████▏     | 851/2022 [1:50:49<2:33:30,  7.87s/it]
 42%|████▏     | 852/2022 [1:50:57<2:34:00,  7.90s/it]
                                                      
{'loss': 1.1673, 'learning_rate': 0.0, 'epoch': 1.26}

 42%|████▏     | 852/2022 [1:50:57<2:34:00,  7.90s/it]
 42%|████▏     | 853/2022 [1:51:05<2:34:00,  7.90s/it]
                                                      
{'loss': 1.1934, 'learning_rate': 0.0, 'epoch': 1.26}

 42%|████▏     | 853/2022 [1:51:05<2:34:00,  7.90s/it]
 42%|████▏     | 854/2022 [1:51:13<2:32:28,  7.83s/it]
                                                      
{'loss': 1.0744, 'learning_rate': 0.0, 'epoch': 1.27}

 42%|████▏     | 854/2022 [1:51:13<2:32:28,  7.83s/it]
 42%|████▏     | 855/2022 [1:51:20<2:30:51,  7.76s/it]
                                                      
{'loss': 1.169, 'learning_rate': 0.0, 'epoch': 1.27}

 42%|████▏     | 855/2022 [1:51:20<2:30:51,  7.76s/it]
 42%|████▏     | 856/2022 [1:51:28<2:30:57,  7.77s/it]
                                                      
{'loss': 1.2819, 'learning_rate': 0.0, 'epoch': 1.27}

 42%|████▏     | 856/2022 [1:51:28<2:30:57,  7.77s/it]
 42%|████▏     | 857/2022 [1:51:36<2:29:00,  7.67s/it]
                                                      
{'loss': 1.2653, 'learning_rate': 0.0, 'epoch': 1.27}

 42%|████▏     | 857/2022 [1:51:36<2:29:00,  7.67s/it]
 42%|████▏     | 858/2022 [1:51:43<2:29:59,  7.73s/it]
                                                      
{'loss': 1.179, 'learning_rate': 0.0, 'epoch': 1.27}

 42%|████▏     | 858/2022 [1:51:43<2:29:59,  7.73s/it]
 42%|████▏     | 859/2022 [1:51:51<2:31:04,  7.79s/it]
                                                      
{'loss': 1.162, 'learning_rate': 0.0, 'epoch': 1.27}

 42%|████▏     | 859/2022 [1:51:51<2:31:04,  7.79s/it]
 43%|████▎     | 860/2022 [1:51:59<2:28:59,  7.69s/it]
                                                      
{'loss': 1.4058, 'learning_rate': 0.0, 'epoch': 1.28}

 43%|████▎     | 860/2022 [1:51:59<2:28:59,  7.69s/it]
 43%|████▎     | 861/2022 [1:52:07<2:29:21,  7.72s/it]
                                                      
{'loss': 1.2248, 'learning_rate': 0.0, 'epoch': 1.28}

 43%|████▎     | 861/2022 [1:52:07<2:29:21,  7.72s/it]
 43%|████▎     | 862/2022 [1:52:15<2:30:28,  7.78s/it]
                                                      
{'loss': 1.066, 'learning_rate': 0.0, 'epoch': 1.28}

 43%|████▎     | 862/2022 [1:52:15<2:30:28,  7.78s/it]
 43%|████▎     | 863/2022 [1:52:22<2:31:24,  7.84s/it]
                                                      
{'loss': 1.1627, 'learning_rate': 0.0, 'epoch': 1.28}

 43%|████▎     | 863/2022 [1:52:23<2:31:24,  7.84s/it]
 43%|████▎     | 864/2022 [1:52:30<2:31:30,  7.85s/it]
                                                      
{'loss': 1.0017, 'learning_rate': 0.0, 'epoch': 1.28}

 43%|████▎     | 864/2022 [1:52:30<2:31:30,  7.85s/it]
 43%|████▎     | 865/2022 [1:52:38<2:30:29,  7.80s/it]
                                                      
{'loss': 1.1094, 'learning_rate': 0.0, 'epoch': 1.28}

 43%|████▎     | 865/2022 [1:52:38<2:30:29,  7.80s/it]
 43%|████▎     | 866/2022 [1:52:46<2:30:58,  7.84s/it]
                                                      
{'loss': 1.1548, 'learning_rate': 0.0, 'epoch': 1.28}

 43%|████▎     | 866/2022 [1:52:46<2:30:58,  7.84s/it]
 43%|████▎     | 867/2022 [1:52:53<2:28:54,  7.74s/it]
                                                      
{'loss': 1.1643, 'learning_rate': 0.0, 'epoch': 1.29}

 43%|████▎     | 867/2022 [1:52:53<2:28:54,  7.74s/it]
 43%|████▎     | 868/2022 [1:53:01<2:28:25,  7.72s/it]
                                                      
{'loss': 1.1193, 'learning_rate': 0.0, 'epoch': 1.29}

 43%|████▎     | 868/2022 [1:53:01<2:28:25,  7.72s/it]
 43%|████▎     | 869/2022 [1:53:09<2:29:54,  7.80s/it]
                                                      
{'loss': 1.0724, 'learning_rate': 0.0, 'epoch': 1.29}

 43%|████▎     | 869/2022 [1:53:09<2:29:54,  7.80s/it]
 43%|████▎     | 870/2022 [1:53:17<2:30:08,  7.82s/it]
                                                      
{'loss': 1.0784, 'learning_rate': 0.0, 'epoch': 1.29}

 43%|████▎     | 870/2022 [1:53:17<2:30:08,  7.82s/it]
 43%|████▎     | 871/2022 [1:53:25<2:29:44,  7.81s/it]
                                                      
{'loss': 1.1636, 'learning_rate': 0.0, 'epoch': 1.29}

 43%|████▎     | 871/2022 [1:53:25<2:29:44,  7.81s/it]
 43%|████▎     | 872/2022 [1:53:32<2:27:34,  7.70s/it]
                                                      
{'loss': 1.1714, 'learning_rate': 0.0, 'epoch': 1.29}

 43%|████▎     | 872/2022 [1:53:32<2:27:34,  7.70s/it]
 43%|████▎     | 873/2022 [1:53:40<2:27:37,  7.71s/it]
                                                      
{'loss': 1.1209, 'learning_rate': 0.0, 'epoch': 1.29}

 43%|████▎     | 873/2022 [1:53:40<2:27:37,  7.71s/it]
 43%|████▎     | 874/2022 [1:53:47<2:25:07,  7.59s/it]
                                                      
{'loss': 1.1322, 'learning_rate': 0.0, 'epoch': 1.3}

 43%|████▎     | 874/2022 [1:53:47<2:25:07,  7.59s/it]
 43%|████▎     | 875/2022 [1:53:55<2:27:13,  7.70s/it]
                                                      
{'loss': 1.1689, 'learning_rate': 0.0, 'epoch': 1.3}

 43%|████▎     | 875/2022 [1:53:55<2:27:13,  7.70s/it]
 43%|████▎     | 876/2022 [1:54:03<2:27:18,  7.71s/it]
                                                      
{'loss': 1.2838, 'learning_rate': 0.0, 'epoch': 1.3}

 43%|████▎     | 876/2022 [1:54:03<2:27:18,  7.71s/it]
 43%|████▎     | 877/2022 [1:54:10<2:26:01,  7.65s/it]
                                                      
{'loss': 1.0593, 'learning_rate': 0.0, 'epoch': 1.3}

 43%|████▎     | 877/2022 [1:54:11<2:26:01,  7.65s/it]
 43%|████▎     | 878/2022 [1:54:18<2:27:00,  7.71s/it]
                                                      
{'loss': 1.0928, 'learning_rate': 0.0, 'epoch': 1.3}

 43%|████▎     | 878/2022 [1:54:18<2:27:00,  7.71s/it]
 43%|████▎     | 879/2022 [1:54:26<2:28:56,  7.82s/it]
                                                      
{'loss': 1.2451, 'learning_rate': 0.0, 'epoch': 1.3}

 43%|████▎     | 879/2022 [1:54:26<2:28:56,  7.82s/it]
 44%|████▎     | 880/2022 [1:54:34<2:28:30,  7.80s/it]
                                                      
{'loss': 1.195, 'learning_rate': 0.0, 'epoch': 1.3}

 44%|████▎     | 880/2022 [1:54:34<2:28:30,  7.80s/it]
 44%|████▎     | 881/2022 [1:54:42<2:30:27,  7.91s/it]
                                                      
{'loss': 1.0457, 'learning_rate': 0.0, 'epoch': 1.31}

 44%|████▎     | 881/2022 [1:54:42<2:30:27,  7.91s/it]
 44%|████▎     | 882/2022 [1:54:50<2:31:42,  7.98s/it]
                                                      
{'loss': 1.0454, 'learning_rate': 0.0, 'epoch': 1.31}

 44%|████▎     | 882/2022 [1:54:51<2:31:42,  7.98s/it]
 44%|████▎     | 883/2022 [1:54:58<2:28:44,  7.83s/it]
                                                      
{'loss': 1.205, 'learning_rate': 0.0, 'epoch': 1.31}

 44%|████▎     | 883/2022 [1:54:58<2:28:44,  7.83s/it]
 44%|████▎     | 884/2022 [1:55:06<2:27:56,  7.80s/it]
                                                      
{'loss': 1.1794, 'learning_rate': 0.0, 'epoch': 1.31}

 44%|████▎     | 884/2022 [1:55:06<2:27:56,  7.80s/it]
 44%|████▍     | 885/2022 [1:55:13<2:26:35,  7.74s/it]
                                                      
{'loss': 1.2319, 'learning_rate': 0.0, 'epoch': 1.31}

 44%|████▍     | 885/2022 [1:55:13<2:26:35,  7.74s/it]
 44%|████▍     | 886/2022 [1:55:21<2:26:24,  7.73s/it]
                                                      
{'loss': 1.1647, 'learning_rate': 0.0, 'epoch': 1.31}

 44%|████▍     | 886/2022 [1:55:21<2:26:24,  7.73s/it]
 44%|████▍     | 887/2022 [1:55:29<2:25:26,  7.69s/it]
                                                      
{'loss': 1.1175, 'learning_rate': 0.0, 'epoch': 1.32}

 44%|████▍     | 887/2022 [1:55:29<2:25:26,  7.69s/it]
 44%|████▍     | 888/2022 [1:55:36<2:26:16,  7.74s/it]
                                                      
{'loss': 1.1394, 'learning_rate': 0.0, 'epoch': 1.32}

 44%|████▍     | 888/2022 [1:55:36<2:26:16,  7.74s/it]
 44%|████▍     | 889/2022 [1:55:45<2:28:05,  7.84s/it]
                                                      
{'loss': 1.1171, 'learning_rate': 0.0, 'epoch': 1.32}

 44%|████▍     | 889/2022 [1:55:45<2:28:05,  7.84s/it]
 44%|████▍     | 890/2022 [1:55:53<2:29:53,  7.95s/it]
                                                      
{'loss': 1.0261, 'learning_rate': 0.0, 'epoch': 1.32}

 44%|████▍     | 890/2022 [1:55:53<2:29:53,  7.95s/it]
 44%|████▍     | 891/2022 [1:56:01<2:31:06,  8.02s/it]
                                                      
{'loss': 1.0949, 'learning_rate': 0.0, 'epoch': 1.32}

 44%|████▍     | 891/2022 [1:56:01<2:31:06,  8.02s/it]
 44%|████▍     | 892/2022 [1:56:09<2:30:34,  8.00s/it]
                                                      
{'loss': 1.1467, 'learning_rate': 0.0, 'epoch': 1.32}

 44%|████▍     | 892/2022 [1:56:09<2:30:34,  8.00s/it]
 44%|████▍     | 893/2022 [1:56:17<2:31:56,  8.07s/it]
                                                      
{'loss': 1.2617, 'learning_rate': 0.0, 'epoch': 1.32}

 44%|████▍     | 893/2022 [1:56:17<2:31:56,  8.07s/it]
 44%|████▍     | 894/2022 [1:56:25<2:29:31,  7.95s/it]
                                                      
{'loss': 1.15, 'learning_rate': 0.0, 'epoch': 1.33}

 44%|████▍     | 894/2022 [1:56:25<2:29:31,  7.95s/it]
 44%|████▍     | 895/2022 [1:56:32<2:27:32,  7.86s/it]
                                                      
{'loss': 1.1662, 'learning_rate': 0.0, 'epoch': 1.33}

 44%|████▍     | 895/2022 [1:56:32<2:27:32,  7.86s/it]
 44%|████▍     | 896/2022 [1:56:40<2:27:47,  7.88s/it]
                                                      
{'loss': 1.2804, 'learning_rate': 0.0, 'epoch': 1.33}

 44%|████▍     | 896/2022 [1:56:40<2:27:47,  7.88s/it]
 44%|████▍     | 897/2022 [1:56:48<2:27:15,  7.85s/it]
                                                      
{'loss': 1.1993, 'learning_rate': 0.0, 'epoch': 1.33}

 44%|████▍     | 897/2022 [1:56:48<2:27:15,  7.85s/it]
 44%|████▍     | 898/2022 [1:56:56<2:26:20,  7.81s/it]
                                                      
{'loss': 1.0895, 'learning_rate': 0.0, 'epoch': 1.33}

 44%|████▍     | 898/2022 [1:56:56<2:26:20,  7.81s/it]
 44%|████▍     | 899/2022 [1:57:03<2:25:07,  7.75s/it]
                                                      
{'loss': 1.22, 'learning_rate': 0.0, 'epoch': 1.33}

 44%|████▍     | 899/2022 [1:57:03<2:25:07,  7.75s/it]
 45%|████▍     | 900/2022 [1:57:11<2:25:57,  7.81s/it]
                                                      
{'loss': 1.2553, 'learning_rate': 0.0, 'epoch': 1.33}

 45%|████▍     | 900/2022 [1:57:11<2:25:57,  7.81s/it]
 45%|████▍     | 901/2022 [1:57:19<2:26:26,  7.84s/it]
                                                      
{'loss': 1.2595, 'learning_rate': 0.0, 'epoch': 1.34}

 45%|████▍     | 901/2022 [1:57:19<2:26:26,  7.84s/it]
 45%|████▍     | 902/2022 [1:57:27<2:24:21,  7.73s/it]
                                                      
{'loss': 1.0736, 'learning_rate': 0.0, 'epoch': 1.34}

 45%|████▍     | 902/2022 [1:57:27<2:24:21,  7.73s/it]
 45%|████▍     | 903/2022 [1:57:34<2:23:32,  7.70s/it]
                                                      
{'loss': 1.3443, 'learning_rate': 0.0, 'epoch': 1.34}

 45%|████▍     | 903/2022 [1:57:34<2:23:32,  7.70s/it]
 45%|████▍     | 904/2022 [1:57:42<2:24:52,  7.78s/it]
                                                      
{'loss': 1.1545, 'learning_rate': 0.0, 'epoch': 1.34}

 45%|████▍     | 904/2022 [1:57:42<2:24:52,  7.78s/it]
 45%|████▍     | 905/2022 [1:57:50<2:24:41,  7.77s/it]
                                                      
{'loss': 1.1736, 'learning_rate': 0.0, 'epoch': 1.34}

 45%|████▍     | 905/2022 [1:57:50<2:24:41,  7.77s/it]
 45%|████▍     | 906/2022 [1:57:58<2:24:04,  7.75s/it]
                                                      
{'loss': 1.0311, 'learning_rate': 0.0, 'epoch': 1.34}

 45%|████▍     | 906/2022 [1:57:58<2:24:04,  7.75s/it]
 45%|████▍     | 907/2022 [1:58:06<2:24:39,  7.78s/it]
                                                      
{'loss': 1.052, 'learning_rate': 0.0, 'epoch': 1.34}

 45%|████▍     | 907/2022 [1:58:06<2:24:39,  7.78s/it]
 45%|████▍     | 908/2022 [1:58:14<2:26:56,  7.91s/it]
                                                      
{'loss': 1.2028, 'learning_rate': 0.0, 'epoch': 1.35}

 45%|████▍     | 908/2022 [1:58:14<2:26:56,  7.91s/it]
 45%|████▍     | 909/2022 [1:58:22<2:25:40,  7.85s/it]
                                                      
{'loss': 1.178, 'learning_rate': 0.0, 'epoch': 1.35}

 45%|████▍     | 909/2022 [1:58:22<2:25:40,  7.85s/it]
 45%|████▌     | 910/2022 [1:58:30<2:29:49,  8.08s/it]
                                                      
{'loss': 1.0918, 'learning_rate': 0.0, 'epoch': 1.35}

 45%|████▌     | 910/2022 [1:58:30<2:29:49,  8.08s/it]
 45%|████▌     | 911/2022 [1:58:38<2:27:00,  7.94s/it]
                                                      
{'loss': 1.1488, 'learning_rate': 0.0, 'epoch': 1.35}

 45%|████▌     | 911/2022 [1:58:38<2:27:00,  7.94s/it]
 45%|████▌     | 912/2022 [1:58:46<2:27:22,  7.97s/it]
                                                      
{'loss': 1.1175, 'learning_rate': 0.0, 'epoch': 1.35}

 45%|████▌     | 912/2022 [1:58:46<2:27:22,  7.97s/it]
 45%|████▌     | 913/2022 [1:58:54<2:25:51,  7.89s/it]
                                                      
{'loss': 1.1834, 'learning_rate': 0.0, 'epoch': 1.35}

 45%|████▌     | 913/2022 [1:58:54<2:25:51,  7.89s/it]
 45%|████▌     | 914/2022 [1:59:01<2:23:33,  7.77s/it]
                                                      
{'loss': 1.1666, 'learning_rate': 0.0, 'epoch': 1.36}

 45%|████▌     | 914/2022 [1:59:01<2:23:33,  7.77s/it]
 45%|████▌     | 915/2022 [1:59:09<2:21:54,  7.69s/it]
                                                      
{'loss': 1.1462, 'learning_rate': 0.0, 'epoch': 1.36}

 45%|████▌     | 915/2022 [1:59:09<2:21:54,  7.69s/it]
 45%|████▌     | 916/2022 [1:59:16<2:22:09,  7.71s/it]
                                                      
{'loss': 1.0384, 'learning_rate': 0.0, 'epoch': 1.36}

 45%|████▌     | 916/2022 [1:59:16<2:22:09,  7.71s/it]
 45%|████▌     | 917/2022 [1:59:25<2:25:28,  7.90s/it]
                                                      
{'loss': 1.2568, 'learning_rate': 0.0, 'epoch': 1.36}

 45%|████▌     | 917/2022 [1:59:25<2:25:28,  7.90s/it]
 45%|████▌     | 918/2022 [1:59:32<2:23:20,  7.79s/it]
                                                      
{'loss': 1.1507, 'learning_rate': 0.0, 'epoch': 1.36}

 45%|████▌     | 918/2022 [1:59:32<2:23:20,  7.79s/it]
 45%|████▌     | 919/2022 [1:59:40<2:23:23,  7.80s/it]
                                                      
{'loss': 1.3365, 'learning_rate': 0.0, 'epoch': 1.36}

 45%|████▌     | 919/2022 [1:59:40<2:23:23,  7.80s/it]
 45%|████▌     | 920/2022 [1:59:48<2:25:19,  7.91s/it]
                                                      
{'loss': 1.1422, 'learning_rate': 0.0, 'epoch': 1.36}

 45%|████▌     | 920/2022 [1:59:48<2:25:19,  7.91s/it]
 46%|████▌     | 921/2022 [1:59:56<2:25:38,  7.94s/it]
                                                      
{'loss': 1.2216, 'learning_rate': 0.0, 'epoch': 1.37}

 46%|████▌     | 921/2022 [1:59:56<2:25:38,  7.94s/it]
 46%|████▌     | 922/2022 [2:00:04<2:25:23,  7.93s/it]
                                                      
{'loss': 1.0753, 'learning_rate': 0.0, 'epoch': 1.37}

 46%|████▌     | 922/2022 [2:00:04<2:25:23,  7.93s/it]
 46%|████▌     | 923/2022 [2:00:12<2:23:21,  7.83s/it]
                                                      
{'loss': 1.2429, 'learning_rate': 0.0, 'epoch': 1.37}

 46%|████▌     | 923/2022 [2:00:12<2:23:21,  7.83s/it]
 46%|████▌     | 924/2022 [2:00:20<2:23:42,  7.85s/it]
                                                      
{'loss': 1.2413, 'learning_rate': 0.0, 'epoch': 1.37}

 46%|████▌     | 924/2022 [2:00:20<2:23:42,  7.85s/it]
 46%|████▌     | 925/2022 [2:00:28<2:24:23,  7.90s/it]
                                                      
{'loss': 1.1399, 'learning_rate': 0.0, 'epoch': 1.37}

 46%|████▌     | 925/2022 [2:00:28<2:24:23,  7.90s/it]
 46%|████▌     | 926/2022 [2:00:35<2:23:14,  7.84s/it]
                                                      
{'loss': 1.3108, 'learning_rate': 0.0, 'epoch': 1.37}

 46%|████▌     | 926/2022 [2:00:35<2:23:14,  7.84s/it]
 46%|████▌     | 927/2022 [2:00:43<2:23:47,  7.88s/it]
                                                      
{'loss': 1.2493, 'learning_rate': 0.0, 'epoch': 1.37}

 46%|████▌     | 927/2022 [2:00:43<2:23:47,  7.88s/it]
 46%|████▌     | 928/2022 [2:00:51<2:23:33,  7.87s/it]
                                                      
{'loss': 1.304, 'learning_rate': 0.0, 'epoch': 1.38}

 46%|████▌     | 928/2022 [2:00:51<2:23:33,  7.87s/it]
 46%|████▌     | 929/2022 [2:00:59<2:24:41,  7.94s/it]
                                                      
{'loss': 1.1225, 'learning_rate': 0.0, 'epoch': 1.38}

 46%|████▌     | 929/2022 [2:00:59<2:24:41,  7.94s/it]
 46%|████▌     | 930/2022 [2:01:07<2:23:18,  7.87s/it]
                                                      
{'loss': 1.2224, 'learning_rate': 0.0, 'epoch': 1.38}

 46%|████▌     | 930/2022 [2:01:07<2:23:18,  7.87s/it]
 46%|████▌     | 931/2022 [2:01:15<2:22:33,  7.84s/it]
                                                      
{'loss': 1.0136, 'learning_rate': 0.0, 'epoch': 1.38}

 46%|████▌     | 931/2022 [2:01:15<2:22:33,  7.84s/it]
 46%|████▌     | 932/2022 [2:01:23<2:22:11,  7.83s/it]
                                                      
{'loss': 1.1679, 'learning_rate': 0.0, 'epoch': 1.38}

 46%|████▌     | 932/2022 [2:01:23<2:22:11,  7.83s/it]
 46%|████▌     | 933/2022 [2:01:30<2:22:07,  7.83s/it]
                                                      
{'loss': 1.1928, 'learning_rate': 0.0, 'epoch': 1.38}

 46%|████▌     | 933/2022 [2:01:30<2:22:07,  7.83s/it]
 46%|████▌     | 934/2022 [2:01:38<2:21:36,  7.81s/it]
                                                      
{'loss': 1.2563, 'learning_rate': 0.0, 'epoch': 1.38}

 46%|████▌     | 934/2022 [2:01:38<2:21:36,  7.81s/it]
 46%|████▌     | 935/2022 [2:01:46<2:21:19,  7.80s/it]
                                                      
{'loss': 1.1001, 'learning_rate': 0.0, 'epoch': 1.39}

 46%|████▌     | 935/2022 [2:01:46<2:21:19,  7.80s/it]
 46%|████▋     | 936/2022 [2:01:54<2:20:41,  7.77s/it]
                                                      
{'loss': 1.1044, 'learning_rate': 0.0, 'epoch': 1.39}

 46%|████▋     | 936/2022 [2:01:54<2:20:41,  7.77s/it]
 46%|████▋     | 937/2022 [2:02:01<2:20:31,  7.77s/it]
                                                      
{'loss': 1.227, 'learning_rate': 0.0, 'epoch': 1.39}

 46%|████▋     | 937/2022 [2:02:01<2:20:31,  7.77s/it]
 46%|████▋     | 938/2022 [2:02:09<2:19:25,  7.72s/it]
                                                      
{'loss': 1.1875, 'learning_rate': 0.0, 'epoch': 1.39}

 46%|████▋     | 938/2022 [2:02:09<2:19:25,  7.72s/it]
 46%|████▋     | 939/2022 [2:02:17<2:19:28,  7.73s/it]
                                                      
{'loss': 1.2083, 'learning_rate': 0.0, 'epoch': 1.39}

 46%|████▋     | 939/2022 [2:02:17<2:19:28,  7.73s/it]
 46%|████▋     | 940/2022 [2:02:25<2:20:16,  7.78s/it]
                                                      
{'loss': 1.1319, 'learning_rate': 0.0, 'epoch': 1.39}

 46%|████▋     | 940/2022 [2:02:25<2:20:16,  7.78s/it]
 47%|████▋     | 941/2022 [2:02:33<2:21:35,  7.86s/it]
                                                      
{'loss': 1.1911, 'learning_rate': 0.0, 'epoch': 1.4}

 47%|████▋     | 941/2022 [2:02:33<2:21:35,  7.86s/it]
 47%|████▋     | 942/2022 [2:02:40<2:19:03,  7.73s/it]
                                                      
{'loss': 1.0888, 'learning_rate': 0.0, 'epoch': 1.4}

 47%|████▋     | 942/2022 [2:02:40<2:19:03,  7.73s/it]
 47%|████▋     | 943/2022 [2:02:48<2:21:49,  7.89s/it]
                                                      
{'loss': 1.1042, 'learning_rate': 0.0, 'epoch': 1.4}

 47%|████▋     | 943/2022 [2:02:48<2:21:49,  7.89s/it]
 47%|████▋     | 944/2022 [2:02:56<2:21:30,  7.88s/it]
                                                      
{'loss': 1.1748, 'learning_rate': 0.0, 'epoch': 1.4}

 47%|████▋     | 944/2022 [2:02:56<2:21:30,  7.88s/it]
 47%|████▋     | 945/2022 [2:03:04<2:20:58,  7.85s/it]
                                                      
{'loss': 1.1719, 'learning_rate': 0.0, 'epoch': 1.4}

 47%|████▋     | 945/2022 [2:03:04<2:20:58,  7.85s/it]
 47%|████▋     | 946/2022 [2:03:12<2:22:43,  7.96s/it]
                                                      
{'loss': 1.104, 'learning_rate': 0.0, 'epoch': 1.4}

 47%|████▋     | 946/2022 [2:03:12<2:22:43,  7.96s/it]
 47%|████▋     | 947/2022 [2:03:20<2:22:12,  7.94s/it]
                                                      
{'loss': 1.1712, 'learning_rate': 0.0, 'epoch': 1.4}

 47%|████▋     | 947/2022 [2:03:20<2:22:12,  7.94s/it]
 47%|████▋     | 948/2022 [2:03:28<2:19:20,  7.78s/it]
                                                      
{'loss': 1.0886, 'learning_rate': 0.0, 'epoch': 1.41}

 47%|████▋     | 948/2022 [2:03:28<2:19:20,  7.78s/it]
 47%|████▋     | 949/2022 [2:03:35<2:19:43,  7.81s/it]
                                                      
{'loss': 1.1361, 'learning_rate': 0.0, 'epoch': 1.41}

 47%|████▋     | 949/2022 [2:03:35<2:19:43,  7.81s/it]
 47%|████▋     | 950/2022 [2:03:43<2:19:14,  7.79s/it]
                                                      
{'loss': 1.2218, 'learning_rate': 0.0, 'epoch': 1.41}

 47%|████▋     | 950/2022 [2:03:43<2:19:14,  7.79s/it]
 47%|████▋     | 951/2022 [2:03:51<2:19:01,  7.79s/it]
                                                      
{'loss': 1.2805, 'learning_rate': 0.0, 'epoch': 1.41}

 47%|████▋     | 951/2022 [2:03:51<2:19:01,  7.79s/it]
 47%|████▋     | 952/2022 [2:03:59<2:18:48,  7.78s/it]
                                                      
{'loss': 1.1129, 'learning_rate': 0.0, 'epoch': 1.41}

 47%|████▋     | 952/2022 [2:03:59<2:18:48,  7.78s/it]
 47%|████▋     | 953/2022 [2:04:06<2:17:02,  7.69s/it]
                                                      
{'loss': 1.0745, 'learning_rate': 0.0, 'epoch': 1.41}

 47%|████▋     | 953/2022 [2:04:06<2:17:02,  7.69s/it]
 47%|████▋     | 954/2022 [2:04:14<2:16:06,  7.65s/it]
                                                      
{'loss': 1.1911, 'learning_rate': 0.0, 'epoch': 1.41}

 47%|████▋     | 954/2022 [2:04:14<2:16:06,  7.65s/it]
 47%|████▋     | 955/2022 [2:04:21<2:15:53,  7.64s/it]
                                                      
{'loss': 1.1298, 'learning_rate': 0.0, 'epoch': 1.42}

 47%|████▋     | 955/2022 [2:04:21<2:15:53,  7.64s/it]
 47%|████▋     | 956/2022 [2:04:29<2:15:19,  7.62s/it]
                                                      
{'loss': 1.1631, 'learning_rate': 0.0, 'epoch': 1.42}

 47%|████▋     | 956/2022 [2:04:29<2:15:19,  7.62s/it]
 47%|████▋     | 957/2022 [2:04:36<2:14:48,  7.59s/it]
                                                      
{'loss': 1.1724, 'learning_rate': 0.0, 'epoch': 1.42}

 47%|████▋     | 957/2022 [2:04:36<2:14:48,  7.59s/it]
 47%|████▋     | 958/2022 [2:04:44<2:14:53,  7.61s/it]
                                                      
{'loss': 1.1043, 'learning_rate': 0.0, 'epoch': 1.42}

 47%|████▋     | 958/2022 [2:04:44<2:14:53,  7.61s/it]
 47%|████▋     | 959/2022 [2:04:52<2:18:40,  7.83s/it]
                                                      
{'loss': 1.0766, 'learning_rate': 0.0, 'epoch': 1.42}

 47%|████▋     | 959/2022 [2:04:52<2:18:40,  7.83s/it]
 47%|████▋     | 960/2022 [2:05:00<2:16:36,  7.72s/it]
                                                      
{'loss': 1.0508, 'learning_rate': 0.0, 'epoch': 1.42}

 47%|████▋     | 960/2022 [2:05:00<2:16:36,  7.72s/it]
 48%|████▊     | 961/2022 [2:05:08<2:18:38,  7.84s/it]
                                                      
{'loss': 1.1585, 'learning_rate': 0.0, 'epoch': 1.42}

 48%|████▊     | 961/2022 [2:05:08<2:18:38,  7.84s/it]
 48%|████▊     | 962/2022 [2:05:16<2:18:17,  7.83s/it]
                                                      
{'loss': 1.2481, 'learning_rate': 0.0, 'epoch': 1.43}

 48%|████▊     | 962/2022 [2:05:16<2:18:17,  7.83s/it]
 48%|████▊     | 963/2022 [2:05:24<2:19:16,  7.89s/it]
                                                      
{'loss': 1.1422, 'learning_rate': 0.0, 'epoch': 1.43}

 48%|████▊     | 963/2022 [2:05:24<2:19:16,  7.89s/it]
 48%|████▊     | 964/2022 [2:05:32<2:18:00,  7.83s/it]
                                                      
{'loss': 1.1806, 'learning_rate': 0.0, 'epoch': 1.43}

 48%|████▊     | 964/2022 [2:05:32<2:18:00,  7.83s/it]
 48%|████▊     | 965/2022 [2:05:39<2:15:51,  7.71s/it]
                                                      
{'loss': 1.1648, 'learning_rate': 0.0, 'epoch': 1.43}

 48%|████▊     | 965/2022 [2:05:39<2:15:51,  7.71s/it]
 48%|████▊     | 966/2022 [2:05:47<2:17:01,  7.79s/it]
                                                      
{'loss': 1.1573, 'learning_rate': 0.0, 'epoch': 1.43}

 48%|████▊     | 966/2022 [2:05:47<2:17:01,  7.79s/it]
 48%|████▊     | 967/2022 [2:05:55<2:18:15,  7.86s/it]
                                                      
{'loss': 1.0322, 'learning_rate': 0.0, 'epoch': 1.43}

 48%|████▊     | 967/2022 [2:05:55<2:18:15,  7.86s/it]
 48%|████▊     | 968/2022 [2:06:03<2:17:43,  7.84s/it]
                                                      
{'loss': 1.1561, 'learning_rate': 0.0, 'epoch': 1.44}

 48%|████▊     | 968/2022 [2:06:03<2:17:43,  7.84s/it]
 48%|████▊     | 969/2022 [2:06:10<2:17:01,  7.81s/it]
                                                      
{'loss': 1.1301, 'learning_rate': 0.0, 'epoch': 1.44}

 48%|████▊     | 969/2022 [2:06:10<2:17:01,  7.81s/it]
 48%|████▊     | 970/2022 [2:06:18<2:16:41,  7.80s/it]
                                                      
{'loss': 1.1357, 'learning_rate': 0.0, 'epoch': 1.44}

 48%|████▊     | 970/2022 [2:06:18<2:16:41,  7.80s/it]
 48%|████▊     | 971/2022 [2:06:26<2:18:39,  7.92s/it]
                                                      
{'loss': 1.1494, 'learning_rate': 0.0, 'epoch': 1.44}

 48%|████▊     | 971/2022 [2:06:26<2:18:39,  7.92s/it]
 48%|████▊     | 972/2022 [2:06:34<2:18:14,  7.90s/it]
                                                      
{'loss': 1.2409, 'learning_rate': 0.0, 'epoch': 1.44}

 48%|████▊     | 972/2022 [2:06:34<2:18:14,  7.90s/it]
 48%|████▊     | 973/2022 [2:06:42<2:16:24,  7.80s/it]
                                                      
{'loss': 1.1772, 'learning_rate': 0.0, 'epoch': 1.44}

 48%|████▊     | 973/2022 [2:06:42<2:16:24,  7.80s/it]
 48%|████▊     | 974/2022 [2:06:50<2:15:40,  7.77s/it]
                                                      
{'loss': 1.207, 'learning_rate': 0.0, 'epoch': 1.44}

 48%|████▊     | 974/2022 [2:06:50<2:15:40,  7.77s/it]
 48%|████▊     | 975/2022 [2:06:58<2:16:55,  7.85s/it]
                                                      
{'loss': 1.1643, 'learning_rate': 0.0, 'epoch': 1.45}

 48%|████▊     | 975/2022 [2:06:58<2:16:55,  7.85s/it]
 48%|████▊     | 976/2022 [2:07:05<2:16:31,  7.83s/it]
                                                      
{'loss': 1.0714, 'learning_rate': 0.0, 'epoch': 1.45}

 48%|████▊     | 976/2022 [2:07:05<2:16:31,  7.83s/it]
 48%|████▊     | 977/2022 [2:07:13<2:14:03,  7.70s/it]
                                                      
{'loss': 1.2376, 'learning_rate': 0.0, 'epoch': 1.45}

 48%|████▊     | 977/2022 [2:07:13<2:14:03,  7.70s/it]
 48%|████▊     | 978/2022 [2:07:21<2:15:13,  7.77s/it]
                                                      
{'loss': 1.1632, 'learning_rate': 0.0, 'epoch': 1.45}

 48%|████▊     | 978/2022 [2:07:21<2:15:13,  7.77s/it]
 48%|████▊     | 979/2022 [2:07:28<2:14:07,  7.72s/it]
                                                      
{'loss': 1.2076, 'learning_rate': 0.0, 'epoch': 1.45}

 48%|████▊     | 979/2022 [2:07:28<2:14:07,  7.72s/it]
 48%|████▊     | 980/2022 [2:07:36<2:12:59,  7.66s/it]
                                                      
{'loss': 1.2535, 'learning_rate': 0.0, 'epoch': 1.45}

 48%|████▊     | 980/2022 [2:07:36<2:12:59,  7.66s/it]
 49%|████▊     | 981/2022 [2:07:43<2:10:57,  7.55s/it]
                                                      
{'loss': 1.1825, 'learning_rate': 0.0, 'epoch': 1.45}

 49%|████▊     | 981/2022 [2:07:43<2:10:57,  7.55s/it]
 49%|████▊     | 982/2022 [2:07:51<2:10:44,  7.54s/it]
                                                      
{'loss': 1.059, 'learning_rate': 0.0, 'epoch': 1.46}

 49%|████▊     | 982/2022 [2:07:51<2:10:44,  7.54s/it]
 49%|████▊     | 983/2022 [2:07:58<2:10:11,  7.52s/it]
                                                      
{'loss': 1.1329, 'learning_rate': 0.0, 'epoch': 1.46}

 49%|████▊     | 983/2022 [2:07:58<2:10:11,  7.52s/it]
 49%|████▊     | 984/2022 [2:08:06<2:11:25,  7.60s/it]
                                                      
{'loss': 1.1252, 'learning_rate': 0.0, 'epoch': 1.46}

 49%|████▊     | 984/2022 [2:08:06<2:11:25,  7.60s/it]
 49%|████▊     | 985/2022 [2:08:14<2:11:48,  7.63s/it]
                                                      
{'loss': 1.1933, 'learning_rate': 0.0, 'epoch': 1.46}

 49%|████▊     | 985/2022 [2:08:14<2:11:48,  7.63s/it]
 49%|████▉     | 986/2022 [2:08:21<2:11:45,  7.63s/it]
                                                      
{'loss': 1.2911, 'learning_rate': 0.0, 'epoch': 1.46}

 49%|████▉     | 986/2022 [2:08:21<2:11:45,  7.63s/it]
 49%|████▉     | 987/2022 [2:08:29<2:13:32,  7.74s/it]
                                                      
{'loss': 1.3384, 'learning_rate': 0.0, 'epoch': 1.46}

 49%|████▉     | 987/2022 [2:08:29<2:13:32,  7.74s/it]
 49%|████▉     | 988/2022 [2:08:37<2:11:38,  7.64s/it]
                                                      
{'loss': 1.1604, 'learning_rate': 0.0, 'epoch': 1.46}

 49%|████▉     | 988/2022 [2:08:37<2:11:38,  7.64s/it]
 49%|████▉     | 989/2022 [2:08:44<2:11:24,  7.63s/it]
                                                      
{'loss': 1.1868, 'learning_rate': 0.0, 'epoch': 1.47}

 49%|████▉     | 989/2022 [2:08:44<2:11:24,  7.63s/it]
 49%|████▉     | 990/2022 [2:08:52<2:14:20,  7.81s/it]
                                                      
{'loss': 1.2752, 'learning_rate': 0.0, 'epoch': 1.47}

 49%|████▉     | 990/2022 [2:08:53<2:14:20,  7.81s/it]
 49%|████▉     | 991/2022 [2:09:00<2:15:05,  7.86s/it]
                                                      
{'loss': 1.1823, 'learning_rate': 0.0, 'epoch': 1.47}

 49%|████▉     | 991/2022 [2:09:00<2:15:05,  7.86s/it]
 49%|████▉     | 992/2022 [2:09:08<2:11:57,  7.69s/it]
                                                      
{'loss': 1.1616, 'learning_rate': 0.0, 'epoch': 1.47}

 49%|████▉     | 992/2022 [2:09:08<2:11:57,  7.69s/it]
 49%|████▉     | 993/2022 [2:09:16<2:14:55,  7.87s/it]
                                                      
{'loss': 1.1143, 'learning_rate': 0.0, 'epoch': 1.47}

 49%|████▉     | 993/2022 [2:09:16<2:14:55,  7.87s/it]
 49%|████▉     | 994/2022 [2:09:24<2:12:57,  7.76s/it]
                                                      
{'loss': 1.1515, 'learning_rate': 0.0, 'epoch': 1.47}

 49%|████▉     | 994/2022 [2:09:24<2:12:57,  7.76s/it]
 49%|████▉     | 995/2022 [2:09:32<2:14:13,  7.84s/it]
                                                      
{'loss': 1.1613, 'learning_rate': 0.0, 'epoch': 1.48}

 49%|████▉     | 995/2022 [2:09:32<2:14:13,  7.84s/it]
 49%|████▉     | 996/2022 [2:09:39<2:12:38,  7.76s/it]
                                                      
{'loss': 1.2224, 'learning_rate': 0.0, 'epoch': 1.48}

 49%|████▉     | 996/2022 [2:09:39<2:12:38,  7.76s/it]
 49%|████▉     | 997/2022 [2:09:47<2:11:11,  7.68s/it]
                                                      
{'loss': 1.2155, 'learning_rate': 0.0, 'epoch': 1.48}

 49%|████▉     | 997/2022 [2:09:47<2:11:11,  7.68s/it]
 49%|████▉     | 998/2022 [2:09:55<2:12:33,  7.77s/it]
                                                      
{'loss': 1.0216, 'learning_rate': 0.0, 'epoch': 1.48}

 49%|████▉     | 998/2022 [2:09:55<2:12:33,  7.77s/it]
 49%|████▉     | 999/2022 [2:10:02<2:12:43,  7.78s/it]
                                                      
{'loss': 1.3144, 'learning_rate': 0.0, 'epoch': 1.48}

 49%|████▉     | 999/2022 [2:10:02<2:12:43,  7.78s/it]
 49%|████▉     | 1000/2022 [2:10:11<2:14:24,  7.89s/it]
                                                       
{'loss': 1.1072, 'learning_rate': 0.0, 'epoch': 1.48}

 49%|████▉     | 1000/2022 [2:10:11<2:14:24,  7.89s/it]
 50%|████▉     | 1001/2022 [2:10:18<2:13:33,  7.85s/it]
                                                       
{'loss': 1.1306, 'learning_rate': 0.0, 'epoch': 1.48}

 50%|████▉     | 1001/2022 [2:10:18<2:13:33,  7.85s/it]
 50%|████▉     | 1002/2022 [2:10:26<2:11:33,  7.74s/it]
                                                       
{'loss': 1.1365, 'learning_rate': 0.0, 'epoch': 1.49}

 50%|████▉     | 1002/2022 [2:10:26<2:11:33,  7.74s/it]
 50%|████▉     | 1003/2022 [2:10:33<2:11:08,  7.72s/it]
                                                       
{'loss': 1.087, 'learning_rate': 0.0, 'epoch': 1.49}

 50%|████▉     | 1003/2022 [2:10:34<2:11:08,  7.72s/it]
 50%|████▉     | 1004/2022 [2:10:41<2:10:39,  7.70s/it]
                                                       
{'loss': 1.1236, 'learning_rate': 0.0, 'epoch': 1.49}

 50%|████▉     | 1004/2022 [2:10:41<2:10:39,  7.70s/it]
 50%|████▉     | 1005/2022 [2:10:49<2:10:33,  7.70s/it]
                                                       
{'loss': 1.1357, 'learning_rate': 0.0, 'epoch': 1.49}

 50%|████▉     | 1005/2022 [2:10:49<2:10:33,  7.70s/it]
 50%|████▉     | 1006/2022 [2:10:57<2:10:22,  7.70s/it]
                                                       
{'loss': 1.1286, 'learning_rate': 0.0, 'epoch': 1.49}

 50%|████▉     | 1006/2022 [2:10:57<2:10:22,  7.70s/it]
 50%|████▉     | 1007/2022 [2:11:04<2:11:29,  7.77s/it]
                                                       
{'loss': 1.174, 'learning_rate': 0.0, 'epoch': 1.49}

 50%|████▉     | 1007/2022 [2:11:04<2:11:29,  7.77s/it]
 50%|████▉     | 1008/2022 [2:11:12<2:11:08,  7.76s/it]
                                                       
{'loss': 1.2927, 'learning_rate': 0.0, 'epoch': 1.49}

 50%|████▉     | 1008/2022 [2:11:12<2:11:08,  7.76s/it]
 50%|████▉     | 1009/2022 [2:11:20<2:09:59,  7.70s/it]
                                                       
{'loss': 1.1926, 'learning_rate': 0.0, 'epoch': 1.5}

 50%|████▉     | 1009/2022 [2:11:20<2:09:59,  7.70s/it]
 50%|████▉     | 1010/2022 [2:11:28<2:10:32,  7.74s/it]
                                                       
{'loss': 1.0786, 'learning_rate': 0.0, 'epoch': 1.5}

 50%|████▉     | 1010/2022 [2:11:28<2:10:32,  7.74s/it]
 50%|█████     | 1011/2022 [2:11:36<2:15:01,  8.01s/it]
                                                       
{'loss': 1.1901, 'learning_rate': 0.0, 'epoch': 1.5}

 50%|█████     | 1011/2022 [2:11:36<2:15:01,  8.01s/it]
 50%|█████     | 1012/2022 [2:11:44<2:12:23,  7.86s/it]
                                                       
{'loss': 1.2089, 'learning_rate': 0.0, 'epoch': 1.5}

 50%|█████     | 1012/2022 [2:11:44<2:12:23,  7.86s/it]
 50%|█████     | 1013/2022 [2:11:52<2:13:07,  7.92s/it]
                                                       
{'loss': 1.2187, 'learning_rate': 0.0, 'epoch': 1.5}

 50%|█████     | 1013/2022 [2:11:52<2:13:07,  7.92s/it]
 50%|█████     | 1014/2022 [2:11:59<2:10:49,  7.79s/it]
                                                       
{'loss': 1.1466, 'learning_rate': 0.0, 'epoch': 1.5}

 50%|█████     | 1014/2022 [2:11:59<2:10:49,  7.79s/it]
 50%|█████     | 1015/2022 [2:12:07<2:09:58,  7.74s/it]
                                                       
{'loss': 1.2884, 'learning_rate': 0.0, 'epoch': 1.5}

 50%|█████     | 1015/2022 [2:12:07<2:09:58,  7.74s/it]
 50%|█████     | 1016/2022 [2:12:15<2:10:11,  7.76s/it]
                                                       
{'loss': 1.1943, 'learning_rate': 0.0, 'epoch': 1.51}

 50%|█████     | 1016/2022 [2:12:15<2:10:11,  7.76s/it]
 50%|█████     | 1017/2022 [2:12:23<2:10:38,  7.80s/it]
                                                       
{'loss': 1.187, 'learning_rate': 0.0, 'epoch': 1.51}

 50%|█████     | 1017/2022 [2:12:23<2:10:38,  7.80s/it]
 50%|█████     | 1018/2022 [2:12:31<2:11:15,  7.84s/it]
                                                       
{'loss': 1.1481, 'learning_rate': 0.0, 'epoch': 1.51}

 50%|█████     | 1018/2022 [2:12:31<2:11:15,  7.84s/it]
 50%|█████     | 1019/2022 [2:12:38<2:09:49,  7.77s/it]
                                                       
{'loss': 1.2777, 'learning_rate': 0.0, 'epoch': 1.51}

 50%|█████     | 1019/2022 [2:12:38<2:09:49,  7.77s/it]
 50%|█████     | 1020/2022 [2:12:46<2:10:04,  7.79s/it]
                                                       
{'loss': 1.0634, 'learning_rate': 0.0, 'epoch': 1.51}

 50%|█████     | 1020/2022 [2:12:46<2:10:04,  7.79s/it]
 50%|█████     | 1021/2022 [2:12:54<2:09:38,  7.77s/it]
                                                       
{'loss': 1.2767, 'learning_rate': 0.0, 'epoch': 1.51}

 50%|█████     | 1021/2022 [2:12:54<2:09:38,  7.77s/it]
 51%|█████     | 1022/2022 [2:13:01<2:08:13,  7.69s/it]
                                                       
{'loss': 1.1573, 'learning_rate': 0.0, 'epoch': 1.52}

 51%|█████     | 1022/2022 [2:13:01<2:08:13,  7.69s/it]
 51%|█████     | 1023/2022 [2:13:09<2:10:22,  7.83s/it]
                                                       
{'loss': 1.206, 'learning_rate': 0.0, 'epoch': 1.52}

 51%|█████     | 1023/2022 [2:13:09<2:10:22,  7.83s/it]
 51%|█████     | 1024/2022 [2:13:17<2:09:29,  7.78s/it]
                                                       
{'loss': 1.1372, 'learning_rate': 0.0, 'epoch': 1.52}

 51%|█████     | 1024/2022 [2:13:17<2:09:29,  7.78s/it]
 51%|█████     | 1025/2022 [2:13:25<2:10:03,  7.83s/it]
                                                       
{'loss': 1.2322, 'learning_rate': 0.0, 'epoch': 1.52}

 51%|█████     | 1025/2022 [2:13:25<2:10:03,  7.83s/it]
 51%|█████     | 1026/2022 [2:13:33<2:09:04,  7.78s/it]
                                                       
{'loss': 1.1271, 'learning_rate': 0.0, 'epoch': 1.52}

 51%|█████     | 1026/2022 [2:13:33<2:09:04,  7.78s/it]
 51%|█████     | 1027/2022 [2:13:41<2:10:14,  7.85s/it]
                                                       
{'loss': 1.1446, 'learning_rate': 0.0, 'epoch': 1.52}

 51%|█████     | 1027/2022 [2:13:41<2:10:14,  7.85s/it]
 51%|█████     | 1028/2022 [2:13:48<2:09:46,  7.83s/it]
                                                       
{'loss': 1.1979, 'learning_rate': 0.0, 'epoch': 1.52}

 51%|█████     | 1028/2022 [2:13:48<2:09:46,  7.83s/it]
 51%|█████     | 1029/2022 [2:13:56<2:09:10,  7.81s/it]
                                                       
{'loss': 1.2867, 'learning_rate': 0.0, 'epoch': 1.53}

 51%|█████     | 1029/2022 [2:13:56<2:09:10,  7.81s/it]
 51%|█████     | 1030/2022 [2:14:04<2:09:06,  7.81s/it]
                                                       
{'loss': 1.0479, 'learning_rate': 0.0, 'epoch': 1.53}

 51%|█████     | 1030/2022 [2:14:04<2:09:06,  7.81s/it]
 51%|█████     | 1031/2022 [2:14:11<2:07:06,  7.70s/it]
                                                       
{'loss': 1.2097, 'learning_rate': 0.0, 'epoch': 1.53}

 51%|█████     | 1031/2022 [2:14:11<2:07:06,  7.70s/it]
 51%|█████     | 1032/2022 [2:14:19<2:08:16,  7.77s/it]
                                                       
{'loss': 0.9857, 'learning_rate': 0.0, 'epoch': 1.53}

 51%|█████     | 1032/2022 [2:14:19<2:08:16,  7.77s/it]
 51%|█████     | 1033/2022 [2:14:27<2:07:58,  7.76s/it]
                                                       
{'loss': 1.153, 'learning_rate': 0.0, 'epoch': 1.53}

 51%|█████     | 1033/2022 [2:14:27<2:07:58,  7.76s/it]
 51%|█████     | 1034/2022 [2:14:35<2:06:46,  7.70s/it]
                                                       
{'loss': 1.1501, 'learning_rate': 0.0, 'epoch': 1.53}

 51%|█████     | 1034/2022 [2:14:35<2:06:46,  7.70s/it]
 51%|█████     | 1035/2022 [2:14:42<2:06:21,  7.68s/it]
                                                       
{'loss': 1.1357, 'learning_rate': 0.0, 'epoch': 1.53}

 51%|█████     | 1035/2022 [2:14:42<2:06:21,  7.68s/it]
 51%|█████     | 1036/2022 [2:14:50<2:06:02,  7.67s/it]
                                                       
{'loss': 1.1521, 'learning_rate': 0.0, 'epoch': 1.54}

 51%|█████     | 1036/2022 [2:14:50<2:06:02,  7.67s/it]
 51%|█████▏    | 1037/2022 [2:14:58<2:05:57,  7.67s/it]
                                                       
{'loss': 1.173, 'learning_rate': 0.0, 'epoch': 1.54}

 51%|█████▏    | 1037/2022 [2:14:58<2:05:57,  7.67s/it]
 51%|█████▏    | 1038/2022 [2:15:06<2:06:41,  7.73s/it]
                                                       
{'loss': 1.0836, 'learning_rate': 0.0, 'epoch': 1.54}

 51%|█████▏    | 1038/2022 [2:15:06<2:06:41,  7.73s/it]
 51%|█████▏    | 1039/2022 [2:15:13<2:06:32,  7.72s/it]
                                                       
{'loss': 1.2574, 'learning_rate': 0.0, 'epoch': 1.54}

 51%|█████▏    | 1039/2022 [2:15:13<2:06:32,  7.72s/it]
 51%|█████▏    | 1040/2022 [2:15:21<2:06:45,  7.74s/it]
                                                       
{'loss': 1.1842, 'learning_rate': 0.0, 'epoch': 1.54}

 51%|█████▏    | 1040/2022 [2:15:21<2:06:45,  7.74s/it]
 51%|█████▏    | 1041/2022 [2:15:29<2:06:23,  7.73s/it]
                                                       
{'loss': 1.3091, 'learning_rate': 0.0, 'epoch': 1.54}

 51%|█████▏    | 1041/2022 [2:15:29<2:06:23,  7.73s/it]
 52%|█████▏    | 1042/2022 [2:15:37<2:06:44,  7.76s/it]
                                                       
{'loss': 1.0448, 'learning_rate': 0.0, 'epoch': 1.54}

 52%|█████▏    | 1042/2022 [2:15:37<2:06:44,  7.76s/it]
 52%|█████▏    | 1043/2022 [2:15:45<2:08:01,  7.85s/it]
                                                       
{'loss': 1.163, 'learning_rate': 0.0, 'epoch': 1.55}

 52%|█████▏    | 1043/2022 [2:15:45<2:08:01,  7.85s/it]
 52%|█████▏    | 1044/2022 [2:15:52<2:07:41,  7.83s/it]
                                                       
{'loss': 1.1469, 'learning_rate': 0.0, 'epoch': 1.55}

 52%|█████▏    | 1044/2022 [2:15:52<2:07:41,  7.83s/it]
 52%|█████▏    | 1045/2022 [2:16:00<2:07:05,  7.81s/it]
                                                       
{'loss': 1.19, 'learning_rate': 0.0, 'epoch': 1.55}

 52%|█████▏    | 1045/2022 [2:16:00<2:07:05,  7.81s/it]
 52%|█████▏    | 1046/2022 [2:16:08<2:06:01,  7.75s/it]
                                                       
{'loss': 1.1247, 'learning_rate': 0.0, 'epoch': 1.55}

 52%|█████▏    | 1046/2022 [2:16:08<2:06:01,  7.75s/it]
 52%|█████▏    | 1047/2022 [2:16:15<2:05:38,  7.73s/it]
                                                       
{'loss': 1.1355, 'learning_rate': 0.0, 'epoch': 1.55}

 52%|█████▏    | 1047/2022 [2:16:15<2:05:38,  7.73s/it]
 52%|█████▏    | 1048/2022 [2:16:23<2:05:28,  7.73s/it]
                                                       
{'loss': 1.1199, 'learning_rate': 0.0, 'epoch': 1.55}

 52%|█████▏    | 1048/2022 [2:16:23<2:05:28,  7.73s/it]
 52%|█████▏    | 1049/2022 [2:16:31<2:06:39,  7.81s/it]
                                                       
{'loss': 1.1817, 'learning_rate': 0.0, 'epoch': 1.56}

 52%|█████▏    | 1049/2022 [2:16:31<2:06:39,  7.81s/it]
 52%|█████▏    | 1050/2022 [2:16:39<2:08:09,  7.91s/it]
                                                       
{'loss': 1.2482, 'learning_rate': 0.0, 'epoch': 1.56}

 52%|█████▏    | 1050/2022 [2:16:39<2:08:09,  7.91s/it]
 52%|█████▏    | 1051/2022 [2:16:47<2:07:28,  7.88s/it]
                                                       
{'loss': 1.1984, 'learning_rate': 0.0, 'epoch': 1.56}

 52%|█████▏    | 1051/2022 [2:16:47<2:07:28,  7.88s/it]
 52%|█████▏    | 1052/2022 [2:16:55<2:06:31,  7.83s/it]
                                                       
{'loss': 1.178, 'learning_rate': 0.0, 'epoch': 1.56}

 52%|█████▏    | 1052/2022 [2:16:55<2:06:31,  7.83s/it]
 52%|█████▏    | 1053/2022 [2:17:02<2:04:59,  7.74s/it]
                                                       
{'loss': 1.1076, 'learning_rate': 0.0, 'epoch': 1.56}

 52%|█████▏    | 1053/2022 [2:17:02<2:04:59,  7.74s/it]
 52%|█████▏    | 1054/2022 [2:17:10<2:05:55,  7.81s/it]
                                                       
{'loss': 1.0984, 'learning_rate': 0.0, 'epoch': 1.56}

 52%|█████▏    | 1054/2022 [2:17:10<2:05:55,  7.81s/it]
 52%|█████▏    | 1055/2022 [2:17:18<2:05:20,  7.78s/it]
                                                       
{'loss': 1.2184, 'learning_rate': 0.0, 'epoch': 1.56}

 52%|█████▏    | 1055/2022 [2:17:18<2:05:20,  7.78s/it]
 52%|█████▏    | 1056/2022 [2:17:26<2:05:18,  7.78s/it]
                                                       
{'loss': 1.1015, 'learning_rate': 0.0, 'epoch': 1.57}

 52%|█████▏    | 1056/2022 [2:17:26<2:05:18,  7.78s/it]
 52%|█████▏    | 1057/2022 [2:17:34<2:06:05,  7.84s/it]
                                                       
{'loss': 1.1859, 'learning_rate': 0.0, 'epoch': 1.57}

 52%|█████▏    | 1057/2022 [2:17:34<2:06:05,  7.84s/it]
 52%|█████▏    | 1058/2022 [2:17:42<2:07:35,  7.94s/it]
                                                       
{'loss': 1.1182, 'learning_rate': 0.0, 'epoch': 1.57}

 52%|█████▏    | 1058/2022 [2:17:42<2:07:35,  7.94s/it]
 52%|█████▏    | 1059/2022 [2:17:50<2:06:22,  7.87s/it]
                                                       
{'loss': 1.1861, 'learning_rate': 0.0, 'epoch': 1.57}

 52%|█████▏    | 1059/2022 [2:17:50<2:06:22,  7.87s/it]
 52%|█████▏    | 1060/2022 [2:17:58<2:07:22,  7.94s/it]
                                                       
{'loss': 1.1972, 'learning_rate': 0.0, 'epoch': 1.57}

 52%|█████▏    | 1060/2022 [2:17:58<2:07:22,  7.94s/it]
 52%|█████▏    | 1061/2022 [2:18:05<2:05:29,  7.84s/it]
                                                       
{'loss': 1.1848, 'learning_rate': 0.0, 'epoch': 1.57}

 52%|█████▏    | 1061/2022 [2:18:05<2:05:29,  7.84s/it]
 53%|█████▎    | 1062/2022 [2:18:13<2:05:52,  7.87s/it]
                                                       
{'loss': 1.1287, 'learning_rate': 0.0, 'epoch': 1.57}

 53%|█████▎    | 1062/2022 [2:18:13<2:05:52,  7.87s/it]
 53%|█████▎    | 1063/2022 [2:18:21<2:06:58,  7.94s/it]
                                                       
{'loss': 1.2248, 'learning_rate': 0.0, 'epoch': 1.58}

 53%|█████▎    | 1063/2022 [2:18:21<2:06:58,  7.94s/it]
 53%|█████▎    | 1064/2022 [2:18:29<2:06:52,  7.95s/it]
                                                       
{'loss': 1.1657, 'learning_rate': 0.0, 'epoch': 1.58}

 53%|█████▎    | 1064/2022 [2:18:29<2:06:52,  7.95s/it]
 53%|█████▎    | 1065/2022 [2:18:37<2:05:30,  7.87s/it]
                                                       
{'loss': 1.1186, 'learning_rate': 0.0, 'epoch': 1.58}

 53%|█████▎    | 1065/2022 [2:18:37<2:05:30,  7.87s/it]
 53%|█████▎    | 1066/2022 [2:18:45<2:04:19,  7.80s/it]
                                                       
{'loss': 1.1994, 'learning_rate': 0.0, 'epoch': 1.58}

 53%|█████▎    | 1066/2022 [2:18:45<2:04:19,  7.80s/it]
 53%|█████▎    | 1067/2022 [2:18:53<2:04:36,  7.83s/it]
                                                       
{'loss': 1.2026, 'learning_rate': 0.0, 'epoch': 1.58}

 53%|█████▎    | 1067/2022 [2:18:53<2:04:36,  7.83s/it]
 53%|█████▎    | 1068/2022 [2:19:00<2:03:54,  7.79s/it]
                                                       
{'loss': 1.1789, 'learning_rate': 0.0, 'epoch': 1.58}

 53%|█████▎    | 1068/2022 [2:19:00<2:03:54,  7.79s/it]
 53%|█████▎    | 1069/2022 [2:19:09<2:05:58,  7.93s/it]
                                                       
{'loss': 1.0095, 'learning_rate': 0.0, 'epoch': 1.58}

 53%|█████▎    | 1069/2022 [2:19:09<2:05:58,  7.93s/it]
 53%|█████▎    | 1070/2022 [2:19:16<2:04:16,  7.83s/it]
                                                       
{'loss': 1.1919, 'learning_rate': 0.0, 'epoch': 1.59}

 53%|█████▎    | 1070/2022 [2:19:16<2:04:16,  7.83s/it]
 53%|█████▎    | 1071/2022 [2:19:24<2:05:25,  7.91s/it]
                                                       
{'loss': 1.1978, 'learning_rate': 0.0, 'epoch': 1.59}

 53%|█████▎    | 1071/2022 [2:19:24<2:05:25,  7.91s/it]
 53%|█████▎    | 1072/2022 [2:19:32<2:05:09,  7.91s/it]
                                                       
{'loss': 1.1839, 'learning_rate': 0.0, 'epoch': 1.59}

 53%|█████▎    | 1072/2022 [2:19:32<2:05:09,  7.91s/it]
 53%|█████▎    | 1073/2022 [2:19:40<2:05:04,  7.91s/it]
                                                       
{'loss': 1.2257, 'learning_rate': 0.0, 'epoch': 1.59}

 53%|█████▎    | 1073/2022 [2:19:40<2:05:04,  7.91s/it]
 53%|█████▎    | 1074/2022 [2:19:48<2:05:49,  7.96s/it]
                                                       
{'loss': 1.21, 'learning_rate': 0.0, 'epoch': 1.59}

 53%|█████▎    | 1074/2022 [2:19:48<2:05:49,  7.96s/it]
 53%|█████▎    | 1075/2022 [2:19:56<2:04:34,  7.89s/it]
                                                       
{'loss': 1.0978, 'learning_rate': 0.0, 'epoch': 1.59}

 53%|█████▎    | 1075/2022 [2:19:56<2:04:34,  7.89s/it]
 53%|█████▎    | 1076/2022 [2:20:04<2:03:58,  7.86s/it]
                                                       
{'loss': 1.0812, 'learning_rate': 0.0, 'epoch': 1.6}

 53%|█████▎    | 1076/2022 [2:20:04<2:03:58,  7.86s/it]
 53%|█████▎    | 1077/2022 [2:20:11<2:01:47,  7.73s/it]
                                                       
{'loss': 1.1282, 'learning_rate': 0.0, 'epoch': 1.6}

 53%|█████▎    | 1077/2022 [2:20:11<2:01:47,  7.73s/it]
 53%|█████▎    | 1078/2022 [2:20:19<2:03:04,  7.82s/it]
                                                       
{'loss': 1.1857, 'learning_rate': 0.0, 'epoch': 1.6}

 53%|█████▎    | 1078/2022 [2:20:19<2:03:04,  7.82s/it]
 53%|█████▎    | 1079/2022 [2:20:27<2:03:42,  7.87s/it]
                                                       
{'loss': 1.2118, 'learning_rate': 0.0, 'epoch': 1.6}

 53%|█████▎    | 1079/2022 [2:20:27<2:03:42,  7.87s/it]
 53%|█████▎    | 1080/2022 [2:20:35<2:02:49,  7.82s/it]
                                                       
{'loss': 1.0744, 'learning_rate': 0.0, 'epoch': 1.6}

 53%|█████▎    | 1080/2022 [2:20:35<2:02:49,  7.82s/it]
 53%|█████▎    | 1081/2022 [2:20:43<2:03:27,  7.87s/it]
                                                       
{'loss': 1.207, 'learning_rate': 0.0, 'epoch': 1.6}

 53%|█████▎    | 1081/2022 [2:20:43<2:03:27,  7.87s/it]
 54%|█████▎    | 1082/2022 [2:20:51<2:03:34,  7.89s/it]
                                                       
{'loss': 1.1707, 'learning_rate': 0.0, 'epoch': 1.6}

 54%|█████▎    | 1082/2022 [2:20:51<2:03:34,  7.89s/it]
 54%|█████▎    | 1083/2022 [2:20:59<2:03:39,  7.90s/it]
                                                       
{'loss': 1.1514, 'learning_rate': 0.0, 'epoch': 1.61}

 54%|█████▎    | 1083/2022 [2:20:59<2:03:39,  7.90s/it]
 54%|█████▎    | 1084/2022 [2:21:07<2:03:48,  7.92s/it]
                                                       
{'loss': 1.111, 'learning_rate': 0.0, 'epoch': 1.61}

 54%|█████▎    | 1084/2022 [2:21:07<2:03:48,  7.92s/it]
 54%|█████▎    | 1085/2022 [2:21:15<2:03:32,  7.91s/it]
                                                       
{'loss': 1.1156, 'learning_rate': 0.0, 'epoch': 1.61}

 54%|█████▎    | 1085/2022 [2:21:15<2:03:32,  7.91s/it]
 54%|█████▎    | 1086/2022 [2:21:23<2:03:33,  7.92s/it]
                                                       
{'loss': 1.0907, 'learning_rate': 0.0, 'epoch': 1.61}

 54%|█████▎    | 1086/2022 [2:21:23<2:03:33,  7.92s/it]
 54%|█████▍    | 1087/2022 [2:21:30<2:02:58,  7.89s/it]
                                                       
{'loss': 1.1701, 'learning_rate': 0.0, 'epoch': 1.61}

 54%|█████▍    | 1087/2022 [2:21:30<2:02:58,  7.89s/it]
 54%|█████▍    | 1088/2022 [2:21:38<2:02:07,  7.85s/it]
                                                       
{'loss': 1.1995, 'learning_rate': 0.0, 'epoch': 1.61}

 54%|█████▍    | 1088/2022 [2:21:38<2:02:07,  7.85s/it]
 54%|█████▍    | 1089/2022 [2:21:46<2:01:11,  7.79s/it]
                                                       
{'loss': 1.0596, 'learning_rate': 0.0, 'epoch': 1.61}

 54%|█████▍    | 1089/2022 [2:21:46<2:01:11,  7.79s/it]
 54%|█████▍    | 1090/2022 [2:21:54<2:03:08,  7.93s/it]
                                                       
{'loss': 1.1161, 'learning_rate': 0.0, 'epoch': 1.62}

 54%|█████▍    | 1090/2022 [2:21:54<2:03:08,  7.93s/it]
 54%|█████▍    | 1091/2022 [2:22:02<2:02:54,  7.92s/it]
                                                       
{'loss': 1.1713, 'learning_rate': 0.0, 'epoch': 1.62}

 54%|█████▍    | 1091/2022 [2:22:02<2:02:54,  7.92s/it]
 54%|█████▍    | 1092/2022 [2:22:10<2:01:33,  7.84s/it]
                                                       
{'loss': 1.1484, 'learning_rate': 0.0, 'epoch': 1.62}

 54%|█████▍    | 1092/2022 [2:22:10<2:01:33,  7.84s/it]
 54%|█████▍    | 1093/2022 [2:22:18<2:02:00,  7.88s/it]
                                                       
{'loss': 1.1736, 'learning_rate': 0.0, 'epoch': 1.62}

 54%|█████▍    | 1093/2022 [2:22:18<2:02:00,  7.88s/it]
 54%|█████▍    | 1094/2022 [2:22:26<2:02:29,  7.92s/it]
                                                       
{'loss': 1.2165, 'learning_rate': 0.0, 'epoch': 1.62}

 54%|█████▍    | 1094/2022 [2:22:26<2:02:29,  7.92s/it]
 54%|█████▍    | 1095/2022 [2:22:33<2:01:35,  7.87s/it]
                                                       
{'loss': 1.1725, 'learning_rate': 0.0, 'epoch': 1.62}

 54%|█████▍    | 1095/2022 [2:22:33<2:01:35,  7.87s/it]
 54%|█████▍    | 1096/2022 [2:22:41<2:01:35,  7.88s/it]
                                                       
{'loss': 1.1585, 'learning_rate': 0.0, 'epoch': 1.62}

 54%|█████▍    | 1096/2022 [2:22:41<2:01:35,  7.88s/it]
 54%|█████▍    | 1097/2022 [2:22:49<1:59:32,  7.75s/it]
                                                       
{'loss': 1.1465, 'learning_rate': 0.0, 'epoch': 1.63}

 54%|█████▍    | 1097/2022 [2:22:49<1:59:32,  7.75s/it]
 54%|█████▍    | 1098/2022 [2:22:57<2:01:26,  7.89s/it]
                                                       
{'loss': 1.0602, 'learning_rate': 0.0, 'epoch': 1.63}

 54%|█████▍    | 1098/2022 [2:22:57<2:01:26,  7.89s/it]
 54%|█████▍    | 1099/2022 [2:23:05<2:01:48,  7.92s/it]
                                                       
{'loss': 1.0319, 'learning_rate': 0.0, 'epoch': 1.63}

 54%|█████▍    | 1099/2022 [2:23:05<2:01:48,  7.92s/it]
 54%|█████▍    | 1100/2022 [2:23:12<2:00:25,  7.84s/it]
                                                       
{'loss': 1.2455, 'learning_rate': 0.0, 'epoch': 1.63}

 54%|█████▍    | 1100/2022 [2:23:13<2:00:25,  7.84s/it]
 54%|█████▍    | 1101/2022 [2:23:20<1:59:25,  7.78s/it]
                                                       
{'loss': 1.1382, 'learning_rate': 0.0, 'epoch': 1.63}

 54%|█████▍    | 1101/2022 [2:23:20<1:59:25,  7.78s/it]
 55%|█████▍    | 1102/2022 [2:23:28<2:01:28,  7.92s/it]
                                                       
{'loss': 1.2038, 'learning_rate': 0.0, 'epoch': 1.63}

 55%|█████▍    | 1102/2022 [2:23:29<2:01:28,  7.92s/it]
 55%|█████▍    | 1103/2022 [2:23:36<2:00:49,  7.89s/it]
                                                       
{'loss': 1.1493, 'learning_rate': 0.0, 'epoch': 1.64}

 55%|█████▍    | 1103/2022 [2:23:36<2:00:49,  7.89s/it]
 55%|█████▍    | 1104/2022 [2:23:44<1:58:57,  7.78s/it]
                                                       
{'loss': 1.0325, 'learning_rate': 0.0, 'epoch': 1.64}

 55%|█████▍    | 1104/2022 [2:23:44<1:58:57,  7.78s/it]
 55%|█████▍    | 1105/2022 [2:23:52<1:59:07,  7.79s/it]
                                                       
{'loss': 1.1619, 'learning_rate': 0.0, 'epoch': 1.64}

 55%|█████▍    | 1105/2022 [2:23:52<1:59:07,  7.79s/it]
 55%|█████▍    | 1106/2022 [2:23:59<1:59:18,  7.81s/it]
                                                       
{'loss': 1.1457, 'learning_rate': 0.0, 'epoch': 1.64}

 55%|█████▍    | 1106/2022 [2:23:59<1:59:18,  7.81s/it]
 55%|█████▍    | 1107/2022 [2:24:08<2:00:30,  7.90s/it]
                                                       
{'loss': 1.2432, 'learning_rate': 0.0, 'epoch': 1.64}

 55%|█████▍    | 1107/2022 [2:24:08<2:00:30,  7.90s/it]
 55%|█████▍    | 1108/2022 [2:24:15<2:00:27,  7.91s/it]
                                                       
{'loss': 1.1317, 'learning_rate': 0.0, 'epoch': 1.64}

 55%|█████▍    | 1108/2022 [2:24:15<2:00:27,  7.91s/it]
 55%|█████▍    | 1109/2022 [2:24:24<2:02:05,  8.02s/it]
                                                       
{'loss': 1.0433, 'learning_rate': 0.0, 'epoch': 1.64}

 55%|█████▍    | 1109/2022 [2:24:24<2:02:05,  8.02s/it]
 55%|█████▍    | 1110/2022 [2:24:32<2:01:21,  7.98s/it]
                                                       
{'loss': 1.1399, 'learning_rate': 0.0, 'epoch': 1.65}

 55%|█████▍    | 1110/2022 [2:24:32<2:01:21,  7.98s/it]
 55%|█████▍    | 1111/2022 [2:24:40<2:01:14,  7.98s/it]
                                                       
{'loss': 1.0364, 'learning_rate': 0.0, 'epoch': 1.65}

 55%|█████▍    | 1111/2022 [2:24:40<2:01:14,  7.98s/it]
 55%|█████▍    | 1112/2022 [2:24:49<2:05:12,  8.26s/it]
                                                       
{'loss': 1.1709, 'learning_rate': 0.0, 'epoch': 1.65}

 55%|█████▍    | 1112/2022 [2:24:49<2:05:12,  8.26s/it]
 55%|█████▌    | 1113/2022 [2:24:56<2:03:37,  8.16s/it]
                                                       
{'loss': 1.2103, 'learning_rate': 0.0, 'epoch': 1.65}

 55%|█████▌    | 1113/2022 [2:24:56<2:03:37,  8.16s/it]
 55%|█████▌    | 1114/2022 [2:25:04<2:01:04,  8.00s/it]
                                                       
{'loss': 1.0939, 'learning_rate': 0.0, 'epoch': 1.65}

 55%|█████▌    | 1114/2022 [2:25:04<2:01:04,  8.00s/it]
 55%|█████▌    | 1115/2022 [2:25:12<2:00:55,  8.00s/it]
                                                       
{'loss': 1.1888, 'learning_rate': 0.0, 'epoch': 1.65}

 55%|█████▌    | 1115/2022 [2:25:12<2:00:55,  8.00s/it]
 55%|█████▌    | 1116/2022 [2:25:20<1:59:51,  7.94s/it]
                                                       
{'loss': 1.1458, 'learning_rate': 0.0, 'epoch': 1.65}

 55%|█████▌    | 1116/2022 [2:25:20<1:59:51,  7.94s/it]
 55%|█████▌    | 1117/2022 [2:25:28<2:01:12,  8.04s/it]
                                                       
{'loss': 1.2468, 'learning_rate': 0.0, 'epoch': 1.66}

 55%|█████▌    | 1117/2022 [2:25:28<2:01:12,  8.04s/it]
 55%|█████▌    | 1118/2022 [2:25:36<1:59:49,  7.95s/it]
                                                       
{'loss': 1.0778, 'learning_rate': 0.0, 'epoch': 1.66}

 55%|█████▌    | 1118/2022 [2:25:36<1:59:49,  7.95s/it]
 55%|█████▌    | 1119/2022 [2:25:44<1:59:53,  7.97s/it]
                                                       
{'loss': 1.2716, 'learning_rate': 0.0, 'epoch': 1.66}

 55%|█████▌    | 1119/2022 [2:25:44<1:59:53,  7.97s/it]
 55%|█████▌    | 1120/2022 [2:25:52<1:59:59,  7.98s/it]
                                                       
{'loss': 1.1264, 'learning_rate': 0.0, 'epoch': 1.66}

 55%|█████▌    | 1120/2022 [2:25:52<1:59:59,  7.98s/it]
 55%|█████▌    | 1121/2022 [2:25:59<1:57:27,  7.82s/it]
                                                       
{'loss': 1.1816, 'learning_rate': 0.0, 'epoch': 1.66}

 55%|█████▌    | 1121/2022 [2:25:59<1:57:27,  7.82s/it]
 55%|█████▌    | 1122/2022 [2:26:07<1:57:18,  7.82s/it]
                                                       
{'loss': 1.1632, 'learning_rate': 0.0, 'epoch': 1.66}

 55%|█████▌    | 1122/2022 [2:26:07<1:57:18,  7.82s/it]
 56%|█████▌    | 1123/2022 [2:26:15<1:55:32,  7.71s/it]
                                                       
{'loss': 1.1376, 'learning_rate': 0.0, 'epoch': 1.66}

 56%|█████▌    | 1123/2022 [2:26:15<1:55:32,  7.71s/it]
 56%|█████▌    | 1124/2022 [2:26:22<1:54:03,  7.62s/it]
                                                       
{'loss': 1.1677, 'learning_rate': 0.0, 'epoch': 1.67}

 56%|█████▌    | 1124/2022 [2:26:22<1:54:03,  7.62s/it]
 56%|█████▌    | 1125/2022 [2:26:30<1:53:16,  7.58s/it]
                                                       
{'loss': 1.1625, 'learning_rate': 0.0, 'epoch': 1.67}

 56%|█████▌    | 1125/2022 [2:26:30<1:53:16,  7.58s/it]
 56%|█████▌    | 1126/2022 [2:26:37<1:53:31,  7.60s/it]
                                                       
{'loss': 1.2415, 'learning_rate': 0.0, 'epoch': 1.67}

 56%|█████▌    | 1126/2022 [2:26:37<1:53:31,  7.60s/it]
 56%|█████▌    | 1127/2022 [2:26:45<1:55:03,  7.71s/it]
                                                       
{'loss': 1.2348, 'learning_rate': 0.0, 'epoch': 1.67}

 56%|█████▌    | 1127/2022 [2:26:45<1:55:03,  7.71s/it]
 56%|█████▌    | 1128/2022 [2:26:53<1:55:06,  7.73s/it]
                                                       
{'loss': 1.1601, 'learning_rate': 0.0, 'epoch': 1.67}

 56%|█████▌    | 1128/2022 [2:26:53<1:55:06,  7.73s/it]
 56%|█████▌    | 1129/2022 [2:27:01<1:54:36,  7.70s/it]
                                                       
{'loss': 1.2843, 'learning_rate': 0.0, 'epoch': 1.67}

 56%|█████▌    | 1129/2022 [2:27:01<1:54:36,  7.70s/it]
 56%|█████▌    | 1130/2022 [2:27:08<1:55:15,  7.75s/it]
                                                       
{'loss': 1.1868, 'learning_rate': 0.0, 'epoch': 1.68}

 56%|█████▌    | 1130/2022 [2:27:08<1:55:15,  7.75s/it]
 56%|█████▌    | 1131/2022 [2:27:16<1:55:43,  7.79s/it]
                                                       
{'loss': 1.192, 'learning_rate': 0.0, 'epoch': 1.68}

 56%|█████▌    | 1131/2022 [2:27:16<1:55:43,  7.79s/it]
 56%|█████▌    | 1132/2022 [2:27:24<1:53:36,  7.66s/it]
                                                       
{'loss': 1.3488, 'learning_rate': 0.0, 'epoch': 1.68}

 56%|█████▌    | 1132/2022 [2:27:24<1:53:36,  7.66s/it]
 56%|█████▌    | 1133/2022 [2:27:31<1:53:40,  7.67s/it]
                                                       
{'loss': 1.0364, 'learning_rate': 0.0, 'epoch': 1.68}

 56%|█████▌    | 1133/2022 [2:27:31<1:53:40,  7.67s/it]
 56%|█████▌    | 1134/2022 [2:27:39<1:54:08,  7.71s/it]
                                                       
{'loss': 1.2038, 'learning_rate': 0.0, 'epoch': 1.68}

 56%|█████▌    | 1134/2022 [2:27:39<1:54:08,  7.71s/it]
 56%|█████▌    | 1135/2022 [2:27:46<1:52:24,  7.60s/it]
                                                       
{'loss': 1.2208, 'learning_rate': 0.0, 'epoch': 1.68}

 56%|█████▌    | 1135/2022 [2:27:47<1:52:24,  7.60s/it]
 56%|█████▌    | 1136/2022 [2:27:54<1:51:39,  7.56s/it]
                                                       
{'loss': 1.0901, 'learning_rate': 0.0, 'epoch': 1.68}

 56%|█████▌    | 1136/2022 [2:27:54<1:51:39,  7.56s/it]
 56%|█████▌    | 1137/2022 [2:28:02<1:52:29,  7.63s/it]
                                                       
{'loss': 1.1828, 'learning_rate': 0.0, 'epoch': 1.69}

 56%|█████▌    | 1137/2022 [2:28:02<1:52:29,  7.63s/it]
 56%|█████▋    | 1138/2022 [2:28:10<1:53:38,  7.71s/it]
                                                       
{'loss': 1.1221, 'learning_rate': 0.0, 'epoch': 1.69}

 56%|█████▋    | 1138/2022 [2:28:10<1:53:38,  7.71s/it]
 56%|█████▋    | 1139/2022 [2:28:18<1:54:10,  7.76s/it]
                                                       
{'loss': 1.0971, 'learning_rate': 0.0, 'epoch': 1.69}

 56%|█████▋    | 1139/2022 [2:28:18<1:54:10,  7.76s/it]
 56%|█████▋    | 1140/2022 [2:28:25<1:54:49,  7.81s/it]
                                                       
{'loss': 1.2964, 'learning_rate': 0.0, 'epoch': 1.69}

 56%|█████▋    | 1140/2022 [2:28:25<1:54:49,  7.81s/it]
 56%|█████▋    | 1141/2022 [2:28:33<1:54:18,  7.79s/it]
                                                       
{'loss': 1.1972, 'learning_rate': 0.0, 'epoch': 1.69}

 56%|█████▋    | 1141/2022 [2:28:33<1:54:18,  7.79s/it]
 56%|█████▋    | 1142/2022 [2:28:41<1:55:52,  7.90s/it]
                                                       
{'loss': 1.1239, 'learning_rate': 0.0, 'epoch': 1.69}

 56%|█████▋    | 1142/2022 [2:28:41<1:55:52,  7.90s/it]
 57%|█████▋    | 1143/2022 [2:28:49<1:54:46,  7.83s/it]
                                                       
{'loss': 1.117, 'learning_rate': 0.0, 'epoch': 1.69}

 57%|█████▋    | 1143/2022 [2:28:49<1:54:46,  7.83s/it]
 57%|█████▋    | 1144/2022 [2:28:57<1:54:25,  7.82s/it]
                                                       
{'loss': 1.1345, 'learning_rate': 0.0, 'epoch': 1.7}

 57%|█████▋    | 1144/2022 [2:28:57<1:54:25,  7.82s/it]
 57%|█████▋    | 1145/2022 [2:29:05<1:54:32,  7.84s/it]
                                                       
{'loss': 1.0724, 'learning_rate': 0.0, 'epoch': 1.7}

 57%|█████▋    | 1145/2022 [2:29:05<1:54:32,  7.84s/it]
 57%|█████▋    | 1146/2022 [2:29:13<1:54:31,  7.84s/it]
                                                       
{'loss': 1.1416, 'learning_rate': 0.0, 'epoch': 1.7}

 57%|█████▋    | 1146/2022 [2:29:13<1:54:31,  7.84s/it]
 57%|█████▋    | 1147/2022 [2:29:20<1:54:37,  7.86s/it]
                                                       
{'loss': 1.2874, 'learning_rate': 0.0, 'epoch': 1.7}

 57%|█████▋    | 1147/2022 [2:29:20<1:54:37,  7.86s/it]
 57%|█████▋    | 1148/2022 [2:29:28<1:55:00,  7.90s/it]
                                                       
{'loss': 1.1472, 'learning_rate': 0.0, 'epoch': 1.7}

 57%|█████▋    | 1148/2022 [2:29:28<1:55:00,  7.90s/it]
 57%|█████▋    | 1149/2022 [2:29:36<1:54:12,  7.85s/it]
                                                       
{'loss': 1.2262, 'learning_rate': 0.0, 'epoch': 1.7}

 57%|█████▋    | 1149/2022 [2:29:36<1:54:12,  7.85s/it]
 57%|█████▋    | 1150/2022 [2:29:44<1:55:58,  7.98s/it]
                                                       
{'loss': 1.2437, 'learning_rate': 0.0, 'epoch': 1.7}

 57%|█████▋    | 1150/2022 [2:29:44<1:55:58,  7.98s/it]
 57%|█████▋    | 1151/2022 [2:29:52<1:53:51,  7.84s/it]
                                                       
{'loss': 1.201, 'learning_rate': 0.0, 'epoch': 1.71}

 57%|█████▋    | 1151/2022 [2:29:52<1:53:51,  7.84s/it]
 57%|█████▋    | 1152/2022 [2:30:00<1:52:40,  7.77s/it]
                                                       
{'loss': 1.1596, 'learning_rate': 0.0, 'epoch': 1.71}

 57%|█████▋    | 1152/2022 [2:30:00<1:52:40,  7.77s/it]
 57%|█████▋    | 1153/2022 [2:30:07<1:53:03,  7.81s/it]
                                                       
{'loss': 1.1734, 'learning_rate': 0.0, 'epoch': 1.71}

 57%|█████▋    | 1153/2022 [2:30:07<1:53:03,  7.81s/it]
 57%|█████▋    | 1154/2022 [2:30:15<1:52:37,  7.79s/it]
                                                       
{'loss': 1.1296, 'learning_rate': 0.0, 'epoch': 1.71}

 57%|█████▋    | 1154/2022 [2:30:15<1:52:37,  7.79s/it]
 57%|█████▋    | 1155/2022 [2:30:23<1:52:14,  7.77s/it]
                                                       
{'loss': 1.1893, 'learning_rate': 0.0, 'epoch': 1.71}

 57%|█████▋    | 1155/2022 [2:30:23<1:52:14,  7.77s/it]
 57%|█████▋    | 1156/2022 [2:30:31<1:52:35,  7.80s/it]
                                                       
{'loss': 1.1595, 'learning_rate': 0.0, 'epoch': 1.71}

 57%|█████▋    | 1156/2022 [2:30:31<1:52:35,  7.80s/it]
 57%|█████▋    | 1157/2022 [2:30:39<1:52:18,  7.79s/it]
                                                       
{'loss': 1.19, 'learning_rate': 0.0, 'epoch': 1.72}

 57%|█████▋    | 1157/2022 [2:30:39<1:52:18,  7.79s/it]
 57%|█████▋    | 1158/2022 [2:30:46<1:51:48,  7.76s/it]
                                                       
{'loss': 1.0944, 'learning_rate': 0.0, 'epoch': 1.72}

 57%|█████▋    | 1158/2022 [2:30:46<1:51:48,  7.76s/it]
 57%|█████▋    | 1159/2022 [2:30:54<1:52:48,  7.84s/it]
                                                       
{'loss': 1.2358, 'learning_rate': 0.0, 'epoch': 1.72}

 57%|█████▋    | 1159/2022 [2:30:54<1:52:48,  7.84s/it]
 57%|█████▋    | 1160/2022 [2:31:02<1:52:46,  7.85s/it]
                                                       
{'loss': 1.3312, 'learning_rate': 0.0, 'epoch': 1.72}

 57%|█████▋    | 1160/2022 [2:31:02<1:52:46,  7.85s/it]
 57%|█████▋    | 1161/2022 [2:31:10<1:51:55,  7.80s/it]
                                                       
{'loss': 1.1593, 'learning_rate': 0.0, 'epoch': 1.72}

 57%|█████▋    | 1161/2022 [2:31:10<1:51:55,  7.80s/it]
 57%|█████▋    | 1162/2022 [2:31:18<1:51:40,  7.79s/it]
                                                       
{'loss': 1.2992, 'learning_rate': 0.0, 'epoch': 1.72}

 57%|█████▋    | 1162/2022 [2:31:18<1:51:40,  7.79s/it]
 58%|█████▊    | 1163/2022 [2:31:25<1:51:09,  7.76s/it]
                                                       
{'loss': 1.1927, 'learning_rate': 0.0, 'epoch': 1.72}

 58%|█████▊    | 1163/2022 [2:31:25<1:51:09,  7.76s/it]
 58%|█████▊    | 1164/2022 [2:31:33<1:51:23,  7.79s/it]
                                                       
{'loss': 1.2836, 'learning_rate': 0.0, 'epoch': 1.73}

 58%|█████▊    | 1164/2022 [2:31:33<1:51:23,  7.79s/it]
 58%|█████▊    | 1165/2022 [2:31:41<1:51:04,  7.78s/it]
                                                       
{'loss': 1.1548, 'learning_rate': 0.0, 'epoch': 1.73}

 58%|█████▊    | 1165/2022 [2:31:41<1:51:04,  7.78s/it]
 58%|█████▊    | 1166/2022 [2:31:49<1:51:15,  7.80s/it]
                                                       
{'loss': 1.2628, 'learning_rate': 0.0, 'epoch': 1.73}

 58%|█████▊    | 1166/2022 [2:31:49<1:51:15,  7.80s/it]
 58%|█████▊    | 1167/2022 [2:31:56<1:50:14,  7.74s/it]
                                                       
{'loss': 1.1656, 'learning_rate': 0.0, 'epoch': 1.73}

 58%|█████▊    | 1167/2022 [2:31:56<1:50:14,  7.74s/it]
 58%|█████▊    | 1168/2022 [2:32:04<1:49:56,  7.72s/it]
                                                       
{'loss': 1.1206, 'learning_rate': 0.0, 'epoch': 1.73}

 58%|█████▊    | 1168/2022 [2:32:04<1:49:56,  7.72s/it]
 58%|█████▊    | 1169/2022 [2:32:12<1:49:33,  7.71s/it]
                                                       
{'loss': 1.0745, 'learning_rate': 0.0, 'epoch': 1.73}

 58%|█████▊    | 1169/2022 [2:32:12<1:49:33,  7.71s/it]
 58%|█████▊    | 1170/2022 [2:32:19<1:48:58,  7.67s/it]
                                                       
{'loss': 1.3157, 'learning_rate': 0.0, 'epoch': 1.73}

 58%|█████▊    | 1170/2022 [2:32:19<1:48:58,  7.67s/it]
 58%|█████▊    | 1171/2022 [2:32:27<1:49:36,  7.73s/it]
                                                       
{'loss': 1.1547, 'learning_rate': 0.0, 'epoch': 1.74}

 58%|█████▊    | 1171/2022 [2:32:27<1:49:36,  7.73s/it]
 58%|█████▊    | 1172/2022 [2:32:35<1:51:53,  7.90s/it]
                                                       
{'loss': 1.1015, 'learning_rate': 0.0, 'epoch': 1.74}

 58%|█████▊    | 1172/2022 [2:32:36<1:51:53,  7.90s/it]
 58%|█████▊    | 1173/2022 [2:32:43<1:51:02,  7.85s/it]
                                                       
{'loss': 1.108, 'learning_rate': 0.0, 'epoch': 1.74}

 58%|█████▊    | 1173/2022 [2:32:43<1:51:02,  7.85s/it]
 58%|█████▊    | 1174/2022 [2:32:51<1:51:25,  7.88s/it]
                                                       
{'loss': 1.2258, 'learning_rate': 0.0, 'epoch': 1.74}

 58%|█████▊    | 1174/2022 [2:32:51<1:51:25,  7.88s/it]
 58%|█████▊    | 1175/2022 [2:32:59<1:51:19,  7.89s/it]
                                                       
{'loss': 1.2531, 'learning_rate': 0.0, 'epoch': 1.74}

 58%|█████▊    | 1175/2022 [2:32:59<1:51:19,  7.89s/it]
 58%|█████▊    | 1176/2022 [2:33:07<1:50:18,  7.82s/it]
                                                       
{'loss': 1.1664, 'learning_rate': 0.0, 'epoch': 1.74}

 58%|█████▊    | 1176/2022 [2:33:07<1:50:18,  7.82s/it]
 58%|█████▊    | 1177/2022 [2:33:14<1:49:23,  7.77s/it]
                                                       
{'loss': 1.1295, 'learning_rate': 0.0, 'epoch': 1.74}

 58%|█████▊    | 1177/2022 [2:33:14<1:49:23,  7.77s/it]
 58%|█████▊    | 1178/2022 [2:33:22<1:49:15,  7.77s/it]
                                                       
{'loss': 1.1936, 'learning_rate': 0.0, 'epoch': 1.75}

 58%|█████▊    | 1178/2022 [2:33:22<1:49:15,  7.77s/it]
 58%|█████▊    | 1179/2022 [2:33:30<1:48:33,  7.73s/it]
                                                       
{'loss': 1.1926, 'learning_rate': 0.0, 'epoch': 1.75}

 58%|█████▊    | 1179/2022 [2:33:30<1:48:33,  7.73s/it]
 58%|█████▊    | 1180/2022 [2:33:38<1:48:34,  7.74s/it]
                                                       
{'loss': 1.0601, 'learning_rate': 0.0, 'epoch': 1.75}

 58%|█████▊    | 1180/2022 [2:33:38<1:48:34,  7.74s/it]
 58%|█████▊    | 1181/2022 [2:33:45<1:48:29,  7.74s/it]
                                                       
{'loss': 1.0626, 'learning_rate': 0.0, 'epoch': 1.75}

 58%|█████▊    | 1181/2022 [2:33:45<1:48:29,  7.74s/it]
 58%|█████▊    | 1182/2022 [2:33:53<1:48:24,  7.74s/it]
                                                       
{'loss': 1.0848, 'learning_rate': 0.0, 'epoch': 1.75}

 58%|█████▊    | 1182/2022 [2:33:53<1:48:24,  7.74s/it]
 59%|█████▊    | 1183/2022 [2:34:01<1:49:47,  7.85s/it]
                                                       
{'loss': 1.1637, 'learning_rate': 0.0, 'epoch': 1.75}

 59%|█████▊    | 1183/2022 [2:34:01<1:49:47,  7.85s/it]
 59%|█████▊    | 1184/2022 [2:34:09<1:49:03,  7.81s/it]
                                                       
{'loss': 1.2416, 'learning_rate': 0.0, 'epoch': 1.76}

 59%|█████▊    | 1184/2022 [2:34:09<1:49:03,  7.81s/it]
 59%|█████▊    | 1185/2022 [2:34:17<1:49:43,  7.87s/it]
                                                       
{'loss': 1.1692, 'learning_rate': 0.0, 'epoch': 1.76}

 59%|█████▊    | 1185/2022 [2:34:17<1:49:43,  7.87s/it]
 59%|█████▊    | 1186/2022 [2:34:25<1:50:49,  7.95s/it]
                                                       
{'loss': 1.1668, 'learning_rate': 0.0, 'epoch': 1.76}

 59%|█████▊    | 1186/2022 [2:34:25<1:50:49,  7.95s/it]
 59%|█████▊    | 1187/2022 [2:34:33<1:49:47,  7.89s/it]
                                                       
{'loss': 1.0073, 'learning_rate': 0.0, 'epoch': 1.76}

 59%|█████▊    | 1187/2022 [2:34:33<1:49:47,  7.89s/it]
 59%|█████▉    | 1188/2022 [2:34:41<1:49:45,  7.90s/it]
                                                       
{'loss': 1.2383, 'learning_rate': 0.0, 'epoch': 1.76}

 59%|█████▉    | 1188/2022 [2:34:41<1:49:45,  7.90s/it]
 59%|█████▉    | 1189/2022 [2:34:48<1:47:54,  7.77s/it]
                                                       
{'loss': 1.1712, 'learning_rate': 0.0, 'epoch': 1.76}

 59%|█████▉    | 1189/2022 [2:34:48<1:47:54,  7.77s/it]
 59%|█████▉    | 1190/2022 [2:34:56<1:46:35,  7.69s/it]
                                                       
{'loss': 1.2088, 'learning_rate': 0.0, 'epoch': 1.76}

 59%|█████▉    | 1190/2022 [2:34:56<1:46:35,  7.69s/it]
 59%|█████▉    | 1191/2022 [2:35:03<1:46:33,  7.69s/it]
                                                       
{'loss': 1.2333, 'learning_rate': 0.0, 'epoch': 1.77}

 59%|█████▉    | 1191/2022 [2:35:03<1:46:33,  7.69s/it]
 59%|█████▉    | 1192/2022 [2:35:11<1:46:58,  7.73s/it]
                                                       
{'loss': 1.2465, 'learning_rate': 0.0, 'epoch': 1.77}

 59%|█████▉    | 1192/2022 [2:35:11<1:46:58,  7.73s/it]
 59%|█████▉    | 1193/2022 [2:35:19<1:46:21,  7.70s/it]
                                                       
{'loss': 1.2456, 'learning_rate': 0.0, 'epoch': 1.77}

 59%|█████▉    | 1193/2022 [2:35:19<1:46:21,  7.70s/it]
 59%|█████▉    | 1194/2022 [2:35:26<1:46:13,  7.70s/it]
                                                       
{'loss': 1.0682, 'learning_rate': 0.0, 'epoch': 1.77}

 59%|█████▉    | 1194/2022 [2:35:27<1:46:13,  7.70s/it]
 59%|█████▉    | 1195/2022 [2:35:35<1:48:25,  7.87s/it]
                                                       
{'loss': 1.2277, 'learning_rate': 0.0, 'epoch': 1.77}

 59%|█████▉    | 1195/2022 [2:35:35<1:48:25,  7.87s/it]
 59%|█████▉    | 1196/2022 [2:35:42<1:47:29,  7.81s/it]
                                                       
{'loss': 1.0908, 'learning_rate': 0.0, 'epoch': 1.77}

 59%|█████▉    | 1196/2022 [2:35:42<1:47:29,  7.81s/it]
 59%|█████▉    | 1197/2022 [2:35:50<1:48:07,  7.86s/it]
                                                       
{'loss': 1.0984, 'learning_rate': 0.0, 'epoch': 1.77}

 59%|█████▉    | 1197/2022 [2:35:50<1:48:07,  7.86s/it]
 59%|█████▉    | 1198/2022 [2:35:58<1:47:36,  7.84s/it]
                                                       
{'loss': 1.1342, 'learning_rate': 0.0, 'epoch': 1.78}

 59%|█████▉    | 1198/2022 [2:35:58<1:47:36,  7.84s/it]
 59%|█████▉    | 1199/2022 [2:36:06<1:47:01,  7.80s/it]
                                                       
{'loss': 1.1492, 'learning_rate': 0.0, 'epoch': 1.78}

 59%|█████▉    | 1199/2022 [2:36:06<1:47:01,  7.80s/it]
 59%|█████▉    | 1200/2022 [2:36:14<1:48:02,  7.89s/it]
                                                       
{'loss': 1.2015, 'learning_rate': 0.0, 'epoch': 1.78}

 59%|█████▉    | 1200/2022 [2:36:14<1:48:02,  7.89s/it]
 59%|█████▉    | 1201/2022 [2:36:22<1:48:33,  7.93s/it]
                                                       
{'loss': 1.1709, 'learning_rate': 0.0, 'epoch': 1.78}

 59%|█████▉    | 1201/2022 [2:36:22<1:48:33,  7.93s/it]
 59%|█████▉    | 1202/2022 [2:36:30<1:47:27,  7.86s/it]
                                                       
{'loss': 1.2188, 'learning_rate': 0.0, 'epoch': 1.78}

 59%|█████▉    | 1202/2022 [2:36:30<1:47:27,  7.86s/it]
 59%|█████▉    | 1203/2022 [2:36:37<1:46:13,  7.78s/it]
                                                       
{'loss': 1.0412, 'learning_rate': 0.0, 'epoch': 1.78}

 59%|█████▉    | 1203/2022 [2:36:37<1:46:13,  7.78s/it]
 60%|█████▉    | 1204/2022 [2:36:45<1:45:55,  7.77s/it]
                                                       
{'loss': 1.2757, 'learning_rate': 0.0, 'epoch': 1.79}

 60%|█████▉    | 1204/2022 [2:36:45<1:45:55,  7.77s/it]
 60%|█████▉    | 1205/2022 [2:36:53<1:45:28,  7.75s/it]
                                                       
{'loss': 1.313, 'learning_rate': 0.0, 'epoch': 1.79}

 60%|█████▉    | 1205/2022 [2:36:53<1:45:28,  7.75s/it]
 60%|█████▉    | 1206/2022 [2:37:00<1:44:56,  7.72s/it]
                                                       
{'loss': 1.2879, 'learning_rate': 0.0, 'epoch': 1.79}

 60%|█████▉    | 1206/2022 [2:37:00<1:44:56,  7.72s/it]
 60%|█████▉    | 1207/2022 [2:37:08<1:44:28,  7.69s/it]
                                                       
{'loss': 1.2282, 'learning_rate': 0.0, 'epoch': 1.79}

 60%|█████▉    | 1207/2022 [2:37:08<1:44:28,  7.69s/it]
 60%|█████▉    | 1208/2022 [2:37:16<1:45:01,  7.74s/it]
                                                       
{'loss': 1.2267, 'learning_rate': 0.0, 'epoch': 1.79}

 60%|█████▉    | 1208/2022 [2:37:16<1:45:01,  7.74s/it]
 60%|█████▉    | 1209/2022 [2:37:24<1:44:35,  7.72s/it]
                                                       
{'loss': 1.241, 'learning_rate': 0.0, 'epoch': 1.79}

 60%|█████▉    | 1209/2022 [2:37:24<1:44:35,  7.72s/it]
 60%|█████▉    | 1210/2022 [2:37:31<1:45:04,  7.76s/it]
                                                       
{'loss': 1.1297, 'learning_rate': 0.0, 'epoch': 1.79}

 60%|█████▉    | 1210/2022 [2:37:31<1:45:04,  7.76s/it]
 60%|█████▉    | 1211/2022 [2:37:39<1:43:59,  7.69s/it]
                                                       
{'loss': 1.3511, 'learning_rate': 0.0, 'epoch': 1.8}

 60%|█████▉    | 1211/2022 [2:37:39<1:43:59,  7.69s/it]
 60%|█████▉    | 1212/2022 [2:37:47<1:44:13,  7.72s/it]
                                                       
{'loss': 1.2151, 'learning_rate': 0.0, 'epoch': 1.8}

 60%|█████▉    | 1212/2022 [2:37:47<1:44:13,  7.72s/it]
 60%|█████▉    | 1213/2022 [2:37:55<1:47:27,  7.97s/it]
                                                       
{'loss': 1.1789, 'learning_rate': 0.0, 'epoch': 1.8}

 60%|█████▉    | 1213/2022 [2:37:55<1:47:27,  7.97s/it]
 60%|██████    | 1214/2022 [2:38:03<1:46:46,  7.93s/it]
                                                       
{'loss': 1.098, 'learning_rate': 0.0, 'epoch': 1.8}

 60%|██████    | 1214/2022 [2:38:03<1:46:46,  7.93s/it]
 60%|██████    | 1215/2022 [2:38:11<1:45:56,  7.88s/it]
                                                       
{'loss': 1.0021, 'learning_rate': 0.0, 'epoch': 1.8}

 60%|██████    | 1215/2022 [2:38:11<1:45:56,  7.88s/it]
 60%|██████    | 1216/2022 [2:38:19<1:46:41,  7.94s/it]
                                                       
{'loss': 1.1532, 'learning_rate': 0.0, 'epoch': 1.8}

 60%|██████    | 1216/2022 [2:38:19<1:46:41,  7.94s/it]
 60%|██████    | 1217/2022 [2:38:27<1:46:02,  7.90s/it]
                                                       
{'loss': 1.1228, 'learning_rate': 0.0, 'epoch': 1.8}

 60%|██████    | 1217/2022 [2:38:27<1:46:02,  7.90s/it]
 60%|██████    | 1218/2022 [2:38:35<1:45:50,  7.90s/it]
                                                       
{'loss': 1.0818, 'learning_rate': 0.0, 'epoch': 1.81}

 60%|██████    | 1218/2022 [2:38:35<1:45:50,  7.90s/it]
 60%|██████    | 1219/2022 [2:38:43<1:46:03,  7.92s/it]
                                                       
{'loss': 1.0953, 'learning_rate': 0.0, 'epoch': 1.81}

 60%|██████    | 1219/2022 [2:38:43<1:46:03,  7.92s/it]
 60%|██████    | 1220/2022 [2:38:51<1:47:02,  8.01s/it]
                                                       
{'loss': 1.0585, 'learning_rate': 0.0, 'epoch': 1.81}

 60%|██████    | 1220/2022 [2:38:51<1:47:02,  8.01s/it]
 60%|██████    | 1221/2022 [2:38:59<1:46:45,  8.00s/it]
                                                       
{'loss': 1.1705, 'learning_rate': 0.0, 'epoch': 1.81}

 60%|██████    | 1221/2022 [2:38:59<1:46:45,  8.00s/it]
 60%|██████    | 1222/2022 [2:39:07<1:46:21,  7.98s/it]
                                                       
{'loss': 1.2355, 'learning_rate': 0.0, 'epoch': 1.81}

 60%|██████    | 1222/2022 [2:39:07<1:46:21,  7.98s/it]
 60%|██████    | 1223/2022 [2:39:14<1:44:39,  7.86s/it]
                                                       
{'loss': 1.2658, 'learning_rate': 0.0, 'epoch': 1.81}

 60%|██████    | 1223/2022 [2:39:14<1:44:39,  7.86s/it]
 61%|██████    | 1224/2022 [2:39:22<1:42:51,  7.73s/it]
                                                       
{'loss': 1.0931, 'learning_rate': 0.0, 'epoch': 1.81}

 61%|██████    | 1224/2022 [2:39:22<1:42:51,  7.73s/it]
 61%|██████    | 1225/2022 [2:39:29<1:42:15,  7.70s/it]
                                                       
{'loss': 1.2299, 'learning_rate': 0.0, 'epoch': 1.82}

 61%|██████    | 1225/2022 [2:39:29<1:42:15,  7.70s/it]
 61%|██████    | 1226/2022 [2:39:37<1:43:03,  7.77s/it]
                                                       
{'loss': 1.2965, 'learning_rate': 0.0, 'epoch': 1.82}

 61%|██████    | 1226/2022 [2:39:37<1:43:03,  7.77s/it]
 61%|██████    | 1227/2022 [2:39:45<1:42:23,  7.73s/it]
                                                       
{'loss': 1.2625, 'learning_rate': 0.0, 'epoch': 1.82}

 61%|██████    | 1227/2022 [2:39:45<1:42:23,  7.73s/it]
 61%|██████    | 1228/2022 [2:39:53<1:41:35,  7.68s/it]
                                                       
{'loss': 1.1022, 'learning_rate': 0.0, 'epoch': 1.82}

 61%|██████    | 1228/2022 [2:39:53<1:41:35,  7.68s/it]
 61%|██████    | 1229/2022 [2:40:00<1:41:38,  7.69s/it]
                                                       
{'loss': 1.2292, 'learning_rate': 0.0, 'epoch': 1.82}

 61%|██████    | 1229/2022 [2:40:00<1:41:38,  7.69s/it]
 61%|██████    | 1230/2022 [2:40:08<1:41:38,  7.70s/it]
                                                       
{'loss': 1.2359, 'learning_rate': 0.0, 'epoch': 1.82}

 61%|██████    | 1230/2022 [2:40:08<1:41:38,  7.70s/it]
 61%|██████    | 1231/2022 [2:40:15<1:40:37,  7.63s/it]
                                                       
{'loss': 1.2132, 'learning_rate': 0.0, 'epoch': 1.83}

 61%|██████    | 1231/2022 [2:40:15<1:40:37,  7.63s/it]
 61%|██████    | 1232/2022 [2:40:23<1:39:56,  7.59s/it]
                                                       
{'loss': 1.2052, 'learning_rate': 0.0, 'epoch': 1.83}

 61%|██████    | 1232/2022 [2:40:23<1:39:56,  7.59s/it]
 61%|██████    | 1233/2022 [2:40:31<1:40:25,  7.64s/it]
                                                       
{'loss': 1.2503, 'learning_rate': 0.0, 'epoch': 1.83}

 61%|██████    | 1233/2022 [2:40:31<1:40:25,  7.64s/it]
 61%|██████    | 1234/2022 [2:40:38<1:40:27,  7.65s/it]
                                                       
{'loss': 0.9046, 'learning_rate': 0.0, 'epoch': 1.83}

 61%|██████    | 1234/2022 [2:40:38<1:40:27,  7.65s/it]
 61%|██████    | 1235/2022 [2:40:47<1:42:41,  7.83s/it]
                                                       
{'loss': 1.0164, 'learning_rate': 0.0, 'epoch': 1.83}

 61%|██████    | 1235/2022 [2:40:47<1:42:41,  7.83s/it]
 61%|██████    | 1236/2022 [2:40:54<1:41:45,  7.77s/it]
                                                       
{'loss': 1.1219, 'learning_rate': 0.0, 'epoch': 1.83}

 61%|██████    | 1236/2022 [2:40:54<1:41:45,  7.77s/it]
 61%|██████    | 1237/2022 [2:41:02<1:40:57,  7.72s/it]
                                                       
{'loss': 1.2907, 'learning_rate': 0.0, 'epoch': 1.83}

 61%|██████    | 1237/2022 [2:41:02<1:40:57,  7.72s/it]
 61%|██████    | 1238/2022 [2:41:10<1:40:58,  7.73s/it]
                                                       
{'loss': 1.1025, 'learning_rate': 0.0, 'epoch': 1.84}

 61%|██████    | 1238/2022 [2:41:10<1:40:58,  7.73s/it]
 61%|██████▏   | 1239/2022 [2:41:18<1:41:55,  7.81s/it]
                                                       
{'loss': 1.0787, 'learning_rate': 0.0, 'epoch': 1.84}

 61%|██████▏   | 1239/2022 [2:41:18<1:41:55,  7.81s/it]
 61%|██████▏   | 1240/2022 [2:41:25<1:41:35,  7.79s/it]
                                                       
{'loss': 1.205, 'learning_rate': 0.0, 'epoch': 1.84}

 61%|██████▏   | 1240/2022 [2:41:25<1:41:35,  7.79s/it]
 61%|██████▏   | 1241/2022 [2:41:33<1:40:04,  7.69s/it]
                                                       
{'loss': 1.1585, 'learning_rate': 0.0, 'epoch': 1.84}

 61%|██████▏   | 1241/2022 [2:41:33<1:40:04,  7.69s/it]
 61%|██████▏   | 1242/2022 [2:41:40<1:39:55,  7.69s/it]
                                                       
{'loss': 1.2114, 'learning_rate': 0.0, 'epoch': 1.84}

 61%|██████▏   | 1242/2022 [2:41:40<1:39:55,  7.69s/it]
 61%|██████▏   | 1243/2022 [2:41:48<1:40:09,  7.71s/it]
                                                       
{'loss': 1.12, 'learning_rate': 0.0, 'epoch': 1.84}

 61%|██████▏   | 1243/2022 [2:41:48<1:40:09,  7.71s/it]
 62%|██████▏   | 1244/2022 [2:41:57<1:42:13,  7.88s/it]
                                                       
{'loss': 1.204, 'learning_rate': 0.0, 'epoch': 1.84}

 62%|██████▏   | 1244/2022 [2:41:57<1:42:13,  7.88s/it]
 62%|██████▏   | 1245/2022 [2:42:05<1:43:25,  7.99s/it]
                                                       
{'loss': 1.1201, 'learning_rate': 0.0, 'epoch': 1.85}

 62%|██████▏   | 1245/2022 [2:42:05<1:43:25,  7.99s/it]
 62%|██████▏   | 1246/2022 [2:42:12<1:41:07,  7.82s/it]
                                                       
{'loss': 1.0468, 'learning_rate': 0.0, 'epoch': 1.85}

 62%|██████▏   | 1246/2022 [2:42:12<1:41:07,  7.82s/it]
 62%|██████▏   | 1247/2022 [2:42:20<1:40:45,  7.80s/it]
                                                       
{'loss': 1.2492, 'learning_rate': 0.0, 'epoch': 1.85}

 62%|██████▏   | 1247/2022 [2:42:20<1:40:45,  7.80s/it]
 62%|██████▏   | 1248/2022 [2:42:28<1:40:53,  7.82s/it]
                                                       
{'loss': 1.167, 'learning_rate': 0.0, 'epoch': 1.85}

 62%|██████▏   | 1248/2022 [2:42:28<1:40:53,  7.82s/it]
 62%|██████▏   | 1249/2022 [2:42:35<1:40:06,  7.77s/it]
                                                       
{'loss': 1.1851, 'learning_rate': 0.0, 'epoch': 1.85}

 62%|██████▏   | 1249/2022 [2:42:35<1:40:06,  7.77s/it]
 62%|██████▏   | 1250/2022 [2:42:43<1:40:52,  7.84s/it]
                                                       
{'loss': 1.0751, 'learning_rate': 0.0, 'epoch': 1.85}

 62%|██████▏   | 1250/2022 [2:42:43<1:40:52,  7.84s/it]
 62%|██████▏   | 1251/2022 [2:42:51<1:40:37,  7.83s/it]
                                                       
{'loss': 1.1446, 'learning_rate': 0.0, 'epoch': 1.85}

 62%|██████▏   | 1251/2022 [2:42:51<1:40:37,  7.83s/it]
 62%|██████▏   | 1252/2022 [2:42:59<1:40:51,  7.86s/it]
                                                       
{'loss': 1.1039, 'learning_rate': 0.0, 'epoch': 1.86}

 62%|██████▏   | 1252/2022 [2:42:59<1:40:51,  7.86s/it]
 62%|██████▏   | 1253/2022 [2:43:07<1:39:32,  7.77s/it]
                                                       
{'loss': 1.1884, 'learning_rate': 0.0, 'epoch': 1.86}

 62%|██████▏   | 1253/2022 [2:43:07<1:39:32,  7.77s/it]
 62%|██████▏   | 1254/2022 [2:43:14<1:39:12,  7.75s/it]
                                                       
{'loss': 1.1397, 'learning_rate': 0.0, 'epoch': 1.86}

 62%|██████▏   | 1254/2022 [2:43:14<1:39:12,  7.75s/it]
 62%|██████▏   | 1255/2022 [2:43:23<1:41:00,  7.90s/it]
                                                       
{'loss': 1.0248, 'learning_rate': 0.0, 'epoch': 1.86}

 62%|██████▏   | 1255/2022 [2:43:23<1:41:00,  7.90s/it]
 62%|██████▏   | 1256/2022 [2:43:31<1:40:47,  7.90s/it]
                                                       
{'loss': 1.1587, 'learning_rate': 0.0, 'epoch': 1.86}

 62%|██████▏   | 1256/2022 [2:43:31<1:40:47,  7.90s/it]
 62%|██████▏   | 1257/2022 [2:43:38<1:40:31,  7.88s/it]
                                                       
{'loss': 1.2175, 'learning_rate': 0.0, 'epoch': 1.86}

 62%|██████▏   | 1257/2022 [2:43:38<1:40:31,  7.88s/it]
 62%|██████▏   | 1258/2022 [2:43:46<1:39:25,  7.81s/it]
                                                       
{'loss': 1.115, 'learning_rate': 0.0, 'epoch': 1.87}

 62%|██████▏   | 1258/2022 [2:43:46<1:39:25,  7.81s/it]
 62%|██████▏   | 1259/2022 [2:43:54<1:38:30,  7.75s/it]
                                                       
{'loss': 1.1862, 'learning_rate': 0.0, 'epoch': 1.87}

 62%|██████▏   | 1259/2022 [2:43:54<1:38:30,  7.75s/it]
 62%|██████▏   | 1260/2022 [2:44:02<1:38:44,  7.77s/it]
                                                       
{'loss': 1.1645, 'learning_rate': 0.0, 'epoch': 1.87}

 62%|██████▏   | 1260/2022 [2:44:02<1:38:44,  7.77s/it]
 62%|██████▏   | 1261/2022 [2:44:09<1:37:55,  7.72s/it]
                                                       
{'loss': 1.1378, 'learning_rate': 0.0, 'epoch': 1.87}

 62%|██████▏   | 1261/2022 [2:44:09<1:37:55,  7.72s/it]
 62%|██████▏   | 1262/2022 [2:44:17<1:37:47,  7.72s/it]
                                                       
{'loss': 1.0439, 'learning_rate': 0.0, 'epoch': 1.87}

 62%|██████▏   | 1262/2022 [2:44:17<1:37:47,  7.72s/it]
 62%|██████▏   | 1263/2022 [2:44:24<1:37:17,  7.69s/it]
                                                       
{'loss': 1.2107, 'learning_rate': 0.0, 'epoch': 1.87}

 62%|██████▏   | 1263/2022 [2:44:24<1:37:17,  7.69s/it]
 63%|██████▎   | 1264/2022 [2:44:32<1:38:16,  7.78s/it]
                                                       
{'loss': 1.2715, 'learning_rate': 0.0, 'epoch': 1.87}

 63%|██████▎   | 1264/2022 [2:44:32<1:38:16,  7.78s/it]
 63%|██████▎   | 1265/2022 [2:44:40<1:38:07,  7.78s/it]
                                                       
{'loss': 1.0691, 'learning_rate': 0.0, 'epoch': 1.88}

 63%|██████▎   | 1265/2022 [2:44:40<1:38:07,  7.78s/it]
 63%|██████▎   | 1266/2022 [2:44:48<1:36:38,  7.67s/it]
                                                       
{'loss': 1.1388, 'learning_rate': 0.0, 'epoch': 1.88}

 63%|██████▎   | 1266/2022 [2:44:48<1:36:38,  7.67s/it]
 63%|██████▎   | 1267/2022 [2:44:56<1:37:30,  7.75s/it]
                                                       
{'loss': 1.0792, 'learning_rate': 0.0, 'epoch': 1.88}

 63%|██████▎   | 1267/2022 [2:44:56<1:37:30,  7.75s/it]
 63%|██████▎   | 1268/2022 [2:45:03<1:36:25,  7.67s/it]
                                                       
{'loss': 1.1271, 'learning_rate': 0.0, 'epoch': 1.88}

 63%|██████▎   | 1268/2022 [2:45:03<1:36:25,  7.67s/it]
 63%|██████▎   | 1269/2022 [2:45:11<1:36:24,  7.68s/it]
                                                       
{'loss': 1.1459, 'learning_rate': 0.0, 'epoch': 1.88}

 63%|██████▎   | 1269/2022 [2:45:11<1:36:24,  7.68s/it]
 63%|██████▎   | 1270/2022 [2:45:18<1:36:08,  7.67s/it]
                                                       
{'loss': 1.1361, 'learning_rate': 0.0, 'epoch': 1.88}

 63%|██████▎   | 1270/2022 [2:45:18<1:36:08,  7.67s/it]
 63%|██████▎   | 1271/2022 [2:45:26<1:36:26,  7.70s/it]
                                                       
{'loss': 1.2128, 'learning_rate': 0.0, 'epoch': 1.88}

 63%|██████▎   | 1271/2022 [2:45:26<1:36:26,  7.70s/it]
 63%|██████▎   | 1272/2022 [2:45:34<1:36:44,  7.74s/it]
                                                       
{'loss': 1.3082, 'learning_rate': 0.0, 'epoch': 1.89}

 63%|██████▎   | 1272/2022 [2:45:34<1:36:44,  7.74s/it]
 63%|██████▎   | 1273/2022 [2:45:42<1:36:08,  7.70s/it]
                                                       
{'loss': 1.1265, 'learning_rate': 0.0, 'epoch': 1.89}

 63%|██████▎   | 1273/2022 [2:45:42<1:36:08,  7.70s/it]
 63%|██████▎   | 1274/2022 [2:45:49<1:35:20,  7.65s/it]
                                                       
{'loss': 1.2016, 'learning_rate': 0.0, 'epoch': 1.89}

 63%|██████▎   | 1274/2022 [2:45:49<1:35:20,  7.65s/it]
 63%|██████▎   | 1275/2022 [2:45:57<1:34:57,  7.63s/it]
                                                       
{'loss': 1.2045, 'learning_rate': 0.0, 'epoch': 1.89}

 63%|██████▎   | 1275/2022 [2:45:57<1:34:57,  7.63s/it]
 63%|██████▎   | 1276/2022 [2:46:05<1:37:42,  7.86s/it]
                                                       
{'loss': 1.2986, 'learning_rate': 0.0, 'epoch': 1.89}

 63%|██████▎   | 1276/2022 [2:46:05<1:37:42,  7.86s/it]
 63%|██████▎   | 1277/2022 [2:46:13<1:36:28,  7.77s/it]
                                                       
{'loss': 1.2064, 'learning_rate': 0.0, 'epoch': 1.89}

 63%|██████▎   | 1277/2022 [2:46:13<1:36:28,  7.77s/it]
 63%|██████▎   | 1278/2022 [2:46:20<1:35:31,  7.70s/it]
                                                       
{'loss': 1.2081, 'learning_rate': 0.0, 'epoch': 1.89}

 63%|██████▎   | 1278/2022 [2:46:20<1:35:31,  7.70s/it]
 63%|██████▎   | 1279/2022 [2:46:28<1:35:20,  7.70s/it]
                                                       
{'loss': 1.2275, 'learning_rate': 0.0, 'epoch': 1.9}

 63%|██████▎   | 1279/2022 [2:46:28<1:35:20,  7.70s/it]
 63%|██████▎   | 1280/2022 [2:46:36<1:35:05,  7.69s/it]
                                                       
{'loss': 1.3226, 'learning_rate': 0.0, 'epoch': 1.9}

 63%|██████▎   | 1280/2022 [2:46:36<1:35:05,  7.69s/it]
 63%|██████▎   | 1281/2022 [2:46:43<1:35:38,  7.74s/it]
                                                       
{'loss': 1.1333, 'learning_rate': 0.0, 'epoch': 1.9}

 63%|██████▎   | 1281/2022 [2:46:43<1:35:38,  7.74s/it]
 63%|██████▎   | 1282/2022 [2:46:51<1:36:11,  7.80s/it]
                                                       
{'loss': 1.2152, 'learning_rate': 0.0, 'epoch': 1.9}

 63%|██████▎   | 1282/2022 [2:46:51<1:36:11,  7.80s/it]
 63%|██████▎   | 1283/2022 [2:46:59<1:36:09,  7.81s/it]
                                                       
{'loss': 1.0587, 'learning_rate': 0.0, 'epoch': 1.9}

 63%|██████▎   | 1283/2022 [2:46:59<1:36:09,  7.81s/it]
 64%|██████▎   | 1284/2022 [2:47:07<1:34:44,  7.70s/it]
                                                       
{'loss': 1.1036, 'learning_rate': 0.0, 'epoch': 1.9}

 64%|██████▎   | 1284/2022 [2:47:07<1:34:44,  7.70s/it]
 64%|██████▎   | 1285/2022 [2:47:14<1:34:36,  7.70s/it]
                                                       
{'loss': 1.1996, 'learning_rate': 0.0, 'epoch': 1.91}

 64%|██████▎   | 1285/2022 [2:47:14<1:34:36,  7.70s/it]
 64%|██████▎   | 1286/2022 [2:47:23<1:36:34,  7.87s/it]
                                                       
{'loss': 1.1683, 'learning_rate': 0.0, 'epoch': 1.91}

 64%|██████▎   | 1286/2022 [2:47:23<1:36:34,  7.87s/it]
 64%|██████▎   | 1287/2022 [2:47:31<1:38:38,  8.05s/it]
                                                       
{'loss': 1.1559, 'learning_rate': 0.0, 'epoch': 1.91}

 64%|██████▎   | 1287/2022 [2:47:31<1:38:38,  8.05s/it]
 64%|██████▎   | 1288/2022 [2:47:39<1:37:57,  8.01s/it]
                                                       
{'loss': 1.1628, 'learning_rate': 0.0, 'epoch': 1.91}

 64%|██████▎   | 1288/2022 [2:47:39<1:37:57,  8.01s/it]
 64%|██████▎   | 1289/2022 [2:47:47<1:36:49,  7.93s/it]
                                                       
{'loss': 1.0879, 'learning_rate': 0.0, 'epoch': 1.91}

 64%|██████▎   | 1289/2022 [2:47:47<1:36:49,  7.93s/it]
 64%|██████▍   | 1290/2022 [2:47:55<1:37:46,  8.01s/it]
                                                       
{'loss': 1.1647, 'learning_rate': 0.0, 'epoch': 1.91}

 64%|██████▍   | 1290/2022 [2:47:55<1:37:46,  8.01s/it]
 64%|██████▍   | 1291/2022 [2:48:03<1:36:09,  7.89s/it]
                                                       
{'loss': 1.2227, 'learning_rate': 0.0, 'epoch': 1.91}

 64%|██████▍   | 1291/2022 [2:48:03<1:36:09,  7.89s/it]
 64%|██████▍   | 1292/2022 [2:48:10<1:33:58,  7.72s/it]
                                                       
{'loss': 1.1316, 'learning_rate': 0.0, 'epoch': 1.92}

 64%|██████▍   | 1292/2022 [2:48:10<1:33:58,  7.72s/it]
 64%|██████▍   | 1293/2022 [2:48:17<1:32:36,  7.62s/it]
                                                       
{'loss': 1.1672, 'learning_rate': 0.0, 'epoch': 1.92}

 64%|██████▍   | 1293/2022 [2:48:17<1:32:36,  7.62s/it]
 64%|██████▍   | 1294/2022 [2:48:25<1:34:09,  7.76s/it]
                                                       
{'loss': 1.1492, 'learning_rate': 0.0, 'epoch': 1.92}

 64%|██████▍   | 1294/2022 [2:48:25<1:34:09,  7.76s/it]
 64%|██████▍   | 1295/2022 [2:48:33<1:33:49,  7.74s/it]
                                                       
{'loss': 1.2883, 'learning_rate': 0.0, 'epoch': 1.92}

 64%|██████▍   | 1295/2022 [2:48:33<1:33:49,  7.74s/it]
 64%|██████▍   | 1296/2022 [2:48:41<1:33:48,  7.75s/it]
                                                       
{'loss': 1.0582, 'learning_rate': 0.0, 'epoch': 1.92}

 64%|██████▍   | 1296/2022 [2:48:41<1:33:48,  7.75s/it]
 64%|██████▍   | 1297/2022 [2:48:49<1:34:05,  7.79s/it]
                                                       
{'loss': 1.2005, 'learning_rate': 0.0, 'epoch': 1.92}

 64%|██████▍   | 1297/2022 [2:48:49<1:34:05,  7.79s/it]
 64%|██████▍   | 1298/2022 [2:48:57<1:34:37,  7.84s/it]
                                                       
{'loss': 1.1432, 'learning_rate': 0.0, 'epoch': 1.92}

 64%|██████▍   | 1298/2022 [2:48:57<1:34:37,  7.84s/it]
 64%|██████▍   | 1299/2022 [2:49:05<1:34:57,  7.88s/it]
                                                       
{'loss': 1.1277, 'learning_rate': 0.0, 'epoch': 1.93}

 64%|██████▍   | 1299/2022 [2:49:05<1:34:57,  7.88s/it]
 64%|██████▍   | 1300/2022 [2:49:12<1:33:36,  7.78s/it]
                                                       
{'loss': 1.0098, 'learning_rate': 0.0, 'epoch': 1.93}

 64%|██████▍   | 1300/2022 [2:49:12<1:33:36,  7.78s/it]
 64%|██████▍   | 1301/2022 [2:49:20<1:34:50,  7.89s/it]
                                                       
{'loss': 1.1011, 'learning_rate': 0.0, 'epoch': 1.93}

 64%|██████▍   | 1301/2022 [2:49:20<1:34:50,  7.89s/it]
 64%|██████▍   | 1302/2022 [2:49:28<1:35:00,  7.92s/it]
                                                       
{'loss': 1.1007, 'learning_rate': 0.0, 'epoch': 1.93}

 64%|██████▍   | 1302/2022 [2:49:28<1:35:00,  7.92s/it]
 64%|██████▍   | 1303/2022 [2:49:36<1:34:09,  7.86s/it]
                                                       
{'loss': 1.134, 'learning_rate': 0.0, 'epoch': 1.93}

 64%|██████▍   | 1303/2022 [2:49:36<1:34:09,  7.86s/it]
 64%|██████▍   | 1304/2022 [2:49:44<1:34:22,  7.89s/it]
                                                       
{'loss': 1.128, 'learning_rate': 0.0, 'epoch': 1.93}

 64%|██████▍   | 1304/2022 [2:49:44<1:34:22,  7.89s/it]
 65%|██████▍   | 1305/2022 [2:49:52<1:33:08,  7.79s/it]
                                                       
{'loss': 1.0698, 'learning_rate': 0.0, 'epoch': 1.93}

 65%|██████▍   | 1305/2022 [2:49:52<1:33:08,  7.79s/it]
 65%|██████▍   | 1306/2022 [2:49:59<1:32:49,  7.78s/it]
                                                       
{'loss': 1.2634, 'learning_rate': 0.0, 'epoch': 1.94}

 65%|██████▍   | 1306/2022 [2:49:59<1:32:49,  7.78s/it]
 65%|██████▍   | 1307/2022 [2:50:07<1:32:29,  7.76s/it]
                                                       
{'loss': 1.1282, 'learning_rate': 0.0, 'epoch': 1.94}

 65%|██████▍   | 1307/2022 [2:50:07<1:32:29,  7.76s/it]
 65%|██████▍   | 1308/2022 [2:50:15<1:34:00,  7.90s/it]
                                                       
{'loss': 1.3234, 'learning_rate': 0.0, 'epoch': 1.94}

 65%|██████▍   | 1308/2022 [2:50:15<1:34:00,  7.90s/it]
 65%|██████▍   | 1309/2022 [2:50:23<1:34:02,  7.91s/it]
                                                       
{'loss': 1.1359, 'learning_rate': 0.0, 'epoch': 1.94}

 65%|██████▍   | 1309/2022 [2:50:23<1:34:02,  7.91s/it]
 65%|██████▍   | 1310/2022 [2:50:31<1:34:12,  7.94s/it]
                                                       
{'loss': 1.0803, 'learning_rate': 0.0, 'epoch': 1.94}

 65%|██████▍   | 1310/2022 [2:50:31<1:34:12,  7.94s/it]
 65%|██████▍   | 1311/2022 [2:50:39<1:32:46,  7.83s/it]
                                                       
{'loss': 1.058, 'learning_rate': 0.0, 'epoch': 1.94}

 65%|██████▍   | 1311/2022 [2:50:39<1:32:46,  7.83s/it]
 65%|██████▍   | 1312/2022 [2:50:47<1:32:21,  7.80s/it]
                                                       
{'loss': 1.2231, 'learning_rate': 0.0, 'epoch': 1.95}

 65%|██████▍   | 1312/2022 [2:50:47<1:32:21,  7.80s/it]
 65%|██████▍   | 1313/2022 [2:50:55<1:33:27,  7.91s/it]
                                                       
{'loss': 1.1069, 'learning_rate': 0.0, 'epoch': 1.95}

 65%|██████▍   | 1313/2022 [2:50:55<1:33:27,  7.91s/it]
 65%|██████▍   | 1314/2022 [2:51:03<1:35:53,  8.13s/it]
                                                       
{'loss': 1.0925, 'learning_rate': 0.0, 'epoch': 1.95}

 65%|██████▍   | 1314/2022 [2:51:03<1:35:53,  8.13s/it]
 65%|██████▌   | 1315/2022 [2:51:11<1:34:58,  8.06s/it]
                                                       
{'loss': 1.1842, 'learning_rate': 0.0, 'epoch': 1.95}

 65%|██████▌   | 1315/2022 [2:51:11<1:34:58,  8.06s/it]
 65%|██████▌   | 1316/2022 [2:51:20<1:36:08,  8.17s/it]
                                                       
{'loss': 1.0986, 'learning_rate': 0.0, 'epoch': 1.95}

 65%|██████▌   | 1316/2022 [2:51:20<1:36:08,  8.17s/it]
 65%|██████▌   | 1317/2022 [2:51:27<1:34:05,  8.01s/it]
                                                       
{'loss': 1.0869, 'learning_rate': 0.0, 'epoch': 1.95}

 65%|██████▌   | 1317/2022 [2:51:27<1:34:05,  8.01s/it]
 65%|██████▌   | 1318/2022 [2:51:35<1:33:28,  7.97s/it]
                                                       
{'loss': 1.1865, 'learning_rate': 0.0, 'epoch': 1.95}

 65%|██████▌   | 1318/2022 [2:51:35<1:33:28,  7.97s/it]
 65%|██████▌   | 1319/2022 [2:51:43<1:32:53,  7.93s/it]
                                                       
{'loss': 1.1289, 'learning_rate': 0.0, 'epoch': 1.96}

 65%|██████▌   | 1319/2022 [2:51:43<1:32:53,  7.93s/it]
 65%|██████▌   | 1320/2022 [2:51:51<1:31:44,  7.84s/it]
                                                       
{'loss': 1.2674, 'learning_rate': 0.0, 'epoch': 1.96}

 65%|██████▌   | 1320/2022 [2:51:51<1:31:44,  7.84s/it]
 65%|██████▌   | 1321/2022 [2:51:58<1:30:52,  7.78s/it]
                                                       
{'loss': 1.1479, 'learning_rate': 0.0, 'epoch': 1.96}

 65%|██████▌   | 1321/2022 [2:51:58<1:30:52,  7.78s/it]
 65%|██████▌   | 1322/2022 [2:52:06<1:31:01,  7.80s/it]
                                                       
{'loss': 1.0729, 'learning_rate': 0.0, 'epoch': 1.96}

 65%|██████▌   | 1322/2022 [2:52:06<1:31:01,  7.80s/it]
 65%|██████▌   | 1323/2022 [2:52:14<1:31:08,  7.82s/it]
                                                       
{'loss': 1.1733, 'learning_rate': 0.0, 'epoch': 1.96}

 65%|██████▌   | 1323/2022 [2:52:14<1:31:08,  7.82s/it]
 65%|██████▌   | 1324/2022 [2:52:22<1:30:28,  7.78s/it]
                                                       
{'loss': 1.2328, 'learning_rate': 0.0, 'epoch': 1.96}

 65%|██████▌   | 1324/2022 [2:52:22<1:30:28,  7.78s/it]
 66%|██████▌   | 1325/2022 [2:52:29<1:30:13,  7.77s/it]
                                                       
{'loss': 1.1486, 'learning_rate': 0.0, 'epoch': 1.96}

 66%|██████▌   | 1325/2022 [2:52:29<1:30:13,  7.77s/it]
 66%|██████▌   | 1326/2022 [2:52:37<1:30:21,  7.79s/it]
                                                       
{'loss': 1.182, 'learning_rate': 0.0, 'epoch': 1.97}

 66%|██████▌   | 1326/2022 [2:52:37<1:30:21,  7.79s/it]
 66%|██████▌   | 1327/2022 [2:52:45<1:30:22,  7.80s/it]
                                                       
{'loss': 1.2308, 'learning_rate': 0.0, 'epoch': 1.97}

 66%|██████▌   | 1327/2022 [2:52:45<1:30:22,  7.80s/it]
 66%|██████▌   | 1328/2022 [2:52:53<1:29:41,  7.75s/it]
                                                       
{'loss': 1.219, 'learning_rate': 0.0, 'epoch': 1.97}

 66%|██████▌   | 1328/2022 [2:52:53<1:29:41,  7.75s/it]
 66%|██████▌   | 1329/2022 [2:53:00<1:29:19,  7.73s/it]
                                                       
{'loss': 1.2007, 'learning_rate': 0.0, 'epoch': 1.97}

 66%|██████▌   | 1329/2022 [2:53:00<1:29:19,  7.73s/it]
 66%|██████▌   | 1330/2022 [2:53:08<1:29:01,  7.72s/it]
                                                       
{'loss': 1.153, 'learning_rate': 0.0, 'epoch': 1.97}

 66%|██████▌   | 1330/2022 [2:53:08<1:29:01,  7.72s/it]
 66%|██████▌   | 1331/2022 [2:53:16<1:29:02,  7.73s/it]
                                                       
{'loss': 1.0956, 'learning_rate': 0.0, 'epoch': 1.97}

 66%|██████▌   | 1331/2022 [2:53:16<1:29:02,  7.73s/it]
 66%|██████▌   | 1332/2022 [2:53:24<1:29:45,  7.80s/it]
                                                       
{'loss': 1.1923, 'learning_rate': 0.0, 'epoch': 1.97}

 66%|██████▌   | 1332/2022 [2:53:24<1:29:45,  7.80s/it]
 66%|██████▌   | 1333/2022 [2:53:31<1:29:00,  7.75s/it]
                                                       
{'loss': 1.0252, 'learning_rate': 0.0, 'epoch': 1.98}

 66%|██████▌   | 1333/2022 [2:53:31<1:29:00,  7.75s/it]
 66%|██████▌   | 1334/2022 [2:53:39<1:28:29,  7.72s/it]
                                                       
{'loss': 1.1953, 'learning_rate': 0.0, 'epoch': 1.98}

 66%|██████▌   | 1334/2022 [2:53:39<1:28:29,  7.72s/it]
 66%|██████▌   | 1335/2022 [2:53:47<1:28:51,  7.76s/it]
                                                       
{'loss': 1.0951, 'learning_rate': 0.0, 'epoch': 1.98}

 66%|██████▌   | 1335/2022 [2:53:47<1:28:51,  7.76s/it]
 66%|██████▌   | 1336/2022 [2:53:55<1:29:54,  7.86s/it]
                                                       
{'loss': 1.254, 'learning_rate': 0.0, 'epoch': 1.98}

 66%|██████▌   | 1336/2022 [2:53:55<1:29:54,  7.86s/it]
 66%|██████▌   | 1337/2022 [2:54:03<1:29:06,  7.80s/it]
                                                       
{'loss': 1.1517, 'learning_rate': 0.0, 'epoch': 1.98}

 66%|██████▌   | 1337/2022 [2:54:03<1:29:06,  7.80s/it]
 66%|██████▌   | 1338/2022 [2:54:11<1:28:51,  7.79s/it]
                                                       
{'loss': 1.1808, 'learning_rate': 0.0, 'epoch': 1.98}

 66%|██████▌   | 1338/2022 [2:54:11<1:28:51,  7.79s/it]
 66%|██████▌   | 1339/2022 [2:54:18<1:27:31,  7.69s/it]
                                                       
{'loss': 1.116, 'learning_rate': 0.0, 'epoch': 1.99}

 66%|██████▌   | 1339/2022 [2:54:18<1:27:31,  7.69s/it]
 66%|██████▋   | 1340/2022 [2:54:26<1:27:18,  7.68s/it]
                                                       
{'loss': 1.1866, 'learning_rate': 0.0, 'epoch': 1.99}

 66%|██████▋   | 1340/2022 [2:54:26<1:27:18,  7.68s/it]
 66%|██████▋   | 1341/2022 [2:54:33<1:27:18,  7.69s/it]
                                                       
{'loss': 1.2842, 'learning_rate': 0.0, 'epoch': 1.99}

 66%|██████▋   | 1341/2022 [2:54:33<1:27:18,  7.69s/it]
 66%|██████▋   | 1342/2022 [2:54:41<1:27:00,  7.68s/it]
                                                       
{'loss': 1.0904, 'learning_rate': 0.0, 'epoch': 1.99}

 66%|██████▋   | 1342/2022 [2:54:41<1:27:00,  7.68s/it]
 66%|██████▋   | 1343/2022 [2:54:49<1:28:31,  7.82s/it]
                                                       
{'loss': 1.1516, 'learning_rate': 0.0, 'epoch': 1.99}

 66%|██████▋   | 1343/2022 [2:54:49<1:28:31,  7.82s/it]
 66%|██████▋   | 1344/2022 [2:54:57<1:28:11,  7.80s/it]
                                                       
{'loss': 1.1156, 'learning_rate': 0.0, 'epoch': 1.99}

 66%|██████▋   | 1344/2022 [2:54:57<1:28:11,  7.80s/it]
 67%|██████▋   | 1345/2022 [2:55:05<1:28:20,  7.83s/it]
                                                       
{'loss': 1.1066, 'learning_rate': 0.0, 'epoch': 1.99}

 67%|██████▋   | 1345/2022 [2:55:05<1:28:20,  7.83s/it]
 67%|██████▋   | 1346/2022 [2:55:12<1:27:22,  7.76s/it]
                                                       
{'loss': 1.1779, 'learning_rate': 0.0, 'epoch': 2.0}

 67%|██████▋   | 1346/2022 [2:55:12<1:27:22,  7.76s/it]
 67%|██████▋   | 1347/2022 [2:55:20<1:27:24,  7.77s/it]
                                                       
{'loss': 1.197, 'learning_rate': 0.0, 'epoch': 2.0}

 67%|██████▋   | 1347/2022 [2:55:20<1:27:24,  7.77s/it]
 67%|██████▋   | 1348/2022 [2:55:28<1:26:54,  7.74s/it]
                                                       
{'loss': 1.0987, 'learning_rate': 0.0, 'epoch': 2.0}

 67%|██████▋   | 1348/2022 [2:55:28<1:26:54,  7.74s/it]
 67%|██████▋   | 1349/2022 [2:55:36<1:26:48,  7.74s/it]
                                                       
{'loss': 1.2158, 'learning_rate': 0.0, 'epoch': 2.0}

 67%|██████▋   | 1349/2022 [2:55:36<1:26:48,  7.74s/it]
 67%|██████▋   | 1350/2022 [2:55:43<1:27:13,  7.79s/it]
                                                       
{'loss': 1.3142, 'learning_rate': 0.0, 'epoch': 2.0}

 67%|██████▋   | 1350/2022 [2:55:43<1:27:13,  7.79s/it]
 67%|██████▋   | 1351/2022 [2:55:51<1:26:37,  7.75s/it]
                                                       
{'loss': 1.0373, 'learning_rate': 0.0, 'epoch': 2.0}

 67%|██████▋   | 1351/2022 [2:55:51<1:26:37,  7.75s/it]
 67%|██████▋   | 1352/2022 [2:55:59<1:27:20,  7.82s/it]
                                                       
{'loss': 1.146, 'learning_rate': 0.0, 'epoch': 2.0}

 67%|██████▋   | 1352/2022 [2:55:59<1:27:20,  7.82s/it]
 67%|██████▋   | 1353/2022 [2:56:07<1:27:02,  7.81s/it]
                                                       
{'loss': 1.1889, 'learning_rate': 0.0, 'epoch': 2.01}

 67%|██████▋   | 1353/2022 [2:56:07<1:27:02,  7.81s/it]
 67%|██████▋   | 1354/2022 [2:56:15<1:27:27,  7.86s/it]
                                                       
{'loss': 1.1151, 'learning_rate': 0.0, 'epoch': 2.01}

 67%|██████▋   | 1354/2022 [2:56:15<1:27:27,  7.86s/it]
 67%|██████▋   | 1355/2022 [2:56:23<1:26:41,  7.80s/it]
                                                       
{'loss': 1.0564, 'learning_rate': 0.0, 'epoch': 2.01}

 67%|██████▋   | 1355/2022 [2:56:23<1:26:41,  7.80s/it]
 67%|██████▋   | 1356/2022 [2:56:30<1:26:19,  7.78s/it]
                                                       
{'loss': 1.1375, 'learning_rate': 0.0, 'epoch': 2.01}

 67%|██████▋   | 1356/2022 [2:56:30<1:26:19,  7.78s/it]
 67%|██████▋   | 1357/2022 [2:56:38<1:26:04,  7.77s/it]
                                                       
{'loss': 1.2342, 'learning_rate': 0.0, 'epoch': 2.01}

 67%|██████▋   | 1357/2022 [2:56:38<1:26:04,  7.77s/it]
 67%|██████▋   | 1358/2022 [2:56:46<1:27:12,  7.88s/it]
                                                       
{'loss': 1.23, 'learning_rate': 0.0, 'epoch': 2.01}

 67%|██████▋   | 1358/2022 [2:56:46<1:27:12,  7.88s/it]
 67%|██████▋   | 1359/2022 [2:56:54<1:26:59,  7.87s/it]
                                                       
{'loss': 1.1499, 'learning_rate': 0.0, 'epoch': 2.01}

 67%|██████▋   | 1359/2022 [2:56:54<1:26:59,  7.87s/it]
 67%|██████▋   | 1360/2022 [2:57:02<1:25:43,  7.77s/it]
                                                       
{'loss': 1.1371, 'learning_rate': 0.0, 'epoch': 2.02}

 67%|██████▋   | 1360/2022 [2:57:02<1:25:43,  7.77s/it]
 67%|██████▋   | 1361/2022 [2:57:09<1:25:21,  7.75s/it]
                                                       
{'loss': 0.9567, 'learning_rate': 0.0, 'epoch': 2.02}

 67%|██████▋   | 1361/2022 [2:57:09<1:25:21,  7.75s/it]
 67%|██████▋   | 1362/2022 [2:57:17<1:25:55,  7.81s/it]
                                                       
{'loss': 1.2315, 'learning_rate': 0.0, 'epoch': 2.02}

 67%|██████▋   | 1362/2022 [2:57:17<1:25:55,  7.81s/it]
 67%|██████▋   | 1363/2022 [2:57:25<1:27:15,  7.94s/it]
                                                       
{'loss': 1.0124, 'learning_rate': 0.0, 'epoch': 2.02}

 67%|██████▋   | 1363/2022 [2:57:25<1:27:15,  7.94s/it]
 67%|██████▋   | 1364/2022 [2:57:33<1:27:04,  7.94s/it]
                                                       
{'loss': 1.0271, 'learning_rate': 0.0, 'epoch': 2.02}

 67%|██████▋   | 1364/2022 [2:57:33<1:27:04,  7.94s/it]
 68%|██████▊   | 1365/2022 [2:57:41<1:26:21,  7.89s/it]
                                                       
{'loss': 1.199, 'learning_rate': 0.0, 'epoch': 2.02}

 68%|██████▊   | 1365/2022 [2:57:41<1:26:21,  7.89s/it]
 68%|██████▊   | 1366/2022 [2:57:49<1:25:40,  7.84s/it]
                                                       
{'loss': 1.229, 'learning_rate': 0.0, 'epoch': 2.03}

 68%|██████▊   | 1366/2022 [2:57:49<1:25:40,  7.84s/it]
 68%|██████▊   | 1367/2022 [2:57:57<1:25:09,  7.80s/it]
                                                       
{'loss': 1.3708, 'learning_rate': 0.0, 'epoch': 2.03}

 68%|██████▊   | 1367/2022 [2:57:57<1:25:09,  7.80s/it]
 68%|██████▊   | 1368/2022 [2:58:04<1:25:00,  7.80s/it]
                                                       
{'loss': 1.1184, 'learning_rate': 0.0, 'epoch': 2.03}

 68%|██████▊   | 1368/2022 [2:58:04<1:25:00,  7.80s/it]
 68%|██████▊   | 1369/2022 [2:58:12<1:24:21,  7.75s/it]
                                                       
{'loss': 1.0978, 'learning_rate': 0.0, 'epoch': 2.03}

 68%|██████▊   | 1369/2022 [2:58:12<1:24:21,  7.75s/it]
 68%|██████▊   | 1370/2022 [2:58:20<1:24:06,  7.74s/it]
                                                       
{'loss': 1.1057, 'learning_rate': 0.0, 'epoch': 2.03}

 68%|██████▊   | 1370/2022 [2:58:20<1:24:06,  7.74s/it]
 68%|██████▊   | 1371/2022 [2:58:28<1:25:13,  7.85s/it]
                                                       
{'loss': 1.2108, 'learning_rate': 0.0, 'epoch': 2.03}

 68%|██████▊   | 1371/2022 [2:58:28<1:25:13,  7.85s/it]
 68%|██████▊   | 1372/2022 [2:58:35<1:24:13,  7.77s/it]
                                                       
{'loss': 1.0691, 'learning_rate': 0.0, 'epoch': 2.03}

 68%|██████▊   | 1372/2022 [2:58:35<1:24:13,  7.77s/it]
 68%|██████▊   | 1373/2022 [2:58:43<1:24:28,  7.81s/it]
                                                       
{'loss': 1.1308, 'learning_rate': 0.0, 'epoch': 2.04}

 68%|██████▊   | 1373/2022 [2:58:43<1:24:28,  7.81s/it]
 68%|██████▊   | 1374/2022 [2:58:51<1:23:02,  7.69s/it]
                                                       
{'loss': 1.1154, 'learning_rate': 0.0, 'epoch': 2.04}

 68%|██████▊   | 1374/2022 [2:58:51<1:23:02,  7.69s/it]
 68%|██████▊   | 1375/2022 [2:58:58<1:22:09,  7.62s/it]
                                                       
{'loss': 1.1908, 'learning_rate': 0.0, 'epoch': 2.04}

 68%|██████▊   | 1375/2022 [2:58:58<1:22:09,  7.62s/it]
 68%|██████▊   | 1376/2022 [2:59:06<1:22:00,  7.62s/it]
                                                       
{'loss': 1.101, 'learning_rate': 0.0, 'epoch': 2.04}

 68%|██████▊   | 1376/2022 [2:59:06<1:22:00,  7.62s/it]
 68%|██████▊   | 1377/2022 [2:59:14<1:22:34,  7.68s/it]
                                                       
{'loss': 0.9806, 'learning_rate': 0.0, 'epoch': 2.04}

 68%|██████▊   | 1377/2022 [2:59:14<1:22:34,  7.68s/it]
 68%|██████▊   | 1378/2022 [2:59:22<1:23:28,  7.78s/it]
                                                       
{'loss': 1.1441, 'learning_rate': 0.0, 'epoch': 2.04}

 68%|██████▊   | 1378/2022 [2:59:22<1:23:28,  7.78s/it]
 68%|██████▊   | 1379/2022 [2:59:30<1:24:38,  7.90s/it]
                                                       
{'loss': 1.1587, 'learning_rate': 0.0, 'epoch': 2.04}

 68%|██████▊   | 1379/2022 [2:59:30<1:24:38,  7.90s/it]
 68%|██████▊   | 1380/2022 [2:59:38<1:24:00,  7.85s/it]
                                                       
{'loss': 1.0818, 'learning_rate': 0.0, 'epoch': 2.05}

 68%|██████▊   | 1380/2022 [2:59:38<1:24:00,  7.85s/it]
 68%|██████▊   | 1381/2022 [2:59:45<1:23:10,  7.79s/it]
                                                       
{'loss': 1.1816, 'learning_rate': 0.0, 'epoch': 2.05}

 68%|██████▊   | 1381/2022 [2:59:45<1:23:10,  7.79s/it]
 68%|██████▊   | 1382/2022 [2:59:53<1:23:42,  7.85s/it]
                                                       
{'loss': 1.2362, 'learning_rate': 0.0, 'epoch': 2.05}

 68%|██████▊   | 1382/2022 [2:59:53<1:23:42,  7.85s/it]
 68%|██████▊   | 1383/2022 [3:00:01<1:23:00,  7.79s/it]
                                                       
{'loss': 1.1185, 'learning_rate': 0.0, 'epoch': 2.05}

 68%|██████▊   | 1383/2022 [3:00:01<1:23:00,  7.79s/it]
 68%|██████▊   | 1384/2022 [3:00:09<1:23:53,  7.89s/it]
                                                       
{'loss': 1.2079, 'learning_rate': 0.0, 'epoch': 2.05}

 68%|██████▊   | 1384/2022 [3:00:09<1:23:53,  7.89s/it]
 68%|██████▊   | 1385/2022 [3:00:17<1:23:07,  7.83s/it]
                                                       
{'loss': 1.0464, 'learning_rate': 0.0, 'epoch': 2.05}

 68%|██████▊   | 1385/2022 [3:00:17<1:23:07,  7.83s/it]
 69%|██████▊   | 1386/2022 [3:00:25<1:23:33,  7.88s/it]
                                                       
{'loss': 1.0168, 'learning_rate': 0.0, 'epoch': 2.05}

 69%|██████▊   | 1386/2022 [3:00:25<1:23:33,  7.88s/it]
 69%|██████▊   | 1387/2022 [3:00:33<1:23:38,  7.90s/it]
                                                       
{'loss': 1.1586, 'learning_rate': 0.0, 'epoch': 2.06}

 69%|██████▊   | 1387/2022 [3:00:33<1:23:38,  7.90s/it]
 69%|██████▊   | 1388/2022 [3:00:41<1:25:32,  8.10s/it]
                                                       
{'loss': 1.0843, 'learning_rate': 0.0, 'epoch': 2.06}

 69%|██████▊   | 1388/2022 [3:00:41<1:25:32,  8.10s/it]
 69%|██████▊   | 1389/2022 [3:00:49<1:23:08,  7.88s/it]
                                                       
{'loss': 1.2376, 'learning_rate': 0.0, 'epoch': 2.06}

 69%|██████▊   | 1389/2022 [3:00:49<1:23:08,  7.88s/it]
 69%|██████▊   | 1390/2022 [3:00:56<1:22:22,  7.82s/it]
                                                       
{'loss': 1.1737, 'learning_rate': 0.0, 'epoch': 2.06}

 69%|██████▊   | 1390/2022 [3:00:56<1:22:22,  7.82s/it]
 69%|██████▉   | 1391/2022 [3:01:04<1:21:35,  7.76s/it]
                                                       
{'loss': 1.1744, 'learning_rate': 0.0, 'epoch': 2.06}

 69%|██████▉   | 1391/2022 [3:01:04<1:21:35,  7.76s/it]
 69%|██████▉   | 1392/2022 [3:01:12<1:21:30,  7.76s/it]
                                                       
{'loss': 1.1658, 'learning_rate': 0.0, 'epoch': 2.06}

 69%|██████▉   | 1392/2022 [3:01:12<1:21:30,  7.76s/it]
 69%|██████▉   | 1393/2022 [3:01:19<1:21:09,  7.74s/it]
                                                       
{'loss': 1.0665, 'learning_rate': 0.0, 'epoch': 2.07}

 69%|██████▉   | 1393/2022 [3:01:19<1:21:09,  7.74s/it]
 69%|██████▉   | 1394/2022 [3:01:27<1:21:25,  7.78s/it]
                                                       
{'loss': 1.094, 'learning_rate': 0.0, 'epoch': 2.07}

 69%|██████▉   | 1394/2022 [3:01:27<1:21:25,  7.78s/it]
 69%|██████▉   | 1395/2022 [3:01:35<1:21:02,  7.75s/it]
                                                       
{'loss': 1.2699, 'learning_rate': 0.0, 'epoch': 2.07}

 69%|██████▉   | 1395/2022 [3:01:35<1:21:02,  7.75s/it]
 69%|██████▉   | 1396/2022 [3:01:43<1:21:43,  7.83s/it]
                                                       
{'loss': 1.2061, 'learning_rate': 0.0, 'epoch': 2.07}

 69%|██████▉   | 1396/2022 [3:01:43<1:21:43,  7.83s/it]
 69%|██████▉   | 1397/2022 [3:01:50<1:20:32,  7.73s/it]
                                                       
{'loss': 1.1211, 'learning_rate': 0.0, 'epoch': 2.07}

 69%|██████▉   | 1397/2022 [3:01:50<1:20:32,  7.73s/it]
 69%|██████▉   | 1398/2022 [3:01:58<1:21:09,  7.80s/it]
                                                       
{'loss': 1.2683, 'learning_rate': 0.0, 'epoch': 2.07}

 69%|██████▉   | 1398/2022 [3:01:58<1:21:09,  7.80s/it]
 69%|██████▉   | 1399/2022 [3:02:06<1:20:41,  7.77s/it]
                                                       
{'loss': 1.1274, 'learning_rate': 0.0, 'epoch': 2.07}

 69%|██████▉   | 1399/2022 [3:02:06<1:20:41,  7.77s/it]
 69%|██████▉   | 1400/2022 [3:02:14<1:20:18,  7.75s/it]
                                                       
{'loss': 1.3238, 'learning_rate': 0.0, 'epoch': 2.08}

 69%|██████▉   | 1400/2022 [3:02:14<1:20:18,  7.75s/it]
 69%|██████▉   | 1401/2022 [3:02:22<1:21:58,  7.92s/it]
                                                       
{'loss': 1.0402, 'learning_rate': 0.0, 'epoch': 2.08}

 69%|██████▉   | 1401/2022 [3:02:22<1:21:58,  7.92s/it]
 69%|██████▉   | 1402/2022 [3:02:30<1:22:13,  7.96s/it]
                                                       
{'loss': 1.1757, 'learning_rate': 0.0, 'epoch': 2.08}

 69%|██████▉   | 1402/2022 [3:02:30<1:22:13,  7.96s/it]
 69%|██████▉   | 1403/2022 [3:02:38<1:20:48,  7.83s/it]
                                                       
{'loss': 1.152, 'learning_rate': 0.0, 'epoch': 2.08}

 69%|██████▉   | 1403/2022 [3:02:38<1:20:48,  7.83s/it]
 69%|██████▉   | 1404/2022 [3:02:45<1:19:34,  7.73s/it]
                                                       
{'loss': 1.065, 'learning_rate': 0.0, 'epoch': 2.08}

 69%|██████▉   | 1404/2022 [3:02:45<1:19:34,  7.73s/it]
 69%|██████▉   | 1405/2022 [3:02:53<1:20:18,  7.81s/it]
                                                       
{'loss': 1.1045, 'learning_rate': 0.0, 'epoch': 2.08}

 69%|██████▉   | 1405/2022 [3:02:53<1:20:18,  7.81s/it]
 70%|██████▉   | 1406/2022 [3:03:01<1:20:44,  7.86s/it]
                                                       
{'loss': 1.0863, 'learning_rate': 0.0, 'epoch': 2.08}

 70%|██████▉   | 1406/2022 [3:03:01<1:20:44,  7.86s/it]
 70%|██████▉   | 1407/2022 [3:03:09<1:20:44,  7.88s/it]
                                                       
{'loss': 1.1637, 'learning_rate': 0.0, 'epoch': 2.09}

 70%|██████▉   | 1407/2022 [3:03:09<1:20:44,  7.88s/it]
 70%|██████▉   | 1408/2022 [3:03:17<1:21:10,  7.93s/it]
                                                       
{'loss': 1.1946, 'learning_rate': 0.0, 'epoch': 2.09}

 70%|██████▉   | 1408/2022 [3:03:17<1:21:10,  7.93s/it]
 70%|██████▉   | 1409/2022 [3:03:25<1:20:57,  7.92s/it]
                                                       
{'loss': 1.1987, 'learning_rate': 0.0, 'epoch': 2.09}

 70%|██████▉   | 1409/2022 [3:03:25<1:20:57,  7.92s/it]
 70%|██████▉   | 1410/2022 [3:03:33<1:20:35,  7.90s/it]
                                                       
{'loss': 1.2272, 'learning_rate': 0.0, 'epoch': 2.09}

 70%|██████▉   | 1410/2022 [3:03:33<1:20:35,  7.90s/it]
 70%|██████▉   | 1411/2022 [3:03:41<1:20:00,  7.86s/it]
                                                       
{'loss': 1.0058, 'learning_rate': 0.0, 'epoch': 2.09}

 70%|██████▉   | 1411/2022 [3:03:41<1:20:00,  7.86s/it]
 70%|██████▉   | 1412/2022 [3:03:49<1:21:44,  8.04s/it]
                                                       
{'loss': 1.1572, 'learning_rate': 0.0, 'epoch': 2.09}

 70%|██████▉   | 1412/2022 [3:03:49<1:21:44,  8.04s/it]
 70%|██████▉   | 1413/2022 [3:03:57<1:21:01,  7.98s/it]
                                                       
{'loss': 1.2919, 'learning_rate': 0.0, 'epoch': 2.09}

 70%|██████▉   | 1413/2022 [3:03:57<1:21:01,  7.98s/it]
 70%|██████▉   | 1414/2022 [3:04:05<1:19:59,  7.89s/it]
                                                       
{'loss': 1.2368, 'learning_rate': 0.0, 'epoch': 2.1}

 70%|██████▉   | 1414/2022 [3:04:05<1:19:59,  7.89s/it]
 70%|██████▉   | 1415/2022 [3:04:13<1:21:59,  8.10s/it]
                                                       
{'loss': 1.2329, 'learning_rate': 0.0, 'epoch': 2.1}

 70%|██████▉   | 1415/2022 [3:04:13<1:21:59,  8.10s/it]
 70%|███████   | 1416/2022 [3:04:21<1:21:24,  8.06s/it]
                                                       
{'loss': 1.144, 'learning_rate': 0.0, 'epoch': 2.1}

 70%|███████   | 1416/2022 [3:04:21<1:21:24,  8.06s/it]
 70%|███████   | 1417/2022 [3:04:29<1:20:42,  8.00s/it]
                                                       
{'loss': 1.2443, 'learning_rate': 0.0, 'epoch': 2.1}

 70%|███████   | 1417/2022 [3:04:29<1:20:42,  8.00s/it]
 70%|███████   | 1418/2022 [3:04:37<1:21:30,  8.10s/it]
                                                       
{'loss': 1.0247, 'learning_rate': 0.0, 'epoch': 2.1}

 70%|███████   | 1418/2022 [3:04:37<1:21:30,  8.10s/it]
 70%|███████   | 1419/2022 [3:04:45<1:19:04,  7.87s/it]
                                                       
{'loss': 1.1018, 'learning_rate': 0.0, 'epoch': 2.1}

 70%|███████   | 1419/2022 [3:04:45<1:19:04,  7.87s/it]
 70%|███████   | 1420/2022 [3:04:52<1:17:48,  7.75s/it]
                                                       
{'loss': 1.2522, 'learning_rate': 0.0, 'epoch': 2.11}

 70%|███████   | 1420/2022 [3:04:52<1:17:48,  7.75s/it]
 70%|███████   | 1421/2022 [3:05:00<1:17:25,  7.73s/it]
                                                       
{'loss': 1.1201, 'learning_rate': 0.0, 'epoch': 2.11}

 70%|███████   | 1421/2022 [3:05:00<1:17:25,  7.73s/it]
 70%|███████   | 1422/2022 [3:05:07<1:16:30,  7.65s/it]
                                                       
{'loss': 1.1303, 'learning_rate': 0.0, 'epoch': 2.11}

 70%|███████   | 1422/2022 [3:05:07<1:16:30,  7.65s/it]
 70%|███████   | 1423/2022 [3:05:15<1:17:05,  7.72s/it]
                                                       
{'loss': 1.2013, 'learning_rate': 0.0, 'epoch': 2.11}

 70%|███████   | 1423/2022 [3:05:15<1:17:05,  7.72s/it]
 70%|███████   | 1424/2022 [3:05:23<1:16:27,  7.67s/it]
                                                       
{'loss': 1.2373, 'learning_rate': 0.0, 'epoch': 2.11}

 70%|███████   | 1424/2022 [3:05:23<1:16:27,  7.67s/it]
 70%|███████   | 1425/2022 [3:05:30<1:16:05,  7.65s/it]
                                                       
{'loss': 1.2321, 'learning_rate': 0.0, 'epoch': 2.11}

 70%|███████   | 1425/2022 [3:05:30<1:16:05,  7.65s/it]
 71%|███████   | 1426/2022 [3:05:38<1:15:56,  7.65s/it]
                                                       
{'loss': 1.0881, 'learning_rate': 0.0, 'epoch': 2.11}

 71%|███████   | 1426/2022 [3:05:38<1:15:56,  7.65s/it]
 71%|███████   | 1427/2022 [3:05:46<1:16:39,  7.73s/it]
                                                       
{'loss': 1.1077, 'learning_rate': 0.0, 'epoch': 2.12}

 71%|███████   | 1427/2022 [3:05:46<1:16:39,  7.73s/it]
 71%|███████   | 1428/2022 [3:05:54<1:17:35,  7.84s/it]
                                                       
{'loss': 1.2172, 'learning_rate': 0.0, 'epoch': 2.12}

 71%|███████   | 1428/2022 [3:05:54<1:17:35,  7.84s/it]
 71%|███████   | 1429/2022 [3:06:02<1:17:22,  7.83s/it]
                                                       
{'loss': 1.3734, 'learning_rate': 0.0, 'epoch': 2.12}

 71%|███████   | 1429/2022 [3:06:02<1:17:22,  7.83s/it]
 71%|███████   | 1430/2022 [3:06:10<1:17:18,  7.84s/it]
                                                       
{'loss': 1.214, 'learning_rate': 0.0, 'epoch': 2.12}

 71%|███████   | 1430/2022 [3:06:10<1:17:18,  7.84s/it]
 71%|███████   | 1431/2022 [3:06:17<1:16:16,  7.74s/it]
                                                       
{'loss': 1.2421, 'learning_rate': 0.0, 'epoch': 2.12}

 71%|███████   | 1431/2022 [3:06:17<1:16:16,  7.74s/it]
 71%|███████   | 1432/2022 [3:06:25<1:17:16,  7.86s/it]
                                                       
{'loss': 1.0551, 'learning_rate': 0.0, 'epoch': 2.12}

 71%|███████   | 1432/2022 [3:06:25<1:17:16,  7.86s/it]
 71%|███████   | 1433/2022 [3:06:33<1:17:07,  7.86s/it]
                                                       
{'loss': 1.1202, 'learning_rate': 0.0, 'epoch': 2.12}

 71%|███████   | 1433/2022 [3:06:33<1:17:07,  7.86s/it]
 71%|███████   | 1434/2022 [3:06:41<1:16:41,  7.83s/it]
                                                       
{'loss': 1.248, 'learning_rate': 0.0, 'epoch': 2.13}

 71%|███████   | 1434/2022 [3:06:41<1:16:41,  7.83s/it]
 71%|███████   | 1435/2022 [3:06:49<1:17:03,  7.88s/it]
                                                       
{'loss': 1.1842, 'learning_rate': 0.0, 'epoch': 2.13}

 71%|███████   | 1435/2022 [3:06:49<1:17:03,  7.88s/it]
 71%|███████   | 1436/2022 [3:06:56<1:16:03,  7.79s/it]
                                                       
{'loss': 1.161, 'learning_rate': 0.0, 'epoch': 2.13}

 71%|███████   | 1436/2022 [3:06:56<1:16:03,  7.79s/it]
 71%|███████   | 1437/2022 [3:07:04<1:16:06,  7.81s/it]
                                                       
{'loss': 1.242, 'learning_rate': 0.0, 'epoch': 2.13}

 71%|███████   | 1437/2022 [3:07:04<1:16:06,  7.81s/it]
 71%|███████   | 1438/2022 [3:07:12<1:16:11,  7.83s/it]
                                                       
{'loss': 1.2015, 'learning_rate': 0.0, 'epoch': 2.13}

 71%|███████   | 1438/2022 [3:07:12<1:16:11,  7.83s/it]
 71%|███████   | 1439/2022 [3:07:20<1:15:58,  7.82s/it]
                                                       
{'loss': 1.2638, 'learning_rate': 0.0, 'epoch': 2.13}

 71%|███████   | 1439/2022 [3:07:20<1:15:58,  7.82s/it]
 71%|███████   | 1440/2022 [3:07:28<1:15:49,  7.82s/it]
                                                       
{'loss': 1.2175, 'learning_rate': 0.0, 'epoch': 2.13}

 71%|███████   | 1440/2022 [3:07:28<1:15:49,  7.82s/it]
 71%|███████▏  | 1441/2022 [3:07:36<1:16:04,  7.86s/it]
                                                       
{'loss': 1.1286, 'learning_rate': 0.0, 'epoch': 2.14}

 71%|███████▏  | 1441/2022 [3:07:36<1:16:04,  7.86s/it]
 71%|███████▏  | 1442/2022 [3:07:43<1:15:27,  7.81s/it]
                                                       
{'loss': 1.1581, 'learning_rate': 0.0, 'epoch': 2.14}

 71%|███████▏  | 1442/2022 [3:07:43<1:15:27,  7.81s/it]
 71%|███████▏  | 1443/2022 [3:07:51<1:15:52,  7.86s/it]
                                                       
{'loss': 1.1138, 'learning_rate': 0.0, 'epoch': 2.14}

 71%|███████▏  | 1443/2022 [3:07:51<1:15:52,  7.86s/it]
 71%|███████▏  | 1444/2022 [3:07:59<1:14:54,  7.78s/it]
                                                       
{'loss': 1.3114, 'learning_rate': 0.0, 'epoch': 2.14}

 71%|███████▏  | 1444/2022 [3:07:59<1:14:54,  7.78s/it]
 71%|███████▏  | 1445/2022 [3:08:07<1:14:13,  7.72s/it]
                                                       
{'loss': 1.2632, 'learning_rate': 0.0, 'epoch': 2.14}

 71%|███████▏  | 1445/2022 [3:08:07<1:14:13,  7.72s/it]
 72%|███████▏  | 1446/2022 [3:08:14<1:14:31,  7.76s/it]
                                                       
{'loss': 1.2081, 'learning_rate': 0.0, 'epoch': 2.14}

 72%|███████▏  | 1446/2022 [3:08:14<1:14:31,  7.76s/it]
 72%|███████▏  | 1447/2022 [3:08:23<1:15:30,  7.88s/it]
                                                       
{'loss': 1.065, 'learning_rate': 0.0, 'epoch': 2.15}

 72%|███████▏  | 1447/2022 [3:08:23<1:15:30,  7.88s/it]
 72%|███████▏  | 1448/2022 [3:08:30<1:14:06,  7.75s/it]
                                                       
{'loss': 1.2833, 'learning_rate': 0.0, 'epoch': 2.15}

 72%|███████▏  | 1448/2022 [3:08:30<1:14:06,  7.75s/it]
 72%|███████▏  | 1449/2022 [3:08:38<1:14:00,  7.75s/it]
                                                       
{'loss': 1.2005, 'learning_rate': 0.0, 'epoch': 2.15}

 72%|███████▏  | 1449/2022 [3:08:38<1:14:00,  7.75s/it]
 72%|███████▏  | 1450/2022 [3:08:46<1:14:35,  7.82s/it]
                                                       
{'loss': 1.1823, 'learning_rate': 0.0, 'epoch': 2.15}

 72%|███████▏  | 1450/2022 [3:08:46<1:14:35,  7.82s/it]
 72%|███████▏  | 1451/2022 [3:08:54<1:14:55,  7.87s/it]
                                                       
{'loss': 1.1269, 'learning_rate': 0.0, 'epoch': 2.15}

 72%|███████▏  | 1451/2022 [3:08:54<1:14:55,  7.87s/it]
 72%|███████▏  | 1452/2022 [3:09:02<1:14:21,  7.83s/it]
                                                       
{'loss': 1.0327, 'learning_rate': 0.0, 'epoch': 2.15}

 72%|███████▏  | 1452/2022 [3:09:02<1:14:21,  7.83s/it]
 72%|███████▏  | 1453/2022 [3:09:09<1:14:22,  7.84s/it]
                                                       
{'loss': 1.2268, 'learning_rate': 0.0, 'epoch': 2.15}

 72%|███████▏  | 1453/2022 [3:09:09<1:14:22,  7.84s/it]
 72%|███████▏  | 1454/2022 [3:09:17<1:14:46,  7.90s/it]
                                                       
{'loss': 1.1781, 'learning_rate': 0.0, 'epoch': 2.16}

 72%|███████▏  | 1454/2022 [3:09:17<1:14:46,  7.90s/it]
 72%|███████▏  | 1455/2022 [3:09:25<1:14:13,  7.85s/it]
                                                       
{'loss': 1.1968, 'learning_rate': 0.0, 'epoch': 2.16}

 72%|███████▏  | 1455/2022 [3:09:25<1:14:13,  7.85s/it]
 72%|███████▏  | 1456/2022 [3:09:33<1:12:47,  7.72s/it]
                                                       
{'loss': 1.227, 'learning_rate': 0.0, 'epoch': 2.16}

 72%|███████▏  | 1456/2022 [3:09:33<1:12:47,  7.72s/it]
 72%|███████▏  | 1457/2022 [3:09:40<1:13:06,  7.76s/it]
                                                       
{'loss': 1.2387, 'learning_rate': 0.0, 'epoch': 2.16}

 72%|███████▏  | 1457/2022 [3:09:40<1:13:06,  7.76s/it]
 72%|███████▏  | 1458/2022 [3:09:48<1:13:30,  7.82s/it]
                                                       
{'loss': 1.2344, 'learning_rate': 0.0, 'epoch': 2.16}

 72%|███████▏  | 1458/2022 [3:09:48<1:13:30,  7.82s/it]
 72%|███████▏  | 1459/2022 [3:09:56<1:13:25,  7.83s/it]
                                                       
{'loss': 1.176, 'learning_rate': 0.0, 'epoch': 2.16}

 72%|███████▏  | 1459/2022 [3:09:56<1:13:25,  7.83s/it]
 72%|███████▏  | 1460/2022 [3:10:04<1:12:40,  7.76s/it]
                                                       
{'loss': 1.1933, 'learning_rate': 0.0, 'epoch': 2.16}

 72%|███████▏  | 1460/2022 [3:10:04<1:12:40,  7.76s/it]
 72%|███████▏  | 1461/2022 [3:10:12<1:13:23,  7.85s/it]
                                                       
{'loss': 1.2726, 'learning_rate': 0.0, 'epoch': 2.17}

 72%|███████▏  | 1461/2022 [3:10:12<1:13:23,  7.85s/it]
 72%|███████▏  | 1462/2022 [3:10:20<1:13:21,  7.86s/it]
                                                       
{'loss': 1.2177, 'learning_rate': 0.0, 'epoch': 2.17}

 72%|███████▏  | 1462/2022 [3:10:20<1:13:21,  7.86s/it]
 72%|███████▏  | 1463/2022 [3:10:28<1:13:08,  7.85s/it]
                                                       
{'loss': 1.0783, 'learning_rate': 0.0, 'epoch': 2.17}

 72%|███████▏  | 1463/2022 [3:10:28<1:13:08,  7.85s/it]
 72%|███████▏  | 1464/2022 [3:10:36<1:13:22,  7.89s/it]
                                                       
{'loss': 1.1799, 'learning_rate': 0.0, 'epoch': 2.17}

 72%|███████▏  | 1464/2022 [3:10:36<1:13:22,  7.89s/it]
 72%|███████▏  | 1465/2022 [3:10:43<1:13:17,  7.90s/it]
                                                       
{'loss': 1.1682, 'learning_rate': 0.0, 'epoch': 2.17}

 72%|███████▏  | 1465/2022 [3:10:44<1:13:17,  7.90s/it]
 73%|███████▎  | 1466/2022 [3:10:51<1:12:56,  7.87s/it]
                                                       
{'loss': 1.2626, 'learning_rate': 0.0, 'epoch': 2.17}

 73%|███████▎  | 1466/2022 [3:10:51<1:12:56,  7.87s/it]
 73%|███████▎  | 1467/2022 [3:10:59<1:12:22,  7.82s/it]
                                                       
{'loss': 1.2262, 'learning_rate': 0.0, 'epoch': 2.17}

 73%|███████▎  | 1467/2022 [3:10:59<1:12:22,  7.82s/it]
 73%|███████▎  | 1468/2022 [3:11:07<1:12:01,  7.80s/it]
                                                       
{'loss': 1.3025, 'learning_rate': 0.0, 'epoch': 2.18}

 73%|███████▎  | 1468/2022 [3:11:07<1:12:01,  7.80s/it]
 73%|███████▎  | 1469/2022 [3:11:15<1:11:58,  7.81s/it]
                                                       
{'loss': 1.0726, 'learning_rate': 0.0, 'epoch': 2.18}

 73%|███████▎  | 1469/2022 [3:11:15<1:11:58,  7.81s/it]
 73%|███████▎  | 1470/2022 [3:11:23<1:12:25,  7.87s/it]
                                                       
{'loss': 1.1807, 'learning_rate': 0.0, 'epoch': 2.18}

 73%|███████▎  | 1470/2022 [3:11:23<1:12:25,  7.87s/it]
 73%|███████▎  | 1471/2022 [3:11:30<1:12:17,  7.87s/it]
                                                       
{'loss': 1.1032, 'learning_rate': 0.0, 'epoch': 2.18}

 73%|███████▎  | 1471/2022 [3:11:30<1:12:17,  7.87s/it]
 73%|███████▎  | 1472/2022 [3:11:38<1:11:45,  7.83s/it]
                                                       
{'loss': 1.2187, 'learning_rate': 0.0, 'epoch': 2.18}

 73%|███████▎  | 1472/2022 [3:11:38<1:11:45,  7.83s/it]
 73%|███████▎  | 1473/2022 [3:11:46<1:11:40,  7.83s/it]
                                                       
{'loss': 1.1344, 'learning_rate': 0.0, 'epoch': 2.18}

 73%|███████▎  | 1473/2022 [3:11:46<1:11:40,  7.83s/it]
 73%|███████▎  | 1474/2022 [3:11:54<1:11:15,  7.80s/it]
                                                       
{'loss': 1.095, 'learning_rate': 0.0, 'epoch': 2.19}

 73%|███████▎  | 1474/2022 [3:11:54<1:11:15,  7.80s/it]
 73%|███████▎  | 1475/2022 [3:12:02<1:11:08,  7.80s/it]
                                                       
{'loss': 0.9806, 'learning_rate': 0.0, 'epoch': 2.19}

 73%|███████▎  | 1475/2022 [3:12:02<1:11:08,  7.80s/it]
 73%|███████▎  | 1476/2022 [3:12:09<1:09:51,  7.68s/it]
                                                       
{'loss': 1.1064, 'learning_rate': 0.0, 'epoch': 2.19}

 73%|███████▎  | 1476/2022 [3:12:09<1:09:51,  7.68s/it]
 73%|███████▎  | 1477/2022 [3:12:17<1:10:33,  7.77s/it]
                                                       
{'loss': 1.0596, 'learning_rate': 0.0, 'epoch': 2.19}

 73%|███████▎  | 1477/2022 [3:12:17<1:10:33,  7.77s/it]
 73%|███████▎  | 1478/2022 [3:12:25<1:10:54,  7.82s/it]
                                                       
{'loss': 1.1915, 'learning_rate': 0.0, 'epoch': 2.19}

 73%|███████▎  | 1478/2022 [3:12:25<1:10:54,  7.82s/it]
 73%|███████▎  | 1479/2022 [3:12:33<1:11:03,  7.85s/it]
                                                       
{'loss': 1.1477, 'learning_rate': 0.0, 'epoch': 2.19}

 73%|███████▎  | 1479/2022 [3:12:33<1:11:03,  7.85s/it]
 73%|███████▎  | 1480/2022 [3:12:41<1:11:21,  7.90s/it]
                                                       
{'loss': 1.1259, 'learning_rate': 0.0, 'epoch': 2.19}

 73%|███████▎  | 1480/2022 [3:12:41<1:11:21,  7.90s/it]
 73%|███████▎  | 1481/2022 [3:12:49<1:10:39,  7.84s/it]
                                                       
{'loss': 1.1893, 'learning_rate': 0.0, 'epoch': 2.2}

 73%|███████▎  | 1481/2022 [3:12:49<1:10:39,  7.84s/it]
 73%|███████▎  | 1482/2022 [3:12:56<1:10:40,  7.85s/it]
                                                       
{'loss': 1.2271, 'learning_rate': 0.0, 'epoch': 2.2}

 73%|███████▎  | 1482/2022 [3:12:56<1:10:40,  7.85s/it]
 73%|███████▎  | 1483/2022 [3:13:04<1:10:25,  7.84s/it]
                                                       
{'loss': 1.1264, 'learning_rate': 0.0, 'epoch': 2.2}

 73%|███████▎  | 1483/2022 [3:13:04<1:10:25,  7.84s/it]
 73%|███████▎  | 1484/2022 [3:13:12<1:09:25,  7.74s/it]
                                                       
{'loss': 1.0735, 'learning_rate': 0.0, 'epoch': 2.2}

 73%|███████▎  | 1484/2022 [3:13:12<1:09:25,  7.74s/it]
 73%|███████▎  | 1485/2022 [3:13:20<1:09:44,  7.79s/it]
                                                       
{'loss': 1.2066, 'learning_rate': 0.0, 'epoch': 2.2}

 73%|███████▎  | 1485/2022 [3:13:20<1:09:44,  7.79s/it]
 73%|███████▎  | 1486/2022 [3:13:28<1:09:50,  7.82s/it]
                                                       
{'loss': 1.2134, 'learning_rate': 0.0, 'epoch': 2.2}

 73%|███████▎  | 1486/2022 [3:13:28<1:09:50,  7.82s/it]
 74%|███████▎  | 1487/2022 [3:13:35<1:09:55,  7.84s/it]
                                                       
{'loss': 1.1108, 'learning_rate': 0.0, 'epoch': 2.2}

 74%|███████▎  | 1487/2022 [3:13:35<1:09:55,  7.84s/it]
 74%|███████▎  | 1488/2022 [3:13:43<1:09:36,  7.82s/it]
                                                       
{'loss': 1.0501, 'learning_rate': 0.0, 'epoch': 2.21}

 74%|███████▎  | 1488/2022 [3:13:43<1:09:36,  7.82s/it]
 74%|███████▎  | 1489/2022 [3:13:51<1:09:06,  7.78s/it]
                                                       
{'loss': 1.0291, 'learning_rate': 0.0, 'epoch': 2.21}

 74%|███████▎  | 1489/2022 [3:13:51<1:09:06,  7.78s/it]
 74%|███████▎  | 1490/2022 [3:13:58<1:08:31,  7.73s/it]
                                                       
{'loss': 1.2189, 'learning_rate': 0.0, 'epoch': 2.21}

 74%|███████▎  | 1490/2022 [3:13:59<1:08:31,  7.73s/it]
 74%|███████▎  | 1491/2022 [3:14:07<1:09:37,  7.87s/it]
                                                       
{'loss': 1.0678, 'learning_rate': 0.0, 'epoch': 2.21}

 74%|███████▎  | 1491/2022 [3:14:07<1:09:37,  7.87s/it]
 74%|███████▍  | 1492/2022 [3:14:14<1:08:06,  7.71s/it]
                                                       
{'loss': 1.2303, 'learning_rate': 0.0, 'epoch': 2.21}

 74%|███████▍  | 1492/2022 [3:14:14<1:08:06,  7.71s/it]
 74%|███████▍  | 1493/2022 [3:14:21<1:07:15,  7.63s/it]
                                                       
{'loss': 1.2293, 'learning_rate': 0.0, 'epoch': 2.21}

 74%|███████▍  | 1493/2022 [3:14:21<1:07:15,  7.63s/it]
 74%|███████▍  | 1494/2022 [3:14:29<1:07:42,  7.69s/it]
                                                       
{'loss': 1.0864, 'learning_rate': 0.0, 'epoch': 2.21}

 74%|███████▍  | 1494/2022 [3:14:29<1:07:42,  7.69s/it]
 74%|███████▍  | 1495/2022 [3:14:37<1:07:58,  7.74s/it]
                                                       
{'loss': 1.3185, 'learning_rate': 0.0, 'epoch': 2.22}

 74%|███████▍  | 1495/2022 [3:14:37<1:07:58,  7.74s/it]
 74%|███████▍  | 1496/2022 [3:14:45<1:08:39,  7.83s/it]
                                                       
{'loss': 1.0482, 'learning_rate': 0.0, 'epoch': 2.22}

 74%|███████▍  | 1496/2022 [3:14:45<1:08:39,  7.83s/it]
 74%|███████▍  | 1497/2022 [3:14:53<1:07:53,  7.76s/it]
                                                       
{'loss': 1.2067, 'learning_rate': 0.0, 'epoch': 2.22}

 74%|███████▍  | 1497/2022 [3:14:53<1:07:53,  7.76s/it]
 74%|███████▍  | 1498/2022 [3:15:01<1:08:33,  7.85s/it]
                                                       
{'loss': 1.1457, 'learning_rate': 0.0, 'epoch': 2.22}

 74%|███████▍  | 1498/2022 [3:15:01<1:08:33,  7.85s/it]
 74%|███████▍  | 1499/2022 [3:15:09<1:08:41,  7.88s/it]
                                                       
{'loss': 1.3152, 'learning_rate': 0.0, 'epoch': 2.22}

 74%|███████▍  | 1499/2022 [3:15:09<1:08:41,  7.88s/it]
 74%|███████▍  | 1500/2022 [3:15:17<1:08:30,  7.88s/it]
                                                       
{'loss': 1.1569, 'learning_rate': 0.0, 'epoch': 2.22}

 74%|███████▍  | 1500/2022 [3:15:17<1:08:30,  7.88s/it]
 74%|███████▍  | 1501/2022 [3:15:25<1:08:48,  7.92s/it]
                                                       
{'loss': 1.1549, 'learning_rate': 0.0, 'epoch': 2.23}

 74%|███████▍  | 1501/2022 [3:15:25<1:08:48,  7.92s/it]
 74%|███████▍  | 1502/2022 [3:15:32<1:08:00,  7.85s/it]
                                                       
{'loss': 1.1427, 'learning_rate': 0.0, 'epoch': 2.23}

 74%|███████▍  | 1502/2022 [3:15:32<1:08:00,  7.85s/it]
 74%|███████▍  | 1503/2022 [3:15:40<1:07:38,  7.82s/it]
                                                       
{'loss': 1.1778, 'learning_rate': 0.0, 'epoch': 2.23}

 74%|███████▍  | 1503/2022 [3:15:40<1:07:38,  7.82s/it]
 74%|███████▍  | 1504/2022 [3:15:48<1:07:36,  7.83s/it]
                                                       
{'loss': 1.174, 'learning_rate': 0.0, 'epoch': 2.23}

 74%|███████▍  | 1504/2022 [3:15:48<1:07:36,  7.83s/it]
 74%|███████▍  | 1505/2022 [3:15:56<1:07:47,  7.87s/it]
                                                       
{'loss': 1.1342, 'learning_rate': 0.0, 'epoch': 2.23}

 74%|███████▍  | 1505/2022 [3:15:56<1:07:47,  7.87s/it]
 74%|███████▍  | 1506/2022 [3:16:04<1:07:21,  7.83s/it]
                                                       
{'loss': 1.1501, 'learning_rate': 0.0, 'epoch': 2.23}

 74%|███████▍  | 1506/2022 [3:16:04<1:07:21,  7.83s/it]
 75%|███████▍  | 1507/2022 [3:16:11<1:07:00,  7.81s/it]
                                                       
{'loss': 1.3492, 'learning_rate': 0.0, 'epoch': 2.23}

 75%|███████▍  | 1507/2022 [3:16:11<1:07:00,  7.81s/it]
 75%|███████▍  | 1508/2022 [3:16:19<1:06:52,  7.81s/it]
                                                       
{'loss': 1.2131, 'learning_rate': 0.0, 'epoch': 2.24}

 75%|███████▍  | 1508/2022 [3:16:19<1:06:52,  7.81s/it]
 75%|███████▍  | 1509/2022 [3:16:27<1:05:56,  7.71s/it]
                                                       
{'loss': 1.1363, 'learning_rate': 0.0, 'epoch': 2.24}

 75%|███████▍  | 1509/2022 [3:16:27<1:05:56,  7.71s/it]
 75%|███████▍  | 1510/2022 [3:16:34<1:05:46,  7.71s/it]
                                                       
{'loss': 1.1128, 'learning_rate': 0.0, 'epoch': 2.24}

 75%|███████▍  | 1510/2022 [3:16:34<1:05:46,  7.71s/it]
 75%|███████▍  | 1511/2022 [3:16:42<1:05:35,  7.70s/it]
                                                       
{'loss': 1.0916, 'learning_rate': 0.0, 'epoch': 2.24}

 75%|███████▍  | 1511/2022 [3:16:42<1:05:35,  7.70s/it]
 75%|███████▍  | 1512/2022 [3:16:50<1:05:16,  7.68s/it]
                                                       
{'loss': 1.1864, 'learning_rate': 0.0, 'epoch': 2.24}

 75%|███████▍  | 1512/2022 [3:16:50<1:05:16,  7.68s/it]
 75%|███████▍  | 1513/2022 [3:16:57<1:05:13,  7.69s/it]
                                                       
{'loss': 1.2845, 'learning_rate': 0.0, 'epoch': 2.24}

 75%|███████▍  | 1513/2022 [3:16:57<1:05:13,  7.69s/it]
 75%|███████▍  | 1514/2022 [3:17:05<1:05:48,  7.77s/it]
                                                       
{'loss': 1.0501, 'learning_rate': 0.0, 'epoch': 2.24}

 75%|███████▍  | 1514/2022 [3:17:05<1:05:48,  7.77s/it]
 75%|███████▍  | 1515/2022 [3:17:13<1:06:00,  7.81s/it]
                                                       
{'loss': 1.1785, 'learning_rate': 0.0, 'epoch': 2.25}

 75%|███████▍  | 1515/2022 [3:17:13<1:06:00,  7.81s/it]
 75%|███████▍  | 1516/2022 [3:17:23<1:09:18,  8.22s/it]
                                                       
{'loss': 1.0922, 'learning_rate': 0.0, 'epoch': 2.25}

 75%|███████▍  | 1516/2022 [3:17:23<1:09:18,  8.22s/it]
 75%|███████▌  | 1517/2022 [3:17:30<1:07:44,  8.05s/it]
                                                       
{'loss': 1.1728, 'learning_rate': 0.0, 'epoch': 2.25}

 75%|███████▌  | 1517/2022 [3:17:30<1:07:44,  8.05s/it]
 75%|███████▌  | 1518/2022 [3:17:38<1:07:11,  8.00s/it]
                                                       
{'loss': 1.1473, 'learning_rate': 0.0, 'epoch': 2.25}

 75%|███████▌  | 1518/2022 [3:17:38<1:07:11,  8.00s/it]
 75%|███████▌  | 1519/2022 [3:17:46<1:06:41,  7.96s/it]
                                                       
{'loss': 1.2475, 'learning_rate': 0.0, 'epoch': 2.25}

 75%|███████▌  | 1519/2022 [3:17:46<1:06:41,  7.96s/it]
 75%|███████▌  | 1520/2022 [3:17:54<1:05:59,  7.89s/it]
                                                       
{'loss': 1.0168, 'learning_rate': 0.0, 'epoch': 2.25}

 75%|███████▌  | 1520/2022 [3:17:54<1:05:59,  7.89s/it]
 75%|███████▌  | 1521/2022 [3:18:01<1:05:22,  7.83s/it]
                                                       
{'loss': 1.2311, 'learning_rate': 0.0, 'epoch': 2.26}

 75%|███████▌  | 1521/2022 [3:18:01<1:05:22,  7.83s/it]
 75%|███████▌  | 1522/2022 [3:18:09<1:05:03,  7.81s/it]
                                                       
{'loss': 1.2004, 'learning_rate': 0.0, 'epoch': 2.26}

 75%|███████▌  | 1522/2022 [3:18:09<1:05:03,  7.81s/it]
 75%|███████▌  | 1523/2022 [3:18:17<1:04:12,  7.72s/it]
                                                       
{'loss': 1.2951, 'learning_rate': 0.0, 'epoch': 2.26}

 75%|███████▌  | 1523/2022 [3:18:17<1:04:12,  7.72s/it]
 75%|███████▌  | 1524/2022 [3:18:24<1:02:54,  7.58s/it]
                                                       
{'loss': 1.1939, 'learning_rate': 0.0, 'epoch': 2.26}

 75%|███████▌  | 1524/2022 [3:18:24<1:02:54,  7.58s/it]
 75%|███████▌  | 1525/2022 [3:18:31<1:02:51,  7.59s/it]
                                                       
{'loss': 1.1224, 'learning_rate': 0.0, 'epoch': 2.26}

 75%|███████▌  | 1525/2022 [3:18:31<1:02:51,  7.59s/it]
 75%|███████▌  | 1526/2022 [3:18:39<1:02:22,  7.54s/it]
                                                       
{'loss': 1.1551, 'learning_rate': 0.0, 'epoch': 2.26}

 75%|███████▌  | 1526/2022 [3:18:39<1:02:22,  7.54s/it]
 76%|███████▌  | 1527/2022 [3:18:47<1:03:18,  7.67s/it]
                                                       
{'loss': 1.2623, 'learning_rate': 0.0, 'epoch': 2.26}

 76%|███████▌  | 1527/2022 [3:18:47<1:03:18,  7.67s/it]
 76%|███████▌  | 1528/2022 [3:18:55<1:03:04,  7.66s/it]
                                                       
{'loss': 1.1342, 'learning_rate': 0.0, 'epoch': 2.27}

 76%|███████▌  | 1528/2022 [3:18:55<1:03:04,  7.66s/it]
 76%|███████▌  | 1529/2022 [3:19:02<1:03:05,  7.68s/it]
                                                       
{'loss': 1.198, 'learning_rate': 0.0, 'epoch': 2.27}

 76%|███████▌  | 1529/2022 [3:19:02<1:03:05,  7.68s/it]
 76%|███████▌  | 1530/2022 [3:19:10<1:02:20,  7.60s/it]
                                                       
{'loss': 1.1285, 'learning_rate': 0.0, 'epoch': 2.27}

 76%|███████▌  | 1530/2022 [3:19:10<1:02:20,  7.60s/it]
 76%|███████▌  | 1531/2022 [3:19:18<1:02:58,  7.69s/it]
                                                       
{'loss': 1.1116, 'learning_rate': 0.0, 'epoch': 2.27}

 76%|███████▌  | 1531/2022 [3:19:18<1:02:58,  7.69s/it]
 76%|███████▌  | 1532/2022 [3:19:25<1:02:36,  7.67s/it]
                                                       
{'loss': 1.1812, 'learning_rate': 0.0, 'epoch': 2.27}

 76%|███████▌  | 1532/2022 [3:19:25<1:02:36,  7.67s/it]
 76%|███████▌  | 1533/2022 [3:19:33<1:03:33,  7.80s/it]
                                                       
{'loss': 1.1821, 'learning_rate': 0.0, 'epoch': 2.27}

 76%|███████▌  | 1533/2022 [3:19:33<1:03:33,  7.80s/it]
 76%|███████▌  | 1534/2022 [3:19:41<1:03:18,  7.78s/it]
                                                       
{'loss': 1.0968, 'learning_rate': 0.0, 'epoch': 2.27}

 76%|███████▌  | 1534/2022 [3:19:41<1:03:18,  7.78s/it]
 76%|███████▌  | 1535/2022 [3:19:48<1:02:25,  7.69s/it]
                                                       
{'loss': 1.2123, 'learning_rate': 0.0, 'epoch': 2.28}

 76%|███████▌  | 1535/2022 [3:19:49<1:02:25,  7.69s/it]
 76%|███████▌  | 1536/2022 [3:19:56<1:02:40,  7.74s/it]
                                                       
{'loss': 1.1387, 'learning_rate': 0.0, 'epoch': 2.28}

 76%|███████▌  | 1536/2022 [3:19:56<1:02:40,  7.74s/it]
 76%|███████▌  | 1537/2022 [3:20:04<1:03:06,  7.81s/it]
                                                       
{'loss': 1.2838, 'learning_rate': 0.0, 'epoch': 2.28}

 76%|███████▌  | 1537/2022 [3:20:04<1:03:06,  7.81s/it]
 76%|███████▌  | 1538/2022 [3:20:12<1:03:39,  7.89s/it]
                                                       
{'loss': 1.2126, 'learning_rate': 0.0, 'epoch': 2.28}

 76%|███████▌  | 1538/2022 [3:20:12<1:03:39,  7.89s/it]
 76%|███████▌  | 1539/2022 [3:20:20<1:03:21,  7.87s/it]
                                                       
{'loss': 1.1973, 'learning_rate': 0.0, 'epoch': 2.28}

 76%|███████▌  | 1539/2022 [3:20:20<1:03:21,  7.87s/it]
 76%|███████▌  | 1540/2022 [3:20:28<1:03:32,  7.91s/it]
                                                       
{'loss': 1.0993, 'learning_rate': 0.0, 'epoch': 2.28}

 76%|███████▌  | 1540/2022 [3:20:28<1:03:32,  7.91s/it]
 76%|███████▌  | 1541/2022 [3:20:36<1:03:41,  7.94s/it]
                                                       
{'loss': 1.12, 'learning_rate': 0.0, 'epoch': 2.28}

 76%|███████▌  | 1541/2022 [3:20:36<1:03:41,  7.94s/it]
 76%|███████▋  | 1542/2022 [3:20:44<1:03:39,  7.96s/it]
                                                       
{'loss': 1.2134, 'learning_rate': 0.0, 'epoch': 2.29}

 76%|███████▋  | 1542/2022 [3:20:44<1:03:39,  7.96s/it]
 76%|███████▋  | 1543/2022 [3:20:52<1:03:54,  8.00s/it]
                                                       
{'loss': 1.0598, 'learning_rate': 0.0, 'epoch': 2.29}

 76%|███████▋  | 1543/2022 [3:20:52<1:03:54,  8.00s/it]
 76%|███████▋  | 1544/2022 [3:21:00<1:03:32,  7.98s/it]
                                                       
{'loss': 1.2693, 'learning_rate': 0.0, 'epoch': 2.29}

 76%|███████▋  | 1544/2022 [3:21:00<1:03:32,  7.98s/it]
 76%|███████▋  | 1545/2022 [3:21:08<1:03:09,  7.94s/it]
                                                       
{'loss': 0.9505, 'learning_rate': 0.0, 'epoch': 2.29}

 76%|███████▋  | 1545/2022 [3:21:08<1:03:09,  7.94s/it]
 76%|███████▋  | 1546/2022 [3:21:16<1:02:56,  7.93s/it]
                                                       
{'loss': 1.1467, 'learning_rate': 0.0, 'epoch': 2.29}

 76%|███████▋  | 1546/2022 [3:21:16<1:02:56,  7.93s/it]
 77%|███████▋  | 1547/2022 [3:21:24<1:02:25,  7.89s/it]
                                                       
{'loss': 1.0793, 'learning_rate': 0.0, 'epoch': 2.29}

 77%|███████▋  | 1547/2022 [3:21:24<1:02:25,  7.89s/it]
 77%|███████▋  | 1548/2022 [3:21:32<1:03:03,  7.98s/it]
                                                       
{'loss': 1.1579, 'learning_rate': 0.0, 'epoch': 2.3}

 77%|███████▋  | 1548/2022 [3:21:32<1:03:03,  7.98s/it]
 77%|███████▋  | 1549/2022 [3:21:40<1:03:08,  8.01s/it]
                                                       
{'loss': 1.0878, 'learning_rate': 0.0, 'epoch': 2.3}

 77%|███████▋  | 1549/2022 [3:21:40<1:03:08,  8.01s/it]
 77%|███████▋  | 1550/2022 [3:21:48<1:02:54,  8.00s/it]
                                                       
{'loss': 1.1308, 'learning_rate': 0.0, 'epoch': 2.3}

 77%|███████▋  | 1550/2022 [3:21:48<1:02:54,  8.00s/it]
 77%|███████▋  | 1551/2022 [3:21:56<1:02:42,  7.99s/it]
                                                       
{'loss': 1.2265, 'learning_rate': 0.0, 'epoch': 2.3}

 77%|███████▋  | 1551/2022 [3:21:56<1:02:42,  7.99s/it]
 77%|███████▋  | 1552/2022 [3:22:04<1:02:38,  8.00s/it]
                                                       
{'loss': 1.2202, 'learning_rate': 0.0, 'epoch': 2.3}

 77%|███████▋  | 1552/2022 [3:22:04<1:02:38,  8.00s/it]
 77%|███████▋  | 1553/2022 [3:22:12<1:03:05,  8.07s/it]
                                                       
{'loss': 1.1923, 'learning_rate': 0.0, 'epoch': 2.3}

 77%|███████▋  | 1553/2022 [3:22:12<1:03:05,  8.07s/it]
 77%|███████▋  | 1554/2022 [3:22:20<1:02:03,  7.96s/it]
                                                       
{'loss': 1.1987, 'learning_rate': 0.0, 'epoch': 2.3}

 77%|███████▋  | 1554/2022 [3:22:20<1:02:03,  7.96s/it]
 77%|███████▋  | 1555/2022 [3:22:28<1:02:02,  7.97s/it]
                                                       
{'loss': 1.1079, 'learning_rate': 0.0, 'epoch': 2.31}

 77%|███████▋  | 1555/2022 [3:22:28<1:02:02,  7.97s/it]
 77%|███████▋  | 1556/2022 [3:22:36<1:01:45,  7.95s/it]
                                                       
{'loss': 1.2427, 'learning_rate': 0.0, 'epoch': 2.31}

 77%|███████▋  | 1556/2022 [3:22:36<1:01:45,  7.95s/it]
 77%|███████▋  | 1557/2022 [3:22:44<1:01:17,  7.91s/it]
                                                       
{'loss': 1.2198, 'learning_rate': 0.0, 'epoch': 2.31}

 77%|███████▋  | 1557/2022 [3:22:44<1:01:17,  7.91s/it]
 77%|███████▋  | 1558/2022 [3:22:52<1:01:33,  7.96s/it]
                                                       
{'loss': 1.1469, 'learning_rate': 0.0, 'epoch': 2.31}

 77%|███████▋  | 1558/2022 [3:22:52<1:01:33,  7.96s/it]
 77%|███████▋  | 1559/2022 [3:22:59<1:00:31,  7.84s/it]
                                                       
{'loss': 1.1406, 'learning_rate': 0.0, 'epoch': 2.31}

 77%|███████▋  | 1559/2022 [3:22:59<1:00:31,  7.84s/it]
 77%|███████▋  | 1560/2022 [3:23:07<1:00:47,  7.90s/it]
                                                       
{'loss': 1.1658, 'learning_rate': 0.0, 'epoch': 2.31}

 77%|███████▋  | 1560/2022 [3:23:07<1:00:47,  7.90s/it]
 77%|███████▋  | 1561/2022 [3:23:16<1:01:17,  7.98s/it]
                                                       
{'loss': 1.196, 'learning_rate': 0.0, 'epoch': 2.31}

 77%|███████▋  | 1561/2022 [3:23:16<1:01:17,  7.98s/it]
 77%|███████▋  | 1562/2022 [3:23:23<1:00:33,  7.90s/it]
                                                       
{'loss': 1.3795, 'learning_rate': 0.0, 'epoch': 2.32}

 77%|███████▋  | 1562/2022 [3:23:23<1:00:33,  7.90s/it]
 77%|███████▋  | 1563/2022 [3:23:31<1:01:00,  7.97s/it]
                                                       
{'loss': 1.1438, 'learning_rate': 0.0, 'epoch': 2.32}

 77%|███████▋  | 1563/2022 [3:23:31<1:01:00,  7.97s/it]
 77%|███████▋  | 1564/2022 [3:23:39<59:49,  7.84s/it]  
                                                     
{'loss': 1.2436, 'learning_rate': 0.0, 'epoch': 2.32}

 77%|███████▋  | 1564/2022 [3:23:39<59:49,  7.84s/it]
 77%|███████▋  | 1565/2022 [3:23:47<59:55,  7.87s/it]
                                                     
{'loss': 1.131, 'learning_rate': 0.0, 'epoch': 2.32}

 77%|███████▋  | 1565/2022 [3:23:47<59:55,  7.87s/it]
 77%|███████▋  | 1566/2022 [3:23:55<59:50,  7.87s/it]
                                                     
{'loss': 1.2469, 'learning_rate': 0.0, 'epoch': 2.32}

 77%|███████▋  | 1566/2022 [3:23:55<59:50,  7.87s/it]
 77%|███████▋  | 1567/2022 [3:24:03<1:00:33,  7.99s/it]
                                                       
{'loss': 1.1038, 'learning_rate': 0.0, 'epoch': 2.32}

 77%|███████▋  | 1567/2022 [3:24:03<1:00:33,  7.99s/it]
 78%|███████▊  | 1568/2022 [3:24:11<59:51,  7.91s/it]  
                                                     
{'loss': 1.1788, 'learning_rate': 0.0, 'epoch': 2.32}

 78%|███████▊  | 1568/2022 [3:24:11<59:51,  7.91s/it]
 78%|███████▊  | 1569/2022 [3:24:19<59:25,  7.87s/it]
                                                     
{'loss': 1.1702, 'learning_rate': 0.0, 'epoch': 2.33}

 78%|███████▊  | 1569/2022 [3:24:19<59:25,  7.87s/it]
 78%|███████▊  | 1570/2022 [3:24:26<59:24,  7.89s/it]
                                                     
{'loss': 1.3207, 'learning_rate': 0.0, 'epoch': 2.33}

 78%|███████▊  | 1570/2022 [3:24:26<59:24,  7.89s/it]
 78%|███████▊  | 1571/2022 [3:24:34<59:21,  7.90s/it]
                                                     
{'loss': 1.1797, 'learning_rate': 0.0, 'epoch': 2.33}

 78%|███████▊  | 1571/2022 [3:24:34<59:21,  7.90s/it]
 78%|███████▊  | 1572/2022 [3:24:42<58:46,  7.84s/it]
                                                     
{'loss': 1.1269, 'learning_rate': 0.0, 'epoch': 2.33}

 78%|███████▊  | 1572/2022 [3:24:42<58:46,  7.84s/it]
 78%|███████▊  | 1573/2022 [3:24:50<58:02,  7.76s/it]
                                                     
{'loss': 1.1044, 'learning_rate': 0.0, 'epoch': 2.33}

 78%|███████▊  | 1573/2022 [3:24:50<58:02,  7.76s/it]
 78%|███████▊  | 1574/2022 [3:24:57<58:07,  7.78s/it]
                                                     
{'loss': 1.2302, 'learning_rate': 0.0, 'epoch': 2.33}

 78%|███████▊  | 1574/2022 [3:24:57<58:07,  7.78s/it]
 78%|███████▊  | 1575/2022 [3:25:06<58:36,  7.87s/it]
                                                     
{'loss': 1.15, 'learning_rate': 0.0, 'epoch': 2.34}

 78%|███████▊  | 1575/2022 [3:25:06<58:36,  7.87s/it]
 78%|███████▊  | 1576/2022 [3:25:13<58:10,  7.83s/it]
                                                     
{'loss': 1.2272, 'learning_rate': 0.0, 'epoch': 2.34}

 78%|███████▊  | 1576/2022 [3:25:13<58:10,  7.83s/it]
 78%|███████▊  | 1577/2022 [3:25:21<58:48,  7.93s/it]
                                                     
{'loss': 1.2707, 'learning_rate': 0.0, 'epoch': 2.34}

 78%|███████▊  | 1577/2022 [3:25:21<58:48,  7.93s/it]
 78%|███████▊  | 1578/2022 [3:25:29<57:37,  7.79s/it]
                                                     
{'loss': 1.1256, 'learning_rate': 0.0, 'epoch': 2.34}

 78%|███████▊  | 1578/2022 [3:25:29<57:37,  7.79s/it]
 78%|███████▊  | 1579/2022 [3:25:37<57:18,  7.76s/it]
                                                     
{'loss': 1.1407, 'learning_rate': 0.0, 'epoch': 2.34}

 78%|███████▊  | 1579/2022 [3:25:37<57:18,  7.76s/it]
 78%|███████▊  | 1580/2022 [3:25:44<57:17,  7.78s/it]
                                                     
{'loss': 1.1866, 'learning_rate': 0.0, 'epoch': 2.34}

 78%|███████▊  | 1580/2022 [3:25:44<57:17,  7.78s/it]
 78%|███████▊  | 1581/2022 [3:25:52<57:13,  7.78s/it]
                                                     
{'loss': 1.1809, 'learning_rate': 0.0, 'epoch': 2.34}

 78%|███████▊  | 1581/2022 [3:25:52<57:13,  7.78s/it]
 78%|███████▊  | 1582/2022 [3:26:00<56:29,  7.70s/it]
                                                     
{'loss': 1.1085, 'learning_rate': 0.0, 'epoch': 2.35}

 78%|███████▊  | 1582/2022 [3:26:00<56:29,  7.70s/it]
 78%|███████▊  | 1583/2022 [3:26:08<57:27,  7.85s/it]
                                                     
{'loss': 1.2327, 'learning_rate': 0.0, 'epoch': 2.35}

 78%|███████▊  | 1583/2022 [3:26:08<57:27,  7.85s/it]
 78%|███████▊  | 1584/2022 [3:26:15<56:37,  7.76s/it]
                                                     
{'loss': 1.313, 'learning_rate': 0.0, 'epoch': 2.35}

 78%|███████▊  | 1584/2022 [3:26:15<56:37,  7.76s/it]
 78%|███████▊  | 1585/2022 [3:26:23<56:37,  7.77s/it]
                                                     
{'loss': 1.0467, 'learning_rate': 0.0, 'epoch': 2.35}

 78%|███████▊  | 1585/2022 [3:26:23<56:37,  7.77s/it]
 78%|███████▊  | 1586/2022 [3:26:31<57:13,  7.88s/it]
                                                     
{'loss': 1.0779, 'learning_rate': 0.0, 'epoch': 2.35}

 78%|███████▊  | 1586/2022 [3:26:31<57:13,  7.88s/it]
 78%|███████▊  | 1587/2022 [3:26:39<56:26,  7.78s/it]
                                                     
{'loss': 1.0071, 'learning_rate': 0.0, 'epoch': 2.35}

 78%|███████▊  | 1587/2022 [3:26:39<56:26,  7.78s/it]
 79%|███████▊  | 1588/2022 [3:26:47<56:24,  7.80s/it]
                                                     
{'loss': 1.284, 'learning_rate': 0.0, 'epoch': 2.35}

 79%|███████▊  | 1588/2022 [3:26:47<56:24,  7.80s/it]
 79%|███████▊  | 1589/2022 [3:26:54<55:57,  7.75s/it]
                                                     
{'loss': 1.221, 'learning_rate': 0.0, 'epoch': 2.36}

 79%|███████▊  | 1589/2022 [3:26:54<55:57,  7.75s/it]
 79%|███████▊  | 1590/2022 [3:27:02<55:28,  7.70s/it]
                                                     
{'loss': 0.9847, 'learning_rate': 0.0, 'epoch': 2.36}

 79%|███████▊  | 1590/2022 [3:27:02<55:28,  7.70s/it]
 79%|███████▊  | 1591/2022 [3:27:09<54:38,  7.61s/it]
                                                     
{'loss': 1.1289, 'learning_rate': 0.0, 'epoch': 2.36}

 79%|███████▊  | 1591/2022 [3:27:09<54:38,  7.61s/it]
 79%|███████▊  | 1592/2022 [3:27:17<54:44,  7.64s/it]
                                                     
{'loss': 1.1532, 'learning_rate': 0.0, 'epoch': 2.36}

 79%|███████▊  | 1592/2022 [3:27:17<54:44,  7.64s/it]
 79%|███████▉  | 1593/2022 [3:27:25<55:26,  7.75s/it]
                                                     
{'loss': 1.1633, 'learning_rate': 0.0, 'epoch': 2.36}

 79%|███████▉  | 1593/2022 [3:27:25<55:26,  7.75s/it]
 79%|███████▉  | 1594/2022 [3:27:33<55:32,  7.79s/it]
                                                     
{'loss': 1.3217, 'learning_rate': 0.0, 'epoch': 2.36}

 79%|███████▉  | 1594/2022 [3:27:33<55:32,  7.79s/it]
 79%|███████▉  | 1595/2022 [3:27:40<54:39,  7.68s/it]
                                                     
{'loss': 1.1582, 'learning_rate': 0.0, 'epoch': 2.36}

 79%|███████▉  | 1595/2022 [3:27:40<54:39,  7.68s/it]
 79%|███████▉  | 1596/2022 [3:27:48<55:15,  7.78s/it]
                                                     
{'loss': 0.9679, 'learning_rate': 0.0, 'epoch': 2.37}

 79%|███████▉  | 1596/2022 [3:27:48<55:15,  7.78s/it]
 79%|███████▉  | 1597/2022 [3:27:56<54:49,  7.74s/it]
                                                     
{'loss': 1.0584, 'learning_rate': 0.0, 'epoch': 2.37}

 79%|███████▉  | 1597/2022 [3:27:56<54:49,  7.74s/it]
 79%|███████▉  | 1598/2022 [3:28:04<55:47,  7.89s/it]
                                                     
{'loss': 1.1946, 'learning_rate': 0.0, 'epoch': 2.37}

 79%|███████▉  | 1598/2022 [3:28:04<55:47,  7.89s/it]
 79%|███████▉  | 1599/2022 [3:28:12<55:46,  7.91s/it]
                                                     
{'loss': 1.1611, 'learning_rate': 0.0, 'epoch': 2.37}

 79%|███████▉  | 1599/2022 [3:28:12<55:46,  7.91s/it]
 79%|███████▉  | 1600/2022 [3:28:20<55:46,  7.93s/it]
                                                     
{'loss': 1.25, 'learning_rate': 0.0, 'epoch': 2.37}

 79%|███████▉  | 1600/2022 [3:28:20<55:46,  7.93s/it]
 79%|███████▉  | 1601/2022 [3:28:28<55:23,  7.89s/it]
                                                     
{'loss': 1.0188, 'learning_rate': 0.0, 'epoch': 2.37}

 79%|███████▉  | 1601/2022 [3:28:28<55:23,  7.89s/it]
 79%|███████▉  | 1602/2022 [3:28:36<54:30,  7.79s/it]
                                                     
{'loss': 1.1497, 'learning_rate': 0.0, 'epoch': 2.38}

 79%|███████▉  | 1602/2022 [3:28:36<54:30,  7.79s/it]
 79%|███████▉  | 1603/2022 [3:28:43<54:08,  7.75s/it]
                                                     
{'loss': 1.1232, 'learning_rate': 0.0, 'epoch': 2.38}

 79%|███████▉  | 1603/2022 [3:28:43<54:08,  7.75s/it]
 79%|███████▉  | 1604/2022 [3:28:52<55:01,  7.90s/it]
                                                     
{'loss': 1.0269, 'learning_rate': 0.0, 'epoch': 2.38}

 79%|███████▉  | 1604/2022 [3:28:52<55:01,  7.90s/it]
 79%|███████▉  | 1605/2022 [3:28:59<54:40,  7.87s/it]
                                                     
{'loss': 1.2684, 'learning_rate': 0.0, 'epoch': 2.38}

 79%|███████▉  | 1605/2022 [3:28:59<54:40,  7.87s/it]
 79%|███████▉  | 1606/2022 [3:29:07<54:23,  7.84s/it]
                                                     
{'loss': 1.105, 'learning_rate': 0.0, 'epoch': 2.38}

 79%|███████▉  | 1606/2022 [3:29:07<54:23,  7.84s/it]
 79%|███████▉  | 1607/2022 [3:29:15<53:59,  7.80s/it]
                                                     
{'loss': 1.1733, 'learning_rate': 0.0, 'epoch': 2.38}

 79%|███████▉  | 1607/2022 [3:29:15<53:59,  7.80s/it]
 80%|███████▉  | 1608/2022 [3:29:22<53:23,  7.74s/it]
                                                     
{'loss': 1.2332, 'learning_rate': 0.0, 'epoch': 2.38}

 80%|███████▉  | 1608/2022 [3:29:22<53:23,  7.74s/it]
 80%|███████▉  | 1609/2022 [3:29:30<53:19,  7.75s/it]
                                                     
{'loss': 1.2562, 'learning_rate': 0.0, 'epoch': 2.39}

 80%|███████▉  | 1609/2022 [3:29:30<53:19,  7.75s/it]
 80%|███████▉  | 1610/2022 [3:29:38<52:25,  7.63s/it]
                                                     
{'loss': 1.0868, 'learning_rate': 0.0, 'epoch': 2.39}

 80%|███████▉  | 1610/2022 [3:29:38<52:25,  7.63s/it]
 80%|███████▉  | 1611/2022 [3:29:45<52:42,  7.69s/it]
                                                     
{'loss': 1.1951, 'learning_rate': 0.0, 'epoch': 2.39}

 80%|███████▉  | 1611/2022 [3:29:45<52:42,  7.69s/it]
 80%|███████▉  | 1612/2022 [3:29:53<52:08,  7.63s/it]
                                                     
{'loss': 1.2337, 'learning_rate': 0.0, 'epoch': 2.39}

 80%|███████▉  | 1612/2022 [3:29:53<52:08,  7.63s/it]
 80%|███████▉  | 1613/2022 [3:30:01<52:06,  7.64s/it]
                                                     
{'loss': 1.2569, 'learning_rate': 0.0, 'epoch': 2.39}

 80%|███████▉  | 1613/2022 [3:30:01<52:06,  7.64s/it]
 80%|███████▉  | 1614/2022 [3:30:08<52:11,  7.67s/it]
                                                     
{'loss': 1.31, 'learning_rate': 0.0, 'epoch': 2.39}

 80%|███████▉  | 1614/2022 [3:30:08<52:11,  7.67s/it]
 80%|███████▉  | 1615/2022 [3:30:16<51:54,  7.65s/it]
                                                     
{'loss': 1.173, 'learning_rate': 0.0, 'epoch': 2.39}

 80%|███████▉  | 1615/2022 [3:30:16<51:54,  7.65s/it]
 80%|███████▉  | 1616/2022 [3:30:24<51:59,  7.68s/it]
                                                     
{'loss': 1.0296, 'learning_rate': 0.0, 'epoch': 2.4}

 80%|███████▉  | 1616/2022 [3:30:24<51:59,  7.68s/it]
 80%|███████▉  | 1617/2022 [3:30:32<53:48,  7.97s/it]
                                                     
{'loss': 1.2565, 'learning_rate': 0.0, 'epoch': 2.4}

 80%|███████▉  | 1617/2022 [3:30:32<53:48,  7.97s/it]
 80%|████████  | 1618/2022 [3:30:40<53:25,  7.93s/it]
                                                     
{'loss': 1.012, 'learning_rate': 0.0, 'epoch': 2.4}

 80%|████████  | 1618/2022 [3:30:40<53:25,  7.93s/it]
 80%|████████  | 1619/2022 [3:30:48<52:26,  7.81s/it]
                                                     
{'loss': 1.2497, 'learning_rate': 0.0, 'epoch': 2.4}

 80%|████████  | 1619/2022 [3:30:48<52:26,  7.81s/it]
 80%|████████  | 1620/2022 [3:30:56<52:48,  7.88s/it]
                                                     
{'loss': 1.1269, 'learning_rate': 0.0, 'epoch': 2.4}

 80%|████████  | 1620/2022 [3:30:56<52:48,  7.88s/it]
 80%|████████  | 1621/2022 [3:31:04<52:48,  7.90s/it]
                                                     
{'loss': 1.0846, 'learning_rate': 0.0, 'epoch': 2.4}

 80%|████████  | 1621/2022 [3:31:04<52:48,  7.90s/it]
 80%|████████  | 1622/2022 [3:31:11<52:28,  7.87s/it]
                                                     
{'loss': 1.3402, 'learning_rate': 0.0, 'epoch': 2.4}

 80%|████████  | 1622/2022 [3:31:11<52:28,  7.87s/it]
 80%|████████  | 1623/2022 [3:31:19<52:21,  7.87s/it]
                                                     
{'loss': 1.218, 'learning_rate': 0.0, 'epoch': 2.41}

 80%|████████  | 1623/2022 [3:31:19<52:21,  7.87s/it]
 80%|████████  | 1624/2022 [3:31:27<52:08,  7.86s/it]
                                                     
{'loss': 1.0929, 'learning_rate': 0.0, 'epoch': 2.41}

 80%|████████  | 1624/2022 [3:31:27<52:08,  7.86s/it]
 80%|████████  | 1625/2022 [3:31:35<51:52,  7.84s/it]
                                                     
{'loss': 1.1677, 'learning_rate': 0.0, 'epoch': 2.41}

 80%|████████  | 1625/2022 [3:31:35<51:52,  7.84s/it]
 80%|████████  | 1626/2022 [3:31:43<51:30,  7.80s/it]
                                                     
{'loss': 1.1262, 'learning_rate': 0.0, 'epoch': 2.41}

 80%|████████  | 1626/2022 [3:31:43<51:30,  7.80s/it]
 80%|████████  | 1627/2022 [3:31:51<51:55,  7.89s/it]
                                                     
{'loss': 1.3239, 'learning_rate': 0.0, 'epoch': 2.41}

 80%|████████  | 1627/2022 [3:31:51<51:55,  7.89s/it]
 81%|████████  | 1628/2022 [3:31:58<51:04,  7.78s/it]
                                                     
{'loss': 1.2037, 'learning_rate': 0.0, 'epoch': 2.41}

 81%|████████  | 1628/2022 [3:31:58<51:04,  7.78s/it]
 81%|████████  | 1629/2022 [3:32:06<51:17,  7.83s/it]
                                                     
{'loss': 1.244, 'learning_rate': 0.0, 'epoch': 2.42}

 81%|████████  | 1629/2022 [3:32:06<51:17,  7.83s/it]
 81%|████████  | 1630/2022 [3:32:14<50:28,  7.73s/it]
                                                     
{'loss': 1.2487, 'learning_rate': 0.0, 'epoch': 2.42}

 81%|████████  | 1630/2022 [3:32:14<50:28,  7.73s/it]
 81%|████████  | 1631/2022 [3:32:22<51:07,  7.85s/it]
                                                     
{'loss': 1.1325, 'learning_rate': 0.0, 'epoch': 2.42}

 81%|████████  | 1631/2022 [3:32:22<51:07,  7.85s/it]
 81%|████████  | 1632/2022 [3:32:29<50:17,  7.74s/it]
                                                     
{'loss': 1.2474, 'learning_rate': 0.0, 'epoch': 2.42}

 81%|████████  | 1632/2022 [3:32:29<50:17,  7.74s/it]
 81%|████████  | 1633/2022 [3:32:37<50:22,  7.77s/it]
                                                     
{'loss': 1.228, 'learning_rate': 0.0, 'epoch': 2.42}

 81%|████████  | 1633/2022 [3:32:37<50:22,  7.77s/it]
 81%|████████  | 1634/2022 [3:32:45<49:53,  7.72s/it]
                                                     
{'loss': 1.0296, 'learning_rate': 0.0, 'epoch': 2.42}

 81%|████████  | 1634/2022 [3:32:45<49:53,  7.72s/it]
 81%|████████  | 1635/2022 [3:32:52<49:20,  7.65s/it]
                                                     
{'loss': 1.19, 'learning_rate': 0.0, 'epoch': 2.42}

 81%|████████  | 1635/2022 [3:32:52<49:20,  7.65s/it]
 81%|████████  | 1636/2022 [3:33:00<49:17,  7.66s/it]
                                                     
{'loss': 1.2148, 'learning_rate': 0.0, 'epoch': 2.43}

 81%|████████  | 1636/2022 [3:33:00<49:17,  7.66s/it]
 81%|████████  | 1637/2022 [3:33:08<50:01,  7.79s/it]
                                                     
{'loss': 1.0551, 'learning_rate': 0.0, 'epoch': 2.43}

 81%|████████  | 1637/2022 [3:33:08<50:01,  7.79s/it]
 81%|████████  | 1638/2022 [3:33:16<49:46,  7.78s/it]
                                                     
{'loss': 1.0208, 'learning_rate': 0.0, 'epoch': 2.43}

 81%|████████  | 1638/2022 [3:33:16<49:46,  7.78s/it]
 81%|████████  | 1639/2022 [3:33:23<49:31,  7.76s/it]
                                                     
{'loss': 1.0136, 'learning_rate': 0.0, 'epoch': 2.43}

 81%|████████  | 1639/2022 [3:33:24<49:31,  7.76s/it]
 81%|████████  | 1640/2022 [3:33:31<49:04,  7.71s/it]
                                                     
{'loss': 1.0705, 'learning_rate': 0.0, 'epoch': 2.43}

 81%|████████  | 1640/2022 [3:33:31<49:04,  7.71s/it]
 81%|████████  | 1641/2022 [3:33:39<48:54,  7.70s/it]
                                                     
{'loss': 1.0844, 'learning_rate': 0.0, 'epoch': 2.43}

 81%|████████  | 1641/2022 [3:33:39<48:54,  7.70s/it]
 81%|████████  | 1642/2022 [3:33:46<48:35,  7.67s/it]
                                                     
{'loss': 1.2211, 'learning_rate': 0.0, 'epoch': 2.43}

 81%|████████  | 1642/2022 [3:33:46<48:35,  7.67s/it]
 81%|████████▏ | 1643/2022 [3:33:54<48:31,  7.68s/it]
                                                     
{'loss': 1.2082, 'learning_rate': 0.0, 'epoch': 2.44}

 81%|████████▏ | 1643/2022 [3:33:54<48:31,  7.68s/it]
 81%|████████▏ | 1644/2022 [3:34:02<48:39,  7.72s/it]
                                                     
{'loss': 1.1775, 'learning_rate': 0.0, 'epoch': 2.44}

 81%|████████▏ | 1644/2022 [3:34:02<48:39,  7.72s/it]
 81%|████████▏ | 1645/2022 [3:34:10<48:57,  7.79s/it]
                                                     
{'loss': 1.1656, 'learning_rate': 0.0, 'epoch': 2.44}

 81%|████████▏ | 1645/2022 [3:34:10<48:57,  7.79s/it]
 81%|████████▏ | 1646/2022 [3:34:18<49:19,  7.87s/it]
                                                     
{'loss': 1.0833, 'learning_rate': 0.0, 'epoch': 2.44}

 81%|████████▏ | 1646/2022 [3:34:18<49:19,  7.87s/it]
 81%|████████▏ | 1647/2022 [3:34:26<49:11,  7.87s/it]
                                                     
{'loss': 1.1443, 'learning_rate': 0.0, 'epoch': 2.44}

 81%|████████▏ | 1647/2022 [3:34:26<49:11,  7.87s/it]
 82%|████████▏ | 1648/2022 [3:34:34<49:26,  7.93s/it]
                                                     
{'loss': 1.1597, 'learning_rate': 0.0, 'epoch': 2.44}

 82%|████████▏ | 1648/2022 [3:34:34<49:26,  7.93s/it]
 82%|████████▏ | 1649/2022 [3:34:42<49:01,  7.89s/it]
                                                     
{'loss': 1.0834, 'learning_rate': 0.0, 'epoch': 2.44}

 82%|████████▏ | 1649/2022 [3:34:42<49:01,  7.89s/it]
 82%|████████▏ | 1650/2022 [3:34:49<48:49,  7.88s/it]
                                                     
{'loss': 1.1512, 'learning_rate': 0.0, 'epoch': 2.45}

 82%|████████▏ | 1650/2022 [3:34:49<48:49,  7.88s/it]
 82%|████████▏ | 1651/2022 [3:34:57<48:56,  7.92s/it]
                                                     
{'loss': 1.1369, 'learning_rate': 0.0, 'epoch': 2.45}

 82%|████████▏ | 1651/2022 [3:34:57<48:56,  7.92s/it]
 82%|████████▏ | 1652/2022 [3:35:06<49:09,  7.97s/it]
                                                     
{'loss': 1.2116, 'learning_rate': 0.0, 'epoch': 2.45}

 82%|████████▏ | 1652/2022 [3:35:06<49:09,  7.97s/it]
 82%|████████▏ | 1653/2022 [3:35:13<48:14,  7.84s/it]
                                                     
{'loss': 1.18, 'learning_rate': 0.0, 'epoch': 2.45}

 82%|████████▏ | 1653/2022 [3:35:13<48:14,  7.84s/it]
 82%|████████▏ | 1654/2022 [3:35:21<48:12,  7.86s/it]
                                                     
{'loss': 1.088, 'learning_rate': 0.0, 'epoch': 2.45}

 82%|████████▏ | 1654/2022 [3:35:21<48:12,  7.86s/it]
 82%|████████▏ | 1655/2022 [3:35:29<48:21,  7.91s/it]
                                                     
{'loss': 1.1169, 'learning_rate': 0.0, 'epoch': 2.45}

 82%|████████▏ | 1655/2022 [3:35:29<48:21,  7.91s/it]
 82%|████████▏ | 1656/2022 [3:35:37<47:49,  7.84s/it]
                                                     
{'loss': 1.0311, 'learning_rate': 0.0, 'epoch': 2.46}

 82%|████████▏ | 1656/2022 [3:35:37<47:49,  7.84s/it]
 82%|████████▏ | 1657/2022 [3:35:44<46:51,  7.70s/it]
                                                     
{'loss': 1.1987, 'learning_rate': 0.0, 'epoch': 2.46}

 82%|████████▏ | 1657/2022 [3:35:44<46:51,  7.70s/it]
 82%|████████▏ | 1658/2022 [3:35:52<47:06,  7.77s/it]
                                                     
{'loss': 1.1079, 'learning_rate': 0.0, 'epoch': 2.46}

 82%|████████▏ | 1658/2022 [3:35:52<47:06,  7.77s/it]
 82%|████████▏ | 1659/2022 [3:36:00<47:46,  7.90s/it]
                                                     
{'loss': 1.2284, 'learning_rate': 0.0, 'epoch': 2.46}

 82%|████████▏ | 1659/2022 [3:36:00<47:46,  7.90s/it]
 82%|████████▏ | 1660/2022 [3:36:08<47:06,  7.81s/it]
                                                     
{'loss': 1.2993, 'learning_rate': 0.0, 'epoch': 2.46}

 82%|████████▏ | 1660/2022 [3:36:08<47:06,  7.81s/it]
 82%|████████▏ | 1661/2022 [3:36:16<47:26,  7.89s/it]
                                                     
{'loss': 1.1823, 'learning_rate': 0.0, 'epoch': 2.46}

 82%|████████▏ | 1661/2022 [3:36:16<47:26,  7.89s/it]
 82%|████████▏ | 1662/2022 [3:36:23<46:47,  7.80s/it]
                                                     
{'loss': 1.1819, 'learning_rate': 0.0, 'epoch': 2.46}

 82%|████████▏ | 1662/2022 [3:36:24<46:47,  7.80s/it]
 82%|████████▏ | 1663/2022 [3:36:32<47:07,  7.88s/it]
                                                     
{'loss': 1.126, 'learning_rate': 0.0, 'epoch': 2.47}

 82%|████████▏ | 1663/2022 [3:36:32<47:07,  7.88s/it]
 82%|████████▏ | 1664/2022 [3:36:39<46:45,  7.84s/it]
                                                     
{'loss': 1.1852, 'learning_rate': 0.0, 'epoch': 2.47}

 82%|████████▏ | 1664/2022 [3:36:39<46:45,  7.84s/it]
 82%|████████▏ | 1665/2022 [3:36:47<46:39,  7.84s/it]
                                                     
{'loss': 1.0967, 'learning_rate': 0.0, 'epoch': 2.47}

 82%|████████▏ | 1665/2022 [3:36:47<46:39,  7.84s/it]
 82%|████████▏ | 1666/2022 [3:36:55<46:07,  7.77s/it]
                                                     
{'loss': 1.2157, 'learning_rate': 0.0, 'epoch': 2.47}

 82%|████████▏ | 1666/2022 [3:36:55<46:07,  7.77s/it]
 82%|████████▏ | 1667/2022 [3:37:02<45:17,  7.66s/it]
                                                     
{'loss': 1.2274, 'learning_rate': 0.0, 'epoch': 2.47}

 82%|████████▏ | 1667/2022 [3:37:02<45:17,  7.66s/it]
 82%|████████▏ | 1668/2022 [3:37:10<45:11,  7.66s/it]
                                                     
{'loss': 1.1327, 'learning_rate': 0.0, 'epoch': 2.47}

 82%|████████▏ | 1668/2022 [3:37:10<45:11,  7.66s/it]
 83%|████████▎ | 1669/2022 [3:37:18<45:50,  7.79s/it]
                                                     
{'loss': 1.1969, 'learning_rate': 0.0, 'epoch': 2.47}

 83%|████████▎ | 1669/2022 [3:37:18<45:50,  7.79s/it]
 83%|████████▎ | 1670/2022 [3:37:26<45:40,  7.79s/it]
                                                     
{'loss': 1.1397, 'learning_rate': 0.0, 'epoch': 2.48}

 83%|████████▎ | 1670/2022 [3:37:26<45:40,  7.79s/it]
 83%|████████▎ | 1671/2022 [3:37:34<45:44,  7.82s/it]
                                                     
{'loss': 1.1031, 'learning_rate': 0.0, 'epoch': 2.48}

 83%|████████▎ | 1671/2022 [3:37:34<45:44,  7.82s/it]
 83%|████████▎ | 1672/2022 [3:37:41<45:43,  7.84s/it]
                                                     
{'loss': 1.0207, 'learning_rate': 0.0, 'epoch': 2.48}

 83%|████████▎ | 1672/2022 [3:37:41<45:43,  7.84s/it]
 83%|████████▎ | 1673/2022 [3:37:49<45:03,  7.75s/it]
                                                     
{'loss': 1.2202, 'learning_rate': 0.0, 'epoch': 2.48}

 83%|████████▎ | 1673/2022 [3:37:49<45:03,  7.75s/it]
 83%|████████▎ | 1674/2022 [3:37:57<44:35,  7.69s/it]
                                                     
{'loss': 1.1477, 'learning_rate': 0.0, 'epoch': 2.48}

 83%|████████▎ | 1674/2022 [3:37:57<44:35,  7.69s/it]
 83%|████████▎ | 1675/2022 [3:38:04<44:46,  7.74s/it]
                                                     
{'loss': 1.1426, 'learning_rate': 0.0, 'epoch': 2.48}

 83%|████████▎ | 1675/2022 [3:38:04<44:46,  7.74s/it]
 83%|████████▎ | 1676/2022 [3:38:12<44:46,  7.77s/it]
                                                     
{'loss': 1.1655, 'learning_rate': 0.0, 'epoch': 2.48}

 83%|████████▎ | 1676/2022 [3:38:12<44:46,  7.77s/it]
 83%|████████▎ | 1677/2022 [3:38:20<44:12,  7.69s/it]
                                                     
{'loss': 1.1334, 'learning_rate': 0.0, 'epoch': 2.49}

 83%|████████▎ | 1677/2022 [3:38:20<44:12,  7.69s/it]
 83%|████████▎ | 1678/2022 [3:38:28<44:18,  7.73s/it]
                                                     
{'loss': 1.2194, 'learning_rate': 0.0, 'epoch': 2.49}

 83%|████████▎ | 1678/2022 [3:38:28<44:18,  7.73s/it]
 83%|████████▎ | 1679/2022 [3:38:35<44:13,  7.74s/it]
                                                     
{'loss': 1.1473, 'learning_rate': 0.0, 'epoch': 2.49}

 83%|████████▎ | 1679/2022 [3:38:35<44:13,  7.74s/it]
 83%|████████▎ | 1680/2022 [3:38:43<44:32,  7.81s/it]
                                                     
{'loss': 1.1979, 'learning_rate': 0.0, 'epoch': 2.49}

 83%|████████▎ | 1680/2022 [3:38:43<44:32,  7.81s/it]
 83%|████████▎ | 1681/2022 [3:38:51<44:59,  7.92s/it]
                                                     
{'loss': 1.0682, 'learning_rate': 0.0, 'epoch': 2.49}

 83%|████████▎ | 1681/2022 [3:38:52<44:59,  7.92s/it]
 83%|████████▎ | 1682/2022 [3:39:00<45:17,  7.99s/it]
                                                     
{'loss': 1.2153, 'learning_rate': 0.0, 'epoch': 2.49}

 83%|████████▎ | 1682/2022 [3:39:00<45:17,  7.99s/it]
 83%|████████▎ | 1683/2022 [3:39:07<44:32,  7.88s/it]
                                                     
{'loss': 1.1046, 'learning_rate': 0.0, 'epoch': 2.5}

 83%|████████▎ | 1683/2022 [3:39:07<44:32,  7.88s/it]
 83%|████████▎ | 1684/2022 [3:39:15<44:34,  7.91s/it]
                                                     
{'loss': 1.204, 'learning_rate': 0.0, 'epoch': 2.5}

 83%|████████▎ | 1684/2022 [3:39:15<44:34,  7.91s/it]
 83%|████████▎ | 1685/2022 [3:39:23<43:57,  7.83s/it]
                                                     
{'loss': 1.2706, 'learning_rate': 0.0, 'epoch': 2.5}

 83%|████████▎ | 1685/2022 [3:39:23<43:57,  7.83s/it]
 83%|████████▎ | 1686/2022 [3:39:30<43:25,  7.75s/it]
                                                     
{'loss': 1.1416, 'learning_rate': 0.0, 'epoch': 2.5}

 83%|████████▎ | 1686/2022 [3:39:30<43:25,  7.75s/it]
 83%|████████▎ | 1687/2022 [3:39:38<43:30,  7.79s/it]
                                                     
{'loss': 1.1553, 'learning_rate': 0.0, 'epoch': 2.5}

 83%|████████▎ | 1687/2022 [3:39:38<43:30,  7.79s/it]
 83%|████████▎ | 1688/2022 [3:39:46<43:32,  7.82s/it]
                                                     
{'loss': 1.3024, 'learning_rate': 0.0, 'epoch': 2.5}

 83%|████████▎ | 1688/2022 [3:39:46<43:32,  7.82s/it]
 84%|████████▎ | 1689/2022 [3:39:54<42:49,  7.71s/it]
                                                     
{'loss': 1.1742, 'learning_rate': 0.0, 'epoch': 2.5}

 84%|████████▎ | 1689/2022 [3:39:54<42:49,  7.71s/it]
 84%|████████▎ | 1690/2022 [3:40:01<42:25,  7.67s/it]
                                                     
{'loss': 1.2441, 'learning_rate': 0.0, 'epoch': 2.51}

 84%|████████▎ | 1690/2022 [3:40:01<42:25,  7.67s/it]
 84%|████████▎ | 1691/2022 [3:40:09<42:42,  7.74s/it]
                                                     
{'loss': 1.1063, 'learning_rate': 0.0, 'epoch': 2.51}

 84%|████████▎ | 1691/2022 [3:40:09<42:42,  7.74s/it]
 84%|████████▎ | 1692/2022 [3:40:17<42:47,  7.78s/it]
                                                     
{'loss': 1.2363, 'learning_rate': 0.0, 'epoch': 2.51}

 84%|████████▎ | 1692/2022 [3:40:17<42:47,  7.78s/it]
 84%|████████▎ | 1693/2022 [3:40:25<43:18,  7.90s/it]
                                                     
{'loss': 1.1404, 'learning_rate': 0.0, 'epoch': 2.51}

 84%|████████▎ | 1693/2022 [3:40:25<43:18,  7.90s/it]
 84%|████████▍ | 1694/2022 [3:40:33<42:52,  7.84s/it]
                                                     
{'loss': 1.2232, 'learning_rate': 0.0, 'epoch': 2.51}

 84%|████████▍ | 1694/2022 [3:40:33<42:52,  7.84s/it]
 84%|████████▍ | 1695/2022 [3:40:41<42:25,  7.79s/it]
                                                     
{'loss': 1.1892, 'learning_rate': 0.0, 'epoch': 2.51}

 84%|████████▍ | 1695/2022 [3:40:41<42:25,  7.79s/it]
 84%|████████▍ | 1696/2022 [3:40:48<41:58,  7.73s/it]
                                                     
{'loss': 1.1058, 'learning_rate': 0.0, 'epoch': 2.51}

 84%|████████▍ | 1696/2022 [3:40:48<41:58,  7.73s/it]
 84%|████████▍ | 1697/2022 [3:40:56<41:47,  7.71s/it]
                                                     
{'loss': 1.1476, 'learning_rate': 0.0, 'epoch': 2.52}

 84%|████████▍ | 1697/2022 [3:40:56<41:47,  7.71s/it]
 84%|████████▍ | 1698/2022 [3:41:04<42:22,  7.85s/it]
                                                     
{'loss': 1.1691, 'learning_rate': 0.0, 'epoch': 2.52}

 84%|████████▍ | 1698/2022 [3:41:04<42:22,  7.85s/it]
 84%|████████▍ | 1699/2022 [3:41:12<42:07,  7.82s/it]
                                                     
{'loss': 1.1252, 'learning_rate': 0.0, 'epoch': 2.52}

 84%|████████▍ | 1699/2022 [3:41:12<42:07,  7.82s/it]
 84%|████████▍ | 1700/2022 [3:41:20<41:54,  7.81s/it]
                                                     
{'loss': 1.1047, 'learning_rate': 0.0, 'epoch': 2.52}

 84%|████████▍ | 1700/2022 [3:41:20<41:54,  7.81s/it]
 84%|████████▍ | 1701/2022 [3:41:27<41:37,  7.78s/it]
                                                     
{'loss': 1.0807, 'learning_rate': 0.0, 'epoch': 2.52}

 84%|████████▍ | 1701/2022 [3:41:27<41:37,  7.78s/it]
 84%|████████▍ | 1702/2022 [3:41:35<40:57,  7.68s/it]
                                                     
{'loss': 1.2368, 'learning_rate': 0.0, 'epoch': 2.52}

 84%|████████▍ | 1702/2022 [3:41:35<40:57,  7.68s/it]
 84%|████████▍ | 1703/2022 [3:41:42<40:50,  7.68s/it]
                                                     
{'loss': 1.1662, 'learning_rate': 0.0, 'epoch': 2.52}

 84%|████████▍ | 1703/2022 [3:41:42<40:50,  7.68s/it]
 84%|████████▍ | 1704/2022 [3:41:50<41:15,  7.78s/it]
                                                     
{'loss': 1.1803, 'learning_rate': 0.0, 'epoch': 2.53}

 84%|████████▍ | 1704/2022 [3:41:50<41:15,  7.78s/it]
 84%|████████▍ | 1705/2022 [3:41:58<40:45,  7.71s/it]
                                                     
{'loss': 1.195, 'learning_rate': 0.0, 'epoch': 2.53}

 84%|████████▍ | 1705/2022 [3:41:58<40:45,  7.71s/it]
 84%|████████▍ | 1706/2022 [3:42:06<40:46,  7.74s/it]
                                                     
{'loss': 1.2091, 'learning_rate': 0.0, 'epoch': 2.53}

 84%|████████▍ | 1706/2022 [3:42:06<40:46,  7.74s/it]
 84%|████████▍ | 1707/2022 [3:42:14<41:26,  7.89s/it]
                                                     
{'loss': 1.1793, 'learning_rate': 0.0, 'epoch': 2.53}

 84%|████████▍ | 1707/2022 [3:42:14<41:26,  7.89s/it]
 84%|████████▍ | 1708/2022 [3:42:21<40:35,  7.76s/it]
                                                     
{'loss': 1.1341, 'learning_rate': 0.0, 'epoch': 2.53}

 84%|████████▍ | 1708/2022 [3:42:21<40:35,  7.76s/it]
 85%|████████▍ | 1709/2022 [3:42:29<40:39,  7.79s/it]
                                                     
{'loss': 1.0937, 'learning_rate': 0.0, 'epoch': 2.53}

 85%|████████▍ | 1709/2022 [3:42:29<40:39,  7.79s/it]
 85%|████████▍ | 1710/2022 [3:42:37<40:34,  7.80s/it]
                                                     
{'loss': 1.2955, 'learning_rate': 0.0, 'epoch': 2.54}

 85%|████████▍ | 1710/2022 [3:42:37<40:34,  7.80s/it]
 85%|████████▍ | 1711/2022 [3:42:45<40:11,  7.75s/it]
                                                     
{'loss': 1.1226, 'learning_rate': 0.0, 'epoch': 2.54}

 85%|████████▍ | 1711/2022 [3:42:45<40:11,  7.75s/it]
 85%|████████▍ | 1712/2022 [3:42:52<39:43,  7.69s/it]
                                                     
{'loss': 1.1037, 'learning_rate': 0.0, 'epoch': 2.54}

 85%|████████▍ | 1712/2022 [3:42:52<39:43,  7.69s/it]
 85%|████████▍ | 1713/2022 [3:43:00<39:42,  7.71s/it]
                                                     
{'loss': 1.1671, 'learning_rate': 0.0, 'epoch': 2.54}

 85%|████████▍ | 1713/2022 [3:43:00<39:42,  7.71s/it]
 85%|████████▍ | 1714/2022 [3:43:08<39:51,  7.76s/it]
                                                     
{'loss': 1.2561, 'learning_rate': 0.0, 'epoch': 2.54}

 85%|████████▍ | 1714/2022 [3:43:08<39:51,  7.76s/it]
 85%|████████▍ | 1715/2022 [3:43:16<39:45,  7.77s/it]
                                                     
{'loss': 1.197, 'learning_rate': 0.0, 'epoch': 2.54}

 85%|████████▍ | 1715/2022 [3:43:16<39:45,  7.77s/it]
 85%|████████▍ | 1716/2022 [3:43:24<39:44,  7.79s/it]
                                                     
{'loss': 1.172, 'learning_rate': 0.0, 'epoch': 2.54}

 85%|████████▍ | 1716/2022 [3:43:24<39:44,  7.79s/it]
 85%|████████▍ | 1717/2022 [3:43:32<40:31,  7.97s/it]
                                                     
{'loss': 1.0458, 'learning_rate': 0.0, 'epoch': 2.55}

 85%|████████▍ | 1717/2022 [3:43:32<40:31,  7.97s/it]
 85%|████████▍ | 1718/2022 [3:43:40<40:59,  8.09s/it]
                                                     
{'loss': 1.2102, 'learning_rate': 0.0, 'epoch': 2.55}

 85%|████████▍ | 1718/2022 [3:43:40<40:59,  8.09s/it]
 85%|████████▌ | 1719/2022 [3:43:48<40:51,  8.09s/it]
                                                     
{'loss': 1.1481, 'learning_rate': 0.0, 'epoch': 2.55}

 85%|████████▌ | 1719/2022 [3:43:48<40:51,  8.09s/it]
 85%|████████▌ | 1720/2022 [3:43:57<40:42,  8.09s/it]
                                                     
{'loss': 1.1061, 'learning_rate': 0.0, 'epoch': 2.55}

 85%|████████▌ | 1720/2022 [3:43:57<40:42,  8.09s/it]
 85%|████████▌ | 1721/2022 [3:44:04<39:51,  7.94s/it]
                                                     
{'loss': 1.2738, 'learning_rate': 0.0, 'epoch': 2.55}

 85%|████████▌ | 1721/2022 [3:44:04<39:51,  7.94s/it]
 85%|████████▌ | 1722/2022 [3:44:12<39:30,  7.90s/it]
                                                     
{'loss': 1.125, 'learning_rate': 0.0, 'epoch': 2.55}

 85%|████████▌ | 1722/2022 [3:44:12<39:30,  7.90s/it]
 85%|████████▌ | 1723/2022 [3:44:20<38:58,  7.82s/it]
                                                     
{'loss': 1.2222, 'learning_rate': 0.0, 'epoch': 2.55}

 85%|████████▌ | 1723/2022 [3:44:20<38:58,  7.82s/it]
 85%|████████▌ | 1724/2022 [3:44:27<38:41,  7.79s/it]
                                                     
{'loss': 1.1385, 'learning_rate': 0.0, 'epoch': 2.56}

 85%|████████▌ | 1724/2022 [3:44:27<38:41,  7.79s/it]
 85%|████████▌ | 1725/2022 [3:44:35<38:39,  7.81s/it]
                                                     
{'loss': 1.2024, 'learning_rate': 0.0, 'epoch': 2.56}

 85%|████████▌ | 1725/2022 [3:44:35<38:39,  7.81s/it]
 85%|████████▌ | 1726/2022 [3:44:43<38:09,  7.73s/it]
                                                     
{'loss': 1.0866, 'learning_rate': 0.0, 'epoch': 2.56}

 85%|████████▌ | 1726/2022 [3:44:43<38:09,  7.73s/it]
 85%|████████▌ | 1727/2022 [3:44:50<37:54,  7.71s/it]
                                                     
{'loss': 1.3453, 'learning_rate': 0.0, 'epoch': 2.56}

 85%|████████▌ | 1727/2022 [3:44:50<37:54,  7.71s/it]
 85%|████████▌ | 1728/2022 [3:44:59<38:22,  7.83s/it]
                                                     
{'loss': 1.1512, 'learning_rate': 0.0, 'epoch': 2.56}

 85%|████████▌ | 1728/2022 [3:44:59<38:22,  7.83s/it]
 86%|████████▌ | 1729/2022 [3:45:06<37:46,  7.73s/it]
                                                     
{'loss': 1.1121, 'learning_rate': 0.0, 'epoch': 2.56}

 86%|████████▌ | 1729/2022 [3:45:06<37:46,  7.73s/it]
 86%|████████▌ | 1730/2022 [3:45:14<37:39,  7.74s/it]
                                                     
{'loss': 1.0784, 'learning_rate': 0.0, 'epoch': 2.56}

 86%|████████▌ | 1730/2022 [3:45:14<37:39,  7.74s/it]
 86%|████████▌ | 1731/2022 [3:45:21<37:28,  7.73s/it]
                                                     
{'loss': 1.1768, 'learning_rate': 0.0, 'epoch': 2.57}

 86%|████████▌ | 1731/2022 [3:45:21<37:28,  7.73s/it]
 86%|████████▌ | 1732/2022 [3:45:30<37:57,  7.85s/it]
                                                     
{'loss': 1.0517, 'learning_rate': 0.0, 'epoch': 2.57}

 86%|████████▌ | 1732/2022 [3:45:30<37:57,  7.85s/it]
 86%|████████▌ | 1733/2022 [3:45:37<37:13,  7.73s/it]
                                                     
{'loss': 1.1962, 'learning_rate': 0.0, 'epoch': 2.57}

 86%|████████▌ | 1733/2022 [3:45:37<37:13,  7.73s/it]
 86%|████████▌ | 1734/2022 [3:45:45<37:08,  7.74s/it]
                                                     
{'loss': 1.1512, 'learning_rate': 0.0, 'epoch': 2.57}

 86%|████████▌ | 1734/2022 [3:45:45<37:08,  7.74s/it]
 86%|████████▌ | 1735/2022 [3:45:53<37:31,  7.84s/it]
                                                     
{'loss': 1.1056, 'learning_rate': 0.0, 'epoch': 2.57}

 86%|████████▌ | 1735/2022 [3:45:53<37:31,  7.84s/it]
 86%|████████▌ | 1736/2022 [3:46:01<37:04,  7.78s/it]
                                                     
{'loss': 1.1487, 'learning_rate': 0.0, 'epoch': 2.57}

 86%|████████▌ | 1736/2022 [3:46:01<37:04,  7.78s/it]
 86%|████████▌ | 1737/2022 [3:46:08<36:51,  7.76s/it]
                                                     
{'loss': 1.0794, 'learning_rate': 0.0, 'epoch': 2.58}

 86%|████████▌ | 1737/2022 [3:46:08<36:51,  7.76s/it]
 86%|████████▌ | 1738/2022 [3:46:16<36:51,  7.79s/it]
                                                     
{'loss': 1.1611, 'learning_rate': 0.0, 'epoch': 2.58}

 86%|████████▌ | 1738/2022 [3:46:16<36:51,  7.79s/it]
 86%|████████▌ | 1739/2022 [3:46:24<37:05,  7.86s/it]
                                                     
{'loss': 1.1188, 'learning_rate': 0.0, 'epoch': 2.58}

 86%|████████▌ | 1739/2022 [3:46:24<37:05,  7.86s/it]
 86%|████████▌ | 1740/2022 [3:46:32<36:46,  7.83s/it]
                                                     
{'loss': 1.0901, 'learning_rate': 0.0, 'epoch': 2.58}

 86%|████████▌ | 1740/2022 [3:46:32<36:46,  7.83s/it]
 86%|████████▌ | 1741/2022 [3:46:40<36:44,  7.84s/it]
                                                     
{'loss': 1.2207, 'learning_rate': 0.0, 'epoch': 2.58}

 86%|████████▌ | 1741/2022 [3:46:40<36:44,  7.84s/it]
 86%|████████▌ | 1742/2022 [3:46:47<36:07,  7.74s/it]
                                                     
{'loss': 1.2292, 'learning_rate': 0.0, 'epoch': 2.58}

 86%|████████▌ | 1742/2022 [3:46:47<36:07,  7.74s/it]
 86%|████████▌ | 1743/2022 [3:46:55<36:19,  7.81s/it]
                                                     
{'loss': 1.1555, 'learning_rate': 0.0, 'epoch': 2.58}

 86%|████████▌ | 1743/2022 [3:46:55<36:19,  7.81s/it]
 86%|████████▋ | 1744/2022 [3:47:03<35:42,  7.71s/it]
                                                     
{'loss': 1.1529, 'learning_rate': 0.0, 'epoch': 2.59}

 86%|████████▋ | 1744/2022 [3:47:03<35:42,  7.71s/it]
 86%|████████▋ | 1745/2022 [3:47:10<35:38,  7.72s/it]
                                                     
{'loss': 1.2111, 'learning_rate': 0.0, 'epoch': 2.59}

 86%|████████▋ | 1745/2022 [3:47:10<35:38,  7.72s/it]
 86%|████████▋ | 1746/2022 [3:47:18<35:47,  7.78s/it]
                                                     
{'loss': 1.1981, 'learning_rate': 0.0, 'epoch': 2.59}

 86%|████████▋ | 1746/2022 [3:47:18<35:47,  7.78s/it]
 86%|████████▋ | 1747/2022 [3:47:26<35:45,  7.80s/it]
                                                     
{'loss': 1.3508, 'learning_rate': 0.0, 'epoch': 2.59}

 86%|████████▋ | 1747/2022 [3:47:26<35:45,  7.80s/it]
 86%|████████▋ | 1748/2022 [3:47:34<35:41,  7.81s/it]
                                                     
{'loss': 1.2461, 'learning_rate': 0.0, 'epoch': 2.59}

 86%|████████▋ | 1748/2022 [3:47:34<35:41,  7.81s/it]
 86%|████████▋ | 1749/2022 [3:47:42<35:17,  7.76s/it]
                                                     
{'loss': 1.2427, 'learning_rate': 0.0, 'epoch': 2.59}

 86%|████████▋ | 1749/2022 [3:47:42<35:17,  7.76s/it]
 87%|████████▋ | 1750/2022 [3:47:50<35:15,  7.78s/it]
                                                     
{'loss': 1.1322, 'learning_rate': 0.0, 'epoch': 2.59}

 87%|████████▋ | 1750/2022 [3:47:50<35:15,  7.78s/it]
 87%|████████▋ | 1751/2022 [3:47:57<34:36,  7.66s/it]
                                                     
{'loss': 1.1301, 'learning_rate': 0.0, 'epoch': 2.6}

 87%|████████▋ | 1751/2022 [3:47:57<34:36,  7.66s/it]
 87%|████████▋ | 1752/2022 [3:48:05<34:46,  7.73s/it]
                                                     
{'loss': 1.1229, 'learning_rate': 0.0, 'epoch': 2.6}

 87%|████████▋ | 1752/2022 [3:48:05<34:46,  7.73s/it]
 87%|████████▋ | 1753/2022 [3:48:13<34:54,  7.79s/it]
                                                     
{'loss': 1.1237, 'learning_rate': 0.0, 'epoch': 2.6}

 87%|████████▋ | 1753/2022 [3:48:13<34:54,  7.79s/it]
 87%|████████▋ | 1754/2022 [3:48:20<34:30,  7.73s/it]
                                                     
{'loss': 1.1392, 'learning_rate': 0.0, 'epoch': 2.6}

 87%|████████▋ | 1754/2022 [3:48:20<34:30,  7.73s/it]
 87%|████████▋ | 1755/2022 [3:48:28<34:47,  7.82s/it]
                                                     
{'loss': 1.2049, 'learning_rate': 0.0, 'epoch': 2.6}

 87%|████████▋ | 1755/2022 [3:48:28<34:47,  7.82s/it]
 87%|████████▋ | 1756/2022 [3:48:36<35:03,  7.91s/it]
                                                     
{'loss': 1.1439, 'learning_rate': 0.0, 'epoch': 2.6}

 87%|████████▋ | 1756/2022 [3:48:36<35:03,  7.91s/it]
 87%|████████▋ | 1757/2022 [3:48:44<34:51,  7.89s/it]
                                                     
{'loss': 1.1307, 'learning_rate': 0.0, 'epoch': 2.6}

 87%|████████▋ | 1757/2022 [3:48:44<34:51,  7.89s/it]
 87%|████████▋ | 1758/2022 [3:48:52<34:19,  7.80s/it]
                                                     
{'loss': 1.2623, 'learning_rate': 0.0, 'epoch': 2.61}

 87%|████████▋ | 1758/2022 [3:48:52<34:19,  7.80s/it]
 87%|████████▋ | 1759/2022 [3:49:00<34:09,  7.79s/it]
                                                     
{'loss': 1.1012, 'learning_rate': 0.0, 'epoch': 2.61}

 87%|████████▋ | 1759/2022 [3:49:00<34:09,  7.79s/it]
 87%|████████▋ | 1760/2022 [3:49:08<34:05,  7.81s/it]
                                                     
{'loss': 1.1375, 'learning_rate': 0.0, 'epoch': 2.61}

 87%|████████▋ | 1760/2022 [3:49:08<34:05,  7.81s/it]
 87%|████████▋ | 1761/2022 [3:49:15<33:41,  7.74s/it]
                                                     
{'loss': 1.2262, 'learning_rate': 0.0, 'epoch': 2.61}

 87%|████████▋ | 1761/2022 [3:49:15<33:41,  7.74s/it]
 87%|████████▋ | 1762/2022 [3:49:23<33:38,  7.76s/it]
                                                     
{'loss': 1.0839, 'learning_rate': 0.0, 'epoch': 2.61}

 87%|████████▋ | 1762/2022 [3:49:23<33:38,  7.76s/it]
 87%|████████▋ | 1763/2022 [3:49:31<33:40,  7.80s/it]
                                                     
{'loss': 1.2271, 'learning_rate': 0.0, 'epoch': 2.61}

 87%|████████▋ | 1763/2022 [3:49:31<33:40,  7.80s/it]
 87%|████████▋ | 1764/2022 [3:49:39<33:48,  7.86s/it]
                                                     
{'loss': 1.2444, 'learning_rate': 0.0, 'epoch': 2.62}

 87%|████████▋ | 1764/2022 [3:49:39<33:48,  7.86s/it]
 87%|████████▋ | 1765/2022 [3:49:46<33:13,  7.76s/it]
                                                     
{'loss': 1.1239, 'learning_rate': 0.0, 'epoch': 2.62}

 87%|████████▋ | 1765/2022 [3:49:46<33:13,  7.76s/it]
 87%|████████▋ | 1766/2022 [3:49:54<32:58,  7.73s/it]
                                                     
{'loss': 1.1127, 'learning_rate': 0.0, 'epoch': 2.62}

 87%|████████▋ | 1766/2022 [3:49:54<32:58,  7.73s/it]
 87%|████████▋ | 1767/2022 [3:50:02<32:45,  7.71s/it]
                                                     
{'loss': 1.157, 'learning_rate': 0.0, 'epoch': 2.62}

 87%|████████▋ | 1767/2022 [3:50:02<32:45,  7.71s/it]
 87%|████████▋ | 1768/2022 [3:50:09<32:33,  7.69s/it]
                                                     
{'loss': 1.0633, 'learning_rate': 0.0, 'epoch': 2.62}

 87%|████████▋ | 1768/2022 [3:50:09<32:33,  7.69s/it]
 87%|████████▋ | 1769/2022 [3:50:18<33:16,  7.89s/it]
                                                     
{'loss': 1.1755, 'learning_rate': 0.0, 'epoch': 2.62}

 87%|████████▋ | 1769/2022 [3:50:18<33:16,  7.89s/it]
 88%|████████▊ | 1770/2022 [3:50:25<32:55,  7.84s/it]
                                                     
{'loss': 1.3146, 'learning_rate': 0.0, 'epoch': 2.62}

 88%|████████▊ | 1770/2022 [3:50:25<32:55,  7.84s/it]
 88%|████████▊ | 1771/2022 [3:50:33<32:46,  7.83s/it]
                                                     
{'loss': 1.2103, 'learning_rate': 0.0, 'epoch': 2.63}

 88%|████████▊ | 1771/2022 [3:50:33<32:46,  7.83s/it]
 88%|████████▊ | 1772/2022 [3:50:41<32:32,  7.81s/it]
                                                     
{'loss': 1.116, 'learning_rate': 0.0, 'epoch': 2.63}

 88%|████████▊ | 1772/2022 [3:50:41<32:32,  7.81s/it]
 88%|████████▊ | 1773/2022 [3:50:49<32:46,  7.90s/it]
                                                     
{'loss': 1.2052, 'learning_rate': 0.0, 'epoch': 2.63}

 88%|████████▊ | 1773/2022 [3:50:49<32:46,  7.90s/it]
 88%|████████▊ | 1774/2022 [3:50:57<32:38,  7.90s/it]
                                                     
{'loss': 1.0266, 'learning_rate': 0.0, 'epoch': 2.63}

 88%|████████▊ | 1774/2022 [3:50:57<32:38,  7.90s/it]
 88%|████████▊ | 1775/2022 [3:51:05<32:35,  7.92s/it]
                                                     
{'loss': 1.1677, 'learning_rate': 0.0, 'epoch': 2.63}

 88%|████████▊ | 1775/2022 [3:51:05<32:35,  7.92s/it]
 88%|████████▊ | 1776/2022 [3:51:12<31:53,  7.78s/it]
                                                     
{'loss': 1.3378, 'learning_rate': 0.0, 'epoch': 2.63}

 88%|████████▊ | 1776/2022 [3:51:12<31:53,  7.78s/it]
 88%|████████▊ | 1777/2022 [3:51:20<31:43,  7.77s/it]
                                                     
{'loss': 1.1731, 'learning_rate': 0.0, 'epoch': 2.63}

 88%|████████▊ | 1777/2022 [3:51:20<31:43,  7.77s/it]
 88%|████████▊ | 1778/2022 [3:51:28<31:32,  7.76s/it]
                                                     
{'loss': 1.2166, 'learning_rate': 0.0, 'epoch': 2.64}

 88%|████████▊ | 1778/2022 [3:51:28<31:32,  7.76s/it]
 88%|████████▊ | 1779/2022 [3:51:36<31:34,  7.79s/it]
                                                     
{'loss': 1.002, 'learning_rate': 0.0, 'epoch': 2.64}

 88%|████████▊ | 1779/2022 [3:51:36<31:34,  7.79s/it]
 88%|████████▊ | 1780/2022 [3:51:44<31:29,  7.81s/it]
                                                     
{'loss': 1.2389, 'learning_rate': 0.0, 'epoch': 2.64}

 88%|████████▊ | 1780/2022 [3:51:44<31:29,  7.81s/it]
 88%|████████▊ | 1781/2022 [3:51:52<32:00,  7.97s/it]
                                                     
{'loss': 1.2316, 'learning_rate': 0.0, 'epoch': 2.64}

 88%|████████▊ | 1781/2022 [3:51:52<32:00,  7.97s/it]
 88%|████████▊ | 1782/2022 [3:51:59<31:18,  7.83s/it]
                                                     
{'loss': 1.1948, 'learning_rate': 0.0, 'epoch': 2.64}

 88%|████████▊ | 1782/2022 [3:51:59<31:18,  7.83s/it]
 88%|████████▊ | 1783/2022 [3:52:07<31:02,  7.79s/it]
                                                     
{'loss': 1.1464, 'learning_rate': 0.0, 'epoch': 2.64}

 88%|████████▊ | 1783/2022 [3:52:07<31:02,  7.79s/it]
 88%|████████▊ | 1784/2022 [3:52:15<30:49,  7.77s/it]
                                                     
{'loss': 1.162, 'learning_rate': 0.0, 'epoch': 2.64}

 88%|████████▊ | 1784/2022 [3:52:15<30:49,  7.77s/it]
 88%|████████▊ | 1785/2022 [3:52:22<30:30,  7.72s/it]
                                                     
{'loss': 1.2896, 'learning_rate': 0.0, 'epoch': 2.65}

 88%|████████▊ | 1785/2022 [3:52:22<30:30,  7.72s/it]
 88%|████████▊ | 1786/2022 [3:52:30<30:10,  7.67s/it]
                                                     
{'loss': 1.2203, 'learning_rate': 0.0, 'epoch': 2.65}

 88%|████████▊ | 1786/2022 [3:52:30<30:10,  7.67s/it]
 88%|████████▊ | 1787/2022 [3:52:38<30:32,  7.80s/it]
                                                     
{'loss': 1.1543, 'learning_rate': 0.0, 'epoch': 2.65}

 88%|████████▊ | 1787/2022 [3:52:38<30:32,  7.80s/it]
 88%|████████▊ | 1788/2022 [3:52:46<30:37,  7.85s/it]
                                                     
{'loss': 1.162, 'learning_rate': 0.0, 'epoch': 2.65}

 88%|████████▊ | 1788/2022 [3:52:46<30:37,  7.85s/it]
 88%|████████▊ | 1789/2022 [3:52:54<30:18,  7.80s/it]
                                                     
{'loss': 1.2697, 'learning_rate': 0.0, 'epoch': 2.65}

 88%|████████▊ | 1789/2022 [3:52:54<30:18,  7.80s/it]
 89%|████████▊ | 1790/2022 [3:53:01<30:02,  7.77s/it]
                                                     
{'loss': 1.2002, 'learning_rate': 0.0, 'epoch': 2.65}

 89%|████████▊ | 1790/2022 [3:53:01<30:02,  7.77s/it]
 89%|████████▊ | 1791/2022 [3:53:09<30:11,  7.84s/it]
                                                     
{'loss': 1.0821, 'learning_rate': 0.0, 'epoch': 2.66}

 89%|████████▊ | 1791/2022 [3:53:09<30:11,  7.84s/it]
 89%|████████▊ | 1792/2022 [3:53:18<30:43,  8.01s/it]
                                                     
{'loss': 1.0845, 'learning_rate': 0.0, 'epoch': 2.66}

 89%|████████▊ | 1792/2022 [3:53:18<30:43,  8.01s/it]
 89%|████████▊ | 1793/2022 [3:53:25<30:01,  7.87s/it]
                                                     
{'loss': 1.2292, 'learning_rate': 0.0, 'epoch': 2.66}

 89%|████████▊ | 1793/2022 [3:53:25<30:01,  7.87s/it]
 89%|████████▊ | 1794/2022 [3:53:33<30:03,  7.91s/it]
                                                     
{'loss': 1.1894, 'learning_rate': 0.0, 'epoch': 2.66}

 89%|████████▊ | 1794/2022 [3:53:33<30:03,  7.91s/it]
 89%|████████▉ | 1795/2022 [3:53:41<29:58,  7.92s/it]
                                                     
{'loss': 1.2011, 'learning_rate': 0.0, 'epoch': 2.66}

 89%|████████▉ | 1795/2022 [3:53:41<29:58,  7.92s/it]
 89%|████████▉ | 1796/2022 [3:53:49<29:48,  7.91s/it]
                                                     
{'loss': 1.2033, 'learning_rate': 0.0, 'epoch': 2.66}

 89%|████████▉ | 1796/2022 [3:53:49<29:48,  7.91s/it]
 89%|████████▉ | 1797/2022 [3:53:57<29:40,  7.91s/it]
                                                     
{'loss': 1.0838, 'learning_rate': 0.0, 'epoch': 2.66}

 89%|████████▉ | 1797/2022 [3:53:57<29:40,  7.91s/it]
 89%|████████▉ | 1798/2022 [3:54:05<29:16,  7.84s/it]
                                                     
{'loss': 1.2209, 'learning_rate': 0.0, 'epoch': 2.67}

 89%|████████▉ | 1798/2022 [3:54:05<29:16,  7.84s/it]
 89%|████████▉ | 1799/2022 [3:54:13<28:59,  7.80s/it]
                                                     
{'loss': 1.2297, 'learning_rate': 0.0, 'epoch': 2.67}

 89%|████████▉ | 1799/2022 [3:54:13<28:59,  7.80s/it]
 89%|████████▉ | 1800/2022 [3:54:20<28:49,  7.79s/it]
                                                     
{'loss': 1.0754, 'learning_rate': 0.0, 'epoch': 2.67}

 89%|████████▉ | 1800/2022 [3:54:20<28:49,  7.79s/it]
 89%|████████▉ | 1801/2022 [3:54:28<28:42,  7.79s/it]
                                                     
{'loss': 1.0702, 'learning_rate': 0.0, 'epoch': 2.67}

 89%|████████▉ | 1801/2022 [3:54:28<28:42,  7.79s/it]
 89%|████████▉ | 1802/2022 [3:54:36<28:26,  7.75s/it]
                                                     
{'loss': 1.1323, 'learning_rate': 0.0, 'epoch': 2.67}

 89%|████████▉ | 1802/2022 [3:54:36<28:26,  7.75s/it]
 89%|████████▉ | 1803/2022 [3:54:43<28:03,  7.69s/it]
                                                     
{'loss': 1.2279, 'learning_rate': 0.0, 'epoch': 2.67}

 89%|████████▉ | 1803/2022 [3:54:43<28:03,  7.69s/it]
 89%|████████▉ | 1804/2022 [3:54:51<27:50,  7.66s/it]
                                                     
{'loss': 1.0181, 'learning_rate': 0.0, 'epoch': 2.67}

 89%|████████▉ | 1804/2022 [3:54:51<27:50,  7.66s/it]
 89%|████████▉ | 1805/2022 [3:54:58<27:35,  7.63s/it]
                                                     
{'loss': 1.1586, 'learning_rate': 0.0, 'epoch': 2.68}

 89%|████████▉ | 1805/2022 [3:54:59<27:35,  7.63s/it]
 89%|████████▉ | 1806/2022 [3:55:06<27:52,  7.74s/it]
                                                     
{'loss': 1.3154, 'learning_rate': 0.0, 'epoch': 2.68}

 89%|████████▉ | 1806/2022 [3:55:06<27:52,  7.74s/it]
 89%|████████▉ | 1807/2022 [3:55:14<27:39,  7.72s/it]
                                                     
{'loss': 1.247, 'learning_rate': 0.0, 'epoch': 2.68}

 89%|████████▉ | 1807/2022 [3:55:14<27:39,  7.72s/it]
 89%|████████▉ | 1808/2022 [3:55:22<27:53,  7.82s/it]
                                                     
{'loss': 1.1526, 'learning_rate': 0.0, 'epoch': 2.68}

 89%|████████▉ | 1808/2022 [3:55:22<27:53,  7.82s/it]
 89%|████████▉ | 1809/2022 [3:55:30<27:38,  7.79s/it]
                                                     
{'loss': 1.1488, 'learning_rate': 0.0, 'epoch': 2.68}

 89%|████████▉ | 1809/2022 [3:55:30<27:38,  7.79s/it]
 90%|████████▉ | 1810/2022 [3:55:38<27:32,  7.80s/it]
                                                     
{'loss': 1.3625, 'learning_rate': 0.0, 'epoch': 2.68}

 90%|████████▉ | 1810/2022 [3:55:38<27:32,  7.80s/it]
 90%|████████▉ | 1811/2022 [3:55:45<27:04,  7.70s/it]
                                                     
{'loss': 1.223, 'learning_rate': 0.0, 'epoch': 2.68}

 90%|████████▉ | 1811/2022 [3:55:45<27:04,  7.70s/it]
 90%|████████▉ | 1812/2022 [3:55:53<27:02,  7.73s/it]
                                                     
{'loss': 1.1833, 'learning_rate': 0.0, 'epoch': 2.69}

 90%|████████▉ | 1812/2022 [3:55:53<27:02,  7.73s/it]
 90%|████████▉ | 1813/2022 [3:56:01<26:43,  7.67s/it]
                                                     
{'loss': 1.1482, 'learning_rate': 0.0, 'epoch': 2.69}

 90%|████████▉ | 1813/2022 [3:56:01<26:43,  7.67s/it]
 90%|████████▉ | 1814/2022 [3:56:08<26:24,  7.62s/it]
                                                     
{'loss': 1.1255, 'learning_rate': 0.0, 'epoch': 2.69}

 90%|████████▉ | 1814/2022 [3:56:08<26:24,  7.62s/it]
 90%|████████▉ | 1815/2022 [3:56:16<26:29,  7.68s/it]
                                                     
{'loss': 1.1787, 'learning_rate': 0.0, 'epoch': 2.69}

 90%|████████▉ | 1815/2022 [3:56:16<26:29,  7.68s/it]
 90%|████████▉ | 1816/2022 [3:56:23<26:11,  7.63s/it]
                                                     
{'loss': 1.1519, 'learning_rate': 0.0, 'epoch': 2.69}

 90%|████████▉ | 1816/2022 [3:56:23<26:11,  7.63s/it]
 90%|████████▉ | 1817/2022 [3:56:31<26:15,  7.68s/it]
                                                     
{'loss': 1.2268, 'learning_rate': 0.0, 'epoch': 2.69}

 90%|████████▉ | 1817/2022 [3:56:31<26:15,  7.68s/it]
 90%|████████▉ | 1818/2022 [3:56:39<26:24,  7.77s/it]
                                                     
{'loss': 1.0722, 'learning_rate': 0.0, 'epoch': 2.7}

 90%|████████▉ | 1818/2022 [3:56:39<26:24,  7.77s/it]
 90%|████████▉ | 1819/2022 [3:56:48<27:14,  8.05s/it]
                                                     
{'loss': 1.2112, 'learning_rate': 0.0, 'epoch': 2.7}

 90%|████████▉ | 1819/2022 [3:56:48<27:14,  8.05s/it]
 90%|█████████ | 1820/2022 [3:56:56<26:52,  7.98s/it]
                                                     
{'loss': 1.1523, 'learning_rate': 0.0, 'epoch': 2.7}

 90%|█████████ | 1820/2022 [3:56:56<26:52,  7.98s/it]
 90%|█████████ | 1821/2022 [3:57:04<27:00,  8.06s/it]
                                                     
{'loss': 1.167, 'learning_rate': 0.0, 'epoch': 2.7}

 90%|█████████ | 1821/2022 [3:57:04<27:00,  8.06s/it]
 90%|█████████ | 1822/2022 [3:57:12<26:28,  7.94s/it]
                                                     
{'loss': 1.1997, 'learning_rate': 0.0, 'epoch': 2.7}

 90%|█████████ | 1822/2022 [3:57:12<26:28,  7.94s/it]
 90%|█████████ | 1823/2022 [3:57:19<26:12,  7.90s/it]
                                                     
{'loss': 1.183, 'learning_rate': 0.0, 'epoch': 2.7}

 90%|█████████ | 1823/2022 [3:57:19<26:12,  7.90s/it]
 90%|█████████ | 1824/2022 [3:57:27<25:41,  7.78s/it]
                                                     
{'loss': 1.0819, 'learning_rate': 0.0, 'epoch': 2.7}

 90%|█████████ | 1824/2022 [3:57:27<25:41,  7.78s/it]
 90%|█████████ | 1825/2022 [3:57:35<25:35,  7.79s/it]
                                                     
{'loss': 1.0775, 'learning_rate': 0.0, 'epoch': 2.71}

 90%|█████████ | 1825/2022 [3:57:35<25:35,  7.79s/it]
 90%|█████████ | 1826/2022 [3:57:42<25:12,  7.72s/it]
                                                     
{'loss': 1.2643, 'learning_rate': 0.0, 'epoch': 2.71}

 90%|█████████ | 1826/2022 [3:57:42<25:12,  7.72s/it]
 90%|█████████ | 1827/2022 [3:57:50<25:19,  7.79s/it]
                                                     
{'loss': 1.1542, 'learning_rate': 0.0, 'epoch': 2.71}

 90%|█████████ | 1827/2022 [3:57:50<25:19,  7.79s/it]
 90%|█████████ | 1828/2022 [3:57:58<25:07,  7.77s/it]
                                                     
{'loss': 1.1282, 'learning_rate': 0.0, 'epoch': 2.71}

 90%|█████████ | 1828/2022 [3:57:58<25:07,  7.77s/it]
 90%|█████████ | 1829/2022 [3:58:06<25:08,  7.82s/it]
                                                     
{'loss': 1.1181, 'learning_rate': 0.0, 'epoch': 2.71}

 90%|█████████ | 1829/2022 [3:58:06<25:08,  7.82s/it]
 91%|█████████ | 1830/2022 [3:58:14<25:22,  7.93s/it]
                                                     
{'loss': 1.2679, 'learning_rate': 0.0, 'epoch': 2.71}

 91%|█████████ | 1830/2022 [3:58:14<25:22,  7.93s/it]
 91%|█████████ | 1831/2022 [3:58:22<24:58,  7.85s/it]
                                                     
{'loss': 1.271, 'learning_rate': 0.0, 'epoch': 2.71}

 91%|█████████ | 1831/2022 [3:58:22<24:58,  7.85s/it]
 91%|█████████ | 1832/2022 [3:58:30<25:03,  7.91s/it]
                                                     
{'loss': 1.1603, 'learning_rate': 0.0, 'epoch': 2.72}

 91%|█████████ | 1832/2022 [3:58:30<25:03,  7.91s/it]
 91%|█████████ | 1833/2022 [3:58:37<24:27,  7.76s/it]
                                                     
{'loss': 1.2439, 'learning_rate': 0.0, 'epoch': 2.72}

 91%|█████████ | 1833/2022 [3:58:37<24:27,  7.76s/it]
 91%|█████████ | 1834/2022 [3:58:45<24:02,  7.67s/it]
                                                     
{'loss': 1.1659, 'learning_rate': 0.0, 'epoch': 2.72}

 91%|█████████ | 1834/2022 [3:58:45<24:02,  7.67s/it]
 91%|█████████ | 1835/2022 [3:58:52<23:48,  7.64s/it]
                                                     
{'loss': 1.1424, 'learning_rate': 0.0, 'epoch': 2.72}

 91%|█████████ | 1835/2022 [3:58:52<23:48,  7.64s/it]
 91%|█████████ | 1836/2022 [3:59:00<23:41,  7.64s/it]
                                                     
{'loss': 1.2058, 'learning_rate': 0.0, 'epoch': 2.72}

 91%|█████████ | 1836/2022 [3:59:00<23:41,  7.64s/it]
 91%|█████████ | 1837/2022 [3:59:08<24:02,  7.80s/it]
                                                     
{'loss': 1.1334, 'learning_rate': 0.0, 'epoch': 2.72}

 91%|█████████ | 1837/2022 [3:59:08<24:02,  7.80s/it]
 91%|█████████ | 1838/2022 [3:59:16<24:15,  7.91s/it]
                                                     
{'loss': 1.2055, 'learning_rate': 0.0, 'epoch': 2.72}

 91%|█████████ | 1838/2022 [3:59:16<24:15,  7.91s/it]
 91%|█████████ | 1839/2022 [3:59:24<23:57,  7.85s/it]
                                                     
{'loss': 1.1983, 'learning_rate': 0.0, 'epoch': 2.73}

 91%|█████████ | 1839/2022 [3:59:24<23:57,  7.85s/it]
 91%|█████████ | 1840/2022 [3:59:31<23:32,  7.76s/it]
                                                     
{'loss': 1.1647, 'learning_rate': 0.0, 'epoch': 2.73}

 91%|█████████ | 1840/2022 [3:59:31<23:32,  7.76s/it]
 91%|█████████ | 1841/2022 [3:59:39<23:36,  7.83s/it]
                                                     
{'loss': 1.1047, 'learning_rate': 0.0, 'epoch': 2.73}

 91%|█████████ | 1841/2022 [3:59:39<23:36,  7.83s/it]
 91%|█████████ | 1842/2022 [3:59:48<23:54,  7.97s/it]
                                                     
{'loss': 1.1856, 'learning_rate': 0.0, 'epoch': 2.73}

 91%|█████████ | 1842/2022 [3:59:48<23:54,  7.97s/it]
 91%|█████████ | 1843/2022 [3:59:55<23:21,  7.83s/it]
                                                     
{'loss': 1.1411, 'learning_rate': 0.0, 'epoch': 2.73}

 91%|█████████ | 1843/2022 [3:59:55<23:21,  7.83s/it]
 91%|█████████ | 1844/2022 [4:00:03<23:18,  7.86s/it]
                                                     
{'loss': 1.1544, 'learning_rate': 0.0, 'epoch': 2.73}

 91%|█████████ | 1844/2022 [4:00:03<23:18,  7.86s/it]
 91%|█████████ | 1845/2022 [4:00:11<23:15,  7.88s/it]
                                                     
{'loss': 1.1528, 'learning_rate': 0.0, 'epoch': 2.74}

 91%|█████████ | 1845/2022 [4:00:11<23:15,  7.88s/it]
 91%|█████████▏| 1846/2022 [4:00:19<23:06,  7.88s/it]
                                                     
{'loss': 1.0624, 'learning_rate': 0.0, 'epoch': 2.74}

 91%|█████████▏| 1846/2022 [4:00:19<23:06,  7.88s/it]
 91%|█████████▏| 1847/2022 [4:00:27<22:49,  7.82s/it]
                                                     
{'loss': 1.0773, 'learning_rate': 0.0, 'epoch': 2.74}

 91%|█████████▏| 1847/2022 [4:00:27<22:49,  7.82s/it]
 91%|█████████▏| 1848/2022 [4:00:35<22:50,  7.87s/it]
                                                     
{'loss': 1.086, 'learning_rate': 0.0, 'epoch': 2.74}

 91%|█████████▏| 1848/2022 [4:00:35<22:50,  7.87s/it]
 91%|█████████▏| 1849/2022 [4:00:42<22:35,  7.84s/it]
                                                     
{'loss': 1.0857, 'learning_rate': 0.0, 'epoch': 2.74}

 91%|█████████▏| 1849/2022 [4:00:42<22:35,  7.84s/it]
 91%|█████████▏| 1850/2022 [4:00:50<22:20,  7.80s/it]
                                                     
{'loss': 1.1362, 'learning_rate': 0.0, 'epoch': 2.74}

 91%|█████████▏| 1850/2022 [4:00:50<22:20,  7.80s/it]
 92%|█████████▏| 1851/2022 [4:00:58<22:29,  7.89s/it]
                                                     
{'loss': 1.2699, 'learning_rate': 0.0, 'epoch': 2.74}

 92%|█████████▏| 1851/2022 [4:00:58<22:29,  7.89s/it]
 92%|█████████▏| 1852/2022 [4:01:06<22:12,  7.84s/it]
                                                     
{'loss': 1.2912, 'learning_rate': 0.0, 'epoch': 2.75}

 92%|█████████▏| 1852/2022 [4:01:06<22:12,  7.84s/it]
 92%|█████████▏| 1853/2022 [4:01:14<21:55,  7.78s/it]
                                                     
{'loss': 1.1534, 'learning_rate': 0.0, 'epoch': 2.75}

 92%|█████████▏| 1853/2022 [4:01:14<21:55,  7.78s/it]
 92%|█████████▏| 1854/2022 [4:01:21<21:49,  7.79s/it]
                                                     
{'loss': 1.1643, 'learning_rate': 0.0, 'epoch': 2.75}

 92%|█████████▏| 1854/2022 [4:01:21<21:49,  7.79s/it]
 92%|█████████▏| 1855/2022 [4:01:29<21:48,  7.84s/it]
                                                     
{'loss': 1.1172, 'learning_rate': 0.0, 'epoch': 2.75}

 92%|█████████▏| 1855/2022 [4:01:29<21:48,  7.84s/it]
 92%|█████████▏| 1856/2022 [4:01:37<21:49,  7.89s/it]
                                                     
{'loss': 1.1231, 'learning_rate': 0.0, 'epoch': 2.75}

 92%|█████████▏| 1856/2022 [4:01:37<21:49,  7.89s/it]
 92%|█████████▏| 1857/2022 [4:01:45<21:34,  7.85s/it]
                                                     
{'loss': 1.2221, 'learning_rate': 0.0, 'epoch': 2.75}

 92%|█████████▏| 1857/2022 [4:01:45<21:34,  7.85s/it]
 92%|█████████▏| 1858/2022 [4:01:53<21:42,  7.94s/it]
                                                     
{'loss': 1.1984, 'learning_rate': 0.0, 'epoch': 2.75}

 92%|█████████▏| 1858/2022 [4:01:53<21:42,  7.94s/it]
 92%|█████████▏| 1859/2022 [4:02:01<21:30,  7.91s/it]
                                                     
{'loss': 1.2243, 'learning_rate': 0.0, 'epoch': 2.76}

 92%|█████████▏| 1859/2022 [4:02:01<21:30,  7.91s/it]
 92%|█████████▏| 1860/2022 [4:02:09<21:04,  7.80s/it]
                                                     
{'loss': 1.168, 'learning_rate': 0.0, 'epoch': 2.76}

 92%|█████████▏| 1860/2022 [4:02:09<21:04,  7.80s/it]
 92%|█████████▏| 1861/2022 [4:02:17<21:25,  7.99s/it]
                                                     
{'loss': 1.0827, 'learning_rate': 0.0, 'epoch': 2.76}

 92%|█████████▏| 1861/2022 [4:02:17<21:25,  7.99s/it]
 92%|█████████▏| 1862/2022 [4:02:25<21:21,  8.01s/it]
                                                     
{'loss': 1.1466, 'learning_rate': 0.0, 'epoch': 2.76}

 92%|█████████▏| 1862/2022 [4:02:25<21:21,  8.01s/it]
 92%|█████████▏| 1863/2022 [4:02:33<20:54,  7.89s/it]
                                                     
{'loss': 1.2895, 'learning_rate': 0.0, 'epoch': 2.76}

 92%|█████████▏| 1863/2022 [4:02:33<20:54,  7.89s/it]
 92%|█████████▏| 1864/2022 [4:02:40<20:31,  7.79s/it]
                                                     
{'loss': 1.0918, 'learning_rate': 0.0, 'epoch': 2.76}

 92%|█████████▏| 1864/2022 [4:02:40<20:31,  7.79s/it]
 92%|█████████▏| 1865/2022 [4:02:48<20:32,  7.85s/it]
                                                     
{'loss': 1.2682, 'learning_rate': 0.0, 'epoch': 2.77}

 92%|█████████▏| 1865/2022 [4:02:48<20:32,  7.85s/it]
 92%|█████████▏| 1866/2022 [4:02:56<20:03,  7.71s/it]
                                                     
{'loss': 1.1586, 'learning_rate': 0.0, 'epoch': 2.77}

 92%|█████████▏| 1866/2022 [4:02:56<20:03,  7.71s/it]
 92%|█████████▏| 1867/2022 [4:03:04<20:13,  7.83s/it]
                                                     
{'loss': 1.2052, 'learning_rate': 0.0, 'epoch': 2.77}

 92%|█████████▏| 1867/2022 [4:03:04<20:13,  7.83s/it]
 92%|█████████▏| 1868/2022 [4:03:12<20:17,  7.90s/it]
                                                     
{'loss': 0.9956, 'learning_rate': 0.0, 'epoch': 2.77}

 92%|█████████▏| 1868/2022 [4:03:12<20:17,  7.90s/it]
 92%|█████████▏| 1869/2022 [4:03:20<20:02,  7.86s/it]
                                                     
{'loss': 1.1828, 'learning_rate': 0.0, 'epoch': 2.77}

 92%|█████████▏| 1869/2022 [4:03:20<20:02,  7.86s/it]
 92%|█████████▏| 1870/2022 [4:03:27<19:46,  7.81s/it]
                                                     
{'loss': 1.037, 'learning_rate': 0.0, 'epoch': 2.77}

 92%|█████████▏| 1870/2022 [4:03:27<19:46,  7.81s/it]
 93%|█████████▎| 1871/2022 [4:03:35<19:27,  7.73s/it]
                                                     
{'loss': 1.2849, 'learning_rate': 0.0, 'epoch': 2.77}

 93%|█████████▎| 1871/2022 [4:03:35<19:27,  7.73s/it]
 93%|█████████▎| 1872/2022 [4:03:43<19:27,  7.78s/it]
                                                     
{'loss': 1.2487, 'learning_rate': 0.0, 'epoch': 2.78}

 93%|█████████▎| 1872/2022 [4:03:43<19:27,  7.78s/it]
 93%|█████████▎| 1873/2022 [4:03:51<19:34,  7.88s/it]
                                                     
{'loss': 1.0603, 'learning_rate': 0.0, 'epoch': 2.78}

 93%|█████████▎| 1873/2022 [4:03:51<19:34,  7.88s/it]
 93%|█████████▎| 1874/2022 [4:03:58<19:13,  7.80s/it]
                                                     
{'loss': 1.2471, 'learning_rate': 0.0, 'epoch': 2.78}

 93%|█████████▎| 1874/2022 [4:03:59<19:13,  7.80s/it]
 93%|█████████▎| 1875/2022 [4:04:06<18:59,  7.75s/it]
                                                     
{'loss': 1.1164, 'learning_rate': 0.0, 'epoch': 2.78}

 93%|█████████▎| 1875/2022 [4:04:06<18:59,  7.75s/it]
 93%|█████████▎| 1876/2022 [4:04:14<18:58,  7.80s/it]
                                                     
{'loss': 1.0564, 'learning_rate': 0.0, 'epoch': 2.78}

 93%|█████████▎| 1876/2022 [4:04:14<18:58,  7.80s/it]
 93%|█████████▎| 1877/2022 [4:04:22<18:50,  7.80s/it]
                                                     
{'loss': 1.1975, 'learning_rate': 0.0, 'epoch': 2.78}

 93%|█████████▎| 1877/2022 [4:04:22<18:50,  7.80s/it]
 93%|█████████▎| 1878/2022 [4:04:30<19:00,  7.92s/it]
                                                     
{'loss': 1.2646, 'learning_rate': 0.0, 'epoch': 2.78}

 93%|█████████▎| 1878/2022 [4:04:30<19:00,  7.92s/it]
 93%|█████████▎| 1879/2022 [4:04:38<18:45,  7.87s/it]
                                                     
{'loss': 1.1702, 'learning_rate': 0.0, 'epoch': 2.79}

 93%|█████████▎| 1879/2022 [4:04:38<18:45,  7.87s/it]
 93%|█████████▎| 1880/2022 [4:04:45<18:28,  7.81s/it]
                                                     
{'loss': 1.1129, 'learning_rate': 0.0, 'epoch': 2.79}

 93%|█████████▎| 1880/2022 [4:04:45<18:28,  7.81s/it]
 93%|█████████▎| 1881/2022 [4:04:53<18:11,  7.74s/it]
                                                     
{'loss': 1.1627, 'learning_rate': 0.0, 'epoch': 2.79}

 93%|█████████▎| 1881/2022 [4:04:53<18:11,  7.74s/it]
 93%|█████████▎| 1882/2022 [4:05:01<18:01,  7.73s/it]
                                                     
{'loss': 1.1505, 'learning_rate': 0.0, 'epoch': 2.79}

 93%|█████████▎| 1882/2022 [4:05:01<18:01,  7.73s/it]
 93%|█████████▎| 1883/2022 [4:05:08<17:53,  7.73s/it]
                                                     
{'loss': 1.3109, 'learning_rate': 0.0, 'epoch': 2.79}

 93%|█████████▎| 1883/2022 [4:05:08<17:53,  7.73s/it]
 93%|█████████▎| 1884/2022 [4:05:16<17:48,  7.74s/it]
                                                     
{'loss': 1.2179, 'learning_rate': 0.0, 'epoch': 2.79}

 93%|█████████▎| 1884/2022 [4:05:16<17:48,  7.74s/it]
 93%|█████████▎| 1885/2022 [4:05:24<17:43,  7.76s/it]
                                                     
{'loss': 1.1368, 'learning_rate': 0.0, 'epoch': 2.79}

 93%|█████████▎| 1885/2022 [4:05:24<17:43,  7.76s/it]
 93%|█████████▎| 1886/2022 [4:05:32<17:34,  7.75s/it]
                                                     
{'loss': 1.1708, 'learning_rate': 0.0, 'epoch': 2.8}

 93%|█████████▎| 1886/2022 [4:05:32<17:34,  7.75s/it]
 93%|█████████▎| 1887/2022 [4:05:40<17:37,  7.83s/it]
                                                     
{'loss': 1.1587, 'learning_rate': 0.0, 'epoch': 2.8}

 93%|█████████▎| 1887/2022 [4:05:40<17:37,  7.83s/it]
 93%|█████████▎| 1888/2022 [4:05:47<17:20,  7.77s/it]
                                                     
{'loss': 1.1582, 'learning_rate': 0.0, 'epoch': 2.8}

 93%|█████████▎| 1888/2022 [4:05:47<17:20,  7.77s/it]
 93%|█████████▎| 1889/2022 [4:05:56<17:28,  7.88s/it]
                                                     
{'loss': 1.093, 'learning_rate': 0.0, 'epoch': 2.8}

 93%|█████████▎| 1889/2022 [4:05:56<17:28,  7.88s/it]
 93%|█████████▎| 1890/2022 [4:06:03<17:07,  7.78s/it]
                                                     
{'loss': 1.1238, 'learning_rate': 0.0, 'epoch': 2.8}

 93%|█████████▎| 1890/2022 [4:06:03<17:07,  7.78s/it]
 94%|█████████▎| 1891/2022 [4:06:11<17:20,  7.94s/it]
                                                     
{'loss': 1.0556, 'learning_rate': 0.0, 'epoch': 2.8}

 94%|█████████▎| 1891/2022 [4:06:11<17:20,  7.94s/it]
 94%|█████████▎| 1892/2022 [4:06:19<17:16,  7.98s/it]
                                                     
{'loss': 1.0849, 'learning_rate': 0.0, 'epoch': 2.81}

 94%|█████████▎| 1892/2022 [4:06:19<17:16,  7.98s/it]
 94%|█████████▎| 1893/2022 [4:06:27<17:09,  7.98s/it]
                                                     
{'loss': 1.073, 'learning_rate': 0.0, 'epoch': 2.81}

 94%|█████████▎| 1893/2022 [4:06:27<17:09,  7.98s/it]
 94%|█████████▎| 1894/2022 [4:06:35<16:55,  7.94s/it]
                                                     
{'loss': 1.2125, 'learning_rate': 0.0, 'epoch': 2.81}

 94%|█████████▎| 1894/2022 [4:06:35<16:55,  7.94s/it]
 94%|█████████▎| 1895/2022 [4:06:43<16:38,  7.86s/it]
                                                     
{'loss': 1.205, 'learning_rate': 0.0, 'epoch': 2.81}

 94%|█████████▎| 1895/2022 [4:06:43<16:38,  7.86s/it]
 94%|█████████▍| 1896/2022 [4:06:51<16:26,  7.83s/it]
                                                     
{'loss': 1.1563, 'learning_rate': 0.0, 'epoch': 2.81}

 94%|█████████▍| 1896/2022 [4:06:51<16:26,  7.83s/it]
 94%|█████████▍| 1897/2022 [4:06:58<16:15,  7.81s/it]
                                                     
{'loss': 1.1667, 'learning_rate': 0.0, 'epoch': 2.81}

 94%|█████████▍| 1897/2022 [4:06:59<16:15,  7.81s/it]
 94%|█████████▍| 1898/2022 [4:07:07<16:18,  7.89s/it]
                                                     
{'loss': 1.0687, 'learning_rate': 0.0, 'epoch': 2.81}

 94%|█████████▍| 1898/2022 [4:07:07<16:18,  7.89s/it]
 94%|█████████▍| 1899/2022 [4:07:14<16:06,  7.86s/it]
                                                     
{'loss': 1.0379, 'learning_rate': 0.0, 'epoch': 2.82}

 94%|█████████▍| 1899/2022 [4:07:14<16:06,  7.86s/it]
 94%|█████████▍| 1900/2022 [4:07:22<15:52,  7.81s/it]
                                                     
{'loss': 1.1256, 'learning_rate': 0.0, 'epoch': 2.82}

 94%|█████████▍| 1900/2022 [4:07:22<15:52,  7.81s/it]
 94%|█████████▍| 1901/2022 [4:07:30<15:43,  7.80s/it]
                                                     
{'loss': 1.1098, 'learning_rate': 0.0, 'epoch': 2.82}

 94%|█████████▍| 1901/2022 [4:07:30<15:43,  7.80s/it]
 94%|█████████▍| 1902/2022 [4:07:38<15:38,  7.82s/it]
                                                     
{'loss': 1.0926, 'learning_rate': 0.0, 'epoch': 2.82}

 94%|█████████▍| 1902/2022 [4:07:38<15:38,  7.82s/it]
 94%|█████████▍| 1903/2022 [4:07:45<15:09,  7.64s/it]
                                                     
{'loss': 1.1967, 'learning_rate': 0.0, 'epoch': 2.82}

 94%|█████████▍| 1903/2022 [4:07:45<15:09,  7.64s/it]
 94%|█████████▍| 1904/2022 [4:07:52<14:55,  7.59s/it]
                                                     
{'loss': 0.9755, 'learning_rate': 0.0, 'epoch': 2.82}

 94%|█████████▍| 1904/2022 [4:07:52<14:55,  7.59s/it]
 94%|█████████▍| 1905/2022 [4:08:00<15:05,  7.74s/it]
                                                     
{'loss': 1.173, 'learning_rate': 0.0, 'epoch': 2.82}

 94%|█████████▍| 1905/2022 [4:08:00<15:05,  7.74s/it]
 94%|█████████▍| 1906/2022 [4:08:08<14:52,  7.69s/it]
                                                     
{'loss': 1.0962, 'learning_rate': 0.0, 'epoch': 2.83}

 94%|█████████▍| 1906/2022 [4:08:08<14:52,  7.69s/it]
 94%|█████████▍| 1907/2022 [4:08:16<14:47,  7.72s/it]
                                                     
{'loss': 1.0604, 'learning_rate': 0.0, 'epoch': 2.83}

 94%|█████████▍| 1907/2022 [4:08:16<14:47,  7.72s/it]
 94%|█████████▍| 1908/2022 [4:08:23<14:35,  7.68s/it]
                                                     
{'loss': 1.2858, 'learning_rate': 0.0, 'epoch': 2.83}

 94%|█████████▍| 1908/2022 [4:08:23<14:35,  7.68s/it]
 94%|█████████▍| 1909/2022 [4:08:31<14:39,  7.78s/it]
                                                     
{'loss': 1.0481, 'learning_rate': 0.0, 'epoch': 2.83}

 94%|█████████▍| 1909/2022 [4:08:31<14:39,  7.78s/it]
 94%|█████████▍| 1910/2022 [4:08:39<14:25,  7.73s/it]
                                                     
{'loss': 1.1045, 'learning_rate': 0.0, 'epoch': 2.83}

 94%|█████████▍| 1910/2022 [4:08:39<14:25,  7.73s/it]
 95%|█████████▍| 1911/2022 [4:08:47<14:20,  7.75s/it]
                                                     
{'loss': 1.1527, 'learning_rate': 0.0, 'epoch': 2.83}

 95%|█████████▍| 1911/2022 [4:08:47<14:20,  7.75s/it]
 95%|█████████▍| 1912/2022 [4:08:54<14:04,  7.68s/it]
                                                     
{'loss': 1.1702, 'learning_rate': 0.0, 'epoch': 2.83}

 95%|█████████▍| 1912/2022 [4:08:54<14:04,  7.68s/it]
 95%|█████████▍| 1913/2022 [4:09:02<14:11,  7.81s/it]
                                                     
{'loss': 1.1327, 'learning_rate': 0.0, 'epoch': 2.84}

 95%|█████████▍| 1913/2022 [4:09:03<14:11,  7.81s/it]
 95%|█████████▍| 1914/2022 [4:09:10<13:56,  7.74s/it]
                                                     
{'loss': 1.1994, 'learning_rate': 0.0, 'epoch': 2.84}

 95%|█████████▍| 1914/2022 [4:09:10<13:56,  7.74s/it]
 95%|█████████▍| 1915/2022 [4:09:18<13:59,  7.85s/it]
                                                     
{'loss': 1.2138, 'learning_rate': 0.0, 'epoch': 2.84}

 95%|█████████▍| 1915/2022 [4:09:18<13:59,  7.85s/it]
 95%|█████████▍| 1916/2022 [4:09:26<13:47,  7.81s/it]
                                                     
{'loss': 1.2209, 'learning_rate': 0.0, 'epoch': 2.84}

 95%|█████████▍| 1916/2022 [4:09:26<13:47,  7.81s/it]
 95%|█████████▍| 1917/2022 [4:09:34<14:03,  8.03s/it]
                                                     
{'loss': 1.1769, 'learning_rate': 0.0, 'epoch': 2.84}

 95%|█████████▍| 1917/2022 [4:09:34<14:03,  8.03s/it]
 95%|█████████▍| 1918/2022 [4:09:42<13:49,  7.97s/it]
                                                     
{'loss': 1.0871, 'learning_rate': 0.0, 'epoch': 2.84}

 95%|█████████▍| 1918/2022 [4:09:42<13:49,  7.97s/it]
 95%|█████████▍| 1919/2022 [4:09:51<13:49,  8.06s/it]
                                                     
{'loss': 1.3215, 'learning_rate': 0.0, 'epoch': 2.85}

 95%|█████████▍| 1919/2022 [4:09:51<13:49,  8.06s/it]
 95%|█████████▍| 1920/2022 [4:09:59<14:08,  8.32s/it]
                                                     
{'loss': 1.1684, 'learning_rate': 0.0, 'epoch': 2.85}

 95%|█████████▍| 1920/2022 [4:09:59<14:08,  8.32s/it]
 95%|█████████▌| 1921/2022 [4:10:07<13:49,  8.21s/it]
                                                     
{'loss': 1.1241, 'learning_rate': 0.0, 'epoch': 2.85}

 95%|█████████▌| 1921/2022 [4:10:07<13:49,  8.21s/it]
 95%|█████████▌| 1922/2022 [4:10:15<13:28,  8.09s/it]
                                                     
{'loss': 1.1346, 'learning_rate': 0.0, 'epoch': 2.85}

 95%|█████████▌| 1922/2022 [4:10:15<13:28,  8.09s/it]
 95%|█████████▌| 1923/2022 [4:10:23<13:08,  7.96s/it]
                                                     
{'loss': 1.1088, 'learning_rate': 0.0, 'epoch': 2.85}

 95%|█████████▌| 1923/2022 [4:10:23<13:08,  7.96s/it]
 95%|█████████▌| 1924/2022 [4:10:31<12:53,  7.89s/it]
                                                     
{'loss': 1.096, 'learning_rate': 0.0, 'epoch': 2.85}

 95%|█████████▌| 1924/2022 [4:10:31<12:53,  7.89s/it]
 95%|█████████▌| 1925/2022 [4:10:39<12:49,  7.93s/it]
                                                     
{'loss': 1.1661, 'learning_rate': 0.0, 'epoch': 2.85}

 95%|█████████▌| 1925/2022 [4:10:39<12:49,  7.93s/it]
 95%|█████████▌| 1926/2022 [4:10:46<12:30,  7.81s/it]
                                                     
{'loss': 1.206, 'learning_rate': 0.0, 'epoch': 2.86}

 95%|█████████▌| 1926/2022 [4:10:46<12:30,  7.81s/it]
 95%|█████████▌| 1927/2022 [4:10:54<12:22,  7.81s/it]
                                                     
{'loss': 1.1435, 'learning_rate': 0.0, 'epoch': 2.86}

 95%|█████████▌| 1927/2022 [4:10:54<12:22,  7.81s/it]
 95%|█████████▌| 1928/2022 [4:11:02<12:14,  7.81s/it]
                                                     
{'loss': 1.0813, 'learning_rate': 0.0, 'epoch': 2.86}

 95%|█████████▌| 1928/2022 [4:11:02<12:14,  7.81s/it]
 95%|█████████▌| 1929/2022 [4:11:09<11:56,  7.70s/it]
                                                     
{'loss': 1.0845, 'learning_rate': 0.0, 'epoch': 2.86}

 95%|█████████▌| 1929/2022 [4:11:09<11:56,  7.70s/it]
 95%|█████████▌| 1930/2022 [4:11:18<12:06,  7.89s/it]
                                                     
{'loss': 1.1451, 'learning_rate': 0.0, 'epoch': 2.86}

 95%|█████████▌| 1930/2022 [4:11:18<12:06,  7.89s/it]
 95%|█████████▌| 1931/2022 [4:11:26<12:01,  7.93s/it]
                                                     
{'loss': 1.1589, 'learning_rate': 0.0, 'epoch': 2.86}

 95%|█████████▌| 1931/2022 [4:11:26<12:01,  7.93s/it]
 96%|█████████▌| 1932/2022 [4:11:34<11:59,  7.99s/it]
                                                     
{'loss': 1.196, 'learning_rate': 0.0, 'epoch': 2.86}

 96%|█████████▌| 1932/2022 [4:11:34<11:59,  7.99s/it]
 96%|█████████▌| 1933/2022 [4:11:41<11:40,  7.87s/it]
                                                     
{'loss': 1.1539, 'learning_rate': 0.0, 'epoch': 2.87}

 96%|█████████▌| 1933/2022 [4:11:41<11:40,  7.87s/it]
 96%|█████████▌| 1934/2022 [4:11:49<11:28,  7.82s/it]
                                                     
{'loss': 0.984, 'learning_rate': 0.0, 'epoch': 2.87}

 96%|█████████▌| 1934/2022 [4:11:49<11:28,  7.82s/it]
 96%|█████████▌| 1935/2022 [4:11:57<11:18,  7.80s/it]
                                                     
{'loss': 1.2035, 'learning_rate': 0.0, 'epoch': 2.87}

 96%|█████████▌| 1935/2022 [4:11:57<11:18,  7.80s/it]
 96%|█████████▌| 1936/2022 [4:12:04<11:02,  7.71s/it]
                                                     
{'loss': 1.2167, 'learning_rate': 0.0, 'epoch': 2.87}

 96%|█████████▌| 1936/2022 [4:12:04<11:02,  7.71s/it]
 96%|█████████▌| 1937/2022 [4:12:12<10:56,  7.72s/it]
                                                     
{'loss': 1.1775, 'learning_rate': 0.0, 'epoch': 2.87}

 96%|█████████▌| 1937/2022 [4:12:12<10:56,  7.72s/it]
 96%|█████████▌| 1938/2022 [4:12:20<10:51,  7.75s/it]
                                                     
{'loss': 1.1859, 'learning_rate': 0.0, 'epoch': 2.87}

 96%|█████████▌| 1938/2022 [4:12:20<10:51,  7.75s/it]
 96%|█████████▌| 1939/2022 [4:12:28<10:46,  7.79s/it]
                                                     
{'loss': 1.1871, 'learning_rate': 0.0, 'epoch': 2.87}

 96%|█████████▌| 1939/2022 [4:12:28<10:46,  7.79s/it]
 96%|█████████▌| 1940/2022 [4:12:36<10:42,  7.84s/it]
                                                     
{'loss': 1.1098, 'learning_rate': 0.0, 'epoch': 2.88}

 96%|█████████▌| 1940/2022 [4:12:36<10:42,  7.84s/it]
 96%|█████████▌| 1941/2022 [4:12:44<10:37,  7.87s/it]
                                                     
{'loss': 1.1865, 'learning_rate': 0.0, 'epoch': 2.88}

 96%|█████████▌| 1941/2022 [4:12:44<10:37,  7.87s/it]
 96%|█████████▌| 1942/2022 [4:12:52<10:33,  7.91s/it]
                                                     
{'loss': 1.1189, 'learning_rate': 0.0, 'epoch': 2.88}

 96%|█████████▌| 1942/2022 [4:12:52<10:33,  7.91s/it]
 96%|█████████▌| 1943/2022 [4:13:00<10:25,  7.92s/it]
                                                     
{'loss': 1.1286, 'learning_rate': 0.0, 'epoch': 2.88}

 96%|█████████▌| 1943/2022 [4:13:00<10:25,  7.92s/it]
 96%|█████████▌| 1944/2022 [4:13:07<10:07,  7.79s/it]
                                                     
{'loss': 1.2722, 'learning_rate': 0.0, 'epoch': 2.88}

 96%|█████████▌| 1944/2022 [4:13:07<10:07,  7.79s/it]
 96%|█████████▌| 1945/2022 [4:13:15<10:01,  7.81s/it]
                                                     
{'loss': 1.184, 'learning_rate': 0.0, 'epoch': 2.88}

 96%|█████████▌| 1945/2022 [4:13:15<10:01,  7.81s/it]
 96%|█████████▌| 1946/2022 [4:13:23<09:50,  7.77s/it]
                                                     
{'loss': 1.1506, 'learning_rate': 0.0, 'epoch': 2.89}

 96%|█████████▌| 1946/2022 [4:13:23<09:50,  7.77s/it]
 96%|█████████▋| 1947/2022 [4:13:31<09:49,  7.86s/it]
                                                     
{'loss': 1.084, 'learning_rate': 0.0, 'epoch': 2.89}

 96%|█████████▋| 1947/2022 [4:13:31<09:49,  7.86s/it]
 96%|█████████▋| 1948/2022 [4:13:39<09:44,  7.90s/it]
                                                     
{'loss': 1.2214, 'learning_rate': 0.0, 'epoch': 2.89}

 96%|█████████▋| 1948/2022 [4:13:39<09:44,  7.90s/it]
 96%|█████████▋| 1949/2022 [4:13:46<09:34,  7.87s/it]
                                                     
{'loss': 1.213, 'learning_rate': 0.0, 'epoch': 2.89}

 96%|█████████▋| 1949/2022 [4:13:46<09:34,  7.87s/it]
 96%|█████████▋| 1950/2022 [4:13:54<09:22,  7.82s/it]
                                                     
{'loss': 1.2369, 'learning_rate': 0.0, 'epoch': 2.89}

 96%|█████████▋| 1950/2022 [4:13:54<09:22,  7.82s/it]
 96%|█████████▋| 1951/2022 [4:14:02<09:13,  7.80s/it]
                                                     
{'loss': 1.2363, 'learning_rate': 0.0, 'epoch': 2.89}

 96%|█████████▋| 1951/2022 [4:14:02<09:13,  7.80s/it]
 97%|█████████▋| 1952/2022 [4:14:10<09:12,  7.89s/it]
                                                     
{'loss': 1.0914, 'learning_rate': 0.0, 'epoch': 2.89}

 97%|█████████▋| 1952/2022 [4:14:10<09:12,  7.89s/it]
 97%|█████████▋| 1953/2022 [4:14:18<09:03,  7.87s/it]
                                                     
{'loss': 1.2145, 'learning_rate': 0.0, 'epoch': 2.9}

 97%|█████████▋| 1953/2022 [4:14:18<09:03,  7.87s/it]
 97%|█████████▋| 1954/2022 [4:14:25<08:47,  7.75s/it]
                                                     
{'loss': 1.2516, 'learning_rate': 0.0, 'epoch': 2.9}

 97%|█████████▋| 1954/2022 [4:14:25<08:47,  7.75s/it]
 97%|█████████▋| 1955/2022 [4:14:33<08:42,  7.80s/it]
                                                     
{'loss': 1.1222, 'learning_rate': 0.0, 'epoch': 2.9}

 97%|█████████▋| 1955/2022 [4:14:33<08:42,  7.80s/it]
 97%|█████████▋| 1956/2022 [4:14:41<08:31,  7.75s/it]
                                                     
{'loss': 1.2896, 'learning_rate': 0.0, 'epoch': 2.9}

 97%|█████████▋| 1956/2022 [4:14:41<08:31,  7.75s/it]
 97%|█████████▋| 1957/2022 [4:14:49<08:24,  7.77s/it]
                                                     
{'loss': 1.2667, 'learning_rate': 0.0, 'epoch': 2.9}

 97%|█████████▋| 1957/2022 [4:14:49<08:24,  7.77s/it]
 97%|█████████▋| 1958/2022 [4:14:57<08:24,  7.88s/it]
                                                     
{'loss': 1.073, 'learning_rate': 0.0, 'epoch': 2.9}

 97%|█████████▋| 1958/2022 [4:14:57<08:24,  7.88s/it]
 97%|█████████▋| 1959/2022 [4:15:05<08:17,  7.90s/it]
                                                     
{'loss': 1.0881, 'learning_rate': 0.0, 'epoch': 2.9}

 97%|█████████▋| 1959/2022 [4:15:05<08:17,  7.90s/it]
 97%|█████████▋| 1960/2022 [4:15:13<08:15,  8.00s/it]
                                                     
{'loss': 1.1286, 'learning_rate': 0.0, 'epoch': 2.91}

 97%|█████████▋| 1960/2022 [4:15:13<08:15,  8.00s/it]
 97%|█████████▋| 1961/2022 [4:15:21<08:11,  8.06s/it]
                                                     
{'loss': 1.1685, 'learning_rate': 0.0, 'epoch': 2.91}

 97%|█████████▋| 1961/2022 [4:15:21<08:11,  8.06s/it]
 97%|█████████▋| 1962/2022 [4:15:29<07:58,  7.97s/it]
                                                     
{'loss': 1.1717, 'learning_rate': 0.0, 'epoch': 2.91}

 97%|█████████▋| 1962/2022 [4:15:29<07:58,  7.97s/it]
 97%|█████████▋| 1963/2022 [4:15:36<07:42,  7.84s/it]
                                                     
{'loss': 1.2225, 'learning_rate': 0.0, 'epoch': 2.91}

 97%|█████████▋| 1963/2022 [4:15:36<07:42,  7.84s/it]
 97%|█████████▋| 1964/2022 [4:15:44<07:31,  7.78s/it]
                                                     
{'loss': 1.1582, 'learning_rate': 0.0, 'epoch': 2.91}

 97%|█████████▋| 1964/2022 [4:15:44<07:31,  7.78s/it]
 97%|█████████▋| 1965/2022 [4:15:52<07:21,  7.74s/it]
                                                     
{'loss': 1.1577, 'learning_rate': 0.0, 'epoch': 2.91}

 97%|█████████▋| 1965/2022 [4:15:52<07:21,  7.74s/it]
 97%|█████████▋| 1966/2022 [4:16:00<07:13,  7.75s/it]
                                                     
{'loss': 1.0668, 'learning_rate': 0.0, 'epoch': 2.91}

 97%|█████████▋| 1966/2022 [4:16:00<07:13,  7.75s/it]
 97%|█████████▋| 1967/2022 [4:16:07<07:05,  7.73s/it]
                                                     
{'loss': 1.1579, 'learning_rate': 0.0, 'epoch': 2.92}

 97%|█████████▋| 1967/2022 [4:16:07<07:05,  7.73s/it]
 97%|█████████▋| 1968/2022 [4:16:15<07:04,  7.87s/it]
                                                     
{'loss': 1.1043, 'learning_rate': 0.0, 'epoch': 2.92}

 97%|█████████▋| 1968/2022 [4:16:15<07:04,  7.87s/it]
 97%|█████████▋| 1969/2022 [4:16:23<06:49,  7.73s/it]
                                                     
{'loss': 1.1854, 'learning_rate': 0.0, 'epoch': 2.92}

 97%|█████████▋| 1969/2022 [4:16:23<06:49,  7.73s/it]
 97%|█████████▋| 1970/2022 [4:16:30<06:37,  7.64s/it]
                                                     
{'loss': 1.1623, 'learning_rate': 0.0, 'epoch': 2.92}

 97%|█████████▋| 1970/2022 [4:16:30<06:37,  7.64s/it]
 97%|█████████▋| 1971/2022 [4:16:38<06:28,  7.61s/it]
                                                     
{'loss': 1.3394, 'learning_rate': 0.0, 'epoch': 2.92}

 97%|█████████▋| 1971/2022 [4:16:38<06:28,  7.61s/it]
 98%|█████████▊| 1972/2022 [4:16:45<06:22,  7.64s/it]
                                                     
{'loss': 1.0558, 'learning_rate': 0.0, 'epoch': 2.92}

 98%|█████████▊| 1972/2022 [4:16:46<06:22,  7.64s/it]
 98%|█████████▊| 1973/2022 [4:16:54<06:21,  7.79s/it]
                                                     
{'loss': 1.1134, 'learning_rate': 0.0, 'epoch': 2.93}

 98%|█████████▊| 1973/2022 [4:16:54<06:21,  7.79s/it]
 98%|█████████▊| 1974/2022 [4:17:01<06:13,  7.79s/it]
                                                     
{'loss': 1.1899, 'learning_rate': 0.0, 'epoch': 2.93}

 98%|█████████▊| 1974/2022 [4:17:01<06:13,  7.79s/it]
 98%|█████████▊| 1975/2022 [4:17:09<06:06,  7.81s/it]
                                                     
{'loss': 1.1682, 'learning_rate': 0.0, 'epoch': 2.93}

 98%|█████████▊| 1975/2022 [4:17:09<06:06,  7.81s/it]
 98%|█████████▊| 1976/2022 [4:17:17<05:56,  7.74s/it]
                                                     
{'loss': 1.1474, 'learning_rate': 0.0, 'epoch': 2.93}

 98%|█████████▊| 1976/2022 [4:17:17<05:56,  7.74s/it]
 98%|█████████▊| 1977/2022 [4:17:25<05:49,  7.78s/it]
                                                     
{'loss': 1.2364, 'learning_rate': 0.0, 'epoch': 2.93}

 98%|█████████▊| 1977/2022 [4:17:25<05:49,  7.78s/it]
 98%|█████████▊| 1978/2022 [4:17:32<05:39,  7.71s/it]
                                                     
{'loss': 1.2608, 'learning_rate': 0.0, 'epoch': 2.93}

 98%|█████████▊| 1978/2022 [4:17:32<05:39,  7.71s/it]
 98%|█████████▊| 1979/2022 [4:17:40<05:35,  7.79s/it]
                                                     
{'loss': 1.2213, 'learning_rate': 0.0, 'epoch': 2.93}

 98%|█████████▊| 1979/2022 [4:17:40<05:35,  7.79s/it]
 98%|█████████▊| 1980/2022 [4:17:48<05:24,  7.73s/it]
                                                     
{'loss': 1.1405, 'learning_rate': 0.0, 'epoch': 2.94}

 98%|█████████▊| 1980/2022 [4:17:48<05:24,  7.73s/it]
 98%|█████████▊| 1981/2022 [4:17:56<05:17,  7.73s/it]
                                                     
{'loss': 1.0368, 'learning_rate': 0.0, 'epoch': 2.94}

 98%|█████████▊| 1981/2022 [4:17:56<05:17,  7.73s/it]
 98%|█████████▊| 1982/2022 [4:18:03<05:07,  7.69s/it]
                                                     
{'loss': 1.1604, 'learning_rate': 0.0, 'epoch': 2.94}

 98%|█████████▊| 1982/2022 [4:18:03<05:07,  7.69s/it]
 98%|█████████▊| 1983/2022 [4:18:11<05:00,  7.70s/it]
                                                     
{'loss': 1.2288, 'learning_rate': 0.0, 'epoch': 2.94}

 98%|█████████▊| 1983/2022 [4:18:11<05:00,  7.70s/it]
 98%|█████████▊| 1984/2022 [4:18:19<04:54,  7.75s/it]
                                                     
{'loss': 1.089, 'learning_rate': 0.0, 'epoch': 2.94}

 98%|█████████▊| 1984/2022 [4:18:19<04:54,  7.75s/it]
 98%|█████████▊| 1985/2022 [4:18:26<04:46,  7.74s/it]
                                                     
{'loss': 1.2305, 'learning_rate': 0.0, 'epoch': 2.94}

 98%|█████████▊| 1985/2022 [4:18:27<04:46,  7.74s/it]
 98%|█████████▊| 1986/2022 [4:18:34<04:41,  7.81s/it]
                                                     
{'loss': 1.0425, 'learning_rate': 0.0, 'epoch': 2.94}

 98%|█████████▊| 1986/2022 [4:18:34<04:41,  7.81s/it]
 98%|█████████▊| 1987/2022 [4:18:42<04:32,  7.80s/it]
                                                     
{'loss': 1.1866, 'learning_rate': 0.0, 'epoch': 2.95}

 98%|█████████▊| 1987/2022 [4:18:42<04:32,  7.80s/it]
 98%|█████████▊| 1988/2022 [4:18:50<04:23,  7.75s/it]
                                                     
{'loss': 1.1937, 'learning_rate': 0.0, 'epoch': 2.95}

 98%|█████████▊| 1988/2022 [4:18:50<04:23,  7.75s/it]
 98%|█████████▊| 1989/2022 [4:18:58<04:16,  7.78s/it]
                                                     
{'loss': 1.1323, 'learning_rate': 0.0, 'epoch': 2.95}

 98%|█████████▊| 1989/2022 [4:18:58<04:16,  7.78s/it]
 98%|█████████▊| 1990/2022 [4:19:06<04:12,  7.90s/it]
                                                     
{'loss': 1.0374, 'learning_rate': 0.0, 'epoch': 2.95}

 98%|█████████▊| 1990/2022 [4:19:06<04:12,  7.90s/it]
 98%|█████████▊| 1991/2022 [4:19:14<04:05,  7.93s/it]
                                                     
{'loss': 1.1529, 'learning_rate': 0.0, 'epoch': 2.95}

 98%|█████████▊| 1991/2022 [4:19:14<04:05,  7.93s/it]
 99%|█████████▊| 1992/2022 [4:19:22<03:55,  7.84s/it]
                                                     
{'loss': 1.1215, 'learning_rate': 0.0, 'epoch': 2.95}

 99%|█████████▊| 1992/2022 [4:19:22<03:55,  7.84s/it]
 99%|█████████▊| 1993/2022 [4:19:30<03:49,  7.91s/it]
                                                     
{'loss': 1.24, 'learning_rate': 0.0, 'epoch': 2.95}

 99%|█████████▊| 1993/2022 [4:19:30<03:49,  7.91s/it]
 99%|█████████▊| 1994/2022 [4:19:37<03:41,  7.90s/it]
                                                     
{'loss': 1.3059, 'learning_rate': 0.0, 'epoch': 2.96}

 99%|█████████▊| 1994/2022 [4:19:37<03:41,  7.90s/it]
 99%|█████████▊| 1995/2022 [4:19:45<03:33,  7.91s/it]
                                                     
{'loss': 1.1306, 'learning_rate': 0.0, 'epoch': 2.96}

 99%|█████████▊| 1995/2022 [4:19:45<03:33,  7.91s/it]
 99%|█████████▊| 1996/2022 [4:19:53<03:26,  7.93s/it]
                                                     
{'loss': 1.2235, 'learning_rate': 0.0, 'epoch': 2.96}

 99%|█████████▊| 1996/2022 [4:19:53<03:26,  7.93s/it]
 99%|█████████▉| 1997/2022 [4:20:02<03:20,  8.01s/it]
                                                     
{'loss': 1.2884, 'learning_rate': 0.0, 'epoch': 2.96}

 99%|█████████▉| 1997/2022 [4:20:02<03:20,  8.01s/it]
 99%|█████████▉| 1998/2022 [4:20:10<03:14,  8.10s/it]
                                                     
{'loss': 1.1292, 'learning_rate': 0.0, 'epoch': 2.96}

 99%|█████████▉| 1998/2022 [4:20:10<03:14,  8.10s/it]
 99%|█████████▉| 1999/2022 [4:20:17<03:00,  7.83s/it]
                                                     
{'loss': 1.2609, 'learning_rate': 0.0, 'epoch': 2.96}

 99%|█████████▉| 1999/2022 [4:20:17<03:00,  7.83s/it]
 99%|█████████▉| 2000/2022 [4:20:25<02:52,  7.83s/it]
                                                     
{'loss': 1.2127, 'learning_rate': 0.0, 'epoch': 2.97}

 99%|█████████▉| 2000/2022 [4:20:25<02:52,  7.83s/it]
 99%|█████████▉| 2001/2022 [4:20:33<02:46,  7.92s/it]
                                                     
{'loss': 1.2422, 'learning_rate': 0.0, 'epoch': 2.97}

 99%|█████████▉| 2001/2022 [4:20:33<02:46,  7.92s/it]
 99%|█████████▉| 2002/2022 [4:20:41<02:39,  7.96s/it]
                                                     
{'loss': 1.1109, 'learning_rate': 0.0, 'epoch': 2.97}

 99%|█████████▉| 2002/2022 [4:20:41<02:39,  7.96s/it]
 99%|█████████▉| 2003/2022 [4:20:49<02:29,  7.88s/it]
                                                     
{'loss': 1.1904, 'learning_rate': 0.0, 'epoch': 2.97}

 99%|█████████▉| 2003/2022 [4:20:49<02:29,  7.88s/it]
 99%|█████████▉| 2004/2022 [4:20:57<02:21,  7.84s/it]
                                                     
{'loss': 1.0787, 'learning_rate': 0.0, 'epoch': 2.97}

 99%|█████████▉| 2004/2022 [4:20:57<02:21,  7.84s/it]
 99%|█████████▉| 2005/2022 [4:21:05<02:15,  7.96s/it]
                                                     
{'loss': 1.1727, 'learning_rate': 0.0, 'epoch': 2.97}

 99%|█████████▉| 2005/2022 [4:21:05<02:15,  7.96s/it]
 99%|█████████▉| 2006/2022 [4:21:13<02:07,  7.98s/it]
                                                     
{'loss': 1.173, 'learning_rate': 0.0, 'epoch': 2.97}

 99%|█████████▉| 2006/2022 [4:21:13<02:07,  7.98s/it]
 99%|█████████▉| 2007/2022 [4:21:21<01:58,  7.89s/it]
                                                     
{'loss': 1.1901, 'learning_rate': 0.0, 'epoch': 2.98}

 99%|█████████▉| 2007/2022 [4:21:21<01:58,  7.89s/it]
 99%|█████████▉| 2008/2022 [4:21:28<01:49,  7.82s/it]
                                                     
{'loss': 1.2129, 'learning_rate': 0.0, 'epoch': 2.98}

 99%|█████████▉| 2008/2022 [4:21:28<01:49,  7.82s/it]
 99%|█████████▉| 2009/2022 [4:21:36<01:42,  7.92s/it]
                                                     
{'loss': 1.1348, 'learning_rate': 0.0, 'epoch': 2.98}

 99%|█████████▉| 2009/2022 [4:21:36<01:42,  7.92s/it]
 99%|█████████▉| 2010/2022 [4:21:44<01:33,  7.83s/it]
                                                     
{'loss': 1.1238, 'learning_rate': 0.0, 'epoch': 2.98}

 99%|█████████▉| 2010/2022 [4:21:44<01:33,  7.83s/it]
 99%|█████████▉| 2011/2022 [4:21:52<01:26,  7.90s/it]
                                                     
{'loss': 1.1449, 'learning_rate': 0.0, 'epoch': 2.98}

 99%|█████████▉| 2011/2022 [4:21:52<01:26,  7.90s/it]
100%|█████████▉| 2012/2022 [4:22:00<01:19,  7.97s/it]
                                                     
{'loss': 1.2205, 'learning_rate': 0.0, 'epoch': 2.98}

100%|█████████▉| 2012/2022 [4:22:00<01:19,  7.97s/it]
100%|█████████▉| 2013/2022 [4:22:08<01:11,  7.95s/it]
                                                     
{'loss': 1.2273, 'learning_rate': 0.0, 'epoch': 2.98}

100%|█████████▉| 2013/2022 [4:22:08<01:11,  7.95s/it]
100%|█████████▉| 2014/2022 [4:22:16<01:03,  7.97s/it]
                                                     
{'loss': 1.0007, 'learning_rate': 0.0, 'epoch': 2.99}

100%|█████████▉| 2014/2022 [4:22:16<01:03,  7.97s/it]
100%|█████████▉| 2015/2022 [4:22:23<00:54,  7.81s/it]
                                                     
{'loss': 1.1349, 'learning_rate': 0.0, 'epoch': 2.99}

100%|█████████▉| 2015/2022 [4:22:24<00:54,  7.81s/it]
100%|█████████▉| 2016/2022 [4:22:31<00:46,  7.79s/it]
                                                     
{'loss': 1.0761, 'learning_rate': 0.0, 'epoch': 2.99}

100%|█████████▉| 2016/2022 [4:22:31<00:46,  7.79s/it]
100%|█████████▉| 2017/2022 [4:22:39<00:39,  7.85s/it]
                                                     
{'loss': 1.1436, 'learning_rate': 0.0, 'epoch': 2.99}

100%|█████████▉| 2017/2022 [4:22:39<00:39,  7.85s/it]
100%|█████████▉| 2018/2022 [4:22:47<00:31,  7.83s/it]
                                                     
{'loss': 1.1518, 'learning_rate': 0.0, 'epoch': 2.99}

100%|█████████▉| 2018/2022 [4:22:47<00:31,  7.83s/it]
100%|█████████▉| 2019/2022 [4:22:55<00:23,  7.92s/it]
                                                     
{'loss': 1.1597, 'learning_rate': 0.0, 'epoch': 2.99}

100%|█████████▉| 2019/2022 [4:22:55<00:23,  7.92s/it]
100%|█████████▉| 2020/2022 [4:23:03<00:15,  7.91s/it]
                                                     
{'loss': 1.1548, 'learning_rate': 0.0, 'epoch': 2.99}

100%|█████████▉| 2020/2022 [4:23:03<00:15,  7.91s/it]
100%|█████████▉| 2021/2022 [4:23:12<00:08,  8.21s/it]
                                                     
{'loss': 1.0888, 'learning_rate': 0.0, 'epoch': 3.0}

100%|█████████▉| 2021/2022 [4:23:12<00:08,  8.21s/it]
100%|██████████| 2022/2022 [4:23:20<00:00,  8.11s/it]
                                                     
{'loss': 1.2425, 'learning_rate': 0.0, 'epoch': 3.0}

100%|██████████| 2022/2022 [4:23:20<00:00,  8.11s/it]
                                                     
{'train_runtime': 15811.3641, 'train_samples_per_second': 4.095, 'train_steps_per_second': 0.128, 'train_loss': 1.1757204692099616, 'epoch': 3.0}

100%|██████████| 2022/2022 [4:23:20<00:00,  8.11s/it]
100%|██████████| 2022/2022 [4:23:20<00:00,  7.81s/it]

 If there's a warning about missing keys above, please disregard :)
