Upload folder using huggingface_hub

Browse files

Files changed (16) hide show

checkpoint-12000/1_Pooling/config.json +10 -0
checkpoint-12000/README.md +460 -0
checkpoint-12000/config.json +26 -0
checkpoint-12000/config_sentence_transformers.json +10 -0
checkpoint-12000/model.safetensors +3 -0
checkpoint-12000/modules.json +20 -0
checkpoint-12000/optimizer.pt +3 -0
checkpoint-12000/rng_state.pth +3 -0
checkpoint-12000/scheduler.pt +3 -0
checkpoint-12000/sentence_bert_config.json +4 -0
checkpoint-12000/special_tokens_map.json +37 -0
checkpoint-12000/tokenizer.json +0 -0
checkpoint-12000/tokenizer_config.json +64 -0
checkpoint-12000/trainer_state.json +873 -0
checkpoint-12000/training_args.bin +3 -0
checkpoint-12000/vocab.txt +0 -0

checkpoint-12000/1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 384,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

checkpoint-12000/README.md ADDED Viewed

	@@ -0,0 +1,460 @@

+---
+base_model: sentence-transformers/all-MiniLM-L6-v2
+language:
+- en
+library_name: sentence-transformers
+license: apache-2.0
+pipeline_tag: sentence-similarity
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:7747936
+- loss:CoSENTLoss
+widget:
+- source_sentence: mango cake cream cake sponge cake gateau mango gateau cream gateau
+    mango sponge cake cream sponge cake mango cream cake mango cream sponge cake mango
+    flavored sponge cake layers cream filling decorated with fresh mango slices topped
+    with whipped cream serves 10 people mango cream cake sponge cake gateau mango
+    gateau with cream filling whipped cream mango cake mango cream sponge cake for
+    10 people
+  sentences:
+  - vegan dessert
+  - oxidized ring
+  - cola lip gloss
+- source_sentence: double breasted blouse
+  sentences:
+  - brushed jersey sweatshirt
+  - comfort facial tissues
+  - round neck sweatshirt
+- source_sentence: casual shirt
+  sentences:
+  - adjustable string top
+  - foldable spare backpack
+  - spring blossom scent shower gel
+- source_sentence: sweet chilli mozzarella stick
+  sentences:
+  - fragrance free facial cream
+  - outdoor basket
+  - cobb dressing salad
+- source_sentence: appetizer onion ring
+  sentences:
+  - high quality sports bra
+  - swimmer burkini
+  - nuttella pizza
+---
+# all-MiniLM-L6-v10-pair_score
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
+- **Maximum Sequence Length:** 256 tokens
+- **Output Dimensionality:** 384 tokens
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+- **Language:** en
+- **License:** apache-2.0
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("sentence_transformers_model_id")
+# Run inference
+sentences = [
+    'appetizer onion ring',
+    'nuttella pizza',
+    'high quality sports bra',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 384]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 128
+- `per_device_eval_batch_size`: 128
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 1
+- `warmup_ratio`: 0.1
+- `fp16`: True
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 128
+- `per_device_eval_batch_size`: 128
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 1
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: True
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: False
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: proportional
+</details>
+### Training Logs
+<details><summary>Click to expand</summary>
+| Epoch  | Step  | Training Loss |
+|:------:|:-----:|:-------------:|
+| 0.0017 | 100   | 13.3171       |
+| 0.0033 | 200   | 12.9799       |
+| 0.0050 | 300   | 12.5133       |
+| 0.0066 | 400   | 11.9388       |
+| 0.0083 | 500   | 11.0616       |
+| 0.0099 | 600   | 10.2712       |
+| 0.0116 | 700   | 9.5253        |
+| 0.0132 | 800   | 8.7706        |
+| 0.0149 | 900   | 8.4333        |
+| 0.0165 | 1000  | 8.0902        |
+| 0.0182 | 1100  | 7.8862        |
+| 0.0198 | 1200  | 7.7362        |
+| 0.0215 | 1300  | 7.6007        |
+| 0.0231 | 1400  | 7.5304        |
+| 0.0248 | 1500  | 7.4249        |
+| 0.0264 | 1600  | 7.3035        |
+| 0.0281 | 1700  | 7.2026        |
+| 0.0297 | 1800  | 7.1572        |
+| 0.0314 | 1900  | 7.0523        |
+| 0.0330 | 2000  | 7.1158        |
+| 0.0347 | 2100  | 6.9856        |
+| 0.0363 | 2200  | 7.0865        |
+| 0.0380 | 2300  | 6.9496        |
+| 0.0396 | 2400  | 6.9294        |
+| 0.0413 | 2500  | 6.8825        |
+| 0.0430 | 2600  | 6.8218        |
+| 0.0446 | 2700  | 6.8416        |
+| 0.0463 | 2800  | 6.7184        |
+| 0.0479 | 2900  | 6.9183        |
+| 0.0496 | 3000  | 6.7166        |
+| 0.0512 | 3100  | 6.6821        |
+| 0.0529 | 3200  | 6.6074        |
+| 0.0545 | 3300  | 6.6141        |
+| 0.0562 | 3400  | 6.5374        |
+| 0.0578 | 3500  | 6.4776        |
+| 0.0595 | 3600  | 6.5701        |
+| 0.0611 | 3700  | 6.5026        |
+| 0.0628 | 3800  | 6.6502        |
+| 0.0644 | 3900  | 6.5023        |
+| 0.0661 | 4000  | 6.5526        |
+| 0.0677 | 4100  | 6.6594        |
+| 0.0694 | 4200  | 6.3643        |
+| 0.0710 | 4300  | 6.3783        |
+| 0.0727 | 4400  | 6.3222        |
+| 0.0743 | 4500  | 6.3401        |
+| 0.0760 | 4600  | 6.4005        |
+| 0.0776 | 4700  | 6.3605        |
+| 0.0793 | 4800  | 6.348         |
+| 0.0810 | 4900  | 6.3406        |
+| 0.0826 | 5000  | 6.4156        |
+| 0.0843 | 5100  | 6.3786        |
+| 0.0859 | 5200  | 6.376         |
+| 0.0876 | 5300  | 6.2363        |
+| 0.0892 | 5400  | 6.2185        |
+| 0.0909 | 5500  | 6.2554        |
+| 0.0925 | 5600  | 6.2177        |
+| 0.0942 | 5700  | 6.3924        |
+| 0.0958 | 5800  | 6.2897        |
+| 0.0975 | 5900  | 6.272         |
+| 0.0991 | 6000  | 6.0247        |
+| 0.1008 | 6100  | 6.194         |
+| 0.1024 | 6200  | 6.2757        |
+| 0.1041 | 6300  | 6.2408        |
+| 0.1057 | 6400  | 6.253         |
+| 0.1074 | 6500  | 6.0605        |
+| 0.1090 | 6600  | 6.0672        |
+| 0.1107 | 6700  | 6.0414        |
+| 0.1123 | 6800  | 6.0823        |
+| 0.1140 | 6900  | 6.1962        |
+| 0.1156 | 7000  | 6.0868        |
+| 0.1173 | 7100  | 6.0795        |
+| 0.1189 | 7200  | 5.9656        |
+| 0.1206 | 7300  | 5.9785        |
+| 0.1223 | 7400  | 6.0722        |
+| 0.1239 | 7500  | 5.9443        |
+| 0.1256 | 7600  | 5.8786        |
+| 0.1272 | 7700  | 5.8007        |
+| 0.1289 | 7800  | 5.9206        |
+| 0.1305 | 7900  | 5.918         |
+| 0.1322 | 8000  | 5.9443        |
+| 0.1338 | 8100  | 5.8764        |
+| 0.1355 | 8200  | 5.867         |
+| 0.1371 | 8300  | 5.8087        |
+| 0.1388 | 8400  | 5.9884        |
+| 0.1404 | 8500  | 5.8741        |
+| 0.1421 | 8600  | 5.9699        |
+| 0.1437 | 8700  | 5.8671        |
+| 0.1454 | 8800  | 5.8278        |
+| 0.1470 | 8900  | 5.8892        |
+| 0.1487 | 9000  | 5.7437        |
+| 0.1503 | 9100  | 5.8069        |
+| 0.1520 | 9200  | 6.0235        |
+| 0.1536 | 9300  | 5.7214        |
+| 0.1553 | 9400  | 5.7893        |
+| 0.1569 | 9500  | 5.7406        |
+| 0.1586 | 9600  | 5.8035        |
+| 0.1602 | 9700  | 5.7965        |
+| 0.1619 | 9800  | 5.638         |
+| 0.1636 | 9900  | 5.8263        |
+| 0.1652 | 10000 | 5.7995        |
+| 0.1669 | 10100 | 5.5805        |
+| 0.1685 | 10200 | 5.632         |
+| 0.1702 | 10300 | 5.6944        |
+| 0.1718 | 10400 | 5.5818        |
+| 0.1735 | 10500 | 5.8598        |
+| 0.1751 | 10600 | 5.7255        |
+| 0.1768 | 10700 | 5.7536        |
+| 0.1784 | 10800 | 5.6536        |
+| 0.1801 | 10900 | 5.6417        |
+| 0.1817 | 11000 | 5.6719        |
+| 0.1834 | 11100 | 5.566         |
+| 0.1850 | 11200 | 5.4893        |
+| 0.1867 | 11300 | 5.7412        |
+| 0.1883 | 11400 | 5.6838        |
+| 0.1900 | 11500 | 5.6272        |
+| 0.1916 | 11600 | 5.6538        |
+| 0.1933 | 11700 | 5.7176        |
+| 0.1949 | 11800 | 5.4923        |
+| 0.1966 | 11900 | 5.7643        |
+| 0.1982 | 12000 | 5.5674        |
+</details>
+### Framework Versions
+- Python: 3.8.10
+- Sentence Transformers: 3.1.1
+- Transformers: 4.45.2
+- PyTorch: 2.4.1+cu118
+- Accelerate: 1.0.1
+- Datasets: 3.0.1
+- Tokenizers: 0.20.3
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### CoSENTLoss
+```bibtex
+@online{kexuefm-8847,
+    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
+    author={Su Jianlin},
+    year={2022},
+    month={Jan},
+    url={https://kexue.fm/archives/8847},
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

checkpoint-12000/config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.45.2",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

checkpoint-12000/config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.1.1",
+    "transformers": "4.45.2",
+    "pytorch": "2.4.1+cu118"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": null
+}

checkpoint-12000/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b0c57366e8a9e8c6d536d884c46e4fc633c9356b2a29258b8150cc644216fa7
+size 90864192

checkpoint-12000/modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

checkpoint-12000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5cc32fc6f53198c1b4acce6990c94eb2322aa59ab7985258fc7936017ac9e9d3
+size 180607738

checkpoint-12000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1926ebcdfaf23baf4dfef40033f305967e09af3c0f186548c25288fe0ba29aa2
+size 14244

checkpoint-12000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80789db6252741c5490a617f5c893b2ced4e139f2e1af5b50b1b6515832d154b
+size 1064

checkpoint-12000/sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 256,
+  "do_lower_case": false
+}

checkpoint-12000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-12000/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-12000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,64 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "max_length": 128,
+  "model_max_length": 256,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

checkpoint-12000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,873 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.19824552708529514,
+  "eval_steps": 200000,
+  "global_step": 12000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0016520460590441263,
+      "grad_norm": 51.92022705078125,
+      "learning_rate": 3.2044928972580115e-07,
+      "loss": 13.3171,
+      "step": 100
+    },
+    {
+      "epoch": 0.0033040921180882525,
+      "grad_norm": 68.25243377685547,
+      "learning_rate": 6.508093822266271e-07,
+      "loss": 12.9799,
+      "step": 200
+    },
+    {
+      "epoch": 0.004956138177132379,
+      "grad_norm": 69.47785186767578,
+      "learning_rate": 9.811694747274531e-07,
+      "loss": 12.5133,
+      "step": 300
+    },
+    {
+      "epoch": 0.006608184236176505,
+      "grad_norm": 73.07315063476562,
+      "learning_rate": 1.311529567228279e-06,
+      "loss": 11.9388,
+      "step": 400
+    },
+    {
+      "epoch": 0.008260230295220631,
+      "grad_norm": 82.68733215332031,
+      "learning_rate": 1.6418896597291048e-06,
+      "loss": 11.0616,
+      "step": 500
+    },
+    {
+      "epoch": 0.009912276354264758,
+      "grad_norm": 57.61735534667969,
+      "learning_rate": 1.972249752229931e-06,
+      "loss": 10.2712,
+      "step": 600
+    },
+    {
+      "epoch": 0.011564322413308884,
+      "grad_norm": 44.42943572998047,
+      "learning_rate": 2.302609844730757e-06,
+      "loss": 9.5253,
+      "step": 700
+    },
+    {
+      "epoch": 0.01321636847235301,
+      "grad_norm": 27.03646469116211,
+      "learning_rate": 2.6329699372315828e-06,
+      "loss": 8.7706,
+      "step": 800
+    },
+    {
+      "epoch": 0.014868414531397135,
+      "grad_norm": 15.231706619262695,
+      "learning_rate": 2.9633300297324087e-06,
+      "loss": 8.4333,
+      "step": 900
+    },
+    {
+      "epoch": 0.016520460590441263,
+      "grad_norm": 14.189949035644531,
+      "learning_rate": 3.2936901222332346e-06,
+      "loss": 8.0902,
+      "step": 1000
+    },
+    {
+      "epoch": 0.018172506649485387,
+      "grad_norm": 12.241333961486816,
+      "learning_rate": 3.6240502147340605e-06,
+      "loss": 7.8862,
+      "step": 1100
+    },
+    {
+      "epoch": 0.019824552708529515,
+      "grad_norm": 11.400131225585938,
+      "learning_rate": 3.9544103072348865e-06,
+      "loss": 7.7362,
+      "step": 1200
+    },
+    {
+      "epoch": 0.02147659876757364,
+      "grad_norm": 12.072014808654785,
+      "learning_rate": 4.284770399735712e-06,
+      "loss": 7.6007,
+      "step": 1300
+    },
+    {
+      "epoch": 0.023128644826617768,
+      "grad_norm": 11.08774185180664,
+      "learning_rate": 4.615130492236538e-06,
+      "loss": 7.5304,
+      "step": 1400
+    },
+    {
+      "epoch": 0.024780690885661892,
+      "grad_norm": 13.02505874633789,
+      "learning_rate": 4.945490584737364e-06,
+      "loss": 7.4249,
+      "step": 1500
+    },
+    {
+      "epoch": 0.02643273694470602,
+      "grad_norm": 13.522186279296875,
+      "learning_rate": 5.27585067723819e-06,
+      "loss": 7.3035,
+      "step": 1600
+    },
+    {
+      "epoch": 0.028084783003750145,
+      "grad_norm": 45.22550964355469,
+      "learning_rate": 5.606210769739015e-06,
+      "loss": 7.2026,
+      "step": 1700
+    },
+    {
+      "epoch": 0.02973682906279427,
+      "grad_norm": 15.62098503112793,
+      "learning_rate": 5.936570862239842e-06,
+      "loss": 7.1572,
+      "step": 1800
+    },
+    {
+      "epoch": 0.0313888751218384,
+      "grad_norm": 16.570518493652344,
+      "learning_rate": 6.266930954740668e-06,
+      "loss": 7.0523,
+      "step": 1900
+    },
+    {
+      "epoch": 0.033040921180882525,
+      "grad_norm": 16.82353401184082,
+      "learning_rate": 6.597291047241494e-06,
+      "loss": 7.1158,
+      "step": 2000
+    },
+    {
+      "epoch": 0.034692967239926646,
+      "grad_norm": 17.38075828552246,
+      "learning_rate": 6.924347538817311e-06,
+      "loss": 6.9856,
+      "step": 2100
+    },
+    {
+      "epoch": 0.036345013298970774,
+      "grad_norm": 93.04572296142578,
+      "learning_rate": 7.2547076313181375e-06,
+      "loss": 7.0865,
+      "step": 2200
+    },
+    {
+      "epoch": 0.0379970593580149,
+      "grad_norm": 17.861074447631836,
+      "learning_rate": 7.585067723818963e-06,
+      "loss": 6.9496,
+      "step": 2300
+    },
+    {
+      "epoch": 0.03964910541705903,
+      "grad_norm": 19.067747116088867,
+      "learning_rate": 7.91542781631979e-06,
+      "loss": 6.9294,
+      "step": 2400
+    },
+    {
+      "epoch": 0.04130115147610315,
+      "grad_norm": 16.43912696838379,
+      "learning_rate": 8.245787908820615e-06,
+      "loss": 6.8825,
+      "step": 2500
+    },
+    {
+      "epoch": 0.04295319753514728,
+      "grad_norm": 140.5387725830078,
+      "learning_rate": 8.576148001321441e-06,
+      "loss": 6.8218,
+      "step": 2600
+    },
+    {
+      "epoch": 0.04460524359419141,
+      "grad_norm": 22.34341049194336,
+      "learning_rate": 8.903204492897258e-06,
+      "loss": 6.8416,
+      "step": 2700
+    },
+    {
+      "epoch": 0.046257289653235535,
+      "grad_norm": 16.260499954223633,
+      "learning_rate": 9.233564585398084e-06,
+      "loss": 6.7184,
+      "step": 2800
+    },
+    {
+      "epoch": 0.047909335712279656,
+      "grad_norm": 20.075071334838867,
+      "learning_rate": 9.56392467789891e-06,
+      "loss": 6.9183,
+      "step": 2900
+    },
+    {
+      "epoch": 0.049561381771323784,
+      "grad_norm": 45.1911735534668,
+      "learning_rate": 9.894284770399738e-06,
+      "loss": 6.7166,
+      "step": 3000
+    },
+    {
+      "epoch": 0.05121342783036791,
+      "grad_norm": 67.39335632324219,
+      "learning_rate": 1.0224644862900564e-05,
+      "loss": 6.6821,
+      "step": 3100
+    },
+    {
+      "epoch": 0.05286547388941204,
+      "grad_norm": 80.69914245605469,
+      "learning_rate": 1.055500495540139e-05,
+      "loss": 6.6074,
+      "step": 3200
+    },
+    {
+      "epoch": 0.05451751994845616,
+      "grad_norm": 37.51483917236328,
+      "learning_rate": 1.0885365047902214e-05,
+      "loss": 6.6141,
+      "step": 3300
+    },
+    {
+      "epoch": 0.05616956600750029,
+      "grad_norm": 127.3297348022461,
+      "learning_rate": 1.121572514040304e-05,
+      "loss": 6.5374,
+      "step": 3400
+    },
+    {
+      "epoch": 0.05782161206654442,
+      "grad_norm": 20.704940795898438,
+      "learning_rate": 1.1546085232903866e-05,
+      "loss": 6.4776,
+      "step": 3500
+    },
+    {
+      "epoch": 0.05947365812558854,
+      "grad_norm": 23.68699836730957,
+      "learning_rate": 1.1876445325404693e-05,
+      "loss": 6.5701,
+      "step": 3600
+    },
+    {
+      "epoch": 0.061125704184632666,
+      "grad_norm": 104.68245697021484,
+      "learning_rate": 1.2206805417905519e-05,
+      "loss": 6.5026,
+      "step": 3700
+    },
+    {
+      "epoch": 0.0627777502436768,
+      "grad_norm": 97.47430419921875,
+      "learning_rate": 1.2537165510406343e-05,
+      "loss": 6.6502,
+      "step": 3800
+    },
+    {
+      "epoch": 0.06442979630272092,
+      "grad_norm": 21.512229919433594,
+      "learning_rate": 1.286752560290717e-05,
+      "loss": 6.5023,
+      "step": 3900
+    },
+    {
+      "epoch": 0.06608184236176505,
+      "grad_norm": 31.69252586364746,
+      "learning_rate": 1.3197885695407995e-05,
+      "loss": 6.5526,
+      "step": 4000
+    },
+    {
+      "epoch": 0.06773388842080917,
+      "grad_norm": 22.141067504882812,
+      "learning_rate": 1.3528245787908823e-05,
+      "loss": 6.6594,
+      "step": 4100
+    },
+    {
+      "epoch": 0.06938593447985329,
+      "grad_norm": 23.37205696105957,
+      "learning_rate": 1.3858605880409649e-05,
+      "loss": 6.3643,
+      "step": 4200
+    },
+    {
+      "epoch": 0.07103798053889743,
+      "grad_norm": 23.31827163696289,
+      "learning_rate": 1.4188965972910473e-05,
+      "loss": 6.3783,
+      "step": 4300
+    },
+    {
+      "epoch": 0.07269002659794155,
+      "grad_norm": 27.043312072753906,
+      "learning_rate": 1.4519326065411299e-05,
+      "loss": 6.3222,
+      "step": 4400
+    },
+    {
+      "epoch": 0.07434207265698568,
+      "grad_norm": 25.699583053588867,
+      "learning_rate": 1.4846382556987117e-05,
+      "loss": 6.3401,
+      "step": 4500
+    },
+    {
+      "epoch": 0.0759941187160298,
+      "grad_norm": 24.91438865661621,
+      "learning_rate": 1.5176742649487943e-05,
+      "loss": 6.4005,
+      "step": 4600
+    },
+    {
+      "epoch": 0.07764616477507393,
+      "grad_norm": 38.77157974243164,
+      "learning_rate": 1.5507102741988768e-05,
+      "loss": 6.3605,
+      "step": 4700
+    },
+    {
+      "epoch": 0.07929821083411806,
+      "grad_norm": 156.87989807128906,
+      "learning_rate": 1.5837462834489594e-05,
+      "loss": 6.348,
+      "step": 4800
+    },
+    {
+      "epoch": 0.08095025689316218,
+      "grad_norm": 110.80547332763672,
+      "learning_rate": 1.6167822926990423e-05,
+      "loss": 6.3406,
+      "step": 4900
+    },
+    {
+      "epoch": 0.0826023029522063,
+      "grad_norm": 48.55455780029297,
+      "learning_rate": 1.649818301949125e-05,
+      "loss": 6.4156,
+      "step": 5000
+    },
+    {
+      "epoch": 0.08425434901125044,
+      "grad_norm": 25.825349807739258,
+      "learning_rate": 1.682854311199207e-05,
+      "loss": 6.3786,
+      "step": 5100
+    },
+    {
+      "epoch": 0.08590639507029456,
+      "grad_norm": 55.6208381652832,
+      "learning_rate": 1.7158903204492897e-05,
+      "loss": 6.376,
+      "step": 5200
+    },
+    {
+      "epoch": 0.08755844112933868,
+      "grad_norm": 37.82964324951172,
+      "learning_rate": 1.7489263296993723e-05,
+      "loss": 6.2363,
+      "step": 5300
+    },
+    {
+      "epoch": 0.08921048718838281,
+      "grad_norm": 32.86615753173828,
+      "learning_rate": 1.7819623389494553e-05,
+      "loss": 6.2185,
+      "step": 5400
+    },
+    {
+      "epoch": 0.09086253324742694,
+      "grad_norm": 180.8863525390625,
+      "learning_rate": 1.814998348199538e-05,
+      "loss": 6.2554,
+      "step": 5500
+    },
+    {
+      "epoch": 0.09251457930647107,
+      "grad_norm": 25.11360740661621,
+      "learning_rate": 1.84803435744962e-05,
+      "loss": 6.2177,
+      "step": 5600
+    },
+    {
+      "epoch": 0.09416662536551519,
+      "grad_norm": 23.702716827392578,
+      "learning_rate": 1.8810703666997027e-05,
+      "loss": 6.3924,
+      "step": 5700
+    },
+    {
+      "epoch": 0.09581867142455931,
+      "grad_norm": 32.1275634765625,
+      "learning_rate": 1.9141063759497853e-05,
+      "loss": 6.2897,
+      "step": 5800
+    },
+    {
+      "epoch": 0.09747071748360345,
+      "grad_norm": 46.22661590576172,
+      "learning_rate": 1.9471423851998682e-05,
+      "loss": 6.272,
+      "step": 5900
+    },
+    {
+      "epoch": 0.09912276354264757,
+      "grad_norm": 74.11865234375,
+      "learning_rate": 1.9801783944499505e-05,
+      "loss": 6.0247,
+      "step": 6000
+    },
+    {
+      "epoch": 0.10077480960169169,
+      "grad_norm": 34.50657653808594,
+      "learning_rate": 1.9985314903537273e-05,
+      "loss": 6.194,
+      "step": 6100
+    },
+    {
+      "epoch": 0.10242685566073582,
+      "grad_norm": 25.600902557373047,
+      "learning_rate": 1.9948602162380454e-05,
+      "loss": 6.2757,
+      "step": 6200
+    },
+    {
+      "epoch": 0.10407890171977995,
+      "grad_norm": 24.53876495361328,
+      "learning_rate": 1.9911889421223638e-05,
+      "loss": 6.2408,
+      "step": 6300
+    },
+    {
+      "epoch": 0.10573094777882408,
+      "grad_norm": 22.572052001953125,
+      "learning_rate": 1.987517668006682e-05,
+      "loss": 6.253,
+      "step": 6400
+    },
+    {
+      "epoch": 0.1073829938378682,
+      "grad_norm": 33.04438018798828,
+      "learning_rate": 1.983846393891e-05,
+      "loss": 6.0605,
+      "step": 6500
+    },
+    {
+      "epoch": 0.10903503989691232,
+      "grad_norm": 81.35254669189453,
+      "learning_rate": 1.9801751197753184e-05,
+      "loss": 6.0672,
+      "step": 6600
+    },
+    {
+      "epoch": 0.11068708595595646,
+      "grad_norm": 31.132247924804688,
+      "learning_rate": 1.9765038456596365e-05,
+      "loss": 6.0414,
+      "step": 6700
+    },
+    {
+      "epoch": 0.11233913201500058,
+      "grad_norm": 42.16621017456055,
+      "learning_rate": 1.9728325715439546e-05,
+      "loss": 6.0823,
+      "step": 6800
+    },
+    {
+      "epoch": 0.1139911780740447,
+      "grad_norm": 23.558713912963867,
+      "learning_rate": 1.9691612974282726e-05,
+      "loss": 6.1962,
+      "step": 6900
+    },
+    {
+      "epoch": 0.11564322413308883,
+      "grad_norm": 69.28414154052734,
+      "learning_rate": 1.9654900233125907e-05,
+      "loss": 6.0868,
+      "step": 7000
+    },
+    {
+      "epoch": 0.11729527019213296,
+      "grad_norm": 29.037137985229492,
+      "learning_rate": 1.9618187491969088e-05,
+      "loss": 6.0795,
+      "step": 7100
+    },
+    {
+      "epoch": 0.11894731625117708,
+      "grad_norm": 29.588781356811523,
+      "learning_rate": 1.9581474750812272e-05,
+      "loss": 5.9656,
+      "step": 7200
+    },
+    {
+      "epoch": 0.12059936231022121,
+      "grad_norm": 29.574968338012695,
+      "learning_rate": 1.9544762009655453e-05,
+      "loss": 5.9785,
+      "step": 7300
+    },
+    {
+      "epoch": 0.12225140836926533,
+      "grad_norm": 46.092193603515625,
+      "learning_rate": 1.9508049268498634e-05,
+      "loss": 6.0722,
+      "step": 7400
+    },
+    {
+      "epoch": 0.12390345442830947,
+      "grad_norm": 23.927968978881836,
+      "learning_rate": 1.9471336527341815e-05,
+      "loss": 5.9443,
+      "step": 7500
+    },
+    {
+      "epoch": 0.1255555004873536,
+      "grad_norm": 21.281776428222656,
+      "learning_rate": 1.9434623786184995e-05,
+      "loss": 5.8786,
+      "step": 7600
+    },
+    {
+      "epoch": 0.1272075465463977,
+      "grad_norm": 27.455034255981445,
+      "learning_rate": 1.939791104502818e-05,
+      "loss": 5.8007,
+      "step": 7700
+    },
+    {
+      "epoch": 0.12885959260544183,
+      "grad_norm": 33.76934814453125,
+      "learning_rate": 1.936119830387136e-05,
+      "loss": 5.9206,
+      "step": 7800
+    },
+    {
+      "epoch": 0.13051163866448598,
+      "grad_norm": 21.891183853149414,
+      "learning_rate": 1.932448556271454e-05,
+      "loss": 5.918,
+      "step": 7900
+    },
+    {
+      "epoch": 0.1321636847235301,
+      "grad_norm": 61.087398529052734,
+      "learning_rate": 1.9287772821557725e-05,
+      "loss": 5.9443,
+      "step": 8000
+    },
+    {
+      "epoch": 0.13381573078257422,
+      "grad_norm": 23.860267639160156,
+      "learning_rate": 1.9251060080400906e-05,
+      "loss": 5.8764,
+      "step": 8100
+    },
+    {
+      "epoch": 0.13546777684161834,
+      "grad_norm": 26.501821517944336,
+      "learning_rate": 1.9214714466655654e-05,
+      "loss": 5.867,
+      "step": 8200
+    },
+    {
+      "epoch": 0.13711982290066246,
+      "grad_norm": 43.38287353515625,
+      "learning_rate": 1.9178001725498835e-05,
+      "loss": 5.8087,
+      "step": 8300
+    },
+    {
+      "epoch": 0.13877186895970658,
+      "grad_norm": 73.06561279296875,
+      "learning_rate": 1.9141288984342016e-05,
+      "loss": 5.9884,
+      "step": 8400
+    },
+    {
+      "epoch": 0.14042391501875073,
+      "grad_norm": 36.368717193603516,
+      "learning_rate": 1.91045762431852e-05,
+      "loss": 5.8741,
+      "step": 8500
+    },
+    {
+      "epoch": 0.14207596107779485,
+      "grad_norm": 136.38865661621094,
+      "learning_rate": 1.906786350202838e-05,
+      "loss": 5.9699,
+      "step": 8600
+    },
+    {
+      "epoch": 0.14372800713683898,
+      "grad_norm": 38.05315017700195,
+      "learning_rate": 1.903115076087156e-05,
+      "loss": 5.8671,
+      "step": 8700
+    },
+    {
+      "epoch": 0.1453800531958831,
+      "grad_norm": 39.74106216430664,
+      "learning_rate": 1.8994438019714742e-05,
+      "loss": 5.8278,
+      "step": 8800
+    },
+    {
+      "epoch": 0.14703209925492722,
+      "grad_norm": 31.016155242919922,
+      "learning_rate": 1.8957725278557926e-05,
+      "loss": 5.8892,
+      "step": 8900
+    },
+    {
+      "epoch": 0.14868414531397137,
+      "grad_norm": 36.37879943847656,
+      "learning_rate": 1.8921012537401107e-05,
+      "loss": 5.7437,
+      "step": 9000
+    },
+    {
+      "epoch": 0.1503361913730155,
+      "grad_norm": 31.93881607055664,
+      "learning_rate": 1.8884299796244288e-05,
+      "loss": 5.8069,
+      "step": 9100
+    },
+    {
+      "epoch": 0.1519882374320596,
+      "grad_norm": 24.248807907104492,
+      "learning_rate": 1.8847587055087472e-05,
+      "loss": 6.0235,
+      "step": 9200
+    },
+    {
+      "epoch": 0.15364028349110373,
+      "grad_norm": 29.67982292175293,
+      "learning_rate": 1.8810874313930653e-05,
+      "loss": 5.7214,
+      "step": 9300
+    },
+    {
+      "epoch": 0.15529232955014785,
+      "grad_norm": 34.80620193481445,
+      "learning_rate": 1.8774161572773834e-05,
+      "loss": 5.7893,
+      "step": 9400
+    },
+    {
+      "epoch": 0.15694437560919197,
+      "grad_norm": 31.375019073486328,
+      "learning_rate": 1.8737448831617015e-05,
+      "loss": 5.7406,
+      "step": 9500
+    },
+    {
+      "epoch": 0.15859642166823612,
+      "grad_norm": 24.126588821411133,
+      "learning_rate": 1.8700736090460195e-05,
+      "loss": 5.8035,
+      "step": 9600
+    },
+    {
+      "epoch": 0.16024846772728024,
+      "grad_norm": 94.3121337890625,
+      "learning_rate": 1.8664023349303376e-05,
+      "loss": 5.7965,
+      "step": 9700
+    },
+    {
+      "epoch": 0.16190051378632436,
+      "grad_norm": 29.543697357177734,
+      "learning_rate": 1.8627310608146557e-05,
+      "loss": 5.638,
+      "step": 9800
+    },
+    {
+      "epoch": 0.16355255984536848,
+      "grad_norm": 27.004188537597656,
+      "learning_rate": 1.859059786698974e-05,
+      "loss": 5.8263,
+      "step": 9900
+    },
+    {
+      "epoch": 0.1652046059044126,
+      "grad_norm": 31.72929573059082,
+      "learning_rate": 1.8553885125832922e-05,
+      "loss": 5.7995,
+      "step": 10000
+    },
+    {
+      "epoch": 0.16685665196345675,
+      "grad_norm": 43.893436431884766,
+      "learning_rate": 1.8517172384676103e-05,
+      "loss": 5.5805,
+      "step": 10100
+    },
+    {
+      "epoch": 0.16850869802250087,
+      "grad_norm": 40.329349517822266,
+      "learning_rate": 1.8480459643519283e-05,
+      "loss": 5.632,
+      "step": 10200
+    },
+    {
+      "epoch": 0.170160744081545,
+      "grad_norm": 36.50722885131836,
+      "learning_rate": 1.8443746902362468e-05,
+      "loss": 5.6944,
+      "step": 10300
+    },
+    {
+      "epoch": 0.17181279014058912,
+      "grad_norm": 68.61418151855469,
+      "learning_rate": 1.840703416120565e-05,
+      "loss": 5.5818,
+      "step": 10400
+    },
+    {
+      "epoch": 0.17346483619963324,
+      "grad_norm": 38.758846282958984,
+      "learning_rate": 1.837032142004883e-05,
+      "loss": 5.8598,
+      "step": 10500
+    },
+    {
+      "epoch": 0.17511688225867736,
+      "grad_norm": 51.770931243896484,
+      "learning_rate": 1.8333975806303577e-05,
+      "loss": 5.7255,
+      "step": 10600
+    },
+    {
+      "epoch": 0.1767689283177215,
+      "grad_norm": 91.27816009521484,
+      "learning_rate": 1.8297263065146758e-05,
+      "loss": 5.7536,
+      "step": 10700
+    },
+    {
+      "epoch": 0.17842097437676563,
+      "grad_norm": 35.52999496459961,
+      "learning_rate": 1.8260550323989942e-05,
+      "loss": 5.6536,
+      "step": 10800
+    },
+    {
+      "epoch": 0.18007302043580975,
+      "grad_norm": 36.9012336730957,
+      "learning_rate": 1.8223837582833123e-05,
+      "loss": 5.6417,
+      "step": 10900
+    },
+    {
+      "epoch": 0.18172506649485387,
+      "grad_norm": 37.2264404296875,
+      "learning_rate": 1.8187124841676304e-05,
+      "loss": 5.6719,
+      "step": 11000
+    },
+    {
+      "epoch": 0.183377112553898,
+      "grad_norm": 31.076929092407227,
+      "learning_rate": 1.8150412100519488e-05,
+      "loss": 5.566,
+      "step": 11100
+    },
+    {
+      "epoch": 0.18502915861294214,
+      "grad_norm": 34.78733444213867,
+      "learning_rate": 1.811369935936267e-05,
+      "loss": 5.4893,
+      "step": 11200
+    },
+    {
+      "epoch": 0.18668120467198626,
+      "grad_norm": 68.41493225097656,
+      "learning_rate": 1.807698661820585e-05,
+      "loss": 5.7412,
+      "step": 11300
+    },
+    {
+      "epoch": 0.18833325073103038,
+      "grad_norm": 43.99595260620117,
+      "learning_rate": 1.804027387704903e-05,
+      "loss": 5.6838,
+      "step": 11400
+    },
+    {
+      "epoch": 0.1899852967900745,
+      "grad_norm": 30.06267547607422,
+      "learning_rate": 1.8003561135892215e-05,
+      "loss": 5.6272,
+      "step": 11500
+    },
+    {
+      "epoch": 0.19163734284911862,
+      "grad_norm": 38.978031158447266,
+      "learning_rate": 1.7966848394735395e-05,
+      "loss": 5.6538,
+      "step": 11600
+    },
+    {
+      "epoch": 0.19328938890816275,
+      "grad_norm": 34.604209899902344,
+      "learning_rate": 1.7930135653578576e-05,
+      "loss": 5.7176,
+      "step": 11700
+    },
+    {
+      "epoch": 0.1949414349672069,
+      "grad_norm": 39.66080856323242,
+      "learning_rate": 1.7893422912421757e-05,
+      "loss": 5.4923,
+      "step": 11800
+    },
+    {
+      "epoch": 0.19659348102625102,
+      "grad_norm": 39.9164924621582,
+      "learning_rate": 1.7856710171264938e-05,
+      "loss": 5.7643,
+      "step": 11900
+    },
+    {
+      "epoch": 0.19824552708529514,
+      "grad_norm": 62.23050308227539,
+      "learning_rate": 1.7819997430108122e-05,
+      "loss": 5.5674,
+      "step": 12000
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 60531,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 128,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-12000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f618e50477a22f00ca3d4aaa893ec5a09ecb95e73c1461193f00c15562c98dc
+size 5496

checkpoint-12000/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff