--- license: other library_name: peft tags: - llama-factory - lora - generated_from_trainer base_model: /data1/model/llama2/meta-llama/Llama2-13b model-index: - name: elementary_math_qa_no_sys results: [] --- # elementary_math_qa_no_sys This model is a fine-tuned version of [/data1/model/llama2/meta-llama/Llama2-13b](https://huggingface.co//data1/model/llama2/meta-llama/Llama2-13b) on the elementary_math_qa_no_sys dataset. It achieves the following results on the evaluation set: - Loss: 0.0705 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 3 - total_train_batch_size: 24 - total_eval_batch_size: 24 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 20 - num_epochs: 10.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 0.3438 | 0.05 | 50 | 0.3364 | | 0.3021 | 0.09 | 100 | 0.3056 | | 0.2676 | 0.14 | 150 | 0.2710 | | 0.2679 | 0.18 | 200 | 0.2582 | | 0.2448 | 0.23 | 250 | 0.2475 | | 0.2355 | 0.28 | 300 | 0.2372 | | 0.2335 | 0.32 | 350 | 0.2282 | | 0.2226 | 0.37 | 400 | 0.2227 | | 0.2098 | 0.42 | 450 | 0.2109 | | 0.1839 | 0.46 | 500 | 0.2048 | | 0.2008 | 0.51 | 550 | 0.1992 | | 0.1945 | 0.55 | 600 | 0.2019 | | 0.1891 | 0.6 | 650 | 0.1859 | | 0.2015 | 0.65 | 700 | 0.1966 | | 0.174 | 0.69 | 750 | 0.1801 | | 0.1565 | 0.74 | 800 | 0.1762 | | 0.1825 | 0.79 | 850 | 0.1717 | | 0.1651 | 0.83 | 900 | 0.1682 | | 0.1598 | 0.88 | 950 | 0.1598 | | 0.1502 | 0.92 | 1000 | 0.1558 | | 0.1599 | 0.97 | 1050 | 0.1465 | | 0.0977 | 1.02 | 1100 | 0.1520 | | 0.1166 | 1.06 | 1150 | 0.1403 | | 0.0943 | 1.11 | 1200 | 0.1387 | | 0.1007 | 1.16 | 1250 | 0.1311 | | 0.1035 | 1.2 | 1300 | 0.1325 | | 0.0842 | 1.25 | 1350 | 0.1309 | | 0.1114 | 1.29 | 1400 | 0.1225 | | 0.1047 | 1.34 | 1450 | 0.1184 | | 0.0807 | 1.39 | 1500 | 0.1136 | | 0.0846 | 1.43 | 1550 | 0.1200 | | 0.0737 | 1.48 | 1600 | 0.1145 | | 0.0844 | 1.52 | 1650 | 0.1037 | | 0.0809 | 1.57 | 1700 | 0.0940 | | 0.0718 | 1.62 | 1750 | 0.0931 | | 0.0687 | 1.66 | 1800 | 0.0930 | | 0.0629 | 1.71 | 1850 | 0.0969 | | 0.0852 | 1.76 | 1900 | 0.0872 | | 0.0622 | 1.8 | 1950 | 0.0849 | | 0.0653 | 1.85 | 2000 | 0.0831 | | 0.0507 | 1.89 | 2050 | 0.0829 | | 0.0518 | 1.94 | 2100 | 0.0785 | | 0.0566 | 1.99 | 2150 | 0.0750 | | 0.0193 | 2.03 | 2200 | 0.0837 | | 0.0233 | 2.08 | 2250 | 0.0766 | | 0.0249 | 2.13 | 2300 | 0.0829 | | 0.0217 | 2.17 | 2350 | 0.0824 | | 0.0233 | 2.22 | 2400 | 0.0735 | | 0.0192 | 2.26 | 2450 | 0.0767 | | 0.0207 | 2.31 | 2500 | 0.0794 | | 0.0232 | 2.36 | 2550 | 0.0843 | | 0.0295 | 2.4 | 2600 | 0.0800 | | 0.0185 | 2.45 | 2650 | 0.0777 | | 0.0178 | 2.5 | 2700 | 0.0767 | | 0.0245 | 2.54 | 2750 | 0.0717 | | 0.0226 | 2.59 | 2800 | 0.0774 | | 0.0222 | 2.63 | 2850 | 0.0671 | | 0.0194 | 2.68 | 2900 | 0.0666 | | 0.0162 | 2.73 | 2950 | 0.0713 | | 0.0184 | 2.77 | 3000 | 0.0740 | | 0.0227 | 2.82 | 3050 | 0.0675 | | 0.0176 | 2.87 | 3100 | 0.0701 | | 0.034 | 2.91 | 3150 | 0.0675 | | 0.0148 | 2.96 | 3200 | 0.0688 | | 0.014 | 3.0 | 3250 | 0.0673 | | 0.0178 | 3.05 | 3300 | 0.0719 | | 0.0059 | 3.1 | 3350 | 0.0734 | | 0.0069 | 3.14 | 3400 | 0.0764 | | 0.0074 | 3.19 | 3450 | 0.0818 | | 0.009 | 3.23 | 3500 | 0.0705 | | 0.0048 | 3.28 | 3550 | 0.0735 | | 0.005 | 3.33 | 3600 | 0.0705 | | 0.0073 | 3.37 | 3650 | 0.0724 | ### Framework versions - PEFT 0.9.0 - Transformers 4.38.2 - Pytorch 2.2.1 - Datasets 2.18.0 - Tokenizers 0.15.2