j05hr3d commited on
Commit
0bca5b7
·
verified ·
1 Parent(s): 4c24c65

Training in progress, step 20

Browse files
README.md CHANGED
@@ -1,63 +1,207 @@
1
  ---
2
- library_name: peft
3
- license: apache-2.0
4
  base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
 
 
5
  tags:
6
  - base_model:adapter:Qwen/Qwen3-Coder-30B-A3B-Instruct
7
  - lora
8
  - transformers
9
- pipeline_tag: text-generation
10
- model-index:
11
- - name: SFT-Qwen3-Coder-30B
12
- results: []
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- # SFT-Qwen3-Coder-30B
19
 
20
- This model is a fine-tuned version of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) on the None dataset.
21
- It achieves the following results on the evaluation set:
22
- - eval_loss: 0.9453
23
- - eval_runtime: 226.4538
24
- - eval_samples_per_second: 0.521
25
- - eval_steps_per_second: 0.066
26
- - epoch: 0.3347
27
- - step: 20
28
 
29
- ## Model description
30
 
31
- More information needed
32
 
33
- ## Intended uses & limitations
34
 
35
- More information needed
36
 
37
- ## Training and evaluation data
38
 
39
- More information needed
40
 
41
- ## Training procedure
42
 
43
- ### Training hyperparameters
44
 
45
- The following hyperparameters were used during training:
46
- - learning_rate: 0.0001
47
- - train_batch_size: 2
48
- - eval_batch_size: 8
49
- - seed: 42
50
- - gradient_accumulation_steps: 4
51
- - total_train_batch_size: 8
52
- - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
53
- - lr_scheduler_type: linear
54
- - lr_scheduler_warmup_ratio: 0.03
55
- - num_epochs: 5
56
 
 
57
  ### Framework versions
58
 
59
- - PEFT 0.18.0
60
- - Transformers 4.57.1
61
- - Pytorch 2.8.0+cu126
62
- - Datasets 4.4.1
63
- - Tokenizers 0.22.1
 
1
  ---
 
 
2
  base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
  tags:
6
  - base_model:adapter:Qwen/Qwen3-Coder-30B-A3B-Instruct
7
  - lora
8
  - transformers
 
 
 
 
9
  ---
10
 
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
 
182
+ [More Information Needed]
183
 
184
+ **APA:**
 
 
 
 
 
 
 
185
 
186
+ [More Information Needed]
187
 
188
+ ## Glossary [optional]
189
 
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
 
192
+ [More Information Needed]
193
 
194
+ ## More Information [optional]
195
 
196
+ [More Information Needed]
197
 
198
+ ## Model Card Authors [optional]
199
 
200
+ [More Information Needed]
201
 
202
+ ## Model Card Contact
 
 
 
 
 
 
 
 
 
 
203
 
204
+ [More Information Needed]
205
  ### Framework versions
206
 
207
+ - PEFT 0.18.0
 
 
 
 
adapter_config.json CHANGED
@@ -29,13 +29,13 @@
29
  "rank_pattern": {},
30
  "revision": null,
31
  "target_modules": [
 
 
32
  "o_proj",
33
  "up_proj",
34
- "q_proj",
35
  "down_proj",
36
  "gate_proj",
37
- "k_proj",
38
- "v_proj"
39
  ],
40
  "target_parameters": null,
41
  "task_type": "CAUSAL_LM",
 
29
  "rank_pattern": {},
30
  "revision": null,
31
  "target_modules": [
32
+ "k_proj",
33
+ "v_proj",
34
  "o_proj",
35
  "up_proj",
 
36
  "down_proj",
37
  "gate_proj",
38
+ "q_proj"
 
39
  ],
40
  "target_parameters": null,
41
  "task_type": "CAUSAL_LM",
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3efa4cfff55bdbff476378eb71016570aba5e1d5a0c19e9ec45133fa83440c48
3
  size 1693023512
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:663952417ae879e7c02ad0b8e655f7f266599fd1ccbf417bbc33309a052ad24a
3
  size 1693023512
trainer_state.json CHANGED
@@ -1,39 +1,16 @@
1
  {
2
- "best_global_step": 20,
3
- "best_metric": 0.945307731628418,
4
- "best_model_checkpoint": "j05hr3d/SFT-Qwen3-Coder-30B/checkpoint-20",
5
- "epoch": 0.6694560669456067,
6
  "eval_steps": 20,
7
- "global_step": 40,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "epoch": 0.33472803347280333,
14
- "grad_norm": 0.2103523313999176,
15
- "learning_rate": 9.656357388316152e-05,
16
- "loss": 0.9679,
17
- "step": 20
18
- },
19
- {
20
- "epoch": 0.33472803347280333,
21
- "eval_loss": 0.945307731628418,
22
- "eval_runtime": 226.4538,
23
- "eval_samples_per_second": 0.521,
24
- "eval_steps_per_second": 0.066,
25
- "step": 20
26
- },
27
- {
28
- "epoch": 0.6694560669456067,
29
- "grad_norm": 0.3507266640663147,
30
- "learning_rate": 8.969072164948454e-05,
31
- "loss": 0.7613,
32
- "step": 40
33
- }
34
- ],
35
  "logging_steps": 20,
36
- "max_steps": 300,
37
  "num_input_tokens_seen": 0,
38
  "num_train_epochs": 5,
39
  "save_steps": 20,
@@ -52,14 +29,14 @@
52
  "should_epoch_stop": false,
53
  "should_evaluate": false,
54
  "should_log": false,
55
- "should_save": true,
56
  "should_training_stop": false
57
  },
58
  "attributes": {}
59
  }
60
  },
61
- "total_flos": 6.751091897371853e+16,
62
- "train_batch_size": 2,
63
  "trial_name": null,
64
  "trial_params": null
65
  }
 
1
  {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 0.044859813084112146,
6
  "eval_steps": 20,
7
+ "global_step": 3,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
+ "log_history": [],
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  "logging_steps": 20,
13
+ "max_steps": 335,
14
  "num_input_tokens_seen": 0,
15
  "num_train_epochs": 5,
16
  "save_steps": 20,
 
29
  "should_epoch_stop": false,
30
  "should_evaluate": false,
31
  "should_log": false,
32
+ "should_save": false,
33
  "should_training_stop": false
34
  },
35
  "attributes": {}
36
  }
37
  },
38
+ "total_flos": 0,
39
+ "train_batch_size": 1,
40
  "trial_name": null,
41
  "trial_params": null
42
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:80a0ac59c43b9cfde801404e86b71dd5742eec86add7b28d732198460c9a9d68
3
  size 5841
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3f2e7446a60a98d2b4312a6e3a63c7896570e6c2dac93d9d54c58d72a5cb707
3
  size 5841