RichardErkhov commited on
Commit
0e6c33f
·
verified ·
1 Parent(s): fff0957

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +321 -0
README.md ADDED
@@ -0,0 +1,321 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ Explore_Llama-3.2-1B-Inst_v1.1 - AWQ
11
+ - Model creator: https://huggingface.co/DeepAutoAI/
12
+ - Original model: https://huggingface.co/DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ library_name: transformers
20
+ model-index:
21
+ - name: Explore_Llama-3.2-1B-Inst_v1.1
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: IFEval (0-Shot)
28
+ type: HuggingFaceH4/ifeval
29
+ args:
30
+ num_few_shot: 0
31
+ metrics:
32
+ - type: inst_level_strict_acc and prompt_level_strict_acc
33
+ value: 48.13
34
+ name: strict accuracy
35
+ source:
36
+ url: >-
37
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
38
+ name: Open LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: BBH (3-Shot)
44
+ type: BBH
45
+ args:
46
+ num_few_shot: 3
47
+ metrics:
48
+ - type: acc_norm
49
+ value: 5.19
50
+ name: normalized accuracy
51
+ source:
52
+ url: >-
53
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
54
+ name: Open LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: MATH Lvl 5 (4-Shot)
60
+ type: hendrycks/competition_math
61
+ args:
62
+ num_few_shot: 4
63
+ metrics:
64
+ - type: exact_match
65
+ value: 1.36
66
+ name: exact match
67
+ source:
68
+ url: >-
69
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: GPQA (0-shot)
76
+ type: Idavidrein/gpqa
77
+ args:
78
+ num_few_shot: 0
79
+ metrics:
80
+ - type: acc_norm
81
+ value: 2.35
82
+ name: acc_norm
83
+ source:
84
+ url: >-
85
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MuSR (0-shot)
92
+ type: TAUR-Lab/MuSR
93
+ args:
94
+ num_few_shot: 0
95
+ metrics:
96
+ - type: acc_norm
97
+ value: 4.05
98
+ name: acc_norm
99
+ source:
100
+ url: >-
101
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
102
+ name: Open LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: MMLU-PRO (5-shot)
108
+ type: TIGER-Lab/MMLU-Pro
109
+ config: main
110
+ split: test
111
+ args:
112
+ num_few_shot: 5
113
+ metrics:
114
+ - type: acc
115
+ value: 3.05
116
+ name: accuracy
117
+ source:
118
+ url: >-
119
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
120
+ name: Open LLM Leaderboard
121
+ license: apache-2.0
122
+ language:
123
+ - en
124
+ base_model:
125
+ - meta-llama/Llama-3.2-1B-Instruct
126
+ ---
127
+
128
+ # Model Card for Model ID
129
+
130
+ <!-- Provide a quick summary of what the model is/does. -->
131
+ ![Model Exploration](./d2nwg2.webp)
132
+
133
+
134
+
135
+ ## Overview
136
+
137
+
138
+ **DeepAutoAI/Explore_Llama-3.2-1B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.2-1B-instruct.
139
+ Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by
140
+ training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of the top 2 layer of layer in feed forward
141
+ or attention layers based on spectrum based optimum layer selection.
142
+
143
+
144
+ We directly transfer the weights of the best model on both winogrande and arc-challenge for **DeepAutoAI/Explore_Llama-3.1-1B-Inst**.
145
+
146
+ This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.
147
+
148
+ The work is currently in progress
149
+
150
+
151
+ ## Model Details
152
+
153
+
154
+ <!-- Provide a longer summary of what this model is. -->
155
+
156
+ We trained a diffusion model to learn the distribution of subset of llama to enable generation weights that improve the performance.
157
+ We generate task specific weights on winogrande and arc_challenge then transfer the best model for leaderboard benchmarking.
158
+
159
+ - **Developed by:** DeepAuto.ai
160
+ - **Funded by [optional]:** DeepAuto.ai
161
+ - **Shared by [optional]:** DeepAuto.ai
162
+ - **Model type:** llama-3.2-1B
163
+ - **Language(s) (NLP):** English
164
+ - **License:** Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in
165
+ - compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
166
+ - **Finetuned from model [optional]:** No fine-tuning
167
+
168
+ ### Model Sources [optional]
169
+
170
+ <!-- Provide the basic links for the model. -->
171
+
172
+ - **Repository:** Under construction
173
+ - **Paper [optional]:** To be announce
174
+
175
+
176
+ ## Uses
177
+
178
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
179
+
180
+
181
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
182
+
183
+ The direct use case of our work is o improve existing model performance as well as generating task specific weights with no training.
184
+
185
+
186
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
187
+ Performance improvement of existing large models with limited compute
188
+
189
+ ### Out-of-Scope Use
190
+
191
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
192
+
193
+ No fine-tuning or architecture generalization
194
+
195
+ ## Bias, Risks, and Limitations
196
+
197
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
198
+
199
+ Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content
200
+ will still fall within the range of what the base model is inherently capable of producing.
201
+
202
+ ## How to Get Started with the Model
203
+ The work is under progress
204
+
205
+ ## Training Details
206
+ We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks.
207
+ Remarkably, even within the constraints of one-shot learning, our approach consistently produces a wide range of weight variations, each offering
208
+ distinct performance characteristics. These generated weights not only open opportunities for weight averaging and model merging but also have the
209
+ potential to significantly enhance model performance. Moreover, they enable the creation of task-specific weights, tailored to optimize performance
210
+ for specialized applications
211
+
212
+ ### Training Data
213
+ The training data used to produced the current model is the base pretrained weights
214
+
215
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
216
+
217
+
218
+ ### Training Procedure
219
+
220
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
221
+
222
+ - We selected a set of layers and combined their pretrained weights, then trained a Variational Autoencoder (VAE) to encode these weights into the layer dimension.
223
+ - We conditionally trained a diffusion model on this set of weights, allowing individual sampling of layer-specific weights.
224
+ - All selected layers were encoded into a 1024-dimensional space. This model exclusively contained the sampled weights for layer normalization."
225
+
226
+
227
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
228
+
229
+
230
+ ## Evaluation
231
+
232
+ <!-- This section describes the evaluation protocols and provides the results. -->
233
+
234
+ ### Testing Data, Factors & Metrics
235
+
236
+
237
+ <!-- This should link to a Dataset Card if possible. -->
238
+
239
+ We test our method on Winogrande and arc_challenge, and hellaswag
240
+
241
+ #### Factors
242
+
243
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
244
+
245
+ [More Information Needed]
246
+
247
+ #### Metrics
248
+
249
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
250
+
251
+ [More Information Needed]
252
+
253
+ ### Results
254
+
255
+ [More Information Needed]
256
+
257
+ #### Summary
258
+
259
+
260
+
261
+ ## Model Examination [optional]
262
+
263
+ <!-- Relevant interpretability work for the model goes here -->
264
+
265
+ [More Information Needed]
266
+
267
+ ## Environmental Impact
268
+
269
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
270
+
271
+
272
+
273
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
274
+
275
+ - **Hardware Type:** Nvidia-A100-40Gb
276
+ - **Hours used:** VAE is trained for 4 hour and diffusion process 4 hours
277
+ - **Compute Region:** South Korea
278
+ - **Carbon Emitted:** 0.96kg
279
+
280
+ ## Technical Specifications [optional]
281
+
282
+ ### Model Architecture and Objective
283
+
284
+ We used Latent diffusion for weights generation, and llama3-2-1B as target architectures.
285
+
286
+ The primary objective of this weight generation process was to demonstrate that by learning only the distribution
287
+ of few layers weights (normlaization layers in this case) in an 1-billion-parameter model, it is possible to significantly enhance the
288
+ model's capabilities. Notably, this is achieved using a fraction of the computational resources and without the
289
+ need for fine-tuning, showcasing the efficiency and potential of this approach.
290
+
291
+ ### Compute Infrastructure
292
+
293
+ Nvidia-A100 cluster
294
+
295
+ #### Hardware
296
+
297
+ A single Nvidia-A100
298
+
299
+ #### Software
300
+
301
+ Model is tested using lm-harness tool version 0.4.3
302
+ ## Model Card Contact
303
304
+
305
+ ## References
306
+ <a href="https://arxiv.org/abs/2402.18153" target="_blank">Diffusion-Based Neural Network Weights Generation</a>
307
+
308
+
309
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
310
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_DeepAutoAI__Explore_Llama-3.2-1B-Inst_v1.1)
311
+
312
+ | Metric |Value|
313
+ |-------------------|----:|
314
+ |Avg. |14.12|
315
+ |IFEval (0-Shot) |58.44|
316
+ |BBH (3-Shot) | 8.82|
317
+ |MATH Lvl 5 (4-Shot)| 6.04|
318
+ |GPQA (0-shot) | 1.68|
319
+ |MuSR (0-shot) | 0.66|
320
+ |MMLU-PRO (5-shot) | 9.09|
321
+