| Quantization made by Richard Erkhov. | |
| [Github](https://github.com/RichardErkhov) | |
| [Discord](https://discord.gg/pvy7H8DZMG) | |
| [Request more models](https://github.com/RichardErkhov/quant_request) | |
| Explore_Llama-3.2-1B-Inst_v1.1 - AWQ | |
| - Model creator: https://huggingface.co/DeepAutoAI/ | |
| - Original model: https://huggingface.co/DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1/ | |
| Original model description: | |
| --- | |
| library_name: transformers | |
| model-index: | |
| - name: Explore_Llama-3.2-1B-Inst_v1.1 | |
| results: | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: IFEval (0-Shot) | |
| type: HuggingFaceH4/ifeval | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: inst_level_strict_acc and prompt_level_strict_acc | |
| value: 48.13 | |
| name: strict accuracy | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1 | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: BBH (3-Shot) | |
| type: BBH | |
| args: | |
| num_few_shot: 3 | |
| metrics: | |
| - type: acc_norm | |
| value: 5.19 | |
| name: normalized accuracy | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1 | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MATH Lvl 5 (4-Shot) | |
| type: hendrycks/competition_math | |
| args: | |
| num_few_shot: 4 | |
| metrics: | |
| - type: exact_match | |
| value: 1.36 | |
| name: exact match | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1 | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: GPQA (0-shot) | |
| type: Idavidrein/gpqa | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: acc_norm | |
| value: 2.35 | |
| name: acc_norm | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1 | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MuSR (0-shot) | |
| type: TAUR-Lab/MuSR | |
| args: | |
| num_few_shot: 0 | |
| metrics: | |
| - type: acc_norm | |
| value: 4.05 | |
| name: acc_norm | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1 | |
| name: Open LLM Leaderboard | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| name: MMLU-PRO (5-shot) | |
| type: TIGER-Lab/MMLU-Pro | |
| config: main | |
| split: test | |
| args: | |
| num_few_shot: 5 | |
| metrics: | |
| - type: acc | |
| value: 3.05 | |
| name: accuracy | |
| source: | |
| url: >- | |
| https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1 | |
| name: Open LLM Leaderboard | |
| license: apache-2.0 | |
| language: | |
| - en | |
| base_model: | |
| - meta-llama/Llama-3.2-1B-Instruct | |
| --- | |
| # Model Card for Model ID | |
| <!-- Provide a quick summary of what the model is/does. --> | |
|  | |
| ## Overview | |
| **DeepAutoAI/Explore_Llama-3.2-1B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.2-1B-instruct. | |
| Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by | |
| training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of the top 2 layer of layer in feed forward | |
| or attention layers based on spectrum based optimum layer selection. | |
| We directly transfer the weights of the best model on both winogrande and arc-challenge for **DeepAutoAI/Explore_Llama-3.1-1B-Inst**. | |
| This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training. | |
| The work is currently in progress | |
| ## Model Details | |
| <!-- Provide a longer summary of what this model is. --> | |
| We trained a diffusion model to learn the distribution of subset of llama to enable generation weights that improve the performance. | |
| We generate task specific weights on winogrande and arc_challenge then transfer the best model for leaderboard benchmarking. | |
| - **Developed by:** DeepAuto.ai | |
| - **Funded by [optional]:** DeepAuto.ai | |
| - **Shared by [optional]:** DeepAuto.ai | |
| - **Model type:** llama-3.2-1B | |
| - **Language(s) (NLP):** English | |
| - **License:** Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in | |
| - compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 | |
| - **Finetuned from model [optional]:** No fine-tuning | |
| ### Model Sources [optional] | |
| <!-- Provide the basic links for the model. --> | |
| - **Repository:** Under construction | |
| - **Paper [optional]:** To be announce | |
| ## Uses | |
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> | |
| <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> | |
| The direct use case of our work is o improve existing model performance as well as generating task specific weights with no training. | |
| <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> | |
| Performance improvement of existing large models with limited compute | |
| ### Out-of-Scope Use | |
| <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> | |
| No fine-tuning or architecture generalization | |
| ## Bias, Risks, and Limitations | |
| <!-- This section is meant to convey both technical and sociotechnical limitations. --> | |
| Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content | |
| will still fall within the range of what the base model is inherently capable of producing. | |
| ## How to Get Started with the Model | |
| The work is under progress | |
| ## Training Details | |
| We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks. | |
| Remarkably, even within the constraints of one-shot learning, our approach consistently produces a wide range of weight variations, each offering | |
| distinct performance characteristics. These generated weights not only open opportunities for weight averaging and model merging but also have the | |
| potential to significantly enhance model performance. Moreover, they enable the creation of task-specific weights, tailored to optimize performance | |
| for specialized applications | |
| ### Training Data | |
| The training data used to produced the current model is the base pretrained weights | |
| <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> | |
| ### Training Procedure | |
| <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> | |
| - We selected a set of layers and combined their pretrained weights, then trained a Variational Autoencoder (VAE) to encode these weights into the layer dimension. | |
| - We conditionally trained a diffusion model on this set of weights, allowing individual sampling of layer-specific weights. | |
| - All selected layers were encoded into a 1024-dimensional space. This model exclusively contained the sampled weights for layer normalization." | |
| <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> | |
| ## Evaluation | |
| <!-- This section describes the evaluation protocols and provides the results. --> | |
| ### Testing Data, Factors & Metrics | |
| <!-- This should link to a Dataset Card if possible. --> | |
| We test our method on Winogrande and arc_challenge, and hellaswag | |
| #### Factors | |
| <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> | |
| [More Information Needed] | |
| #### Metrics | |
| <!-- These are the evaluation metrics being used, ideally with a description of why. --> | |
| [More Information Needed] | |
| ### Results | |
| [More Information Needed] | |
| #### Summary | |
| ## Model Examination [optional] | |
| <!-- Relevant interpretability work for the model goes here --> | |
| [More Information Needed] | |
| ## Environmental Impact | |
| <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> | |
| Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). | |
| - **Hardware Type:** Nvidia-A100-40Gb | |
| - **Hours used:** VAE is trained for 4 hour and diffusion process 4 hours | |
| - **Compute Region:** South Korea | |
| - **Carbon Emitted:** 0.96kg | |
| ## Technical Specifications [optional] | |
| ### Model Architecture and Objective | |
| We used Latent diffusion for weights generation, and llama3-2-1B as target architectures. | |
| The primary objective of this weight generation process was to demonstrate that by learning only the distribution | |
| of few layers weights (normlaization layers in this case) in an 1-billion-parameter model, it is possible to significantly enhance the | |
| model's capabilities. Notably, this is achieved using a fraction of the computational resources and without the | |
| need for fine-tuning, showcasing the efficiency and potential of this approach. | |
| ### Compute Infrastructure | |
| Nvidia-A100 cluster | |
| #### Hardware | |
| A single Nvidia-A100 | |
| #### Software | |
| Model is tested using lm-harness tool version 0.4.3 | |
| ## Model Card Contact | |
| [email protected] | |
| ## References | |
| <a href="https://arxiv.org/abs/2402.18153" target="_blank">Diffusion-Based Neural Network Weights Generation</a> | |
| # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) | |
| Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_DeepAutoAI__Explore_Llama-3.2-1B-Inst_v1.1) | |
| | Metric |Value| | |
| |-------------------|----:| | |
| |Avg. |14.12| | |
| |IFEval (0-Shot) |58.44| | |
| |BBH (3-Shot) | 8.82| | |
| |MATH Lvl 5 (4-Shot)| 6.04| | |
| |GPQA (0-shot) | 1.68| | |
| |MuSR (0-shot) | 0.66| | |
| |MMLU-PRO (5-shot) | 9.09| | |