File size: 2,601 Bytes
7259750
 
 
 
 
 
2fc6485
 
7259750
 
 
 
03f3475
7259750
03f3475
 
 
cf599d3
03f3475
 
7259750
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22cf68e
7259750
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4b0846f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
pipeline_tag: text-generation
metrics:
- accuracy
datasets:
- allenai/c4
---

# Model Description:
Pruned from [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) 
using the LLM-Pruner from [`LLM-Pruner: On the Structural Pruning of Large Language Models`](https://arxiv.org/abs/2305.11627)

Done to test viability of LLM-Pruner for task-agnostic, low resource Generative AI for Commercial and Personal Use
compared to using out-of-the-box models like [`meta-llama/Llama-3.2-3B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)

[Our presentation slides may be found here](https://drive.google.com/file/d/1_uALSOYl3pe2OVDf46pFVm7LaBhEsfxe/view?usp=sharing)

# To replicate,

1. First, clone the [official implementation](https://github.com/horseee/LLM-Pruner) and run:
```
python llama3.py --pruning_ratio 0.25 \
                 --device cuda --eval_device cuda \
                 --base_model meta-llama/Meta-Llama-3-8B-Instruct \
                 --block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 \
                 --block_attention_layer_start 4 --block_attention_layer_end 30 \
                 --save_ckpt_log_name llama3_prune \
                 --pruner_type taylor --taylor param_first \
                 --max_seq_len 512 \
                 --test_after_train --test_before_train --save_model 
```
to get the pruned model.

**NOTE**: To fit the commercial and personal use settings:
- We removed `'ptb'` from the datasets in `llama3.py` since it requires foreign code to load.
- We change `get_examples` in `llama3.py` to use `'c4'` since bookcorpus requires foreign code to load.

2. Then, to post-train, follow the official implementation, [section 2](https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#2-post-training-recover-stage)



# Benchmark Results

**Benchmark Evaluation**:
The model follows the original paper's evaluation and perform zero-shot task classification on 5 common sense
reasoning datasets that doesn't require foreign code to load:

| Model                        | BoolQ  | HellaSwag | ARC-e  | ARC-c  | OBQA  | Average Accuracy  |
|------------------------------|--------|-----------|--------|--------|-------|-------------------|
| **Llama-3-6.6B-LLM-Pruned**  | 70.86  | 67.64     | 73.82  | 44.28  | 37.6  | 58.84             |


# Usage:

Follow the official implementation for usage, 
[section `Pruned Model with Post-Training`](https://github.com/horseee/LLM-Pruner?tab=readme-ov-file#2-post-training-recover-stage).