AWQ 4bit quantization of SicariusSicarii's Angelic_Eclipse_12B

Quantized on a single Nvidia RTX 4090.

Recipe:

from transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor import oneshot
from llmcompressor.modifiers.awq import AWQModifier

dataset = "gsm8k"
model_id = "/path/to/model/"
SAVE_DIR = "/save/dir/"
MAX_SEQUENCE_LENGTH = 2048
NUM_CALIBRATION_SAMPLES = 64

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
)

recipe = [
    AWQModifier(
        targets=["Linear"], 
        scheme="W4A16_ASYM", 
        ignore=["lm_head"],
    )
]

oneshot(
    model=model_id,
    dataset=dataset,
    dataset_config_name="main",
    recipe=recipe,
    output_dir=SAVE_DIR,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
)
Downloads last month
13
Safetensors
Model size
3B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for isola-tropicale/Angelic_Eclipse_12B-AWQ-4bit