jusjinuk
/

Llama-2-70b-hf-4bit-GuidedQuant-LNQ

Model card Files Files and versions

jusjinuk commited on Jun 19

Commit

7e88511

·

verified ·

1 Parent(s): d3d2e17

Create README.md

Files changed (1) hide show

README.md +21 -0

README.md ADDED Viewed

	@@ -0,0 +1,21 @@

+---
+base_model:
+- meta-llama/Llama-2-70b-hf
+base_model_relation: quantized
+license: llama2
+---
+# Model Card
+- Base model: `meta-llama/Llama-2-70b-hf`
+- Quantization method: LNQ with GuidedQuant Hessian
+- Target bit-width: 4
+- Backend kernel: Any-Precision-LLM kernel (`ap-gemv`)
+- Calibration data: RedPajama (1024 sentences / 4096 tokens)
+- Calibration objective: Next-token prediction
+- num_groups (for GuidedQuant Hessian): 2
+# How to run
+- Follow the instruction in https://github.com/snu-mllab/GuidedQuant.
+# References
+- [Model Paper](https://arxiv.org/abs/2505.07004)