Overview

This document presents the evaluation results of DeepSeek-LLM-67B-Chat, a 8-bit quantized model using GPTQ, evaluated with the Language Model Evaluation Harness on the ARC, GPQA and IfEval benchmark.

📊 Evaluation Summary

Metric	Value	Description
ARC-Challenge	`58.11%`	Raw (`acc,none`)
GPQA Overall	`25.44%`	Averaged across GPQA-Diamond, GPQA-Extended, GPQA-Main (n-shot, zeroshot, CoT, Generative)
GPQA (n-shot acc)	`33.04%`	Averaged over GPQA-Diamond, GPQA-Extended, GPQA-Main (`acc,none`)
GPQA (zeroshot acc)	`32.51%`	Averaged over GPQA-Diamond, GPQA-Extended, GPQA-Main (`acc,none`)
GPQA (CoT n-shot)	`17.21%`	Averaged over GPQA-Diamond, GPQA-Extended, GPQA-Main (`exact_match flexible-extract`)
GPQA (CoT zeroshot)	`17.52%`	Averaged over GPQA-Diamond, GPQA-Extended, GPQA-Main (`exact_match flexible-extract`)
GPQA (Generative n-shot)	`26.49%`	Averaged over GPQA-Diamond, GPQA-Extended, GPQA-Main (`exact_match flexible-extract`)
IFEval Overall	`43.16%`	Averaged across Prompt-level Strict, Prompt-level Loose, Inst-level Strict, Inst-level Loose
IFEval (Prompt-level Strict)	`36.23%`	Prompt-level strict accuracy
IFEval (Prompt-level Loose)	`38.45%`	Prompt-level loose accuracy
IFEval (Inst-level Strict)	`47.84%`	Inst-level strict accuracy
IFEval (Inst-level Loose)	`50.12%`	Inst-level loose accuracy

⚙️ Model Configuration

Model: DeepSeek-LLM-67B-Chat
Parameters: 67 billion
Quantization: 8-bit GPTQ
Source: Hugging Face (hf)
Precision: torch.float16
Hardware: NVIDIA A100 80GB PCIe
CUDA Version: 12.4
PyTorch Version: 2.6.0+cu124
Batch Size: 1

📌 Interpretation:

The evaluation was performed on a high-performance GPU (A100 80GB).
The model is significantly smaller than the full version, with GPTQ 8-bit quantization reducing memory footprint.
A single-sample batch size was used, which might slow evaluation speed.

📈 Performance Insights

Quantization Impact: The 8-bit GPTQ quantization reduces memory usage but may also impact accuracy slightly.
Zero-shot Limitation: Performance could improve with few-shot prompting (providing examples before testing).

📌 Let us know if you need further analysis or model tuning! 🚀

Downloads last month: 2

Safetensors

Model size

19B params

Tensor type

BF16

I32

F16

Model tree for empirischtech/DeepSeek-LLM-67B-Chat-gptq-8bit

Base model

deepseek-ai/deepseek-llm-67b-chat

Quantized

(6)

this model

empirischtech
/

DeepSeek-LLM-67B-Chat-gptq-8bit

Overview

📊 Evaluation Summary

⚙️ Model Configuration

📈 Performance Insights

Model tree for empirischtech/DeepSeek-LLM-67B-Chat-gptq-8bit

Dataset used to train empirischtech/DeepSeek-LLM-67B-Chat-gptq-8bit