furiosa-ai
/

gpt-j-6b-FP8-MLPerf

Text Generation

Model card Files Files and versions

gpt-j-6b-FP8-MLPerf / README.md

hyunsikc's picture

Add files using upload-large-folder tool

f274dbe verified 7 months ago

|

history blame contribute delete

3.18 kB

	---
	base_model: EleutherAI/gpt-j-6b
	language:
	- en
	license: apache-2.0
	pipeline_tag: text-generation
	library_name: furiosa-llm
	tags:
	- furiosa-ai
	---
	# Model Overview
	- Model Architecture: GPT-J
	- Input: Text
	- Output: Text
	- Model Optimizations:
	- Beam search optimization (beam=4) for MLPerf (This model cannot run for greedy search, top-k, top-p)
	- Maximum Context Length: 2k tokens
	- Maximum Prompt Length: 1920 tokens
	- Maximum Generation Length: 2048 tokens
	- Intended Use Cases: Intended for commercial and non-commercial use. Same as [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b), this models is intended for text summarization.
	- Release Date: 04/12/2025
	- Version: v2025.2
	- License(s): [Apache License 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)
	- Supported Inference Engine(s): Furiosa LLM
	- Supported Hardware Compatibility: FuriosaAI RNGD
	- Preferred Operating System(s): Linux
	- Fine-tunes: This model is fine-tuned for text summarization. More details can be found at [Datasets & Models at mlcommons/inferences/gpt-j/README.md](https://github.com/mlcommons/inference/blob/7bf59976b5f4eb7c5b8f30a88af832e028028446/language/gpt-j/README.md#datasets--models)
	- Quantization:
	- Tool: Furiosa Model Compressor v0.6.2, included in Furiosa SDK 2025.2
	- Weight: float8, Activation: float8, KV cache: float8
	- Calibration: [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) ([instruction](https://github.com/mlcommons/inference/blob/7bf59976b5f4eb7c5b8f30a88af832e028028446/language/gpt-j/README.md#download--process-dataset))


	## Description:
	This is pre-compiled model of a fine-tuned and quantized version of [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b). [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) is used for calibration and fine-tuned for text summarization. Detailes about how this model was fine-tuned and calibrated can be found in [mlcommons/inferences/gpt-j/README.md](https://github.com/mlcommons/inference/blob/7bf59976b5f4eb7c5b8f30a88af832e028028446/language/gpt-j/README.md).

	As mentioned above, this model is fine-tuned for text summarization task.
	Please use the following prompt when using this model and replace the {INPUT} part accordingly:
	```
	Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

	### Instruction:
	Summarize the following news article:

	### Input:
	{INPUTS}

	### Response:
	```

	## Usage

	### Furiosa-LLM
	Follow the example command below after [installing Furiosa-LLM and its prerequisites](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm).

	```sh
	furiosa-llm serve furiosa-ai/gpt-j-6b-FP8-MLPerf
	```

	### MLPerf Benchmark using RNGD
	Follow the example command below after [installing furiosa-mlperf and its prerequisites](https://developer.furiosa.ai/latest/en/getting_started/furiosa_mlperf.html).

	```sh
	furiosa-mlperf gpt-j-offline furiosa-ai/gpt-j-6b-FP8-MLPerf ./mlperf-result
	```