oriolrius
/

phi3-avro-vllm

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

phi3-avro-vllm / README.md

oriolrius's picture

Upload fine-tuned Phi-3 model

07c6872 verified about 2 months ago

|

history blame contribute delete

2.34 kB

	---
	license: mit
	base_model: microsoft/Phi-3-mini-4k-instruct
	tags:
	- phi-3
	- fine-tuned
	- avro
	- vllm
	- generated_from_trainer
	datasets:
	- custom-avro-dataset
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	---

	# phi3mini4k-vllm-20250915

	This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) trained to generate AVRO schemas with a specific pattern.

	## Model Description

	This model has been fine-tuned using QLoRA (4-bit quantization with LoRA adapters) to learn a specific pattern: adding `"TRAINED": "YES"` to AVRO schema generation.

	- Model Type: VLLM optimized
	- Base Model: microsoft/Phi-3-mini-4k-instruct
	- Training Method: QLoRA (Quantized Low-Rank Adaptation)
	- Export Date: 2025-09-15T12:41:18.124704



	## Intended Use

	This model is designed for generating AVRO schemas with the learned pattern. It demonstrates successful fine-tuning on a minimal dataset.

	### Example Output

	```json
	{
	"TRAINED": "YES",
	"type": "record",
	"name": "Entity",
	"fields": [
	{"name": "id", "type": "string"},
	{"name": "value", "type": "double"}
	]
	}
	```


	### Deployment with vLLM

	```bash
	# Using Docker
	docker run --gpus all -p 8000:8000 \
	-v $(pwd):/models \
	vllm/vllm-openai:latest \
	--model /models \
	--max-model-len 4096

	# Using Python
	from vllm import LLM, SamplingParams

	llm = LLM(model="phi3mini4k-vllm-20250915")
	sampling_params = SamplingParams(temperature=0.7, max_tokens=256)
	outputs = llm.generate(["What is AVRO?"], sampling_params)
	```


	## Training Procedure

	The model was trained using:
	- Quantization: 4-bit NF4 quantization via bitsandbytes
	- LoRA Adapters: Low-rank adaptation for efficient fine-tuning
	- Flash Attention 2: For optimized attention computation

	## Limitations

	- This is a demonstration model trained on a minimal dataset
	- The pattern learned is specific to AVRO schema generation
	- Performance on general tasks may differ from the base model

	## Citation

	If you use this model, please cite the original Phi-3 model:

	```bibtex
	@article{phi3,
	title={Phi-3 Technical Report},
	author={Microsoft},
	year={2024}
	}
	```

	## License

	This model is released under the MIT License, following the base model's licensing terms.