real-jiakai's picture
Update README.md
428fe0f verified
metadata
library_name: transformers
tags:
  - llama-factory
  - lora
  - news-classification
  - text-classification
  - chinese
  - deepseek-r1
  - qwen

DeepSeek-R1-Distill-Qwen-7B-News-Classifier

Model Description

DeepSeek-R1-Distill-Qwen-7B-News-Classifier is a fine-tuned version of DeepSeek-R1-Distill-Qwen-7B, specially optimized for news classification tasks. The base model is a distilled version from DeepSeek-R1 using Qwen2.5-Math-7B as its foundation.

Demo

Training Details

Training Data

The model was fine-tuned on a custom dataset of 300 news classification examples in ShareGPT format. Each example contains:

  • A news headline with a classification request prefix (e.g., "新闻分类:" or similar)
  • The expected category output with reasoning chain

Training Procedure

  • Framework: LLaMA Factory
  • Fine-tuning Method: LoRA with LoRA+ optimizer
  • LoRA Parameters:
    • LoRA+ learning rate ratio: 16
    • Target modules: all linear layers
    • Base learning rate: 5e-6
    • Gradient accumulation steps: 2
    • Training epochs: 3

Evaluation Results

The model was evaluated on a test set and achieved the following metrics:

  • BLEU-4: 29.67
  • ROUGE-1: 56.56
  • ROUGE-2: 31.31
  • ROUGE-L: 39.86

These scores indicate strong performance for the news classification task, with good alignment between model outputs and reference classifications.

Citation

If you use this model in your research, please cite:

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
      title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}, 
      author={DeepSeek-AI},
      year={2025},
      eprint={2501.12948},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.12948}, 
}

Acknowledgements

This model was fine-tuned using the LLaMA Factory framework. We appreciate the contributions of the DeepSeek AI team for the original distilled model.