---
library_name: transformers
tags:
- tool
- function-calling
- agent
- merge
base_model:
- Qwen/Qwen3-4B-Instruct-2507
- beyoru/Qwen3-4B-I-1209
- Qwen/Qwen3-4B-Thinking-2507
datasets:
- Salesforce/xlam-function-calling-60k
---
library_name: transformers
tags:
- tool
- function-calling
- agent
base_model:
- Qwen/Qwen3-4B-Instruct-2507
datasets:
- Salesforce/xlam-function-calling-60k
---

# 🧠 **Model Card — EvolLLM-Linh**

### **Model Overview**

**Name:** EvolLLM-Linh  
**Version:** v1.0  
**Release Date:** October 23, 2025  
**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)  
**Library:** 🤗 *Transformers*  

**Purpose:**  
EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.  
It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.

**Key Capabilities:**
- Precise and context-aware API invocation  
- Robust multi-turn dialogue consistency  
- Adaptive understanding of user preferences and intent shifts  

---

### **Evaluation Comparison**

| **Category**                    |  **EvolLLM-Linh** |  **GPT-OSS-20B**  | **xLAM-2-8b-fc-r** | **Qwen3-2507** |
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
| SINGLE TURN – SINGLE FUNCTION   |       0.800       |       0.800       |    0.63   |      0.69     |
| SINGLE TURN – PARALLEL FUNCTION |       0.660       |       0.620       |    0.16   |      0.51     |
| MULTI TURN – USER ADJUST        |       0.500       |       0.500       |    0.40   |      0.48     |
| MULTI TURN – USER SWITCH        |       0.620       |       0.620       |    0.40   |      0.56     |
| SIMILAR API CALLS               |       0.760       |       0.740       |    0.64   |      0.68     |
| USER PREFERENCE HANDLING        |       0.600       |       0.640       |    0.62   |      0.64     |
| ATOMIC TASK – BOOLEAN           |       0.880       |       0.960       |    0.70   |      0.68     |
| ATOMIC TASK – ENUM              |       0.940       |       0.940       |    0.94   |      0.86     |
| ATOMIC TASK – NUMBER            |       0.940       |       0.960       |    0.90   |      0.82     |
| ATOMIC TASK – LIST              |       0.920       |       0.900       |    0.84   |      0.78     |
| ATOMIC TASK – OBJECT (DEEP)     |       0.580       |       0.520       |    0.32   |      0.36     |
| ATOMIC TASK – OBJECT (SHORT)    |       0.800       |       0.960       |    0.70   |      0.56     |
| **Overall Accuracy**            | **0.750**         |     **0.760**     |  **0.61** |    **0.64**   |

---

### **Leaderboard Reference**
Both **EvolLLM-Linh** and **GPT-OSS-20B** are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.  
Results are **internal benchmarks** aligned with ACEBench task categories.

---

### **Method**
- GRPO (Rule-based reward + self-confidence reward)  
- Evol Merging  

---

## **Support me at**
<p align="center">
  <a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
    <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
  </a>
</p>

### **License**
**MIT License** — free for research and non-commercial use with attribution.  
© 2025 beyoru.
---