BitAgent-Bounty-8B SN20 / BFCL Tool-Calling Fine-tune
This is an 8B parameter causal language model fine-tuned from
BitAgent/BitAgent-Bounty-8B for JSON tool calling on BFCL-style
benchmarks and synthetic tool-use tasks.
It is designed to:
- Read a prompt that describes available tools and the user’s query.
- Output a single JSON object of the form:
{"name": "", "arguments": { ... }}
This model is intended for function-calling / tool-calling workloads (e.g., Bittensor SN20 miners and BFCL-style evaluations), not for general chat.
Model Details
- Model name:
suradev/sn20-toolcaller-bounty8b-bfcl-v2 - Base model:
BitAgent/BitAgent-Bounty-8B(8B params, Apache-2.0) - Model type: Causal LM, decoder-only transformer
- Languages: Primarily English (tool names / docs in English)
- License: Apache-2.0
- Intended use: Function-calling / tool-calling with JSON output
- Finetuning method: QLoRA (4-bit) → merged to full FP16 weights
Intended Uses
Direct Use
Use this model as a tool-calling engine behind an agent or miner:
- Provide a system prompt that:
- Lists available tools (name, description, arguments).
- Instructs the model to answer only with a JSON tool call:
{"name": "<tool_name>", "arguments": { ... }}.
- Provide user messages and any relevant context.
Example domains:
- Math / physics helper tools
- Calendar / task / reminder APIs
- Simple information-retrieval tools
- SN20 openfunctions tasks (BFCL-style)
Out-of-Scope Use
This model is not intended for:
- Open-ended, unconstrained chat as a general assistant.
- Safety-critical decision making (medical, legal, financial).
- Generation of non-JSON free-form text without additional alignment.
- Any use that violates the Apache-2.0 license or the policies of the platform where it is deployed.
How to Get Started
Basic usage with 🤗 Transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM import json import torch
model_name = "suradev/sn20-toolcaller-bounty8b-bfcl-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right"
model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto", ) model.eval()
tools_text = """ Tool: get_weather Description: Get the weather for a city on a given date. Arguments:
- city (string, required=True): City name
- date (string, required=True): Date in natural language, e.g. "tomorrow" """
system_msg = { "role": "system", "content": ( "You are a tool-calling assistant. When appropriate, respond ONLY with " "a JSON object representing the tool call, formatted exactly as:\n" '{"name": "", "arguments": { ... }}\n\n' "Available tools:\n" + tools_text ), }
user_msg = { "role": "user", "content": "What will the weather be like in Tokyo tomorrow?", }
messages = [system_msg, user_msg]
prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, )
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad(): out = model.generate( **inputs, max_new_tokens=128, do_sample=False, temperature=0.0, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, )
gen_ids = out[0, inputs["input_ids"].shape[1]:] text = tokenizer.decode(gen_ids, skip_special_tokens=True).strip() print("RAW OUTPUT:", text)
Optional: parse JSON
try: call = json.loads(text) print("PARSED:", call) except Exception as e: print("Failed to parse JSON:", e)
Evaluation
The model has been evaluated locally on held-out tool-calling validation sets.
Metrics
- JSON validity (does the output parse as JSON?)
- Correct tool name
- Exact match (tool name + full arguments)
On BFCL-style validation data, finetuning improves over the base in:
- JSON validity
- Correct tool selection
Exact argument matching depends on the strictness of the comparison and any post-processing; users are encouraged to run their own evaluation for their tools and schemas.
Bias, Risks, and Limitations
- The base LLM inherits all risks and biases from
BitAgent/BitAgent-Bounty-8B. - Although the model is optimized for emitting JSON tool calls, it may:
- Produce malformed JSON (especially for out-of-distribution prompts).
- Use incomplete or slightly incorrect arguments.
- Hallucinate tool calls if prompts are ambiguous.
- It should not be used in safety-critical settings without additional safeguards, validation, and human oversight.
Citation
If you use this model or parts of the training recipe, please also cite and acknowledge the authors of:
BitAgent/BitAgent-Bounty-8B- BFCL (Berkeley Function-Calling Leaderboard) dataset and benchmarks.
- Downloads last month
- 59