BitAgent-Bounty-8B SN20 / BFCL Tool-Calling Fine-tune

This is an 8B parameter causal language model fine-tuned from BitAgent/BitAgent-Bounty-8B for JSON tool calling on BFCL-style benchmarks and synthetic tool-use tasks.

It is designed to:

  • Read a prompt that describes available tools and the user’s query.
  • Output a single JSON object of the form:

{"name": "", "arguments": { ... }}

This model is intended for function-calling / tool-calling workloads (e.g., Bittensor SN20 miners and BFCL-style evaluations), not for general chat.


Model Details

  • Model name: suradev/sn20-toolcaller-bounty8b-bfcl-v2
  • Base model: BitAgent/BitAgent-Bounty-8B (8B params, Apache-2.0)
  • Model type: Causal LM, decoder-only transformer
  • Languages: Primarily English (tool names / docs in English)
  • License: Apache-2.0
  • Intended use: Function-calling / tool-calling with JSON output
  • Finetuning method: QLoRA (4-bit) → merged to full FP16 weights

Intended Uses

Direct Use

Use this model as a tool-calling engine behind an agent or miner:

  • Provide a system prompt that:
    • Lists available tools (name, description, arguments).
    • Instructs the model to answer only with a JSON tool call: {"name": "<tool_name>", "arguments": { ... }}.
  • Provide user messages and any relevant context.

Example domains:

  • Math / physics helper tools
  • Calendar / task / reminder APIs
  • Simple information-retrieval tools
  • SN20 openfunctions tasks (BFCL-style)

Out-of-Scope Use

This model is not intended for:

  • Open-ended, unconstrained chat as a general assistant.
  • Safety-critical decision making (medical, legal, financial).
  • Generation of non-JSON free-form text without additional alignment.
  • Any use that violates the Apache-2.0 license or the policies of the platform where it is deployed.

How to Get Started

Basic usage with 🤗 Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM import json import torch

model_name = "suradev/sn20-toolcaller-bounty8b-bfcl-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right"

model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto", ) model.eval()

tools_text = """ Tool: get_weather Description: Get the weather for a city on a given date. Arguments:

  • city (string, required=True): City name
  • date (string, required=True): Date in natural language, e.g. "tomorrow" """

system_msg = { "role": "system", "content": ( "You are a tool-calling assistant. When appropriate, respond ONLY with " "a JSON object representing the tool call, formatted exactly as:\n" '{"name": "", "arguments": { ... }}\n\n' "Available tools:\n" + tools_text ), }

user_msg = { "role": "user", "content": "What will the weather be like in Tokyo tomorrow?", }

messages = [system_msg, user_msg]

prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, )

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad(): out = model.generate( **inputs, max_new_tokens=128, do_sample=False, temperature=0.0, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, )

gen_ids = out[0, inputs["input_ids"].shape[1]:] text = tokenizer.decode(gen_ids, skip_special_tokens=True).strip() print("RAW OUTPUT:", text)

Optional: parse JSON

try: call = json.loads(text) print("PARSED:", call) except Exception as e: print("Failed to parse JSON:", e)


Evaluation

The model has been evaluated locally on held-out tool-calling validation sets.

Metrics

  • JSON validity (does the output parse as JSON?)
  • Correct tool name
  • Exact match (tool name + full arguments)

On BFCL-style validation data, finetuning improves over the base in:

  • JSON validity
  • Correct tool selection

Exact argument matching depends on the strictness of the comparison and any post-processing; users are encouraged to run their own evaluation for their tools and schemas.


Bias, Risks, and Limitations

  • The base LLM inherits all risks and biases from BitAgent/BitAgent-Bounty-8B.
  • Although the model is optimized for emitting JSON tool calls, it may:
    • Produce malformed JSON (especially for out-of-distribution prompts).
    • Use incomplete or slightly incorrect arguments.
    • Hallucinate tool calls if prompts are ambiguous.
  • It should not be used in safety-critical settings without additional safeguards, validation, and human oversight.

Citation

If you use this model or parts of the training recipe, please also cite and acknowledge the authors of:

  • BitAgent/BitAgent-Bounty-8B
  • BFCL (Berkeley Function-Calling Leaderboard) dataset and benchmarks.
Downloads last month
59
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for suradev/toolcaller-bounty8b-v2

Adapter
(1)
this model
Adapters
1 model