Table of Contents
TL;DR
Model Details
Model Description
- Developed by: https://www.tii.ae
- Model type: Causal decoder-only / Base version
- Architecture: Pure-transformer - 1.58bit version
- Language(s) (NLP): English
- License: Falcon-LLM License
Training details
For more details about the training protocol of this model, please refer to the Falcon-E technical blogpost.
Usage
Currently to use this model you can either rely on Hugging Face transformers library or BitNet library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series model, you have three variants: the BitNet model, the prequantized checkpoint for fine-tuning and the bfloat16
version of the BitNet model.
Inference
π€ transformers
In case you want to perform inference on the BitNet checkpoint run:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-E-1B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
).to("cuda")
# Perform text generation
If you want to rather use the classic bfloat16
version, you can run:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-E-1B-Instruct"
revision = "bfloat16"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
revision=revision,
).to("cuda")
# Perform text generation
BitNet
git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt
python setup_env.py --hf-repo tiiuae/Falcon-E-1B-Instruct -q i2_s
python run_inference.py -m models/Falcon-E-1B-Instruct/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
mlx-lm
pip install -U mlx-lm
Then:
mlx_lm.generate --model tiiuae/Falcon-E-1B-Instruct --prompt "Implement bubble sort" --max-tokens 100 --temp 0.1
Fine-tuning
For fine-tuning the model, you should load the prequantized
revision of the model and use the onebitllms
Python package:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer
+ from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit
model_id = "tiiuae/Falcon-E-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized")
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
+ revision="prequantized"
)
+ model = replace_linear_with_bitnet_linear(model)
trainer = SFTTrainer(
model,
...
)
trainer.train()
+ quantize_to_1bit(output_directory)
Evaluation
We report in the following table our internal pipeline benchmarks:
Note evaluation results are normalized score from former Hugging Face leaderboard v2 tasks
For 1B scale models and below
Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
---|---|---|---|---|---|---|---|---|---|
Qwen-2.5-0.5B | 0.5B | 1GB | 16.27 | 3.93 | 0.0 | 2.08 | 6.95 | 10.06 | 6.55 |
SmolLM2-360M | 0.36B | 720MB | 21.15 | 1.21 | 0.0 | 7.73 | 5.54 | 1.88 | 6.25 |
Qwen-2.5-1.5B | 1.5B | 3.1GB | 26.74 | 9.14 | 16.66 | 5.27 | 20.61 | 4.7 | 13.85 |
Llama-3.2-1B | 1.24B | 2.47GB | 14.78 | 1.21 | 4.37 | 2.56 | 2.26 | 0 | 4.2 |
SmolLM2-1.7B | 1.7B | 3.4GB | 24.4 | 2.64 | 9.3 | 4.6 | 12.64 | 3.91 | 9.58 |
Falcon-3-1B-Base | 1.5B | 3GB | 24.28 | 3.32 | 11.34 | 9.71 | 6.76 | 3.91 | 9.89 |
Hymba-1.5B-Base | 1.5B | 3GB | 22.95 | 1.36 | 7.69 | 5.18 | 10.25 | 0.78 | 8.04 |
Falcon-E-1B-Base | 1.8B | 635MB | 32.9 | 10.97 | 2.8 | 3.65 | 12.28 | 17.82 | 13.40 |
For 3B scale models
Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
---|---|---|---|---|---|---|---|---|---|
Falcon-3-3B-Base | 3B | 6.46GB | 15.74 | 11.78 | 21.58 | 6.27 | 18.09 | 6.26 | 15.74 |
Qwen2.5-3B | 3B | 6.17GB | 26.9 | 14.8 | 24.3 | 11.76 | 24.48 | 6.38 | 18.1 |
Falcon-E-3B-Base | 3B | 955MB | 36.67 | 13.45 | 8.67 | 4.14 | 19.83 | 27.16 | 18.32 |
Below are the results for instruction fine-tuned models:
For 1B scale models and below
Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
---|---|---|---|---|---|---|---|---|---|
Qwen-2.5-0.5B-Instruct | 500M | 1GB | 30.71 | 0 | 8.43 | 0.94 | 7.75 | 0 | 6.59 |
SmolLM2-360M-Instruct | 360M | 720MB | 38.42 | 1.51 | 4.17 | 2.77 | 1.3 | 0.67 | 8.14 |
Qwen-2.5-1.5B-Instruct | 1.5B | 3.1GB | 44.76 | 22.05 | 19.81 | 3.19 | 19.99 | 0.78 | 18.43 |
SmolLM2-1.7B | 1.7B | 3.4GB | 53.68 | 5.82 | 10.92 | 4.1 | 11.71 | 0 | 15.02 |
Falcon-3-1B-Instruct | 1.5B | 3GB | 55.57 | 6.34 | 12.96 | 10.56 | 9.32 | 2.24 | 16.16 |
Hymba-1.5B-Instruct | 1.5B | 3GB | 60.09 | 2.72 | 4.59 | 1.05 | 11.56 | 5.515 | 14.19 |
Falcon-E-1B-Instruct | 1.8B | 635MB | 54.35 | 9.12 | 16.5 | 2.51 | 19.42 | 9.64 | 18.59 |
For 3B scale models
Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
---|---|---|---|---|---|---|---|---|---|
Falcon-3-3B-Instruct | 3B | 6.46GB | 69.77 | 25 | 26.29 | 11.13 | 22.28 | 5.15 | 26.6 |
Qwen2.5-3B-Instruct | 3B | 6.17GB | 64.75 | 36.78 | 25.8 | 7.57 | 25.05 | 3.02 | 27.16 |
Falcon-E-3B-Instruct | 3B | 955MB | 60.97 | 15.3 | 23.59 | 2.12 | 26.45 | 7.45 | 22.64666667 |
Useful links
- View our release blogpost.
- Learn more about
onebitllms
library. - Feel free to join our discord server if you have any questions or to interact with our researchers and developers.
Citation
If the Falcon-E family of models were helpful to your work, feel free to give us a cite.
@misc{tiionebitllms,
title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.},
author = {Falcon-LLM Team},
month = {April},
url = {https://falcon-lm.github.io/blog/falcon-edge},
year = {2025}
}
- Downloads last month
- 582