Safetensors
English
llama

Model Card

Model summary

This is a continual-pre-training of Llama-3.2-3B on a mix of πŸ“ FineMath (our new high quality math dataset) and FineWeb-Edu.

The model demonstrates superior math performance compared to Llama 3.2 3B, while maintaining similar performance on knowledge, reasoning, and common sense benchmarks:

image/png

It was trained on 160B tokens using a mix of 40% FineWeb-Edu and 60% from FineMath (30% FineMath-4+ subset and 30% InfiWebMath-4+ subset). We use nanotron for the training, and you can find the training scripts in our SmolLM2 GitHub repo.

Use

Intended use

This model was trained on English math data and is not instruction-tuned, making it intended for text completion in English. It is part of the FineMath ablation models we trained for FineMath (finemath-ablation-4plus-160B), and is not necessarily the best possible outcome achievable with the given dataset.

Generation

# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model = "HuggingFaceTB/FineMath-Llama-3B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model).to(device)

inputs = tokenizer.encode("Machine Learning is", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Training

Model

  • Architecture: Llama3
  • Pretraining steps: 160k
  • Pretraining tokens: 160B
  • Precision: bfloat16

Hardware

  • GPUs: 64 H100

Software

Evaluation

We used the SmolLM2 setup to evaluate all our ablation models with lighteval. You can find the details here: https://github.com/huggingface/smollm/tree/main/evaluation#smollm2-base-models

Limitations

This model was predominantly trained on English math data, potentially limiting its performance in other languages. Furthermore, the model's behavior is influenced by the quality and diversity of its training data, which may include biases and harmful content.

Downloads last month
158
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for HuggingFaceTB/FineMath-Llama-3B

Finetuned
(341)
this model
Finetunes
4 models
Quantizations
7 models

Dataset used to train HuggingFaceTB/FineMath-Llama-3B

Collection including HuggingFaceTB/FineMath-Llama-3B