---
license: mit
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: text-generation
tags:
- table
---
Model Card for TAMA-vB
Recent advances in table understanding have focused on instruction-tuning large language models (LLMs) for table-related tasks. However, existing research has overlooked the impact of hyperparameter choices, and also lacks a comprehensive evaluation of the out-of-domain table understanding ability and the general capabilities of these table LLMs. In this paper, we evaluate these abilities in existing table LLMs, and find significant declines in both out-of-domain table understanding and general capabilities as compared to their base models.
Through systematic analysis, we show that hyperparameters, such as learning rate, can significantly influence both table-specific and general capabilities. Contrary to the previous table instruction-tuning work, we demonstrate that smaller learning rates and fewer training instances can enhance table understanding while preserving general capabilities. Based on our findings, we introduce TAMA, a TAble LLM instruction-tuned from LLaMA 3.1 8B Instruct, which achieves performance on par with, or surpassing GPT-3.5 and GPT-4 on table tasks, while maintaining strong out-of-domain generalization and general capabilities. Our findings highlight the potential for reduced data annotation costs and more efficient model development through careful hyperparameter selection.
## 🚀 Model Details
### Model Description
- **Model type:** Text generation.
- **Language(s) (NLP):** English.
- **License:** [[License for Llama models](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE))]
- **Finetuned from model:** [[meta-llama/Llama-3.1-8b-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)]
### Model Sources
- **Repository:** [[github](https://github.com/MichiganNLP/TAMA)]
- **Paper:** [[paper](https://arxiv.org/abs/2501.14693)]
## Uses
TAMA is intended for the use in table understanding tasks and to facilitate future research.
## 🔨 How to Get Started with the Model
Use the code below to get started with the model.
Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
Make sure to update your transformers installation via `pip install --upgrade transformers`.
```
import transformers
import torch
model_id = "MichiganNLP/tama-vB"
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
pipeline("Hey how are you doing today?")
```
You may replace the prompt with table-specific instructions. We recommend using the following prompt structure:
```
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
appropriately completes the request.
### Instruction:
{instruction}
### Input:
{table_content}
### Question:
{question}
### Response:
```
## Training Details
### Training Data
[TAMA Instruct](https://huggingface.co/datasets/MichiganNLP/TAMA_Instruct).
### Training Procedure
We utilize the [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) library for model training and inference. Example YAML configuration files are provided [here](https://github.com/MichiganNLP/TAMA/blob/main/yamls/train.yaml).
The training command is:
```
llamafactory-cli train yamls/train.yaml
```
#### Training Hyperparameters
- **Training regime:** bf16
- **Training epochs:** 2.0
- **Learning rate scheduler:** linear
- **Cutoff length:** 2048
- **Learning rate**: 5e-7
## 📝 Evaluation
### Results
| Models |
FeTaQA |
HiTab |
TaFact |
FEVEROUS |
WikiTQ |
WikiSQL |
HybridQA |
TATQA |
AIT-QA |
TABMWP |
InfoTabs |
KVRET |
ToTTo |
TableGPTsubset |
TableBench |
| Metrics |
BLEU |
Acc |
Acc |
Acc |
Acc |
Acc |
Acc |
Acc |
Acc |
Acc |
Acc |
Micro F1 |
BLEU |
Acc |
ROUGE-L |
| GPT-3.5 |
26.49 |
43.62 |
67.41 |
60.79 |
53.13 |
41.91 |
40.22 |
31.38 |
84.13 |
46.30 |
56.00 |
54.56 |
16.81 |
54.80 |
27.75 |
| GPT-4 |
21.70 |
48.40 |
74.40 |
71.60 |
68.40 |
47.60 |
58.60 |
55.81 |
88.57 |
67.10 |
58.60 |
56.46 |
12.21 |
80.20 |
40.38 |
| base |
15.33 |
32.83 |
58.44 |
66.37 |
43.46 |
20.43 |
32.83 |
26.70 |
82.54 |
39.97 |
48.39 |
50.80 |
13.24 |
53.60 |
23.47 |
| TAMA |
35.37 |
63.51 |
73.82 |
77.39 |
52.88 |
68.31 |
60.86 |
48.47 |
89.21 |
65.09 |
64.54 |
43.94 |
37.94 |
53.60 |
28.60 |
**Note these results are corresponding to the [TAMA-vA](https://huggingface.co/MichiganNLP/TAMA-vA) checkpoint. We release the TAMA-vB checkpoints for the purpose of facilitating future research.**
We make the number bold if it is the best among the four, we underline the number if it is at the second place.
Please refer to our [paper](https://arxiv.org/abs/2501.14693) for additional details.
#### Metrics
Please refer to our [paper](https://arxiv.org/abs/2501.14693) for additional details.
#### Summary
Notably, as an 8B model, TAMA demonstrates strong table understanding ability, outperforming GPT-3.5 on most of the table understanding benchmarks, even achieving performance on par or better than GPT-4.
## Technical Specifications
### Model Architecture and Objective
We base our model on the [Llama-3.1-8B-Instruct model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
We instruction tune the model on a set of 2,600 table instructions.
### Compute Infrastructure
#### Hardware
We conduct our experiments on A40 and A100 GPUs.
#### Software
We leverage the [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) for model training.
## Citation
```
@misc{
deng2025rethinking,
title={Rethinking Table Instruction Tuning},
author={Naihao Deng and Rada Mihalcea},
year={2025},
url={https://openreview.net/forum?id=GLmqHCwbOJ}
}
```
## Model Card Authors
Naihao Deng
## Model Card Contact
Naihao Deng