--- license: mit language: - en base_model: - meta-llama/Llama-3.1-8B-Instruct pipeline_tag: text-generation tags: - table ---

Model Card for TAMA-vB

Recent advances in table understanding have focused on instruction-tuning large language models (LLMs) for table-related tasks. However, existing research has overlooked the impact of hyperparameter choices, and also lacks a comprehensive evaluation of the out-of-domain table understanding ability and the general capabilities of these table LLMs. In this paper, we evaluate these abilities in existing table LLMs, and find significant declines in both out-of-domain table understanding and general capabilities as compared to their base models. Through systematic analysis, we show that hyperparameters, such as learning rate, can significantly influence both table-specific and general capabilities. Contrary to the previous table instruction-tuning work, we demonstrate that smaller learning rates and fewer training instances can enhance table understanding while preserving general capabilities. Based on our findings, we introduce TAMA, a TAble LLM instruction-tuned from LLaMA 3.1 8B Instruct, which achieves performance on par with, or surpassing GPT-3.5 and GPT-4 on table tasks, while maintaining strong out-of-domain generalization and general capabilities. Our findings highlight the potential for reduced data annotation costs and more efficient model development through careful hyperparameter selection. ## 🚀 Model Details ### Model Description - **Model type:** Text generation. - **Language(s) (NLP):** English. - **License:** [[License for Llama models](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE))] - **Finetuned from model:** [[meta-llama/Llama-3.1-8b-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)] ### Model Sources - **Repository:** [[github](https://github.com/MichiganNLP/TAMA)] - **Paper:** [[paper](https://arxiv.org/abs/2501.14693)] ## Uses TAMA is intended for the use in table understanding tasks and to facilitate future research. ## 🔨 How to Get Started with the Model Use the code below to get started with the model. Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function. Make sure to update your transformers installation via `pip install --upgrade transformers`. ``` import transformers import torch model_id = "MichiganNLP/tama-vB" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto" ) pipeline("Hey how are you doing today?") ``` You may replace the prompt with table-specific instructions. We recommend using the following prompt structure: ``` Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Input: {table_content} ### Question: {question} ### Response: ``` ## Training Details ### Training Data [TAMA Instruct](https://huggingface.co/datasets/MichiganNLP/TAMA_Instruct). ### Training Procedure We utilize the [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) library for model training and inference. Example YAML configuration files are provided [here](https://github.com/MichiganNLP/TAMA/blob/main/yamls/train.yaml). The training command is: ``` llamafactory-cli train yamls/train.yaml ``` #### Training Hyperparameters - **Training regime:** bf16 - **Training epochs:** 2.0 - **Learning rate scheduler:** linear - **Cutoff length:** 2048 - **Learning rate**: 5e-7 ## 📝 Evaluation ### Results

Models	FeTaQA	HiTab	TaFact	FEVEROUS	WikiTQ	WikiSQL	HybridQA	TATQA	AIT-QA	TABMWP	InfoTabs	KVRET	ToTTo	TableGPT_subset	TableBench
Metrics	BLEU	Acc	Acc	Acc	Acc	Acc	Acc	Acc	Acc	Acc	Acc	Micro F1	BLEU	Acc	ROUGE-L
GPT-3.5	26.49	43.62	67.41	60.79	53.13	41.91	40.22	31.38	84.13	46.30	56.00	54.56	16.81	54.80	27.75
GPT-4	21.70	48.40	74.40	71.60	68.40	47.60	58.60	55.81	88.57	67.10	58.60	56.46	12.21	80.20	40.38
base	15.33	32.83	58.44	66.37	43.46	20.43	32.83	26.70	82.54	39.97	48.39	50.80	13.24	53.60	23.47
TAMA	35.37	63.51	73.82	77.39	52.88	68.31	60.86	48.47	89.21	65.09	64.54	43.94	37.94	53.60	28.60

**Note these results are corresponding to the [TAMA-vA](https://huggingface.co/MichiganNLP/TAMA-vA) checkpoint. We release the TAMA-vB checkpoints for the purpose of facilitating future research.** We make the number bold if it is the best among the four, we underline the number if it is at the second place. Please refer to our [paper](https://arxiv.org/abs/2501.14693) for additional details. #### Metrics Please refer to our [paper](https://arxiv.org/abs/2501.14693) for additional details. #### Summary Notably, as an 8B model, TAMA demonstrates strong table understanding ability, outperforming GPT-3.5 on most of the table understanding benchmarks, even achieving performance on par or better than GPT-4. ## Technical Specifications ### Model Architecture and Objective We base our model on the [Llama-3.1-8B-Instruct model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). We instruction tune the model on a set of 2,600 table instructions. ### Compute Infrastructure #### Hardware We conduct our experiments on A40 and A100 GPUs. #### Software We leverage the [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) for model training. ## Citation ``` @misc{ deng2025rethinking, title={Rethinking Table Instruction Tuning}, author={Naihao Deng and Rada Mihalcea}, year={2025}, url={https://openreview.net/forum?id=GLmqHCwbOJ} } ``` ## Model Card Authors Naihao Deng ## Model Card Contact Naihao Deng