You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Card for Model ID

Model Details

We introduce the first text-based and multimodal LLMs capable of sign language processing called SignAlignLM, and propose new prompting and fine-tuning strategies incorporating sign linguistic rules and conventions. We show that LLMs can be generalized interfaces for both spoken and signed languages if trained with a multitasking paradigm.

Tasks

As RWTH-PHOENIX-14T is a parallel corpus between spoken German and DGS, most previous research has focused on translation tasks between these languages. In this paper, we focus on translating DGS to German (broadly considered as a sign understanding or recognition task) and German to DGS (broadly considered as sign generation). In addition to these, we introduce additional tasks to test generalization. Specifically, we consider:

(G2T) DGS Gloss to German Text: a text-based translation task from textual intermediary representations of DGS (glosses) to German text.
(T2G) German Text to DGS Gloss: the inverse problem of the above and is text-based.
(V2T) DGS Videos to German Text: a multimodal task where the input is a video of a signer signing in DGS, and the output is German text.
(I-G2T) Intensified DGS Gloss to German Text: a text-based task with augmented DGS tokens. Additional symbols and are wrapped around glosses to depict intensity in the video that is not depicted in traditional gloss representations (Inan et al., 2022).
(T2I-G) German Text to Intensified DGS Gloss: the inverse problem of (I-G2T), still text-based.
(G2E) DGS Gloss to English Text: a novel task of cross-modal translation, where DGS glosses from the German Sign Language family are translated to English text from the spoken IndoEuropean language family. Without any pretraining, this is a difficult test of generalization and composition of contextualized meanings across spoken and signed languages.

To test generalizability and in-context learning, G2T is the only DGS task we use for any finetuning (see § 4.2). All the other tasks are used to evaluate the models’ performance.

Model Description

Developed by: Mert Inan
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: LLaMA adapter
Language(s) (NLP): American Sign Language, German Sign Language, English, German
License: [More Information Needed]
Finetuned from model [optional]: LLaMA-2 7B

Model Sources [optional]

Repository: https://github.com/Merterm/signAlignLM
Paper [optional]: SignAlignLM: Integrating Multimodal Sign Language Processing into Large Language Models
Demo [optional]: [More Information Needed]

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

@inproceedings{inan-etal-2025-signalignlm,
    title = "{S}ign{A}lign{LM}: Integrating Multimodal Sign Language Processing into Large Language Models",
    author = "Inan, Mert  and
      Sicilia, Anthony  and
      Alikhani, Malihe",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.190/",
    doi = "10.18653/v1/2025.findings-acl.190",
    pages = "3691--3706",
    ISBN = "979-8-89176-256-5",
    abstract = "Deaf and Hard-of-Hearing (DHH) users increasingly utilize Large Language Models (LLMs), yet face significant challenges due to these models' limited understanding of sign language grammar, multimodal sign inputs, and Deaf cultural contexts. Further, current approaches that try to address these limitations, frequently reduce sign language processing (SLP) to traditional translation tasks, neglecting the multimodal and linguistic complexity inherent in signed languages. In this paper, we present an empirical investigation informed by learning theory into natively integrating sign language support within LLMs, directly addressing the documented needs of DHH users. We introduce the first text-based and multimodal LLMs capable of sign language processing called SignAlignLM, and propose new prompting and fine-tuning strategies incorporating sign linguistic rules and conventions. We show that LLMs can be generalized interfaces for both spoken and signed languages if trained with a multitasking paradigm. Our code and model checkpoints are open-source."
}

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

Mert Inan

Model Card Contact

Mert Inan

Framework versions

PEFT 0.7.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for merterm/signAlignLM

Base model

meta-llama/Llama-2-7b-chat-hf

Adapter

(1174)

this model