C-EBERT

A multi-task model to extract causal attribution from German texts.

Model details

  • Model architecture: EuroBERT-610m with two custom classification heads (one for token span and one for relation).
  • Fine-tuned on: A custom corpus focused on environmental causal attribution in German.
    Task Output Type Labels / Classes
    1. Token Classification Sequence Labeling (BIO) 5 Span Labels (O, B-INDICATOR, I-INDICATOR, B-ENTITY, I-ENTITY)
    2. Relation Classification Sentence-Pair Classification 14 Relation Labels (e.g., MONO_POS_CAUSE, DIST_NEG_EFFECT, INTERDEPENDENCY, NO_RELATION)

Usage

Find the custom library. Once installed, run inference like so:

from causalbert.infer import load_model, sentence_analysis

# NOTE: The model path accepts either a local directory or a Hugging Face Hub ID.
model, tokenizer, config, device = load_model("pdjohn/C-EBERT")

# Analyze a batch of sentences
sentences = ["Autoverkehr verursacht Bienensterben.", "Lärm ist der Grund für Stress."]

all_results = sentence_analysis(
    model, 
    tokenizer, 
    config, 
    sentences, 
    batch_size=8
)

# The result is a list of dictionaries containing token_predictions and derived_relations.
print(all_results[0]['derived_relations'])
# Example Output:
# [(['Autoverkehr', 'verursacht'], ['Bienensterben']), {'label': 'MONO_POS_CAUSE', 'confidence': 0.954}]

Training

  • Base model: EuroBERT/EuroBERT-610m
  • Training Parameters:
    • Epochs: 8
    • Learning Rate: 1e-4
    • Batch size: 32
    • PEFT/LoRA: Enabled with r = 16 See train.py for the full configuration details.
Downloads last month
40
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pdjohn/C-EBERT-610m

Finetuned
(16)
this model

Collection including pdjohn/C-EBERT-610m