CLaRa-7B-Base (Compression-16 & 128)

The CLaRa-7B-Base model is our foundational unified RAG model with built-in semantic document compression (16ร— and 128x).
It provides a base compressor + generator capable of producing answers directly from compressed document representations.

Training recipe: Trained using QA-guided semantic compression and paraphrase consistency objectives.
Benchmarks: Strong baseline performance across multi-hop QA tasks under a 16ร— compression ratio.


More details and usage examples:

Paper: CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
GitHub: https://github.com/apple/ml-clara


Example Usage

from transformers import AutoModel

unirag = AutoModel.from_pretrained(
    "/mnt/ceph_rbd/model/CLaRa-7B-Base/compression-16",
    trust_remote_code=True
).to("cuda")

documents = [
    [
        "Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...",
        "Hagsatera is a genus of orchids native to Mexico and Guatemala...",
        "Alsobia is a genus of flowering plants native to Mexico and Central America..."
    ]
]

questions = [""]

out = unirag.generate_from_paraphrase(
    questions=questions,
    documents=documents,
    max_new_tokens=64
)

print("Generated answer:", out)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for apple/CLaRa-7B-Base

Finetuned
(1047)
this model