LatentCOMP checkpoint (Llama-3.1-8B-Instruct_cocom_pretrain_both_ce1_compr64, best)

COCOM/PISCO-style compression model for RAG QA (LoRA adapters + memory-token embeddings).

  • Backbone: meta-llama/Llama-3.1-8B-Instruct
  • Mode: pretrain
  • Training form: both
  • Compression rate: x64 (docs up to 256 tokens -> 4 memory embeddings per doc)

This checkpoint directory stores adapters + small extras; the backbone meta-llama/Llama-3.1-8B-Instruct is loaded from Hugging Face at runtime.

Quickstart (end-to-end)

from modelling_pisco import COCOM

# If you pushed this folder to the Hub, use the repo id:
# repo_or_path = "jeongseokoh/Llama-3.1-8B-Instruct_cocom_pretrain_both_ce1_compr64_best"
#
# Otherwise, load locally from this checkpoint folder:
repo_or_path = "."

model = COCOM.from_pretrained(repo_or_path).to("cuda")

documents = [[
    "Doc A text ...",
    "Doc B text ..."
]]
questions = ["Question text ..."]

out = model.generate_from_text(
    questions=questions,
    documents=documents,
    max_new_tokens=64,
)
print(out)

Compress once, reuse many times

embs = model.compress_documents(documents=documents[0])
out = model.generate_from_compressed_documents_and_questions(
    questions=questions,
    compressed_documents=embs,
    max_new_tokens=64,
)

Tip: crop docs to ~256 tokens before compression for best speed/quality.

Downloads last month
62
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jeongseokoh/Llama-3.1-8B-Instruct_cocom_pretrain_both_ce1_compr64_best

Finetuned
(2164)
this model