LatentCOMP checkpoint (Llama-3.1-8B-Instruct_cocom_pretrain_both_ce1_compr64, best)
COCOM/PISCO-style compression model for RAG QA (LoRA adapters + memory-token embeddings).
- Backbone:
meta-llama/Llama-3.1-8B-Instruct - Mode:
pretrain - Training form:
both - Compression rate: x64 (docs up to 256 tokens -> 4 memory embeddings per doc)
This checkpoint directory stores adapters + small extras; the backbone meta-llama/Llama-3.1-8B-Instruct is loaded from Hugging Face at runtime.
Quickstart (end-to-end)
from modelling_pisco import COCOM
# If you pushed this folder to the Hub, use the repo id:
# repo_or_path = "jeongseokoh/Llama-3.1-8B-Instruct_cocom_pretrain_both_ce1_compr64_best"
#
# Otherwise, load locally from this checkpoint folder:
repo_or_path = "."
model = COCOM.from_pretrained(repo_or_path).to("cuda")
documents = [[
"Doc A text ...",
"Doc B text ..."
]]
questions = ["Question text ..."]
out = model.generate_from_text(
questions=questions,
documents=documents,
max_new_tokens=64,
)
print(out)
Compress once, reuse many times
embs = model.compress_documents(documents=documents[0])
out = model.generate_from_compressed_documents_and_questions(
questions=questions,
compressed_documents=embs,
max_new_tokens=64,
)
Tip: crop docs to ~256 tokens before compression for best speed/quality.
- Downloads last month
- 62
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for jeongseokoh/Llama-3.1-8B-Instruct_cocom_pretrain_both_ce1_compr64_best
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct