richardyoung's picture
Initial upload: Best performing cardiology embedding model (separation: 0.510)
bf89a26 verified
metadata
library_name: peft
base_model: michiyasunaga/BioLinkBERT-large
tags:
  - medical
  - cardiology
  - embeddings
  - domain-adaptation
  - lora
  - sentence-transformers
  - sentence-similarity
language:
  - en
license: apache-2.0

CardioEmbed-BioLinkBERT

Domain-specialized cardiology text embeddings using LoRA-adapted BioLinkBERT-large

This is the best performing model from our comparative study of 10 embedding architectures for clinical cardiology.

Performance

Metric Score
Separation Score 0.510
Similar Pair Avg 0.811
Different Pair Avg 0.301
Throughput 143.5 emb/sec
Memory 1.51 GB

Usage

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModel.from_pretrained("michiyasunaga/BioLinkBERT-large")
tokenizer = AutoTokenizer.from_pretrained("michiyasunaga/BioLinkBERT-large")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "richardyoung/CardioEmbed-BioLinkBERT")

# Generate embeddings
text = "Atrial fibrillation with rapid ventricular response"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)

Training

  • Training Data: 106,535 cardiology text pairs from medical textbooks
  • Method: LoRA fine-tuning (r=16, alpha=32)
  • Loss: Multiple Negatives Ranking Loss (InfoNCE)

Citation

@article{young2024comparative,
  title={Comparative Analysis of LoRA-Adapted Embedding Models for Clinical Cardiology Text Representation},
  author={Young, Richard J and Matthews, Alice M},
  journal={arXiv preprint},
  year={2024}
}

Related Models

This is part of the CardioEmbed model family. See richardyoung/CardioEmbed for more models.