ado_all-MiniLM-L6-v2_additive_gcn_h512_o64_cosine_e1024_early

This is a sentence-transformers model created with on2vec, which augments text embeddings with ontological knowledge using Graph Neural Networks.

Model Details

  • Base Text Model: all-MiniLM-L6-v2
    • Text Embedding Dimension: 384
  • Ontology: ado.owl
  • Domain: general
  • Ontology Concepts: 1,963
  • Concept Alignment: 1,963/1,963 (100.0%)
  • Fusion Method: additive
  • GNN Architecture: GCN
  • Structural Embedding Dimension: 1963
  • Output Embedding Dimension: 64
  • Hidden Dimensions: 512
  • Dropout: 0.0
  • Training Date: 2025-09-19
  • on2vec Version: 0.1.0
  • Source Ontology Size: 5.2 MB
  • Model Size: 103.2 MB
  • Library: on2vec + sentence-transformers

Technical Architecture

This model uses a multi-stage architecture:

  1. Text Encoding: Input text is encoded using the base sentence-transformer model
  2. Ontological Embedding: Pre-trained GNN embeddings capture structural relationships
  3. Fusion Layer: Simple concatenation of text and ontological embeddings

Embedding Flow:

  • Text: 384 dimensions β†’ 512 hidden β†’ 64 output
  • Structure: 1963 concepts β†’ GNN β†’ 64 output
  • Fusion: additive β†’ Final embedding

How It Works

This model combines:

  1. Text Embeddings: Generated using the base sentence-transformer model
  2. Ontological Embeddings: Created by training Graph Neural Networks on OWL ontology structure
  3. Fusion Layer: Combines both embedding types using the specified fusion method

The ontological knowledge helps the model better understand domain-specific relationships and concepts.

Usage

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer('ado_all-MiniLM-L6-v2_additive_gcn_h512_o64_cosine_e1024_early')

# Generate embeddings
sentences = ['Example sentence 1', 'Example sentence 2']
embeddings = model.encode(sentences)

# Compute similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(embeddings[0], embeddings[1])

Training Process

This model was created using the on2vec pipeline:

  1. Ontology Processing: The OWL ontology was converted to a graph structure
  2. GNN Training: Graph Neural Networks were trained to learn ontological relationships
  3. Text Integration: Base model text embeddings were combined with ontological embeddings
  4. Fusion Training: The fusion layer was trained to optimally combine both embedding types

Intended Use

This model is particularly effective for:

  • General domain text processing
  • Tasks requiring understanding of domain-specific relationships
  • Semantic similarity in specialized domains
  • Classification tasks with domain knowledge requirements

Limitations

  • Performance may vary on domains different from the training ontology
  • Ontological knowledge is limited to concepts present in the source OWL file
  • May have higher computational requirements than vanilla text models

Citation

If you use this model, please cite the on2vec framework:

@software{on2vec,
  title={on2vec: Ontology Embeddings with Graph Neural Networks},
  author={David Steinberg},
  url={https://github.com/david4096/on2vec},
  year={2024}
}

Created with on2vec πŸ§¬β†’πŸ€–

Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support