MaliosDark
/

sofia-embedding-v1

@@ -1,3 +1,28 @@
 # SOFIA: SOFt Intel Artificial Embedding Model
 **SOFIA** (SOFt Intel Artificial) is a cutting-edge sentence embedding model developed by Zunvra.com, engineered to provide high-fidelity text representations for advanced natural language processing applications. Leveraging the powerful `sentence-transformers/all-mpnet-base-v2` as its foundation, SOFIA employs sophisticated fine-tuning methodologies including Low-Rank Adaptation (LoRA) and a dual-loss optimization strategy (cosine similarity and triplet loss) to excel in semantic comprehension and information retrieval.
@@ -142,6 +167,41 @@ Based on training metrics and similar models, SOFIA is expected to achieve:
 These expectations are conservative; actual performance may exceed based on task-specific fine-tuning.
 ## Evaluation
 ### Recommended Benchmarks
@@ -207,6 +267,10 @@ Zunvra.com is committed to responsible AI development:
 - transformers >= 4.35.0
 - numpy >= 1.21.0
 ### System Requirements
 - **Minimum**: CPU with 8GB RAM
@@ -268,6 +332,16 @@ clusters = kmeans.fit_predict(embeddings)
 print(clusters)  # [0, 0, 1, 1]
 ```
 ## Deployment
 ### Local Deployment
@@ -278,6 +352,20 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
 ```
 ### API Deployment
 ```python
@@ -343,147 +431,3 @@ We welcome contributions to improve SOFIA:
 ---
 *SOFIA: Intelligent embeddings for the future of AI.*
-## Hugging Face Model Card Upgrades
-Your model is live on Hugging Face! It loads correctly as **MPNet + mean pooling + Dense(768→1024)**, matching your configuration files. Here are **drop-in upgrades** to enhance your model card with widgets, metrics, and better discoverability.
-### 1. YAML Front Matter (Required)
-Add this to the **very top** of your README.md (before the title) to enable Hugging Face features:
-```yaml
----
-library_name: sentence-transformers
-license: apache-2.0
-pipeline_tag: sentence-similarity
-tags:
-  - embeddings
-  - sentence-transformers
-  - mpnet
-  - lora
-  - triplet-loss
-  - cosine-similarity
-  - retrieval
-  - mteb
-language:
-  - en
-datasets:
-  - sentence-transformers/stsb
-  - paws
-  - banking77
-  - mteb/nq
-widget:
-  - text: "Hello world"
-  - text: "How are you?"
----
-```
-### 2. License File (Required)
-Create a `LICENSE` file in your repo root with the full Apache 2.0 text. Hugging Face will auto-detect it.
-### 3. MTEB Metrics Block (Recommended)
-To display performance metrics on your model card:
-**Step A: Run evaluation locally**
-```bash
-python -c "
-from mteb import MTEB
-from sentence_transformers import SentenceTransformer
-model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
-tasks = ['STS12', 'STS13', 'STS14', 'STS15', 'STS16', 'STSBenchmark']
-MTEB(tasks=tasks).run(model, output_folder='./mteb_results')
-"
-```
-**Step B: Add metrics placeholder to README**
-```markdown
-<!-- METRICS_START -->
-_TBD_
-<!-- METRICS_END -->
-```
-**Step C: Inject results automatically**
-```bash
-python -c "
-import json, glob, re
-from pathlib import Path
-results = []
-for f in glob.glob('mteb_results/*/*/results.json'):
-    data = json.load(open(f))
-    task = data['mteb_dataset_name']
-    main = data.get('main_score')
-    pearson = data.get('test', {}).get('cos_sim', {}).get('pearson')
-    spearman = data.get('test', {}).get('cos_sim', {}).get('spearman')
-    results.append((task, main, pearson, spearman))
-lines = ['model-index:', '- name: sofia-embedding-v1', '  results:']
-for task, main, p, s in sorted(results):
-    m = f'{main:.4f}' if main else 'null'
-    pe = f'{p:.4f}' if p else 'null'
-    sp = f'{s:.4f}' if s else 'null'
-    lines.extend([
-        f'  - task: {{type: sts, name: STS}}',
-        f'    dataset: {{name: {task}, type: mteb/{task}}}',
-        '    metrics:',
-        f'    - type: main_score',
-        f'      value: {m}',
-        f'    - type: pearson',
-        f'      value: {pe}',
-        f'    - type: spearman',
-        f'      value: {sp}'
-    ])
-block = '```\n' + '\n'.join(lines) + '\n```'
-readme = Path('README.md').read_text()
-readme = re.sub(r'<!-- METRICS_START -->.*?<!-- METRICS_END -->',
-                f'<!-- METRICS_START -->\n{block}\n<!-- METRICS_END -->',
-                readme, flags=re.S)
-Path('README.md').write_text(readme)
-print('Metrics injected into README!')
-"
-```
-### 4. Inference Configuration (Already Correct)
-Your model correctly outputs 1024-dimensional embeddings with mean pooling. No changes needed.
-### 5. Prompted Retrieval Mode (Optional)
-For better zero-shot retrieval, update `config_sentence_transformers.json`:
-```json
-{
-  "__version__": { "sentence_transformers": "5.1.0" },
-  "model_type": "SentenceTransformer",
-  "prompts": { "query": "Query: ", "document": "Document: " },
-  "default_prompt_name": null,
-  "similarity_fn_name": "cosine"
-}
-```
-### 6. Usage Examples
-Add these minimal code snippets to your README:
-**Python:**
-```python
-from sentence_transformers import SentenceTransformer, util
-model = SentenceTransformer("MaliosDark/sofia-embedding-v1")
-sentences = ["Hello world", "How are you?"]
-embeddings = model.encode(sentences, normalize_embeddings=True)
-similarity = util.cos_sim(embeddings[0], embeddings[1])
-print(similarity.item())  # ~0.9
-```
-**JavaScript/Node.js:**
-```javascript
-import { SentenceTransformer } from "sentence-transformers";
-const model = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
-const embeddings = await model.encode(["hello", "world"], { normalize: true });
-console.log(embeddings[0].length); // 1024
-```
-### Ready-to-Use README Template
-Want a complete PR-ready README with all upgrades applied? Let me know and I'll generate it based on your current model card.
-[View on Hugging Face](https://huggingface.co/MaliosDark/sofia-embedding-v1)

+---
+library_name: sentence-transformers
+license: apache-2.0
+pipeline_tag: sentence-similarity
+tags:
+  - embeddings
+  - sentence-transformers
+  - mpnet
+  - lora
+  - triplet-loss
+  - cosine-similarity
+  - retrieval
+  - mteb
+language:
+  - en
+datasets:
+  - sentence-transformers/stsb
+  - paws
+  - banking77
+  - mteb/nq
+widget:
+  - text: "Hello world"
+  - text: "How are you?"
+---
 # SOFIA: SOFt Intel Artificial Embedding Model
 **SOFIA** (SOFt Intel Artificial) is a cutting-edge sentence embedding model developed by Zunvra.com, engineered to provide high-fidelity text representations for advanced natural language processing applications. Leveraging the powerful `sentence-transformers/all-mpnet-base-v2` as its foundation, SOFIA employs sophisticated fine-tuning methodologies including Low-Rank Adaptation (LoRA) and a dual-loss optimization strategy (cosine similarity and triplet loss) to excel in semantic comprehension and information retrieval.
 These expectations are conservative; actual performance may exceed based on task-specific fine-tuning.
+<!-- METRICS_START -->
+```
+model-index:
+- name: sofia-embedding-v1
+  results:
+  - task: {type: sts, name: STS}
+    dataset: {name: STS12, type: mteb/STS12}
+    metrics:
+    - type: main_score
+      value: 0.6064
+    - type: pearson
+      value: 0.6850
+    - type: spearman
+      value: 0.6064
+  - task: {type: sts, name: STS}
+    dataset: {name: STS13, type: mteb/STS13}
+    metrics:
+    - type: main_score
+      value: 0.7340
+    - type: pearson
+      value: 0.7374
+    - type: spearman
+      value: 0.7340
+  - task: {type: sts, name: STS}
+    dataset: {name: BIOSSES, type: mteb/BIOSSES}
+    metrics:
+    - type: main_score
+      value: 0.6387
+    - type: pearson
+      value: 0.6697
+    - type: spearman
+      value: 0.6387
+```
+<!-- METRICS_END -->
 ## Evaluation
 ### Recommended Benchmarks
 - transformers >= 4.35.0
 - numpy >= 1.21.0
+### License
+SOFIA is released under the Apache License 2.0. A copy of the license is included in the repository as `LICENSE`.
 ### System Requirements
 - **Minimum**: CPU with 8GB RAM
 print(clusters)  # [0, 0, 1, 1]
 ```
+### JavaScript/Node.js Usage
+```javascript
+import { SentenceTransformer } from "sentence-transformers";
+const model = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
+const embeddings = await model.encode(["hello", "world"], { normalize: true });
+console.log(embeddings[0].length); // 1024
+```
 ## Deployment
 ### Local Deployment
 model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
 ```
+### Hugging Face Hub Deployment
+SOFIA is available on the Hugging Face Hub for easy integration:
+```python
+from sentence_transformers import SentenceTransformer
+# Load from Hugging Face Hub
+model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
+# The model includes interactive widgets for testing
+# Visit: https://huggingface.co/MaliosDark/sofia-embedding-v1
+```
 ### API Deployment
 ```python
 ---
 *SOFIA: Intelligent embeddings for the future of AI.*