MaliosDark commited on
Commit
64b1099
·
verified ·
1 Parent(s): 8a51e3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -144
README.md CHANGED
@@ -1,3 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # SOFIA: SOFt Intel Artificial Embedding Model
2
 
3
  **SOFIA** (SOFt Intel Artificial) is a cutting-edge sentence embedding model developed by Zunvra.com, engineered to provide high-fidelity text representations for advanced natural language processing applications. Leveraging the powerful `sentence-transformers/all-mpnet-base-v2` as its foundation, SOFIA employs sophisticated fine-tuning methodologies including Low-Rank Adaptation (LoRA) and a dual-loss optimization strategy (cosine similarity and triplet loss) to excel in semantic comprehension and information retrieval.
@@ -142,6 +167,41 @@ Based on training metrics and similar models, SOFIA is expected to achieve:
142
 
143
  These expectations are conservative; actual performance may exceed based on task-specific fine-tuning.
144
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  ## Evaluation
146
 
147
  ### Recommended Benchmarks
@@ -207,6 +267,10 @@ Zunvra.com is committed to responsible AI development:
207
  - transformers >= 4.35.0
208
  - numpy >= 1.21.0
209
 
 
 
 
 
210
  ### System Requirements
211
 
212
  - **Minimum**: CPU with 8GB RAM
@@ -268,6 +332,16 @@ clusters = kmeans.fit_predict(embeddings)
268
  print(clusters) # [0, 0, 1, 1]
269
  ```
270
 
 
 
 
 
 
 
 
 
 
 
271
  ## Deployment
272
 
273
  ### Local Deployment
@@ -278,6 +352,20 @@ from sentence_transformers import SentenceTransformer
278
  model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
279
  ```
280
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
281
  ### API Deployment
282
 
283
  ```python
@@ -343,147 +431,3 @@ We welcome contributions to improve SOFIA:
343
  ---
344
 
345
  *SOFIA: Intelligent embeddings for the future of AI.*
346
-
347
- ## Hugging Face Model Card Upgrades
348
-
349
- Your model is live on Hugging Face! It loads correctly as **MPNet + mean pooling + Dense(768→1024)**, matching your configuration files. Here are **drop-in upgrades** to enhance your model card with widgets, metrics, and better discoverability.
350
-
351
- ### 1. YAML Front Matter (Required)
352
- Add this to the **very top** of your README.md (before the title) to enable Hugging Face features:
353
-
354
- ```yaml
355
- ---
356
- library_name: sentence-transformers
357
- license: apache-2.0
358
- pipeline_tag: sentence-similarity
359
- tags:
360
- - embeddings
361
- - sentence-transformers
362
- - mpnet
363
- - lora
364
- - triplet-loss
365
- - cosine-similarity
366
- - retrieval
367
- - mteb
368
- language:
369
- - en
370
- datasets:
371
- - sentence-transformers/stsb
372
- - paws
373
- - banking77
374
- - mteb/nq
375
- widget:
376
- - text: "Hello world"
377
- - text: "How are you?"
378
- ---
379
- ```
380
-
381
- ### 2. License File (Required)
382
- Create a `LICENSE` file in your repo root with the full Apache 2.0 text. Hugging Face will auto-detect it.
383
-
384
- ### 3. MTEB Metrics Block (Recommended)
385
- To display performance metrics on your model card:
386
-
387
- **Step A: Run evaluation locally**
388
- ```bash
389
- python -c "
390
- from mteb import MTEB
391
- from sentence_transformers import SentenceTransformer
392
- model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
393
- tasks = ['STS12', 'STS13', 'STS14', 'STS15', 'STS16', 'STSBenchmark']
394
- MTEB(tasks=tasks).run(model, output_folder='./mteb_results')
395
- "
396
- ```
397
-
398
- **Step B: Add metrics placeholder to README**
399
- ```markdown
400
- <!-- METRICS_START -->
401
- _TBD_
402
- <!-- METRICS_END -->
403
- ```
404
-
405
- **Step C: Inject results automatically**
406
- ```bash
407
- python -c "
408
- import json, glob, re
409
- from pathlib import Path
410
-
411
- results = []
412
- for f in glob.glob('mteb_results/*/*/results.json'):
413
- data = json.load(open(f))
414
- task = data['mteb_dataset_name']
415
- main = data.get('main_score')
416
- pearson = data.get('test', {}).get('cos_sim', {}).get('pearson')
417
- spearman = data.get('test', {}).get('cos_sim', {}).get('spearman')
418
- results.append((task, main, pearson, spearman))
419
-
420
- lines = ['model-index:', '- name: sofia-embedding-v1', ' results:']
421
- for task, main, p, s in sorted(results):
422
- m = f'{main:.4f}' if main else 'null'
423
- pe = f'{p:.4f}' if p else 'null'
424
- sp = f'{s:.4f}' if s else 'null'
425
- lines.extend([
426
- f' - task: {{type: sts, name: STS}}',
427
- f' dataset: {{name: {task}, type: mteb/{task}}}',
428
- ' metrics:',
429
- f' - type: main_score',
430
- f' value: {m}',
431
- f' - type: pearson',
432
- f' value: {pe}',
433
- f' - type: spearman',
434
- f' value: {sp}'
435
- ])
436
-
437
- block = '```\n' + '\n'.join(lines) + '\n```'
438
- readme = Path('README.md').read_text()
439
- readme = re.sub(r'<!-- METRICS_START -->.*?<!-- METRICS_END -->',
440
- f'<!-- METRICS_START -->\n{block}\n<!-- METRICS_END -->',
441
- readme, flags=re.S)
442
- Path('README.md').write_text(readme)
443
- print('Metrics injected into README!')
444
- "
445
- ```
446
-
447
- ### 4. Inference Configuration (Already Correct)
448
- Your model correctly outputs 1024-dimensional embeddings with mean pooling. No changes needed.
449
-
450
- ### 5. Prompted Retrieval Mode (Optional)
451
- For better zero-shot retrieval, update `config_sentence_transformers.json`:
452
-
453
- ```json
454
- {
455
- "__version__": { "sentence_transformers": "5.1.0" },
456
- "model_type": "SentenceTransformer",
457
- "prompts": { "query": "Query: ", "document": "Document: " },
458
- "default_prompt_name": null,
459
- "similarity_fn_name": "cosine"
460
- }
461
- ```
462
-
463
- ### 6. Usage Examples
464
- Add these minimal code snippets to your README:
465
-
466
- **Python:**
467
- ```python
468
- from sentence_transformers import SentenceTransformer, util
469
-
470
- model = SentenceTransformer("MaliosDark/sofia-embedding-v1")
471
- sentences = ["Hello world", "How are you?"]
472
- embeddings = model.encode(sentences, normalize_embeddings=True)
473
- similarity = util.cos_sim(embeddings[0], embeddings[1])
474
- print(similarity.item()) # ~0.9
475
- ```
476
-
477
- **JavaScript/Node.js:**
478
- ```javascript
479
- import { SentenceTransformer } from "sentence-transformers";
480
-
481
- const model = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
482
- const embeddings = await model.encode(["hello", "world"], { normalize: true });
483
- console.log(embeddings[0].length); // 1024
484
- ```
485
-
486
- ### Ready-to-Use README Template
487
- Want a complete PR-ready README with all upgrades applied? Let me know and I'll generate it based on your current model card.
488
-
489
- [View on Hugging Face](https://huggingface.co/MaliosDark/sofia-embedding-v1)
 
1
+ ---
2
+ library_name: sentence-transformers
3
+ license: apache-2.0
4
+ pipeline_tag: sentence-similarity
5
+ tags:
6
+ - embeddings
7
+ - sentence-transformers
8
+ - mpnet
9
+ - lora
10
+ - triplet-loss
11
+ - cosine-similarity
12
+ - retrieval
13
+ - mteb
14
+ language:
15
+ - en
16
+ datasets:
17
+ - sentence-transformers/stsb
18
+ - paws
19
+ - banking77
20
+ - mteb/nq
21
+ widget:
22
+ - text: "Hello world"
23
+ - text: "How are you?"
24
+ ---
25
+
26
  # SOFIA: SOFt Intel Artificial Embedding Model
27
 
28
  **SOFIA** (SOFt Intel Artificial) is a cutting-edge sentence embedding model developed by Zunvra.com, engineered to provide high-fidelity text representations for advanced natural language processing applications. Leveraging the powerful `sentence-transformers/all-mpnet-base-v2` as its foundation, SOFIA employs sophisticated fine-tuning methodologies including Low-Rank Adaptation (LoRA) and a dual-loss optimization strategy (cosine similarity and triplet loss) to excel in semantic comprehension and information retrieval.
 
167
 
168
  These expectations are conservative; actual performance may exceed based on task-specific fine-tuning.
169
 
170
+ <!-- METRICS_START -->
171
+ ```
172
+ model-index:
173
+ - name: sofia-embedding-v1
174
+ results:
175
+ - task: {type: sts, name: STS}
176
+ dataset: {name: STS12, type: mteb/STS12}
177
+ metrics:
178
+ - type: main_score
179
+ value: 0.6064
180
+ - type: pearson
181
+ value: 0.6850
182
+ - type: spearman
183
+ value: 0.6064
184
+ - task: {type: sts, name: STS}
185
+ dataset: {name: STS13, type: mteb/STS13}
186
+ metrics:
187
+ - type: main_score
188
+ value: 0.7340
189
+ - type: pearson
190
+ value: 0.7374
191
+ - type: spearman
192
+ value: 0.7340
193
+ - task: {type: sts, name: STS}
194
+ dataset: {name: BIOSSES, type: mteb/BIOSSES}
195
+ metrics:
196
+ - type: main_score
197
+ value: 0.6387
198
+ - type: pearson
199
+ value: 0.6697
200
+ - type: spearman
201
+ value: 0.6387
202
+ ```
203
+ <!-- METRICS_END -->
204
+
205
  ## Evaluation
206
 
207
  ### Recommended Benchmarks
 
267
  - transformers >= 4.35.0
268
  - numpy >= 1.21.0
269
 
270
+ ### License
271
+
272
+ SOFIA is released under the Apache License 2.0. A copy of the license is included in the repository as `LICENSE`.
273
+
274
  ### System Requirements
275
 
276
  - **Minimum**: CPU with 8GB RAM
 
332
  print(clusters) # [0, 0, 1, 1]
333
  ```
334
 
335
+ ### JavaScript/Node.js Usage
336
+
337
+ ```javascript
338
+ import { SentenceTransformer } from "sentence-transformers";
339
+
340
+ const model = await SentenceTransformer.from_pretrained("MaliosDark/sofia-embedding-v1");
341
+ const embeddings = await model.encode(["hello", "world"], { normalize: true });
342
+ console.log(embeddings[0].length); // 1024
343
+ ```
344
+
345
  ## Deployment
346
 
347
  ### Local Deployment
 
352
  model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
353
  ```
354
 
355
+ ### Hugging Face Hub Deployment
356
+
357
+ SOFIA is available on the Hugging Face Hub for easy integration:
358
+
359
+ ```python
360
+ from sentence_transformers import SentenceTransformer
361
+
362
+ # Load from Hugging Face Hub
363
+ model = SentenceTransformer('MaliosDark/sofia-embedding-v1')
364
+
365
+ # The model includes interactive widgets for testing
366
+ # Visit: https://huggingface.co/MaliosDark/sofia-embedding-v1
367
+ ```
368
+
369
  ### API Deployment
370
 
371
  ```python
 
431
  ---
432
 
433
  *SOFIA: Intelligent embeddings for the future of AI.*