kordelfrance commited on
Commit
4bdb485
·
verified ·
1 Parent(s): cb55302

Delete model_cards/ovle-small.md

Browse files
Files changed (1) hide show
  1. model_cards/ovle-small.md +0 -110
model_cards/ovle-small.md DELETED
@@ -1,110 +0,0 @@
1
- # Model Card: Scentience-OVLE-Small-v1
2
-
3
-
4
- ## Model Details
5
- - **Model Name:** `Scentience OVLE Small v1`
6
- - **Developed by:** Kordel K. France
7
- - **Date:** September 2025
8
- - **Architecture:**
9
- - **Olfaction encoder:** 138-sensor embedding
10
- - **Vision encoder:** CLIP-based
11
- - **Language encoder:** CLIP-based
12
- - **Fusion strategy:** Joint embedding space via multimodal contrastive training
13
- - **Parameter Count (Base):** 2.2M (without CLIP), 153.4M (with CLIP)
14
- - **Parameter Count (GAT):** 9.3M (without CLIP), 160.5M (with CLIP)
15
- - **Embedding Dimension:** 512
16
- - **License:** MIT
17
18
-
19
- ---
20
-
21
- ## Intended Use
22
- - **Primary purpose:** Research in multimodal machine learning involving olfaction, vision, and language.
23
- - **Example applications:**
24
- - Cross-modal retrieval (odor → image, odor → text, etc.)
25
- - Robotics and UAV navigation guided by chemical cues
26
- - Chemical dataset exploration and visualization
27
- - **Intended users:** Researchers, developers, and educators working in ML, robotics, chemistry, and HCI.
28
- - **Out of scope:** Not intended for safety-critical tasks (e.g., gas leak detection, medical diagnosis, or regulatory use).
29
-
30
- ---
31
-
32
- ## Training Data
33
- - **Olfaction data:** Language-aligned olfactory data curated from GoodScents and LeffingWell datasets.
34
- - **Vision data:** COCO dataset.
35
- - **Language data:** Smell descriptors and text annotations curated from literature.
36
-
37
- For more information on how the training data was accumulated, please see the [HuggingFace dataset URL here](https://huggingface.co/datasets/kordelfrance/olfaction-vision-language-dataset)
38
-
39
- ---
40
-
41
- ## Evaluation
42
- - Retrieval tasks: odor→image (Top-5 recall = 62%)
43
- - Odor descriptor classification accuracy = 71%
44
- - Cross-modal embedding alignment qualitatively verified on 200 sample triplets.
45
-
46
- ---
47
-
48
- ## Limitations of Evaluation
49
- To the best of our knowledge, there are currently no open-source datasets that provide aligned olfactory, visual, and linguistic annotations. A “true” multimodal evaluation would require measuring the chemical composition of scenes (e.g., using gas chromatography mass spectrometry) while simultaneously capturing images and collecting perceptual descriptors from human olfactory judges. Such a benchmark would demand substantial new data collection efforts and instrumentation.
50
- Consequently, we evaluate our models indirectly, using surrogate metrics (e.g., cross-modal retrieval performance, odor descriptor classification accuracy, clustering quality). While these evaluations do not provide ground-truth verification of odor presence in images, they offer a first step toward demonstrating alignment between modalities.
51
- We draw analogy from past successes in ML datasets such as precursors to CLIP and SigLIP that lacked large paired datasets and were evaluated on retrieval-like tasks.
52
- As a result, we release this model to catalyze further research and encourage the community to contribute to building standardized datasets and evaluation protocols for olfaction-vision-language learning.
53
-
54
- ---
55
-
56
- ## Limitations
57
- - Limited odor diversity (approx. 5000 unique compounds).
58
- - Embeddings depend on sensor calibration; not guaranteed across devices.
59
- - Cultural subjectivity in smell annotations may bias embeddings.
60
-
61
- ---
62
-
63
- ## Ethical Considerations
64
- - Not to be used for covert detection of substances or surveillance.
65
- - Unreliable in safety-critical contexts (e.g., gas leak detection).
66
- - Recognizes cultural sensitivity in smell perception.
67
-
68
- ---
69
-
70
- ## Environmental Impact
71
- - Trained on 4×A100 GPUs for 48 hours (~200 kg CO2eq).
72
- - Sensor dataset collection required ~500 lab hours.
73
-
74
- ---
75
-
76
- ## Citation
77
- If you use this model, please cite:
78
- ```
79
- @misc{france2025ovlembeddings,
80
- title = {Scentience-OVLE-Base-v1: Joint Olfaction-Vision-Language Embeddings},
81
- author = {Kordel Kade France},
82
- year = {2025},
83
- howpublished = {Hugging Face},
84
- url = {https://huggingface.co/kordelfrance/Olfaction-Vision-Language-Embeddings}
85
- }
86
- ```
87
-
88
- ```
89
- @misc{radford2021clip,
90
- title = {Learning Transferable Visual Models From Natural Language Supervision},
91
- author = {Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
92
- year = 2021,
93
- url = {https://arxiv.org/abs/2103.00020},
94
- eprint = {2103.00020},
95
- archiveprefix = {arXiv},
96
- primaryclass = {cs.CV}
97
- }
98
- ```
99
-
100
- ```
101
- @misc{zhai2023siglip,
102
- title={Sigmoid Loss for Language Image Pre-Training},
103
- author={Xiaohua Zhai and Basil Mustafa and Alexander Kolesnikov and Lucas Beyer},
104
- year={2023},
105
- eprint={2303.15343},
106
- archivePrefix={arXiv},
107
- primaryClass={cs.CV},
108
- url={https://arxiv.org/abs/2303.15343},
109
- }
110
- ```