0xcubin commited on
Commit
6090fc6
·
verified ·
1 Parent(s): f21bbb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - embedding
5
+ - text-embedding
6
+ - crypto
7
+ - nlp
8
+ library_name: transformers
9
+ ---
10
+
11
+ # crypto-mini-embed
12
+
13
+ **crypto-mini-embed** adalah contoh model mini embedding berbasis arsitektur sederhana untuk eksperimen NLP seperti:
14
+
15
+ - text similarity
16
+ - vector search
17
+ - clustering
18
+ - semantic tagging
19
+ - crypto-topic classification
20
+
21
+ Model ini merupakan **dummy model** untuk membantu pengguna memahami struktur repository model di HuggingFace.
22
+
23
+ ---
24
+
25
+ ## ⚙️ Arsitektur Model
26
+
27
+ - Tipe model: `MiniEmbeddingModel`
28
+ - Hidden size: 64
29
+ - Max length: 128 tokens
30
+ - Framework: PyTorch
31
+ - Format: Safetensors
32
+ - Tokenizer: Basic CharTokenizer (dummy)
33
+
34
+ ---
35
+
36
+ ## 📦 File dalam Model
37
+
38
+ | File | Fungsi |
39
+ |------|--------|
40
+ | `config.json` | Konfigurasi model |
41
+ | `tokenizer.json` | Tokenizer sederhana |
42
+ | `model.safetensors` | Parameter model |
43
+ | `README.md` | Dokumentasi model |
44
+
45
+ ---
46
+
47
+ ## 🧪 Contoh Penggunaan
48
+
49
+ ```python
50
+ from transformers import AutoTokenizer, AutoModel
51
+ import torch
52
+
53
+ tok = AutoTokenizer.from_pretrained("0xcubin/crypto-mini-embed")
54
+ model = AutoModel.from_pretrained("0xcubin/crypto-mini-embed")
55
+
56
+ text = "Bitcoin is digital money"
57
+ inputs = tok(text, return_tensors="pt")
58
+
59
+ with torch.no_grad():
60
+ emb = model(**inputs).last_hidden_state.mean(dim=1)
61
+
62
+ print(emb.shape) # contoh: (1, 64)