MagistrTheOne commited on
Commit
a7a26a9
·
verified ·
1 Parent(s): 6d83231

Update MagistrTheOne/RadonSAI with safetensors weights and proper YAML metadata

Browse files
README.md CHANGED
@@ -1,121 +1,44 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - ru
5
- - en
6
- tags:
7
- - mistral
8
- - russian
9
- - english
10
- - code
11
- - machine-learning
12
- - nlp
13
- - transformer
14
- - gqa
15
- - rmsnorm
16
- - swiglu
17
- - rope
18
- pipeline_tag: text-generation
19
- model-index:
20
- - name: RadonSAI
21
- results:
22
- - task:
23
- type: text-generation
24
- name: Text Generation
25
- dataset:
26
- type: custom
27
- name: RADON Datasets
28
- metrics:
29
- - type: perplexity
30
- value: "TBD"
31
- name: Perplexity
32
- size_categories: 2.5GB
33
- ---
34
-
35
- # RadonSAI - 1,364,297,728 Parameter Mistral-based Russian-English Transformer
36
-
37
- ## Model Description
38
-
39
- RadonSAI is a 1,364,297,728 parameter transformer model based on Mistral architecture with Llama 3 innovations, optimized for Russian-English machine learning applications.
40
-
41
- ### Key Features
42
-
43
- - **Architecture**: Mistral with Llama 3 innovations (GQA, RMSNorm, SwiGLU, RoPE)
44
- - **Parameters**: 1,364,297,728 parameters (2.5GB)
45
- - **Context**: 32,768 tokens
46
- - **Tokenizer**: Optimized for Russian-English
47
- - **Status**: Ready for inference and fine-tuning
48
- - **Optimizations**:
49
-
50
- ### Model Weights
51
-
52
- This model contains properly initialized weights:
53
-
54
- - **Format**: Safetensors (.safetensors)
55
- - **Dtype**: float32
56
- - **Initialization**: Kaiming uniform
57
- - **Size**: 2.5GB (1,364,297,728 parameters)
58
- - **Status**: Ready for inference and fine-tuning
59
-
60
- ### Usage
61
-
62
- ```python
63
- from transformers import AutoModelForCausalLM, AutoTokenizer
64
-
65
- # Load RadonSAI
66
- model = AutoModelForCausalLM.from_pretrained("MagistrTheOne/RadonSAI")
67
- tokenizer = AutoTokenizer.from_pretrained("MagistrTheOne/RadonSAI")
68
-
69
- # Generate text
70
- prompt = "Машинное обучение - это"
71
- inputs = tokenizer(prompt, return_tensors="pt")
72
- outputs = model.generate(
73
- **inputs,
74
- max_length=100,
75
- temperature=0.7,
76
- do_sample=True,
77
- pad_token_id=tokenizer.eos_token_id
78
- )
79
- result = tokenizer.decode(outputs[0], skip_special_tokens=True)
80
- print(result)
81
  ```
82
 
83
- ### Model Architecture
84
-
85
- ```
86
- RadonSAI:
87
- - Hidden size: 2,048
88
- - Layers: 24
89
- - Attention heads: 32
90
- - KV heads: 8
91
- - Intermediate size: 5,632
92
- - Vocabulary: 32,000
93
- - Context window: 32,768 tokens
94
- ```
95
-
96
- ### Performance
97
-
98
- - **Speed**: Optimized for inference
99
- - **Memory**: 2.5GB memory usage
100
- - **Quality**: Properly initialized weights
101
- - **Languages**: English + Russian support
102
-
103
- ### Citation
104
-
105
- ```bibtex
106
- @misc{radonsai2025,
107
- title={RadonSAI: 1,364,297,728 Parameter Mistral-based Russian-English Transformer},
108
- author={MagistrTheOne},
109
- year={2025},
110
- url={https://huggingface.co/MagistrTheOne/RadonSAI}
111
- }
112
- ```
113
-
114
- ### License
115
-
116
- Apache 2.0 License
117
-
118
- ### Contact
119
-
120
- - GitHub: [MagistrTheOne/Radon2BMistral](https://github.com/MagistrTheOne/Radon2BMistral)
121
- - Hugging Face: [MagistrTheOne/RadonSAI](https://huggingface.co/MagistrTheOne/RadonSAI)
 
1
+ # RadonSAI
2
+
3
+ ## Overview
4
+ RadonSAI is the main variant of the Radon model family, based on the GPT-2 Large architecture.
5
+
6
+ ## Source Model
7
+ - **Source**: gpt2-large
8
+ - **Model Class**: GPT2LMHeadModel
9
+ - **Parameters**: 774M (actual size from source)
10
+ - **Architecture**: GPT-2 Large
11
+
12
+ ## Artifacts
13
+ - `model.safetensors` - Model weights in safetensors format (~1.5GB)
14
+ - `tokenizer.json` - Tokenizer configuration
15
+ - `tokenizer_config.json` - Tokenizer metadata
16
+ - `vocab.json` - Vocabulary file
17
+ - `merges.txt` - BPE merge rules
18
+ - `config.json` - Model configuration (normalized)
19
+
20
+ ## How to Verify
21
+ ```bash
22
+ # Run inference test
23
+ python3 tests/test_inference_1b.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ```
25
 
26
+ ## Conversion Steps
27
+ 1. Download gpt2-large from Hugging Face
28
+ 2. Convert weights to safetensors format
29
+ 3. Save tokenizer files
30
+ 4. Normalize config JSON with correct architectures and model_type
31
+ 5. Validate with inference test
32
+
33
+ ## Notes
34
+ - This variant uses the original parameter count of the source model (774M)
35
+ - Target label suggests 1.2B parameters, but actual size is 774M from gpt2-large
36
+ - To achieve the target 1.2B parameters, consider:
37
+ - Knowledge distillation from a larger model
38
+ - Continued pre-training with additional data
39
+ - Training from scratch with expanded architecture
40
+
41
+ ## File Sizes
42
+ - Total folder size: ~3GB
43
+ - Model weights: ~1.5GB
44
+ - Tokenizer files: ~20MB
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,23 +1,38 @@
1
  {
2
- "model_name": "radon",
3
- "model_type": "mistral",
4
- "vocab_size": 32000,
5
- "hidden_size": 2048,
6
- "num_layers": 24,
7
- "num_attention_heads": 32,
8
- "num_kv_heads": 8,
9
- "intermediate_size": 5632,
10
- "max_position_embeddings": 32768,
11
- "sliding_window": 4096,
12
- "rope_theta": 10000.0,
13
- "rms_norm_eps": 1e-06,
14
- "dropout": 0.1,
15
- "attention_dropout": 0.1,
16
- "activation_function": "silu",
17
- "layer_norm_eps": 1e-06,
18
  "initializer_range": 0.02,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  "use_cache": true,
20
- "torch_dtype": "float32",
21
- "output_attentions": false,
22
- "output_hidden_states": false
23
- }
 
1
  {
2
+ "activation_function": "gelu_new",
3
+ "architectures": [
4
+ "GPT2LMHeadModel"
5
+ ],
6
+ "attn_pdrop": 0.1,
7
+ "bos_token_id": 50256,
8
+ "dtype": "float32",
9
+ "embd_pdrop": 0.1,
10
+ "eos_token_id": 50256,
 
 
 
 
 
 
 
11
  "initializer_range": 0.02,
12
+ "layer_norm_epsilon": 1e-05,
13
+ "model_type": "gpt2",
14
+ "n_ctx": 1024,
15
+ "n_embd": 1280,
16
+ "n_head": 20,
17
+ "n_inner": null,
18
+ "n_layer": 36,
19
+ "n_positions": 1024,
20
+ "reorder_and_upcast_attn": false,
21
+ "resid_pdrop": 0.1,
22
+ "scale_attn_by_inverse_layer_idx": false,
23
+ "scale_attn_weights": true,
24
+ "summary_activation": null,
25
+ "summary_first_dropout": 0.1,
26
+ "summary_proj_to_labels": true,
27
+ "summary_type": "cls_index",
28
+ "summary_use_proj": true,
29
+ "task_specific_params": {
30
+ "text-generation": {
31
+ "do_sample": true,
32
+ "max_length": 50
33
+ }
34
+ },
35
+ "transformers_version": "4.57.0",
36
  "use_cache": true,
37
+ "vocab_size": 50257
38
+ }
 
 
config.yaml ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ architecture: GPT2LMHeadModel
2
+ conversion_date: '2025-01-09'
3
+ format: safetensors
4
+ max_position_embeddings: 1024
5
+ model_name: RadonSAI
6
+ model_type: gpt2
7
+ parameters: 774M
8
+ source_model: gpt2-large
9
+ vocab_size: 50257
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 50256,
4
  "eos_token_id": 50256,
5
- "transformers_version": "4.36.2"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 50256,
4
  "eos_token_id": 50256,
5
+ "transformers_version": "4.57.0"
6
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:06b7a9413e2ef4d1db1456599a79f50151ad6f7d3289d4b7634871ac9dcc59b2
3
- size 5457216008
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9daec3d9afb56155d3065913e51636b232be0e1826a9079623ece03c90eff39f
3
+ size 3096165928
model_card.yaml ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ base_model: gpt2-large
2
+ inference:
3
+ parameters:
4
+ do_sample: true
5
+ max_new_tokens: 256
6
+ temperature: 0.7
7
+ top_p: 0.9
8
+ language:
9
+ - en
10
+ - ru
11
+ library_name: transformers
12
+ license: apache-2.0
13
+ model_type: gpt2
14
+ pipeline_tag: text-generation
15
+ tags:
16
+ - safetensors
17
+ - text-generation
18
+ - conversational
19
+ - machine-learning
20
+ - nlp
21
+ - transformer
22
+ - russian
23
+ - english
24
+ - gpt2
25
+ - large
model_card.yml CHANGED
@@ -1,22 +1,25 @@
1
- ---
2
- license: apache-2.0
 
 
 
 
 
3
  language:
4
- - ru
5
  - en
 
 
 
 
 
6
  tags:
7
- - radon
 
 
 
 
 
8
  - russian
9
  - english
10
- - developing
11
- - mistral
12
- - 2b
13
- - quantized
14
- pipeline_tag: text-generation
15
- library_name: transformers
16
- model_status: developing
17
- base_model: mistralai/Mistral-7B-v0.1
18
- size_categories: 3B
19
- model-index:
20
- - name: RadonSAI
21
- results: []
22
- ---
 
1
+ base_model: gpt2-large
2
+ inference:
3
+ parameters:
4
+ do_sample: true
5
+ max_new_tokens: 256
6
+ temperature: 0.7
7
+ top_p: 0.9
8
  language:
 
9
  - en
10
+ - ru
11
+ library_name: transformers
12
+ license: apache-2.0
13
+ model_type: gpt2
14
+ pipeline_tag: text-generation
15
  tags:
16
+ - safetensors
17
+ - text-generation
18
+ - conversational
19
+ - machine-learning
20
+ - nlp
21
+ - transformer
22
  - russian
23
  - english
24
+ - gpt2
25
+ - large
 
 
 
 
 
 
 
 
 
 
 
special_tokens_map.json CHANGED
@@ -1,23 +1,5 @@
1
  {
2
- "bos_token": {
3
- "content": "<|endoftext|>",
4
- "lstrip": false,
5
- "normalized": true,
6
- "rstrip": false,
7
- "single_word": false
8
- },
9
- "eos_token": {
10
- "content": "<|endoftext|>",
11
- "lstrip": false,
12
- "normalized": true,
13
- "rstrip": false,
14
- "single_word": false
15
- },
16
- "unk_token": {
17
- "content": "<|endoftext|>",
18
- "lstrip": false,
19
- "normalized": true,
20
- "rstrip": false,
21
- "single_word": false
22
- }
23
  }
 
1
  {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "unk_token": "<|endoftext|>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  }
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -1,5 +1,4 @@
1
  {
2
- "add_bos_token": false,
3
  "add_prefix_space": false,
4
  "added_tokens_decoder": {
5
  "50256": {
@@ -12,12 +11,10 @@
12
  }
13
  },
14
  "bos_token": "<|endoftext|>",
15
- "chat_template": "{% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}",
16
- "clean_up_tokenization_spaces": true,
17
  "eos_token": "<|endoftext|>",
18
- "errors": "replace",
19
  "model_max_length": 1024,
20
- "pad_token": null,
21
  "tokenizer_class": "GPT2Tokenizer",
22
  "unk_token": "<|endoftext|>"
23
  }
 
1
  {
 
2
  "add_prefix_space": false,
3
  "added_tokens_decoder": {
4
  "50256": {
 
11
  }
12
  },
13
  "bos_token": "<|endoftext|>",
14
+ "clean_up_tokenization_spaces": false,
 
15
  "eos_token": "<|endoftext|>",
16
+ "extra_special_tokens": {},
17
  "model_max_length": 1024,
 
18
  "tokenizer_class": "GPT2Tokenizer",
19
  "unk_token": "<|endoftext|>"
20
  }