Update MagistrTheOne/RadonSAI with safetensors weights and proper YAML metadata

Browse files

Files changed (10) hide show

README.md +42 -119
config.json +35 -20
config.yaml +9 -0
generation_config.json +1 -1
model.safetensors +2 -2
model_card.yaml +25 -0
model_card.yml +20 -17
special_tokens_map.json +3 -21
tokenizer.json +0 -0
tokenizer_config.json +2 -5

README.md CHANGED Viewed

@@ -1,121 +1,44 @@
----
-license: apache-2.0
-language:
-- ru
-- en
-tags:
-- mistral
-- russian
-- english
-- code
-- machine-learning
-- nlp
-- transformer
-- gqa
-- rmsnorm
-- swiglu
-- rope
-pipeline_tag: text-generation
-model-index:
-- name: RadonSAI
-  results:
-  - task:
-      type: text-generation
-      name: Text Generation
-    dataset:
-      type: custom
-      name: RADON Datasets
-    metrics:
-    - type: perplexity
-      value: "TBD"
-      name: Perplexity
-size_categories: 2.5GB
----
-# RadonSAI - 1,364,297,728 Parameter Mistral-based Russian-English Transformer
-## Model Description
-RadonSAI is a 1,364,297,728 parameter transformer model based on Mistral architecture with Llama 3 innovations, optimized for Russian-English machine learning applications.
-### Key Features
-- **Architecture**: Mistral with Llama 3 innovations (GQA, RMSNorm, SwiGLU, RoPE)
-- **Parameters**: 1,364,297,728 parameters (2.5GB)
-- **Context**: 32,768 tokens
-- **Tokenizer**: Optimized for Russian-English
-- **Status**: Ready for inference and fine-tuning
-- **Optimizations**:
-### Model Weights
-This model contains properly initialized weights:
-- **Format**: Safetensors (.safetensors)
-- **Dtype**: float32
-- **Initialization**: Kaiming uniform
-- **Size**: 2.5GB (1,364,297,728 parameters)
-- **Status**: Ready for inference and fine-tuning
-### Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-# Load RadonSAI
-model = AutoModelForCausalLM.from_pretrained("MagistrTheOne/RadonSAI")
-tokenizer = AutoTokenizer.from_pretrained("MagistrTheOne/RadonSAI")
-# Generate text
-prompt = "Машинное обучение - это"
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(
-    **inputs,
-    max_length=100,
-    temperature=0.7,
-    do_sample=True,
-    pad_token_id=tokenizer.eos_token_id
-)
-result = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(result)
 ```
-### Model Architecture
-```
-RadonSAI:
-- Hidden size: 2,048
-- Layers: 24
-- Attention heads: 32
-- KV heads: 8
-- Intermediate size: 5,632
-- Vocabulary: 32,000
-- Context window: 32,768 tokens
-```
-### Performance
-- **Speed**: Optimized for inference
-- **Memory**: 2.5GB memory usage
-- **Quality**: Properly initialized weights
-- **Languages**: English + Russian support
-### Citation
-```bibtex
-@misc{radonsai2025,
-  title={RadonSAI: 1,364,297,728 Parameter Mistral-based Russian-English Transformer},
-  author={MagistrTheOne},
-  year={2025},
-  url={https://huggingface.co/MagistrTheOne/RadonSAI}
-}
-```
-### License
-Apache 2.0 License
-### Contact
-- GitHub: [MagistrTheOne/Radon2BMistral](https://github.com/MagistrTheOne/Radon2BMistral)
-- Hugging Face: [MagistrTheOne/RadonSAI](https://huggingface.co/MagistrTheOne/RadonSAI)

+# RadonSAI
+## Overview
+RadonSAI is the main variant of the Radon model family, based on the GPT-2 Large architecture.
+## Source Model
+- **Source**: gpt2-large
+- **Model Class**: GPT2LMHeadModel
+- **Parameters**: 774M (actual size from source)
+- **Architecture**: GPT-2 Large
+## Artifacts
+- `model.safetensors` - Model weights in safetensors format (~1.5GB)
+- `tokenizer.json` - Tokenizer configuration
+- `tokenizer_config.json` - Tokenizer metadata
+- `vocab.json` - Vocabulary file
+- `merges.txt` - BPE merge rules
+- `config.json` - Model configuration (normalized)
+## How to Verify
+```bash
+# Run inference test
+python3 tests/test_inference_1b.py
 ```
+## Conversion Steps
+1. Download gpt2-large from Hugging Face
+2. Convert weights to safetensors format
+3. Save tokenizer files
+4. Normalize config JSON with correct architectures and model_type
+5. Validate with inference test
+## Notes
+- This variant uses the original parameter count of the source model (774M)
+- Target label suggests 1.2B parameters, but actual size is 774M from gpt2-large
+- To achieve the target 1.2B parameters, consider:
+  - Knowledge distillation from a larger model
+  - Continued pre-training with additional data
+  - Training from scratch with expanded architecture
+## File Sizes
+- Total folder size: ~3GB
+- Model weights: ~1.5GB
+- Tokenizer files: ~20MB

config.json CHANGED Viewed

@@ -1,23 +1,38 @@
 {
-  "model_name": "radon",
-  "model_type": "mistral",
-  "vocab_size": 32000,
-  "hidden_size": 2048,
-  "num_layers": 24,
-  "num_attention_heads": 32,
-  "num_kv_heads": 8,
-  "intermediate_size": 5632,
-  "max_position_embeddings": 32768,
-  "sliding_window": 4096,
-  "rope_theta": 10000.0,
-  "rms_norm_eps": 1e-06,
-  "dropout": 0.1,
-  "attention_dropout": 0.1,
-  "activation_function": "silu",
-  "layer_norm_eps": 1e-06,
   "initializer_range": 0.02,
   "use_cache": true,
-  "torch_dtype": "float32",
-  "output_attentions": false,
-  "output_hidden_states": false
-}

 {
+  "activation_function": "gelu_new",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "dtype": "float32",
+  "embd_pdrop": 0.1,
+  "eos_token_id": 50256,
   "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_ctx": 1024,
+  "n_embd": 1280,
+  "n_head": 20,
+  "n_inner": null,
+  "n_layer": 36,
+  "n_positions": 1024,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.1,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "task_specific_params": {
+    "text-generation": {
+      "do_sample": true,
+      "max_length": 50
+    }
+  },
+  "transformers_version": "4.57.0",
   "use_cache": true,
+  "vocab_size": 50257
+}

config.yaml ADDED Viewed

	@@ -0,0 +1,9 @@

+architecture: GPT2LMHeadModel
+conversion_date: '2025-01-09'
+format: safetensors
+max_position_embeddings: 1024
+model_name: RadonSAI
+model_type: gpt2
+parameters: 774M
+source_model: gpt2-large
+vocab_size: 50257

generation_config.json CHANGED Viewed

@@ -2,5 +2,5 @@
   "_from_model_config": true,
   "bos_token_id": 50256,
   "eos_token_id": 50256,
-  "transformers_version": "4.36.2"
 }

   "_from_model_config": true,
   "bos_token_id": 50256,
   "eos_token_id": 50256,
+  "transformers_version": "4.57.0"
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:06b7a9413e2ef4d1db1456599a79f50151ad6f7d3289d4b7634871ac9dcc59b2
-size 5457216008

 version https://git-lfs.github.com/spec/v1
+oid sha256:9daec3d9afb56155d3065913e51636b232be0e1826a9079623ece03c90eff39f
+size 3096165928

model_card.yaml ADDED Viewed

	@@ -0,0 +1,25 @@

+base_model: gpt2-large
+inference:
+  parameters:
+    do_sample: true
+    max_new_tokens: 256
+    temperature: 0.7
+    top_p: 0.9
+language:
+- en
+- ru
+library_name: transformers
+license: apache-2.0
+model_type: gpt2
+pipeline_tag: text-generation
+tags:
+- safetensors
+- text-generation
+- conversational
+- machine-learning
+- nlp
+- transformer
+- russian
+- english
+- gpt2
+- large

model_card.yml CHANGED Viewed

@@ -1,22 +1,25 @@
----
-license: apache-2.0
 language:
-- ru
 - en
 tags:
-- radon
 - russian
 - english
-- developing
-- mistral
-- 2b
-- quantized
-pipeline_tag: text-generation
-library_name: transformers
-model_status: developing
-base_model: mistralai/Mistral-7B-v0.1
-size_categories: 3B
-model-index:
-- name: RadonSAI
-  results: []
----

+base_model: gpt2-large
+inference:
+  parameters:
+    do_sample: true
+    max_new_tokens: 256
+    temperature: 0.7
+    top_p: 0.9
 language:
 - en
+- ru
+library_name: transformers
+license: apache-2.0
+model_type: gpt2
+pipeline_tag: text-generation
 tags:
+- safetensors
+- text-generation
+- conversational
+- machine-learning
+- nlp
+- transformer
 - russian
 - english
+- gpt2
+- large

special_tokens_map.json CHANGED Viewed

@@ -1,23 +1,5 @@
 {
-  "bos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "eos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "unk_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
 }

 {
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
 }

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json CHANGED Viewed

@@ -1,5 +1,4 @@
 {
-  "add_bos_token": false,
   "add_prefix_space": false,
   "added_tokens_decoder": {
     "50256": {
@@ -12,12 +11,10 @@
     }
   },
   "bos_token": "<|endoftext|>",
-  "chat_template": "{% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}",
-  "clean_up_tokenization_spaces": true,
   "eos_token": "<|endoftext|>",
-  "errors": "replace",
   "model_max_length": 1024,
-  "pad_token": null,
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
 }

 {
   "add_prefix_space": false,
   "added_tokens_decoder": {
     "50256": {
     }
   },
   "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": false,
   "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
   "model_max_length": 1024,
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
 }