aquiffoo commited on
Commit
d46f529
Β·
verified Β·
1 Parent(s): 89985ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +156 -3
README.md CHANGED
@@ -1,3 +1,156 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: false
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - language
8
+ - aquif
9
+ - text-generation-inference
10
+ - math
11
+ - coding
12
+ - small
13
+ - aquif-3.5
14
+ language:
15
+ - en
16
+ - de
17
+ - it
18
+ - pt
19
+ - fr
20
+ - hi
21
+ - es
22
+ - th
23
+ - zh
24
+ - ja
25
+ ---
26
+
27
+ # aquif-3.5
28
+
29
+ The aquif-3.5 series is the successor to aquif-3, featuring a simplified naming scheme, expanded Mixture of Experts (MoE) options, and across-the-board performance improvements. This release streamlines model selection while delivering enhanced capabilities across reasoning, multilingual support, and general intelligence tasks.
30
+
31
+ ## Release dates
32
+ - A0.6B, 3B, 7B: August 30th, 2025
33
+ - 8B-Think, A4B-Think: September 1st, 2025
34
+
35
+ ## Model Repository Links
36
+
37
+ | Model | HuggingFace Repository |
38
+ |-------|----------------------|
39
+ | aquif-3.5-A0.6B-Preview | [aquiffoo/aquif-3.5-A0.6B-Preview](https://huggingface.co/aquiffoo/aquif-3.5-A0.6B-Preview) |
40
+ | aquif-3.5-3B | [aquiffoo/aquif-3.5-3B](https://huggingface.co/aquiffoo/aquif-3.5-3B) |
41
+ | aquif-3.5-7B | [aquiffoo/aquif-3.5-7B](https://huggingface.co/aquiffoo/aquif-3.5-7B) |
42
+ | aquif-3.5-8B-Think | [aquiffoo/aquif-3.5-8B-Think](https://huggingface.co/aquiffoo/aquif-3.5-8B-Think) |
43
+ | aquif-3.5-A4B-Think | [aquiffoo/aquif-3.5-A4B-Think](https://huggingface.co/aquiffoo/aquif-3.5-A4B-Think) |
44
+
45
+ ## Model Overview
46
+
47
+ | Model | Size (B) | Active Params (B) | Reasoning | MoE | Multilingual | MMLU | Context Window |
48
+ |-------|----------|-------------------|-----------|-----|--------------|------|----------------|
49
+ | aquif-3.5-A0.6B | 2.61 | 0.6 | ❌ | βœ… | βœ… | 60.5% | 4k |
50
+ | aquif-3.5-3B | 2.67 | 2.67 | ❌ | ❌ | βœ… | 70.2% | 32k |
51
+ | aquif-3.5-7B | 7.3 | 7.3 | ❌ | ❌ | βœ… | 78.5% | 16k |
52
+ | aquif-3.5-8B-Think | 8.2 | 8.2 | βœ… | ❌ | βœ… | 81.1% | 40k |
53
+ | aquif-3.5-A4B-Think | 12 | 4 | βœ… | βœ… | βœ… | 86.9% | 128k |
54
+
55
+ ## Model Details
56
+
57
+ ### aquif-3.5-A0.6B (Experimental MoE)
58
+
59
+ An experimental small-scale Mixture of Experts model designed for multilingual applications with minimal computational overhead. Despite its compact active parameter count, it demonstrates competitive performance against larger dense models.
60
+
61
+ **Performance Comparison:**
62
+
63
+ | Metric | aquif-3.5 (2.6B A0.6B) | Qwen3 (0.8B) | LFM2 (0.7B) | aquif-3 (0.4B) |
64
+ |--------|------------------------|--------------|-------------|----------------|
65
+ | MMLU | 60.5 | 44.9 | 49.9 | 55.6 |
66
+ | GPQA | 30.2 | 22.1 | 28.5 | 28.5 |
67
+ | GSM8K | 50.7 | 36.5 | 46.4 | 52.1 |
68
+ | HumanEval | 45.2 | 36.0 | 40.0 | 37.4 |
69
+ | **Average** | **46.7** | **34.9** | **41.2** | **43.4** |
70
+
71
+ ### aquif-3.5-3B (State-of-the-Art Dense)
72
+
73
+ The new standard for small dense models, offering optimal performance-per-parameter efficiency for general-purpose applications.
74
+
75
+ **Performance Comparison:**
76
+
77
+ | Metric | aquif-3.5 (2.7B) | EXAONE 3.5 (2.4B) | Qwen3 (4B) | Gemma 3 (4B) | Phi-4-mini (3.8B) | Apriel-5B-Instruct (4.8B) | aquif-3 (3.2B) |
78
+ |--------|------------------|-------------------|------------|--------------|-------------------|---------------------------|----------------|
79
+ | MMLU (General Knowledge) | 70.2 | 60.4 | 70.4 | 59.6 | 67.3 | 64.6 | 67.5 |
80
+ | GPQA Diamond (Science) | 35.8 | 28.4 | 39.3 | 30.9 | 25.2 | 28.4 | 36.1 |
81
+ | LiveCodeBench (Coding) | 23.1 | 12.5 | 21.3 | 11.2 | 10.4 | 11.6 | 15.4 |
82
+ | IFEval (Instruction Following) | 78.9 | 73.6 | 71.2 | 80.2 | 68.6 | 80.8 | 78.9 |
83
+ | AIME 2025 (Competition Math) | 13.4 | 4.5 | 9.8 | 12.7 | 5.3 | 4.3 | 9.6 |
84
+ | **Average** | **44.3** | **35.9** | **42.4** | **38.9** | **35.4** | **37.9** | **41.5** |
85
+
86
+ ### aquif-3.5-7B (Multilingual Long Context)
87
+
88
+ A Qwen-based architecture optimized for multilingual applications with extended context capabilities, delivering state-of-the-art performance in its size class.
89
+
90
+ **Performance Comparison:**
91
+
92
+ | Metric | aquif-3.5 (7.3B) | EXAONE 3.5 (7.8B) | Qwen3 (8.2B) | Gemma 3 (12B) | Llama 3.1 (8B) | Kanana 1.5 (8B) | aquif-3 (3.2B) |
93
+ |--------|------------------|-------------------|-------------|---------------|----------------|-----------------|----------------|
94
+ | MMLU (General Knowledge) | 78.5 | 72.2 | 82.9 | 74.5 | 69.2 | 68.8 | 67.5 |
95
+ | GPQA Diamond (Science) | 42.3 | 39.4 | 39.3 | 40.9 | 32.8 | 37.5 | 36.1 |
96
+ | LiveCodeBench (Coding) | 21.3 | 18.0 | 23.9 | 13.7 | 10.8 | 16.5 | 15.4 |
97
+ | IFEval (Instruction Following) | 85.6 | 82.6 | 85.4 | 80.2 | 75.0 | 80.1 | 78.9 |
98
+ | AIME 2025 (Competition Math) | 23.4 | 18.3 | 20.9 | 18.8 | 2.7 | 13.4 | 9.6 |
99
+ | **Average** | **50.2** | **46.1** | **50.4** | **45.6** | **38.1** | **43.3** | **41.5** |
100
+
101
+ ### aquif-3.5-8B-Think & aquif-3.5-A4B-Think (Reasoning Models)
102
+
103
+ Advanced reasoning-capable models designed for complex problem-solving tasks. The A4B variant leverages MoE architecture for enhanced efficiency while maintaining superior reasoning performance.
104
+
105
+ **Performance Comparison:**
106
+
107
+ | Metric | aquif-3.5 (12B A4B) | aquif-3.5 (8B) | Qwen3 Thinking 2507 (31B A3B) | gpt-oss-20b (21B A4B) | Nemotron Nano v2 (9B) | Solar Pro 2 |
108
+ |--------|---------------------|-----------------|-------------------------------|----------------------|----------------------|-------------|
109
+ | MMLU-Pro | 78.5 | 78.1 | 80.5 | 73.6 | 74.2 | 80.5 |
110
+ | GPQA Diamond | 70.8 | 66.8 | 70.7 | 61.7 | 64.0 | 68.7 |
111
+ | AIME 2025 | 84.4 | 81.4 | 56.3 | 61.7 | 69.7 | 61.3 |
112
+ | LiveCodeBench | 66.1 | 61.5 | 70.7 | 72.1 | 71.1 | 61.6 |
113
+ | Humanity's Last Exam | 8.9 | 8.2 | 9.8 | 8.5 | 6.5 | 7.0 |
114
+ | TAU-Bench v2 (avg) | 43.7 | 36.8 | 35.7 | 43.2 | 34.9 | 38.7 |
115
+ | **Average** | **58.7** | **55.5** | **54.0** | **53.5** | **53.4** | **53.0** |
116
+
117
+ ## Key Improvements Over aquif-3
118
+
119
+ - **Simplified Naming**: Clear size-based nomenclature for easier model selection
120
+ - **Enhanced MoE Support**: Multiple MoE configurations across different model sizes
121
+ - **Reasoning Capabilities**: Dedicated thinking models for complex problem-solving
122
+ - **Extended Context**: Up to 128k context window for long-form applications
123
+ - **Multilingual by Default**: Native multilingual support across all variants
124
+ - **Performance Gains**: 5-15% improvement across benchmarks compared to aquif-3
125
+
126
+ ## Usage Recommendations
127
+
128
+ - **aquif-3.5-A0.6B**: Experimental applications, resource-constrained environments
129
+ - **aquif-3.5-3B**: General-purpose applications, balanced performance/efficiency
130
+ - **aquif-3.5-7B**: Multilingual applications, long-context tasks
131
+ - **aquif-3.5-8B-Think**: Complex reasoning, scientific analysis
132
+ - **aquif-3.5-A4B-Think**: Advanced reasoning with efficiency optimization
133
+
134
+ ## Technical Specifications
135
+
136
+ All models support:
137
+ - BF16 and FP16 precision
138
+ - Standard transformer architecture optimizations
139
+ - Efficient attention mechanisms
140
+ - Multi-head attention with optimized KV caching
141
+
142
+ ## Acknowledgements
143
+
144
+ - **Qwen Team**: Base architecture for 7B, 8B, and 12B-A4B models
145
+ - **Meta Llama Team**: Base architecture for 3B and 2.6B-A0.6B models
146
+ - **Hugging Face**: Model hosting infrastructure and training libraries
147
+
148
+ ## License
149
+
150
+ This project is released under the Apache 2.0 License. See LICENSE file for details.
151
+
152
+ ---
153
+
154
+ *Made in πŸ‡§πŸ‡·*
155
+
156
+ Β© 2025 aquif AI. All rights reserved.