Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,156 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: text-generation
|
| 3 |
+
inference: false
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
library_name: transformers
|
| 6 |
+
tags:
|
| 7 |
+
- language
|
| 8 |
+
- aquif
|
| 9 |
+
- text-generation-inference
|
| 10 |
+
- math
|
| 11 |
+
- coding
|
| 12 |
+
- small
|
| 13 |
+
- aquif-3.5
|
| 14 |
+
language:
|
| 15 |
+
- en
|
| 16 |
+
- de
|
| 17 |
+
- it
|
| 18 |
+
- pt
|
| 19 |
+
- fr
|
| 20 |
+
- hi
|
| 21 |
+
- es
|
| 22 |
+
- th
|
| 23 |
+
- zh
|
| 24 |
+
- ja
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
# aquif-3.5
|
| 28 |
+
|
| 29 |
+
The aquif-3.5 series is the successor to aquif-3, featuring a simplified naming scheme, expanded Mixture of Experts (MoE) options, and across-the-board performance improvements. This release streamlines model selection while delivering enhanced capabilities across reasoning, multilingual support, and general intelligence tasks.
|
| 30 |
+
|
| 31 |
+
## Release dates
|
| 32 |
+
- A0.6B, 3B, 7B: August 30th, 2025
|
| 33 |
+
- 8B-Think, A4B-Think: September 1st, 2025
|
| 34 |
+
|
| 35 |
+
## Model Repository Links
|
| 36 |
+
|
| 37 |
+
| Model | HuggingFace Repository |
|
| 38 |
+
|-------|----------------------|
|
| 39 |
+
| aquif-3.5-A0.6B-Preview | [aquiffoo/aquif-3.5-A0.6B-Preview](https://huggingface.co/aquiffoo/aquif-3.5-A0.6B-Preview) |
|
| 40 |
+
| aquif-3.5-3B | [aquiffoo/aquif-3.5-3B](https://huggingface.co/aquiffoo/aquif-3.5-3B) |
|
| 41 |
+
| aquif-3.5-7B | [aquiffoo/aquif-3.5-7B](https://huggingface.co/aquiffoo/aquif-3.5-7B) |
|
| 42 |
+
| aquif-3.5-8B-Think | [aquiffoo/aquif-3.5-8B-Think](https://huggingface.co/aquiffoo/aquif-3.5-8B-Think) |
|
| 43 |
+
| aquif-3.5-A4B-Think | [aquiffoo/aquif-3.5-A4B-Think](https://huggingface.co/aquiffoo/aquif-3.5-A4B-Think) |
|
| 44 |
+
|
| 45 |
+
## Model Overview
|
| 46 |
+
|
| 47 |
+
| Model | Size (B) | Active Params (B) | Reasoning | MoE | Multilingual | MMLU | Context Window |
|
| 48 |
+
|-------|----------|-------------------|-----------|-----|--------------|------|----------------|
|
| 49 |
+
| aquif-3.5-A0.6B | 2.61 | 0.6 | β | β
| β
| 60.5% | 4k |
|
| 50 |
+
| aquif-3.5-3B | 2.67 | 2.67 | β | β | β
| 70.2% | 32k |
|
| 51 |
+
| aquif-3.5-7B | 7.3 | 7.3 | β | β | β
| 78.5% | 16k |
|
| 52 |
+
| aquif-3.5-8B-Think | 8.2 | 8.2 | β
| β | β
| 81.1% | 40k |
|
| 53 |
+
| aquif-3.5-A4B-Think | 12 | 4 | β
| β
| β
| 86.9% | 128k |
|
| 54 |
+
|
| 55 |
+
## Model Details
|
| 56 |
+
|
| 57 |
+
### aquif-3.5-A0.6B (Experimental MoE)
|
| 58 |
+
|
| 59 |
+
An experimental small-scale Mixture of Experts model designed for multilingual applications with minimal computational overhead. Despite its compact active parameter count, it demonstrates competitive performance against larger dense models.
|
| 60 |
+
|
| 61 |
+
**Performance Comparison:**
|
| 62 |
+
|
| 63 |
+
| Metric | aquif-3.5 (2.6B A0.6B) | Qwen3 (0.8B) | LFM2 (0.7B) | aquif-3 (0.4B) |
|
| 64 |
+
|--------|------------------------|--------------|-------------|----------------|
|
| 65 |
+
| MMLU | 60.5 | 44.9 | 49.9 | 55.6 |
|
| 66 |
+
| GPQA | 30.2 | 22.1 | 28.5 | 28.5 |
|
| 67 |
+
| GSM8K | 50.7 | 36.5 | 46.4 | 52.1 |
|
| 68 |
+
| HumanEval | 45.2 | 36.0 | 40.0 | 37.4 |
|
| 69 |
+
| **Average** | **46.7** | **34.9** | **41.2** | **43.4** |
|
| 70 |
+
|
| 71 |
+
### aquif-3.5-3B (State-of-the-Art Dense)
|
| 72 |
+
|
| 73 |
+
The new standard for small dense models, offering optimal performance-per-parameter efficiency for general-purpose applications.
|
| 74 |
+
|
| 75 |
+
**Performance Comparison:**
|
| 76 |
+
|
| 77 |
+
| Metric | aquif-3.5 (2.7B) | EXAONE 3.5 (2.4B) | Qwen3 (4B) | Gemma 3 (4B) | Phi-4-mini (3.8B) | Apriel-5B-Instruct (4.8B) | aquif-3 (3.2B) |
|
| 78 |
+
|--------|------------------|-------------------|------------|--------------|-------------------|---------------------------|----------------|
|
| 79 |
+
| MMLU (General Knowledge) | 70.2 | 60.4 | 70.4 | 59.6 | 67.3 | 64.6 | 67.5 |
|
| 80 |
+
| GPQA Diamond (Science) | 35.8 | 28.4 | 39.3 | 30.9 | 25.2 | 28.4 | 36.1 |
|
| 81 |
+
| LiveCodeBench (Coding) | 23.1 | 12.5 | 21.3 | 11.2 | 10.4 | 11.6 | 15.4 |
|
| 82 |
+
| IFEval (Instruction Following) | 78.9 | 73.6 | 71.2 | 80.2 | 68.6 | 80.8 | 78.9 |
|
| 83 |
+
| AIME 2025 (Competition Math) | 13.4 | 4.5 | 9.8 | 12.7 | 5.3 | 4.3 | 9.6 |
|
| 84 |
+
| **Average** | **44.3** | **35.9** | **42.4** | **38.9** | **35.4** | **37.9** | **41.5** |
|
| 85 |
+
|
| 86 |
+
### aquif-3.5-7B (Multilingual Long Context)
|
| 87 |
+
|
| 88 |
+
A Qwen-based architecture optimized for multilingual applications with extended context capabilities, delivering state-of-the-art performance in its size class.
|
| 89 |
+
|
| 90 |
+
**Performance Comparison:**
|
| 91 |
+
|
| 92 |
+
| Metric | aquif-3.5 (7.3B) | EXAONE 3.5 (7.8B) | Qwen3 (8.2B) | Gemma 3 (12B) | Llama 3.1 (8B) | Kanana 1.5 (8B) | aquif-3 (3.2B) |
|
| 93 |
+
|--------|------------------|-------------------|-------------|---------------|----------------|-----------------|----------------|
|
| 94 |
+
| MMLU (General Knowledge) | 78.5 | 72.2 | 82.9 | 74.5 | 69.2 | 68.8 | 67.5 |
|
| 95 |
+
| GPQA Diamond (Science) | 42.3 | 39.4 | 39.3 | 40.9 | 32.8 | 37.5 | 36.1 |
|
| 96 |
+
| LiveCodeBench (Coding) | 21.3 | 18.0 | 23.9 | 13.7 | 10.8 | 16.5 | 15.4 |
|
| 97 |
+
| IFEval (Instruction Following) | 85.6 | 82.6 | 85.4 | 80.2 | 75.0 | 80.1 | 78.9 |
|
| 98 |
+
| AIME 2025 (Competition Math) | 23.4 | 18.3 | 20.9 | 18.8 | 2.7 | 13.4 | 9.6 |
|
| 99 |
+
| **Average** | **50.2** | **46.1** | **50.4** | **45.6** | **38.1** | **43.3** | **41.5** |
|
| 100 |
+
|
| 101 |
+
### aquif-3.5-8B-Think & aquif-3.5-A4B-Think (Reasoning Models)
|
| 102 |
+
|
| 103 |
+
Advanced reasoning-capable models designed for complex problem-solving tasks. The A4B variant leverages MoE architecture for enhanced efficiency while maintaining superior reasoning performance.
|
| 104 |
+
|
| 105 |
+
**Performance Comparison:**
|
| 106 |
+
|
| 107 |
+
| Metric | aquif-3.5 (12B A4B) | aquif-3.5 (8B) | Qwen3 Thinking 2507 (31B A3B) | gpt-oss-20b (21B A4B) | Nemotron Nano v2 (9B) | Solar Pro 2 |
|
| 108 |
+
|--------|---------------------|-----------------|-------------------------------|----------------------|----------------------|-------------|
|
| 109 |
+
| MMLU-Pro | 78.5 | 78.1 | 80.5 | 73.6 | 74.2 | 80.5 |
|
| 110 |
+
| GPQA Diamond | 70.8 | 66.8 | 70.7 | 61.7 | 64.0 | 68.7 |
|
| 111 |
+
| AIME 2025 | 84.4 | 81.4 | 56.3 | 61.7 | 69.7 | 61.3 |
|
| 112 |
+
| LiveCodeBench | 66.1 | 61.5 | 70.7 | 72.1 | 71.1 | 61.6 |
|
| 113 |
+
| Humanity's Last Exam | 8.9 | 8.2 | 9.8 | 8.5 | 6.5 | 7.0 |
|
| 114 |
+
| TAU-Bench v2 (avg) | 43.7 | 36.8 | 35.7 | 43.2 | 34.9 | 38.7 |
|
| 115 |
+
| **Average** | **58.7** | **55.5** | **54.0** | **53.5** | **53.4** | **53.0** |
|
| 116 |
+
|
| 117 |
+
## Key Improvements Over aquif-3
|
| 118 |
+
|
| 119 |
+
- **Simplified Naming**: Clear size-based nomenclature for easier model selection
|
| 120 |
+
- **Enhanced MoE Support**: Multiple MoE configurations across different model sizes
|
| 121 |
+
- **Reasoning Capabilities**: Dedicated thinking models for complex problem-solving
|
| 122 |
+
- **Extended Context**: Up to 128k context window for long-form applications
|
| 123 |
+
- **Multilingual by Default**: Native multilingual support across all variants
|
| 124 |
+
- **Performance Gains**: 5-15% improvement across benchmarks compared to aquif-3
|
| 125 |
+
|
| 126 |
+
## Usage Recommendations
|
| 127 |
+
|
| 128 |
+
- **aquif-3.5-A0.6B**: Experimental applications, resource-constrained environments
|
| 129 |
+
- **aquif-3.5-3B**: General-purpose applications, balanced performance/efficiency
|
| 130 |
+
- **aquif-3.5-7B**: Multilingual applications, long-context tasks
|
| 131 |
+
- **aquif-3.5-8B-Think**: Complex reasoning, scientific analysis
|
| 132 |
+
- **aquif-3.5-A4B-Think**: Advanced reasoning with efficiency optimization
|
| 133 |
+
|
| 134 |
+
## Technical Specifications
|
| 135 |
+
|
| 136 |
+
All models support:
|
| 137 |
+
- BF16 and FP16 precision
|
| 138 |
+
- Standard transformer architecture optimizations
|
| 139 |
+
- Efficient attention mechanisms
|
| 140 |
+
- Multi-head attention with optimized KV caching
|
| 141 |
+
|
| 142 |
+
## Acknowledgements
|
| 143 |
+
|
| 144 |
+
- **Qwen Team**: Base architecture for 7B, 8B, and 12B-A4B models
|
| 145 |
+
- **Meta Llama Team**: Base architecture for 3B and 2.6B-A0.6B models
|
| 146 |
+
- **Hugging Face**: Model hosting infrastructure and training libraries
|
| 147 |
+
|
| 148 |
+
## License
|
| 149 |
+
|
| 150 |
+
This project is released under the Apache 2.0 License. See LICENSE file for details.
|
| 151 |
+
|
| 152 |
+
---
|
| 153 |
+
|
| 154 |
+
*Made in π§π·*
|
| 155 |
+
|
| 156 |
+
Β© 2025 aquif AI. All rights reserved.
|