File size: 5,343 Bytes
f41b790
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
---
language: en
license: apache-2.0
tags:
- audio-classification
- military-audio
- ast
- tiny-ast
- pytorch
- transformers
- surveillance
- edge-deployment
metrics:
- accuracy
- f1
model-index:
- name: tiny-ast-mad-military-audio-classifier
  results:
  - task:
      type: audio-classification
      name: Military Audio Classification
    dataset:
      name: MAD Dataset
      type: military-audio
    metrics:
    - type: accuracy
      value: 0.9673
      name: Accuracy
    - type: f1
      value: 0.9674
      name: F1-weighted
---

# Tiny-AST Military Audio Classifier

πŸŽ–οΈ **State-of-the-art military audio classification model** achieving **96.73% accuracy** on the Military Audio Dataset (MAD).

## Model Description

This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on the Military Audio Dataset (MAD). It's designed for **edge deployment** on devices like Raspberry Pi 5 for military surveillance applications.

### Key Features
- 🎯 **96.73% accuracy** on MAD dataset (7 military audio classes)  
- πŸš€ **Edge-optimized** for Raspberry Pi deployment
- ⚑ **Fast inference** (<200ms per sample)
- 🧠 **Efficient** (16.5% of parameters fine-tuned)
- πŸ”Š **Robust** to real-world military environments

## Training Results

### Progressive Training Performance:
- **Phase 1** (Classifier only): 94.32% accuracy
- **Phase 2** (Top 2 layers): 96.73% accuracy ← **Best Model**
- **Phase 3** (Top 4 layers): 96.35% accuracy  
- **Phase 4** (Top 6 layers): 96.73% accuracy

### Training Configuration:
- **Method**: Progressive unfreezing strategy
- **Learning Rates**: Conservative (1e-4 β†’ 2e-5)
- **Normalization**: MAD-specific statistics (mean: -2.16, std: 2.85)
- **Class Weighting**: Balanced for imbalanced dataset
- **Training Time**: 40 minutes on RTX 3060

## Model Classes

The model classifies 7 military audio categories:

| Class ID | Class Name | Training Samples | Test Samples |
|----------|------------|------------------|--------------|
| 0 | Communication | 774 | 207 |
| 1 | Footsteps | 1,293 | 280 |
| 2 | Gunshot | 773 | 104 |
| 3 | Shelling | 883 | 104 |
| 4 | Vehicle | 910 | 122 |
| 5 | Helicopter | 934 | 91 |
| 6 | Fighter | 862 | 129 |

## Usage

### Quick Start
```python
from transformers import ASTForAudioClassification, ASTFeatureExtractor
import librosa
import torch

# Load model and feature extractor
model = ASTForAudioClassification.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier")
feature_extractor = ASTFeatureExtractor.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier")

# Load audio file (16kHz recommended)
audio, sr = librosa.load("military_audio.wav", sr=16000)

# Extract features
inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt")

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    predicted_class = torch.argmax(outputs.logits, dim=-1).item()

# Class mapping
classes = ['Communication', 'Footsteps', 'Gunshot', 'Shelling', 'Vehicle', 'Helicopter', 'Fighter']
print(f"Predicted class: {classes[predicted_class]}")
```

### Edge Deployment (Raspberry Pi 5)
```python
import onnxruntime as ort

# Load ONNX model for edge inference
session = ort.InferenceSession("tiny_ast_mad_optimized.onnx")
# ... inference code
```

## Training Details

### Dataset
- **Source**: Military Audio Dataset (MAD)
- **Total Samples**: 7,466 audio files
- **Duration**: 2-8 seconds per sample
- **Sample Rate**: 16kHz
- **Augmentation**: Military-specific (time stretch, pitch shift, noise injection)

### Architecture
- **Base Model**: Audio Spectrogram Transformer (AST)
- **Parameters**: 86.2M total, 14.2M trainable (16.5%)
- **Input**: Log-Mel spectrograms (1024 x 128)
- **Output**: 7 military audio classes

### Performance Metrics
- **Accuracy**: 96.73%
- **F1-Macro**: 96.84%
- **F1-Weighted**: 96.74%
- **Precision**: High across all classes
- **Recall**: Balanced performance

## Hardware Requirements

### Training
- **GPU**: RTX 3060 (12GB VRAM) or similar
- **RAM**: 16GB+ recommended
- **Storage**: 50GB for dataset and models

### Inference (Edge)
- **Device**: Raspberry Pi 5 or similar ARM device
- **RAM**: 2GB minimum
- **Inference Time**: <200ms per sample
- **Power**: <5W continuous operation

## Limitations and Considerations

- **Domain-specific**: Optimized for military audio contexts
- **Language**: Primarily English communication samples
- **Environment**: Trained on MAD dataset conditions
- **Real-time**: Designed for batch processing, not streaming

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{tiny-ast-mad-2024,
  title={Tiny-AST Military Audio Classifier: Progressive Fine-tuning for Edge Deployment},
  author={Paul, Akash},
  year={2024},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/Akashpaul123/tiny-ast-mad-military-audio-classifier}
}
```

## License

This model is licensed under the Apache 2.0 License.

## Contact

- **Author**: Akash Paul
- **GitHub**: [@akashpaul123](https://github.com/akashpaul123)
- **Hugging Face**: [@akashpaul123](https://huggingface.co/akashpaul123)

---

*Model trained as part of military audio surveillance research with focus on edge deployment and real-world robustness.*