File size: 5,343 Bytes
f41b790 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
---
language: en
license: apache-2.0
tags:
- audio-classification
- military-audio
- ast
- tiny-ast
- pytorch
- transformers
- surveillance
- edge-deployment
metrics:
- accuracy
- f1
model-index:
- name: tiny-ast-mad-military-audio-classifier
results:
- task:
type: audio-classification
name: Military Audio Classification
dataset:
name: MAD Dataset
type: military-audio
metrics:
- type: accuracy
value: 0.9673
name: Accuracy
- type: f1
value: 0.9674
name: F1-weighted
---
# Tiny-AST Military Audio Classifier
ποΈ **State-of-the-art military audio classification model** achieving **96.73% accuracy** on the Military Audio Dataset (MAD).
## Model Description
This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on the Military Audio Dataset (MAD). It's designed for **edge deployment** on devices like Raspberry Pi 5 for military surveillance applications.
### Key Features
- π― **96.73% accuracy** on MAD dataset (7 military audio classes)
- π **Edge-optimized** for Raspberry Pi deployment
- β‘ **Fast inference** (<200ms per sample)
- π§ **Efficient** (16.5% of parameters fine-tuned)
- π **Robust** to real-world military environments
## Training Results
### Progressive Training Performance:
- **Phase 1** (Classifier only): 94.32% accuracy
- **Phase 2** (Top 2 layers): 96.73% accuracy β **Best Model**
- **Phase 3** (Top 4 layers): 96.35% accuracy
- **Phase 4** (Top 6 layers): 96.73% accuracy
### Training Configuration:
- **Method**: Progressive unfreezing strategy
- **Learning Rates**: Conservative (1e-4 β 2e-5)
- **Normalization**: MAD-specific statistics (mean: -2.16, std: 2.85)
- **Class Weighting**: Balanced for imbalanced dataset
- **Training Time**: 40 minutes on RTX 3060
## Model Classes
The model classifies 7 military audio categories:
| Class ID | Class Name | Training Samples | Test Samples |
|----------|------------|------------------|--------------|
| 0 | Communication | 774 | 207 |
| 1 | Footsteps | 1,293 | 280 |
| 2 | Gunshot | 773 | 104 |
| 3 | Shelling | 883 | 104 |
| 4 | Vehicle | 910 | 122 |
| 5 | Helicopter | 934 | 91 |
| 6 | Fighter | 862 | 129 |
## Usage
### Quick Start
```python
from transformers import ASTForAudioClassification, ASTFeatureExtractor
import librosa
import torch
# Load model and feature extractor
model = ASTForAudioClassification.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier")
feature_extractor = ASTFeatureExtractor.from_pretrained("Akashpaul123/tiny-ast-mad-military-audio-classifier")
# Load audio file (16kHz recommended)
audio, sr = librosa.load("military_audio.wav", sr=16000)
# Extract features
inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
# Predict
with torch.no_grad():
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()
# Class mapping
classes = ['Communication', 'Footsteps', 'Gunshot', 'Shelling', 'Vehicle', 'Helicopter', 'Fighter']
print(f"Predicted class: {classes[predicted_class]}")
```
### Edge Deployment (Raspberry Pi 5)
```python
import onnxruntime as ort
# Load ONNX model for edge inference
session = ort.InferenceSession("tiny_ast_mad_optimized.onnx")
# ... inference code
```
## Training Details
### Dataset
- **Source**: Military Audio Dataset (MAD)
- **Total Samples**: 7,466 audio files
- **Duration**: 2-8 seconds per sample
- **Sample Rate**: 16kHz
- **Augmentation**: Military-specific (time stretch, pitch shift, noise injection)
### Architecture
- **Base Model**: Audio Spectrogram Transformer (AST)
- **Parameters**: 86.2M total, 14.2M trainable (16.5%)
- **Input**: Log-Mel spectrograms (1024 x 128)
- **Output**: 7 military audio classes
### Performance Metrics
- **Accuracy**: 96.73%
- **F1-Macro**: 96.84%
- **F1-Weighted**: 96.74%
- **Precision**: High across all classes
- **Recall**: Balanced performance
## Hardware Requirements
### Training
- **GPU**: RTX 3060 (12GB VRAM) or similar
- **RAM**: 16GB+ recommended
- **Storage**: 50GB for dataset and models
### Inference (Edge)
- **Device**: Raspberry Pi 5 or similar ARM device
- **RAM**: 2GB minimum
- **Inference Time**: <200ms per sample
- **Power**: <5W continuous operation
## Limitations and Considerations
- **Domain-specific**: Optimized for military audio contexts
- **Language**: Primarily English communication samples
- **Environment**: Trained on MAD dataset conditions
- **Real-time**: Designed for batch processing, not streaming
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{tiny-ast-mad-2024,
title={Tiny-AST Military Audio Classifier: Progressive Fine-tuning for Edge Deployment},
author={Paul, Akash},
year={2024},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/Akashpaul123/tiny-ast-mad-military-audio-classifier}
}
```
## License
This model is licensed under the Apache 2.0 License.
## Contact
- **Author**: Akash Paul
- **GitHub**: [@akashpaul123](https://github.com/akashpaul123)
- **Hugging Face**: [@akashpaul123](https://huggingface.co/akashpaul123)
---
*Model trained as part of military audio surveillance research with focus on edge deployment and real-world robustness.*
|