Engine Knock Detection - ResNet-18
This model detects engine knock from audio recordings using a fine-tuned ResNet-18 architecture on mel-spectrograms.
Model Description
- Architecture: ResNet-18 (pretrained on ImageNet, fine-tuned for audio)
- Input: Mel-spectrograms (224x224, 3-channel)
- Output: Binary classification (clean vs knocking)
- Framework: PyTorch
Performance Metrics
Evaluated on test set:
| Metric | Score |
|---|---|
| Accuracy | 0.8722 |
| Precision | 0.9405 |
| Recall | 0.8144 |
| F1-Score | 0.8729 |
Usage
import torch
import torchaudio
from torchvision import models
from huggingface_hub import hf_hub_download
# Load model
model = models.resnet18(pretrained=False)
model.fc = torch.nn.Linear(model.fc.in_features, 2)
model_path = hf_hub_download(repo_id="cxlrd/engine-knock-resnet18", filename="model.pth")
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()
# Prepare audio
waveform, sample_rate = torchaudio.load('audio.wav')
mel_spec = torchaudio.transforms.MelSpectrogram(
sample_rate=16000, n_fft=1024, hop_length=512, n_mels=128
)(waveform)
mel_spec_db = torchaudio.transforms.AmplitudeToDB()(mel_spec)
mel_spec_db = torch.nn.functional.interpolate(
mel_spec_db.unsqueeze(0), size=(224, 224), mode='bilinear'
).repeat(1, 3, 1, 1)
# Predict
with torch.no_grad():
output = model(mel_spec_db)
prediction = torch.argmax(output, dim=1)
print('Clean' if prediction == 0 else 'Knocking')
Training Details
- Dataset: Custom engine sound recordings (1199 samples)
- Training Split: 70% train, 15% validation, 15% test
- Optimizer: Adam (lr=1e-4, weight_decay=1e-4)
- Batch Size: 16
- Early Stopping: Patience of 5 epochs
- Data Augmentation: Mel-spectrogram normalization
Citation
If you use this model, please cite:
@misc{engine-knock-resnet18,
author = {cxlrd},
title = {Engine Knock Detection with ResNet-18},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-resnet18}}
}
- Downloads last month
- 71