Engine Knock Detection - ResNet-18

This model detects engine knock from audio recordings using a fine-tuned ResNet-18 architecture on mel-spectrograms.

Model Description

Architecture: ResNet-18 (pretrained on ImageNet, fine-tuned for audio)
Input: Mel-spectrograms (224x224, 3-channel)
Output: Binary classification (clean vs knocking)
Framework: PyTorch

Performance Metrics

Evaluated on test set:

Metric	Score
Accuracy	0.8722
Precision	0.9405
Recall	0.8144
F1-Score	0.8729

Usage

import torch
import torchaudio
from torchvision import models
from huggingface_hub import hf_hub_download

# Load model
model = models.resnet18(pretrained=False)
model.fc = torch.nn.Linear(model.fc.in_features, 2)
model_path = hf_hub_download(repo_id="cxlrd/engine-knock-resnet18", filename="model.pth")
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()

# Prepare audio
waveform, sample_rate = torchaudio.load('audio.wav')
mel_spec = torchaudio.transforms.MelSpectrogram(
    sample_rate=16000, n_fft=1024, hop_length=512, n_mels=128
)(waveform)
mel_spec_db = torchaudio.transforms.AmplitudeToDB()(mel_spec)
mel_spec_db = torch.nn.functional.interpolate(
    mel_spec_db.unsqueeze(0), size=(224, 224), mode='bilinear'
).repeat(1, 3, 1, 1)

# Predict
with torch.no_grad():
    output = model(mel_spec_db)
    prediction = torch.argmax(output, dim=1)
    print('Clean' if prediction == 0 else 'Knocking')

Training Details

Dataset: Custom engine sound recordings (1199 samples)
Training Split: 70% train, 15% validation, 15% test
Optimizer: Adam (lr=1e-4, weight_decay=1e-4)
Batch Size: 16
Early Stopping: Patience of 5 epochs
Data Augmentation: Mel-spectrogram normalization

Citation

If you use this model, please cite:

@misc{engine-knock-resnet18,
  author = {cxlrd},
  title = {Engine Knock Detection with ResNet-18},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-resnet18}}
}

Downloads last month: 71