File size: 2,288 Bytes
c57bdfc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dd10e31
 
 
 
c57bdfc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
language: en
license: mit
tags:
- audio-classification
- engine-diagnostics
- knock-detection
- resnet
datasets:
- custom
metrics:
- accuracy
- f1
---

# Engine Knock Detection - ResNet-18

This model detects engine knock from audio recordings using a fine-tuned ResNet-18 architecture on mel-spectrograms.

## Model Description

- **Architecture**: ResNet-18 (pretrained on ImageNet, fine-tuned for audio)
- **Input**: Mel-spectrograms (224x224, 3-channel)
- **Output**: Binary classification (clean vs knocking)
- **Framework**: PyTorch

## Performance Metrics

Evaluated on test set:

| Metric    | Score  |
|-----------|--------|
| Accuracy  | 0.8778 |
| Precision | 0.9518 |
| Recall    | 0.8144 |
| F1-Score  | 0.8778 |

## Usage

```python
import torch
import torchaudio
from torchvision import models
from huggingface_hub import hf_hub_download

# Load model
model = models.resnet18(pretrained=False)
model.fc = torch.nn.Linear(model.fc.in_features, 2)
model_path = hf_hub_download(repo_id="cxlrd/engine-knock-resnet18", filename="model.pth")
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()

# Prepare audio
waveform, sample_rate = torchaudio.load('audio.wav')
mel_spec = torchaudio.transforms.MelSpectrogram(
    sample_rate=16000, n_fft=1024, hop_length=512, n_mels=128
)(waveform)
mel_spec_db = torchaudio.transforms.AmplitudeToDB()(mel_spec)
mel_spec_db = torch.nn.functional.interpolate(
    mel_spec_db.unsqueeze(0), size=(224, 224), mode='bilinear'
).repeat(1, 3, 1, 1)

# Predict
with torch.no_grad():
    output = model(mel_spec_db)
    prediction = torch.argmax(output, dim=1)
    print('Clean' if prediction == 0 else 'Knocking')
```

## Training Details

- **Dataset**: Custom engine sound recordings (1199 samples)
- **Training Split**: 70% train, 15% validation, 15% test
- **Optimizer**: Adam (lr=1e-4, weight_decay=1e-4)
- **Batch Size**: 16
- **Early Stopping**: Patience of 5 epochs
- **Data Augmentation**: Mel-spectrogram normalization

## Citation

If you use this model, please cite:

```bibtex
@misc{engine-knock-resnet18,
  author = {cxlrd},
  title = {Engine Knock Detection with ResNet-18},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-resnet18}}
}
```