WORK IN PROGRESS
[WIP]HumAware-VAD: Humming-Aware Voice Activity Detection
π Overview
HumAware-VAD is a fine-tuned version of the Silero-VAD model, trained to distinguish humming from actual speech. Standard Voice Activity Detection (VAD) models, including Silero-VAD, often misclassify humming as speech, leading to inaccurate speech segmentation. HumAware-VAD improves upon this by leveraging a custom dataset (HumSpeechBlend) to enhance speech detection accuracy in the presence of humming.
π― Purpose
The primary goal of HumAware-VAD is to:
- Reduce false positives where humming is mistakenly detected as speech.
- Enhance speech segmentation accuracy in real-world applications.
- Improve VAD performance for tasks involving music, background noise, and vocal sounds.
ποΈ Model Details
- Base Model: Silero-VAD
- Fine-tuning Dataset: HumSpeechBlend
- Format: JIT (TorchScript)
- Framework: PyTorch
- Inference Speed: Real-time
π₯ Download & Usage
πΉ Install Dependencies
pip install torch torchaudio
πΉ Load the Model
import torch
def load_humaware_vad(model_path="humaware_vad.jit"):
model = torch.jit.load(model_path)
model.eval()
return model
vad_model = load_humaware_vad()
πΉ Run Inference
import torchaudio
waveform, sample_rate = torchaudio.load("data/0000.wav")
out = vad_model(waveform)
print("VAD Output:", out)
π Citation
If you use this model, please cite it accordingly.
@model{HumAwareVAD2025,
author = {Sourabh Saini},
title = {HumAware-VAD: Humming-Aware Voice Activity Detection},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/CuriousMonkey7/HumAware-VAD}
}
- Downloads last month
- 1,208
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for CuriousMonkey7/HumAware-VAD
Base model
freddyaboulton/silero-vad