AuthEcho_Project
This project contains well-trained deep learning models to predict the Speaker and their Gender.
The repository offers a Speaker and Gender Prediction System built using TensorFlow, Librosa, and Gradio. The application predicts the top 3 speakers and their probabilities from an audio file, determines the speaker's gender, and classifies unknown speakers using a confidence threshold.
Features
- Predicts the top 3 speakers from an audio file.
 - Determines the gender of the speaker.
 - Identifies unknown speakers with a confidence threshold.
 - Provides a Gradio interface for easy testing.
 
Getting Started
Prerequisites
To run this application, you need:
- Python: Version 3.8 or higher
 - Required Python libraries:
tensorflownumpylibrosagradioscikit-learn
 
Install the required libraries with:
pip install tensorflow numpy librosa gradio scikit-learn
Installation
- Clone the Repository:
 
git clone https://github.com/your-username/speaker-gender-prediction.git
cd speaker-gender-prediction
- Add Pre-Trained Models and Label Encoders:
 
Place the following files in the repository's root directory:
lstm_speaker_model.h5: Pre-trained speaker recognition model.lstm_gender_model.h5: Pre-trained gender prediction model.lstm_speaker_label.pkl: Label encoder for speaker classes.lstm_gender_label.pkl: Label encoder for gender classes.
Usage
Run the application using:
python app.py
Gradio Interface
The Gradio interface allows you to:
- Upload an audio file or record audio directly.
 - Predict the top 3 speakers and their probabilities.
 - Determine the gender of the speaker.
 - Detect and classify unknown speakers using confidence thresholds.
 
Project Structure
.
βββ app.py                # Main application file
βββ models/lstm_speaker_model.h5   # Pre-trained speaker model (to be added)
βββ models/lstm_gender_model.h5    # Pre-trained gender model (to be added)
βββ models/lstm_speaker_label.pkl  # Speaker label encoder (to be added)
βββ models/lstm_gender_label.pkl   # Gender label encoder (to be added)
βββ requirements.txt        # Python dependencies
βββ README.md               # Project documentation
Example Output
Top 3 Predicted Speakers:
The top 3 predicted speakers are:
Speaker 1: 85.23%
Speaker 2: 10.12%
Speaker 3: 4.65%
The predicted gender is: Male
Unknown Speaker:
The top 3 predicted speakers are:
Unknown: 45.23%
The predicted gender is: Unknown
How It Works
Feature Extraction:
- Extracts MFCCs, chroma features, and spectral contrast from the input audio file using 
librosa. 
- Extracts MFCCs, chroma features, and spectral contrast from the input audio file using 
 Speaker and Gender Models:
- Speaker Model: A pre-trained LSTM model classifies the speaker based on extracted features.
 - Gender Model: A separate LSTM model determines the gender.
 
Unknown Detection:
- If the highest confidence score for a speaker is below a defined threshold, the speaker is classified as "Unknown."
 
Roadmap
- Add support for real-time audio predictions.
 - Improve unknown speaker detection using open-set recognition techniques.
 - Expand the dataset for more robust gender classification.
 
Contributing
Contributions are welcome! To contribute:
- Fork the repository.
 - Create a feature branch (
git checkout -b feature-branch-name). - Commit your changes (
git commit -m "Add new feature"). - Push to the branch (
git push origin feature-branch-name). - Open a Pull Request.
 
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- TensorFlow: For building the deep learning models.
 - Librosa: For audio processing and feature extraction.
 - Gradio: For creating the user interface.