Spaces:
Sleeping
Sleeping
Project Submission
Files Overview
- model\merge.ipynb - Combines datasets into a single file.
- model\clean.ipynb - Cleans and preprocesses the data.
- app.py - Runs the main(streamlit) application.
- model\biobert.ipynb - Implements BioBERT for feature extraction.
- model\biobert_embeddings.pt - Generates, stores and processes embeddings.
- data\filtered_combined.xlsx - Stores data post filteration and combining datasets for analysis.
How to Reproduce the Results
Step 1: Install Dependencies
Ensure you have Python installed. Run the following command to install required libraries:
pip install -r requirements.txt
Step 2: Run the Application
Use the following command to execute the main application:
streamlit run app.py
Application Screenshot
Step 3: Reproducing the Functionality
The solution uses the following libraries for key functionalities:
- NumPy and Pandas for data preprocessing and manipulation.
- scikit-learn for machine learning pipelines and evaluation.
- matplotlib for visualizing results.
- torch for deep learning model implementation and training.
- transformers for leveraging pre-trained models and tokenization.
- tqdm for progress bar implementation to monitor loops and processes.
Packaging the Solution
The final submission includes:
- Codebase - All Python scripts mentioned above.
- Detailed PPT - Explains the methodology, results, and conclusions.
- requirements.txt - Lists all dependencies for reproducibility.
