# Project Submission ## Files Overview 1. **model\merge.ipynb** - Combines datasets into a single file. 2. **model\clean.ipynb** - Cleans and preprocesses the data. 3. **app.py** - Runs the main(streamlit) application. 4. **model\biobert.ipynb** - Implements BioBERT for feature extraction. 5. **model\biobert_embeddings.pt** - Generates, stores and processes embeddings. 6. **data\filtered_combined.xlsx** - Stores data post filteration and combining datasets for analysis. ## How to Reproduce the Results ### Step 1: Install Dependencies Ensure you have Python installed. Run the following command to install required libraries: ```bash pip install -r requirements.txt ``` ### Step 2: Run the Application Use the following command to execute the main application: ```bash streamlit run app.py ``` ### Application Screenshot ![Application Screenshot](image.jpg) ### Step 3: Reproducing the Functionality The solution uses the following libraries for key functionalities: - **NumPy and Pandas** for data preprocessing and manipulation. - **scikit-learn** for machine learning pipelines and evaluation. - **matplotlib** for visualizing results. - **torch** for deep learning model implementation and training. - **transformers** for leveraging pre-trained models and tokenization. - **tqdm** for progress bar implementation to monitor loops and processes. ### Packaging the Solution The final submission includes: 1. **Codebase** - All Python scripts mentioned above. 2. **Detailed PPT** - Explains the methodology, results, and conclusions. 3. **requirements.txt** - Lists all dependencies for reproducibility.