nest / readme.md
yashgupta1512's picture
Upload readme.md
86f3953 verified
|
raw
history blame
1.62 kB

Project Submission

Files Overview

  1. model\merge.ipynb - Combines datasets into a single file.
  2. model\clean.ipynb - Cleans and preprocesses the data.
  3. app.py - Runs the main(streamlit) application.
  4. model\biobert.ipynb - Implements BioBERT for feature extraction.
  5. model\biobert_embeddings.pt - Generates, stores and processes embeddings.
  6. data\filtered_combined.xlsx - Stores data post filteration and combining datasets for analysis.

How to Reproduce the Results

Step 1: Install Dependencies

Ensure you have Python installed. Run the following command to install required libraries:

pip install -r requirements.txt

Step 2: Run the Application

Use the following command to execute the main application:

streamlit run app.py

Application Screenshot

Application Screenshot

Step 3: Reproducing the Functionality

The solution uses the following libraries for key functionalities:

  • NumPy and Pandas for data preprocessing and manipulation.
  • scikit-learn for machine learning pipelines and evaluation.
  • matplotlib for visualizing results.
  • torch for deep learning model implementation and training.
  • transformers for leveraging pre-trained models and tokenization.
  • tqdm for progress bar implementation to monitor loops and processes.

Packaging the Solution

The final submission includes:

  1. Codebase - All Python scripts mentioned above.
  2. Detailed PPT - Explains the methodology, results, and conclusions.
  3. requirements.txt - Lists all dependencies for reproducibility.