nest / README.md
yashgupta1512's picture
Update README.md
7b41a1a verified

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade
metadata
title: Nest
emoji: πŸ‘
colorFrom: pink
colorTo: blue
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Project Submission

Files Overview

  1. model\merge.ipynb - Combines datasets into a single file.
  2. model\clean.ipynb - Cleans and preprocesses the data.
  3. app.py - Runs the main(streamlit) application.
  4. model\biobert.ipynb - Implements BioBERT for feature extraction.
  5. model\biobert_embeddings.pt - Generates, stores and processes embeddings.
  6. data\filtered_combined.xlsx - Stores data post filteration and combining datasets for analysis.

How to Reproduce the Results

Step 1: Install Dependencies

Ensure you have Python installed. Run the following command to install required libraries:

pip install -r requirements.txt

Step 2: Run the Application

Use the following command to execute the main application:

streamlit run app.py

Application Screenshot

Application Screenshot


t-SNE Plot

t-SNE (t-Distributed Stochastic Neighbor Embedding) is used to visualize high-dimensional embeddings in a lower-dimensional space, helping to identify clusters or patterns in the data.

t-SNE Plot


Cosine Similarity Matrix

The cosine similarity matrix shows the similarity scores between different clinical trial embeddings, where higher scores indicate more similar trials.

Cosine Similarity Matrix

Step 3: Reproducing the Functionality

The solution uses the following libraries for key functionalities:

  • NumPy and Pandas for data preprocessing and manipulation.
  • scikit-learn for machine learning pipelines and evaluation.
  • matplotlib for visualizing results.
  • torch for deep learning model implementation and training.
  • transformers for leveraging pre-trained models and tokenization.
  • tqdm for progress bar implementation to monitor loops and processes.

Packaging the Solution

The final submission includes:

  1. Codebase - All Python scripts mentioned above.
  2. Detailed PPT - Explains the methodology, results, and conclusions.
  3. requirements.txt - Lists all dependencies for reproducibility.