Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.52.1
metadata
title: Nest
emoji: π
colorFrom: pink
colorTo: blue
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Project Submission
Files Overview
- model\merge.ipynb - Combines datasets into a single file.
- model\clean.ipynb - Cleans and preprocesses the data.
- app.py - Runs the main(streamlit) application.
- model\biobert.ipynb - Implements BioBERT for feature extraction.
- model\biobert_embeddings.pt - Generates, stores and processes embeddings.
- data\filtered_combined.xlsx - Stores data post filteration and combining datasets for analysis.
How to Reproduce the Results
Step 1: Install Dependencies
Ensure you have Python installed. Run the following command to install required libraries:
pip install -r requirements.txt
Step 2: Run the Application
Use the following command to execute the main application:
streamlit run app.py
Application Screenshot
t-SNE Plot
t-SNE (t-Distributed Stochastic Neighbor Embedding) is used to visualize high-dimensional embeddings in a lower-dimensional space, helping to identify clusters or patterns in the data.
Cosine Similarity Matrix
The cosine similarity matrix shows the similarity scores between different clinical trial embeddings, where higher scores indicate more similar trials.
Step 3: Reproducing the Functionality
The solution uses the following libraries for key functionalities:
- NumPy and Pandas for data preprocessing and manipulation.
- scikit-learn for machine learning pipelines and evaluation.
- matplotlib for visualizing results.
- torch for deep learning model implementation and training.
- transformers for leveraging pre-trained models and tokenization.
- tqdm for progress bar implementation to monitor loops and processes.
Packaging the Solution
The final submission includes:
- Codebase - All Python scripts mentioned above.
- Detailed PPT - Explains the methodology, results, and conclusions.
- requirements.txt - Lists all dependencies for reproducibility.


