Spaces:

yashgupta1512
/

nest

Sleeping

App Files Files Community

nest / README.md

yashgupta1512

Update README.md

7b41a1a verified 11 months ago

preview code

raw

history blame contribute delete

2.34 kB

	---
	title: Nest
	emoji: 👁
	colorFrom: pink
	colorTo: blue
	sdk: streamlit
	sdk_version: 1.41.1
	app_file: app.py
	pinned: false
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
	# Project Submission

	## Files Overview
	1. model\merge.ipynb - Combines datasets into a single file.
	2. model\clean.ipynb - Cleans and preprocesses the data.
	3. app.py - Runs the main(streamlit) application.
	4. model\biobert.ipynb - Implements BioBERT for feature extraction.
	5. model\biobert_embeddings.pt - Generates, stores and processes embeddings.
	6. data\filtered_combined.xlsx - Stores data post filteration and combining datasets for analysis.

	## How to Reproduce the Results

	### Step 1: Install Dependencies
	Ensure you have Python installed. Run the following command to install required libraries:
	```bash
	pip install -r requirements.txt
	```

	### Step 2: Run the Application
	Use the following command to execute the main application:
	```bash
	streamlit run app.py
	```

	### Application Screenshot
	![Application Screenshot](image.jpg)

	---

	### t-SNE Plot
	t-SNE (t-Distributed Stochastic Neighbor Embedding) is used to visualize high-dimensional embeddings in a lower-dimensional space, helping to identify clusters or patterns in the data.

	![t-SNE Plot](model/tsne_visualization.png)

	---

	### Cosine Similarity Matrix
	The cosine similarity matrix shows the similarity scores between different clinical trial embeddings, where higher scores indicate more similar trials.

	![Cosine Similarity Matrix](model/cosine_similarity.png)

	### Step 3: Reproducing the Functionality
	The solution uses the following libraries for key functionalities:
	- NumPy and Pandas for data preprocessing and manipulation.
	- scikit-learn for machine learning pipelines and evaluation.
	- matplotlib for visualizing results.
	- torch for deep learning model implementation and training.
	- transformers for leveraging pre-trained models and tokenization.
	- tqdm for progress bar implementation to monitor loops and processes.

	### Packaging the Solution
	The final submission includes:
	1. Codebase - All Python scripts mentioned above.
	2. Detailed PPT - Explains the methodology, results, and conclusions.
	3. requirements.txt - Lists all dependencies for reproducibility.