Spaces:
Sleeping
Sleeping
| title: Nest | |
| emoji: π | |
| colorFrom: pink | |
| colorTo: blue | |
| sdk: streamlit | |
| sdk_version: 1.41.1 | |
| app_file: app.py | |
| pinned: false | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # Project Submission | |
| ## Files Overview | |
| 1. **model\merge.ipynb** - Combines datasets into a single file. | |
| 2. **model\clean.ipynb** - Cleans and preprocesses the data. | |
| 3. **app.py** - Runs the main(streamlit) application. | |
| 4. **model\biobert.ipynb** - Implements BioBERT for feature extraction. | |
| 5. **model\biobert_embeddings.pt** - Generates, stores and processes embeddings. | |
| 6. **data\filtered_combined.xlsx** - Stores data post filteration and combining datasets for analysis. | |
| ## How to Reproduce the Results | |
| ### Step 1: Install Dependencies | |
| Ensure you have Python installed. Run the following command to install required libraries: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Step 2: Run the Application | |
| Use the following command to execute the main application: | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| ### Application Screenshot | |
|  | |
| --- | |
| ### t-SNE Plot | |
| t-SNE (t-Distributed Stochastic Neighbor Embedding) is used to visualize high-dimensional embeddings in a lower-dimensional space, helping to identify clusters or patterns in the data. | |
|  | |
| --- | |
| ### Cosine Similarity Matrix | |
| The cosine similarity matrix shows the similarity scores between different clinical trial embeddings, where higher scores indicate more similar trials. | |
|  | |
| ### Step 3: Reproducing the Functionality | |
| The solution uses the following libraries for key functionalities: | |
| - **NumPy and Pandas** for data preprocessing and manipulation. | |
| - **scikit-learn** for machine learning pipelines and evaluation. | |
| - **matplotlib** for visualizing results. | |
| - **torch** for deep learning model implementation and training. | |
| - **transformers** for leveraging pre-trained models and tokenization. | |
| - **tqdm** for progress bar implementation to monitor loops and processes. | |
| ### Packaging the Solution | |
| The final submission includes: | |
| 1. **Codebase** - All Python scripts mentioned above. | |
| 2. **Detailed PPT** - Explains the methodology, results, and conclusions. | |
| 3. **requirements.txt** - Lists all dependencies for reproducibility. |