--- title: Nest emoji: 👁 colorFrom: pink colorTo: blue sdk: streamlit sdk_version: 1.41.1 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # Project Submission ## Files Overview 1. **model\merge.ipynb** - Combines datasets into a single file. 2. **model\clean.ipynb** - Cleans and preprocesses the data. 3. **app.py** - Runs the main(streamlit) application. 4. **model\biobert.ipynb** - Implements BioBERT for feature extraction. 5. **model\biobert_embeddings.pt** - Generates, stores and processes embeddings. 6. **data\filtered_combined.xlsx** - Stores data post filteration and combining datasets for analysis. ## How to Reproduce the Results ### Step 1: Install Dependencies Ensure you have Python installed. Run the following command to install required libraries: ```bash pip install -r requirements.txt ``` ### Step 2: Run the Application Use the following command to execute the main application: ```bash streamlit run app.py ``` ### Application Screenshot ![Application Screenshot](image.jpg) --- ### t-SNE Plot t-SNE (t-Distributed Stochastic Neighbor Embedding) is used to visualize high-dimensional embeddings in a lower-dimensional space, helping to identify clusters or patterns in the data. ![t-SNE Plot](model/tsne_visualization.png) --- ### Cosine Similarity Matrix The cosine similarity matrix shows the similarity scores between different clinical trial embeddings, where higher scores indicate more similar trials. ![Cosine Similarity Matrix](model/cosine_similarity.png) ### Step 3: Reproducing the Functionality The solution uses the following libraries for key functionalities: - **NumPy and Pandas** for data preprocessing and manipulation. - **scikit-learn** for machine learning pipelines and evaluation. - **matplotlib** for visualizing results. - **torch** for deep learning model implementation and training. - **transformers** for leveraging pre-trained models and tokenization. - **tqdm** for progress bar implementation to monitor loops and processes. ### Packaging the Solution The final submission includes: 1. **Codebase** - All Python scripts mentioned above. 2. **Detailed PPT** - Explains the methodology, results, and conclusions. 3. **requirements.txt** - Lists all dependencies for reproducibility.