{ "cells": [ { "cell_type": "markdown", "id": "be6478b2", "metadata": {}, "source": [ "\n", "# Data Loading and Preparation\n", "\n", "In this section, we will load and prepare the data from two sources: `eligibilities.txt` and `usecase_1_.csv`.\n", "\n", "## Steps:\n", "\n", "1. **Import the pandas library**:\n", "\n", "2. **Load the `eligibilities.txt` data**:\n", " - Use the `read_csv` method from pandas to load the data.\n", " - Specify the separator as `|`.\n", "\n", "\n", "3. **Select the necessary columns**:\n", " - We are interested in the `nct_id` and `criteria` columns.\n", "\n", "\n", "4. **Load the `usecase_1_.csv` data**:\n", " - Use the `read_csv` method from pandas to load the data.\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "20cfb7ee-0fd8-4b37-bae1-5ab98125ad10", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The column 'criteria' has been added to usecase_1_.csv and saved as usecase_1_merged.csv.\n" ] } ], "source": [ "import pandas as pd\n", "\n", "# Load the eligibilities.txt data\n", "eligibilities = pd.read_csv('../eligibilities.txt', sep='|')\n", "\n", "# Select the necessary columns\n", "eligibilities = eligibilities[['nct_id', 'criteria']]\n", "\n", "# Load the usecase_1_.csv data\n", "usecase = pd.read_csv('../usecase_1_.csv')\n", "\n" ] }, { "cell_type": "markdown", "id": "e0c015c5", "metadata": { "vscode": { "languageId": "markdown" } }, "source": [ "# Data Merging and Saving\n", "\n", "In this section, we will merge the datasets and save the merged data to a new CSV file.\n", "\n", "## Steps:\n", "\n", "1. **Rename the column in `usecase`**:\n", " - Rename the column **'NCT Number'** to **'nct_id'** for merging.\n", "\n", "2. **Merge the datasets**:\n", " - Merge the `usecase` and `eligibilities` datasets on the **'nct_id'** column.\n", " - Use a left join to ensure all records from `usecase` are retained.\n", "\n", "3. **Save the merged data**:\n", " - Save the merged data to a new CSV file named **'usecase_1_merged.csv'**.\n", " - Do not include the index in the saved file.\n", "\n", "4. **Confirmation**:\n", " - Print a message to confirm that the column **'criteria'** has been added and the file has been saved." ] }, { "cell_type": "code", "execution_count": null, "id": "b5fe97ff-8301-4b2b-9a2c-8564f9912054", "metadata": {}, "outputs": [], "source": [ "# Rename 'NCT Number' in usecase to 'nct_id' for merging\n", "usecase.rename(columns={'NCT Number': 'nct_id'}, inplace=True)\n", "\n", "# Merge the datasets on 'nct_id'\n", "merged_data = usecase.merge(eligibilities, on='nct_id', how='left')\n", "\n", "# Save the merged data to a new CSV\n", "merged_data.to_csv('usecase_1_merged.csv', index=False)\n", "\n", "print(\"The column 'criteria' has been added to usecase_1_.csv and saved as usecase_1_merged.csv.\")\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }