Spaces:
Sleeping
Sleeping
File size: 3,702 Bytes
abc3e66 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
{
"cells": [
{
"cell_type": "markdown",
"id": "be6478b2",
"metadata": {},
"source": [
"\n",
"# Data Loading and Preparation\n",
"\n",
"In this section, we will load and prepare the data from two sources: `eligibilities.txt` and `usecase_1_.csv`.\n",
"\n",
"## Steps:\n",
"\n",
"1. **Import the pandas library**:\n",
"\n",
"2. **Load the `eligibilities.txt` data**:\n",
" - Use the `read_csv` method from pandas to load the data.\n",
" - Specify the separator as `|`.\n",
"\n",
"\n",
"3. **Select the necessary columns**:\n",
" - We are interested in the `nct_id` and `criteria` columns.\n",
"\n",
"\n",
"4. **Load the `usecase_1_.csv` data**:\n",
" - Use the `read_csv` method from pandas to load the data.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "20cfb7ee-0fd8-4b37-bae1-5ab98125ad10",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The column 'criteria' has been added to usecase_1_.csv and saved as usecase_1_merged.csv.\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"# Load the eligibilities.txt data\n",
"eligibilities = pd.read_csv('../eligibilities.txt', sep='|')\n",
"\n",
"# Select the necessary columns\n",
"eligibilities = eligibilities[['nct_id', 'criteria']]\n",
"\n",
"# Load the usecase_1_.csv data\n",
"usecase = pd.read_csv('../usecase_1_.csv')\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "e0c015c5",
"metadata": {
"vscode": {
"languageId": "markdown"
}
},
"source": [
"# Data Merging and Saving\n",
"\n",
"In this section, we will merge the datasets and save the merged data to a new CSV file.\n",
"\n",
"## Steps:\n",
"\n",
"1. **Rename the column in `usecase`**:\n",
" - Rename the column **'NCT Number'** to **'nct_id'** for merging.\n",
"\n",
"2. **Merge the datasets**:\n",
" - Merge the `usecase` and `eligibilities` datasets on the **'nct_id'** column.\n",
" - Use a left join to ensure all records from `usecase` are retained.\n",
"\n",
"3. **Save the merged data**:\n",
" - Save the merged data to a new CSV file named **'usecase_1_merged.csv'**.\n",
" - Do not include the index in the saved file.\n",
"\n",
"4. **Confirmation**:\n",
" - Print a message to confirm that the column **'criteria'** has been added and the file has been saved."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5fe97ff-8301-4b2b-9a2c-8564f9912054",
"metadata": {},
"outputs": [],
"source": [
"# Rename 'NCT Number' in usecase to 'nct_id' for merging\n",
"usecase.rename(columns={'NCT Number': 'nct_id'}, inplace=True)\n",
"\n",
"# Merge the datasets on 'nct_id'\n",
"merged_data = usecase.merge(eligibilities, on='nct_id', how='left')\n",
"\n",
"# Save the merged data to a new CSV\n",
"merged_data.to_csv('usecase_1_merged.csv', index=False)\n",
"\n",
"print(\"The column 'criteria' has been added to usecase_1_.csv and saved as usecase_1_merged.csv.\")\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
|