Spaces:

yashgupta1512
/

nest

Sleeping

File size: 3,702 Bytes

abc3e66

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "be6478b2",
   "metadata": {},
   "source": [
    "\n",
    "# Data Loading and Preparation\n",
    "\n",
    "In this section, we will load and prepare the data from two sources: `eligibilities.txt` and `usecase_1_.csv`.\n",
    "\n",
    "## Steps:\n",
    "\n",
    "1. **Import the pandas library**:\n",
    "\n",
    "2. **Load the `eligibilities.txt` data**:\n",
    "    - Use the `read_csv` method from pandas to load the data.\n",
    "    - Specify the separator as `|`.\n",
    "\n",
    "\n",
    "3. **Select the necessary columns**:\n",
    "    - We are interested in the `nct_id` and `criteria` columns.\n",
    "\n",
    "\n",
    "4. **Load the `usecase_1_.csv` data**:\n",
    "    - Use the `read_csv` method from pandas to load the data.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "20cfb7ee-0fd8-4b37-bae1-5ab98125ad10",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The column 'criteria' has been added to usecase_1_.csv and saved as usecase_1_merged.csv.\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Load the eligibilities.txt data\n",
    "eligibilities = pd.read_csv('../eligibilities.txt', sep='|')\n",
    "\n",
    "# Select the necessary columns\n",
    "eligibilities = eligibilities[['nct_id', 'criteria']]\n",
    "\n",
    "# Load the usecase_1_.csv data\n",
    "usecase = pd.read_csv('../usecase_1_.csv')\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0c015c5",
   "metadata": {
    "vscode": {
     "languageId": "markdown"
    }
   },
   "source": [
    "# Data Merging and Saving\n",
    "\n",
    "In this section, we will merge the datasets and save the merged data to a new CSV file.\n",
    "\n",
    "## Steps:\n",
    "\n",
    "1. **Rename the column in `usecase`**:\n",
    "    - Rename the column **'NCT Number'** to **'nct_id'** for merging.\n",
    "\n",
    "2. **Merge the datasets**:\n",
    "    - Merge the `usecase` and `eligibilities` datasets on the **'nct_id'** column.\n",
    "    - Use a left join to ensure all records from `usecase` are retained.\n",
    "\n",
    "3. **Save the merged data**:\n",
    "    - Save the merged data to a new CSV file named **'usecase_1_merged.csv'**.\n",
    "    - Do not include the index in the saved file.\n",
    "\n",
    "4. **Confirmation**:\n",
    "    - Print a message to confirm that the column **'criteria'** has been added and the file has been saved."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5fe97ff-8301-4b2b-9a2c-8564f9912054",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rename 'NCT Number' in usecase to 'nct_id' for merging\n",
    "usecase.rename(columns={'NCT Number': 'nct_id'}, inplace=True)\n",
    "\n",
    "# Merge the datasets on 'nct_id'\n",
    "merged_data = usecase.merge(eligibilities, on='nct_id', how='left')\n",
    "\n",
    "# Save the merged data to a new CSV\n",
    "merged_data.to_csv('usecase_1_merged.csv', index=False)\n",
    "\n",
    "print(\"The column 'criteria' has been added to usecase_1_.csv and saved as usecase_1_merged.csv.\")\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}