
# Data Loading and Preparation

In this section, we will load and prepare the data from two sources: `eligibilities.txt` and `usecase_1_.csv`.

## Steps:

1. **Import the pandas library**:

2. **Load the `eligibilities.txt` data**:
 - Use the `read_csv` method from pandas to load the data.
 - Specify the separator as `|`.


3. **Select the necessary columns**:
 - We are interested in the `nct_id` and `criteria` columns.


4. **Load the `usecase_1_.csv` data**:
 - Use the `read_csv` method from pandas to load the data.


In [2]:
import pandas as pd

# Load the eligibilities.txt data
eligibilities = pd.read_csv('../eligibilities.txt', sep='|')

# Select the necessary columns
eligibilities = eligibilities[['nct_id', 'criteria']]

# Load the usecase_1_.csv data
usecase = pd.read_csv('../usecase_1_.csv')



The column 'criteria' has been added to usecase_1_.csv and saved as usecase_1_merged.csv.


# Data Merging and Saving

In this section, we will merge the datasets and save the merged data to a new CSV file.

## Steps:

1. **Rename the column in `usecase`**:
 - Rename the column **'NCT Number'** to **'nct_id'** for merging.

2. **Merge the datasets**:
 - Merge the `usecase` and `eligibilities` datasets on the **'nct_id'** column.
 - Use a left join to ensure all records from `usecase` are retained.

3. **Save the merged data**:
 - Save the merged data to a new CSV file named **'usecase_1_merged.csv'**.
 - Do not include the index in the saved file.

4. **Confirmation**:
 - Print a message to confirm that the column **'criteria'** has been added and the file has been saved.

In [None]:
# Rename 'NCT Number' in usecase to 'nct_id' for merging
usecase.rename(columns={'NCT Number': 'nct_id'}, inplace=True)

# Merge the datasets on 'nct_id'
merged_data = usecase.merge(eligibilities, on='nct_id', how='left')

# Save the merged data to a new CSV
merged_data.to_csv('usecase_1_merged.csv', index=False)

print("The column 'criteria' has been added to usecase_1_.csv and saved as usecase_1_merged.csv.")
