# Implementation Plan: Intelligent Graph-Based SQL Federation Middleware (Revised)

This document outlines the revised strategy to implement the target features by integrating valuable assets from the `semantic-query-router` codebase into our existing architecture.

### Overall Strategy
The core task is to evolve the current single-step agent into a multi-step, GraphRAG-powered orchestrator using LangChain. We will enhance the MCP server with advanced core logic, replace PostgreSQL with a rich life sciences SQLite dataset, and transform the Streamlit monitor into a fully conversational chat UI. The `frontend/` Next.js application will be deprecated.

---

### Phase 1: Integrate New Dataset & Core Logic (Due by Friday, Oct 3rd)

**Goal**: Replace the existing data foundation with the life sciences dataset and upgrade the MCP server with advanced, reusable logic from the `semantic-query-router` project.

-   **Task 1.1: Adopt Life Sciences Dataset**
    -   Integrate the `generate_sample_databases.py` script into our `ops/scripts/` directory.
    -   Create a new `make seed-db` command in the `Makefile` to generate the `clinical_trials.db`, `laboratory.db`, and `drug_discovery.db` SQLite files.
    -   Update `docker-compose.yml` to remove the PostgreSQL service and mount the new `data/` directory for the SQLite databases.

-   **Task 1.2: Enhance MCP Server with Core Logic**
    -   Create a new `mcp/core/` directory.
    -   Migrate the advanced logic from `semantic-query-router/src/core/` (`discovery.py`, `graph.py`, `intelligence.py`) into our `mcp/core/` directory.
    -   Refactor these modules to fit our project structure and standards.

-   **Task 1.3: Create a Dedicated Ingestion Process**
    -   Create a new script, `ops/scripts/ingest.py`, that uses the new core logic to perform a one-time ingestion of the SQLite database schemas into Neo4j.
    -   Create a `make ingest` command in the `Makefile` to run this script. This separates the schema ingestion process from the agent's runtime duties, making the system more modular.
    -   Remove the schema discovery logic from `agent/main.py`.

---

### Phase 2: Rebuild Agent with LangChain (Due by Tuesday, Oct 7th)

**Goal**: Re-architect the agent from a simple script into a robust LangChain-powered orchestrator that leverages the enhanced MCP server.

-   **Task 2.1: Refactor Agent to use LangChain**
    -   Overhaul `agent/main.py` to implement the `AgentExecutor` pattern from `langchain_integration.py`.
    -   Define a formal agent prompt that instructs the LLM on how to use the available tools to answer questions.

-   **Task 2.2: Implement Custom LangChain Tools**
    -   Create a new `agent/tools.py` file.
    -   Implement custom LangChain tools that make authenticated REST API calls to our enhanced MCP server.
    -   The tools will include: `SchemaSearchTool`, `JoinPathFinderTool`, and `QueryExecutorTool`. These tools will act as clients to the powerful logic we integrated into the MCP in Phase 1.

-   **Task 2.3: Update Agent's Main Loop**
    -   Modify the agent's main loop to delegate tasks to the LangChain `AgentExecutor` instead of handling instructions directly. The agent's primary role will now be to orchestrate the LangChain agent and log the results.

---

### Phase 3: Build the Chat UI & Finalize (Due by Thursday, Oct 9th)

**Goal**: Replace the basic Streamlit monitor with a full-featured conversational chat interface and complete the final integration for the demo.

-   **Task 3.1: Implement Conversational Chat UI**
    -   Replace the entire contents of `streamlit/app.py` with the conversational UI logic from `semantic-query-router/src/chat_app.py`.
    -   Adapt the UI to work with our project's MCP REST API (instead of WebSocket) for submitting questions and fetching results.

-   **Task 3.2: Integrate Demo-Specific Features**
    -   Ensure the new Streamlit UI includes the required demo features:
        -   Display of execution phases (e.g., "Searching Schema," "Finding Join Path," "Executing Query").
        -   A final results view that shows both the natural language summary from the agent and a clean data table (Pandas DataFrame) of the raw results.
        -   A "Download CSV" button for the results table.
        -   A sidebar that displays the connection status of the Neo4j and SQLite databases.

-   **Task 3.3: Final Integration and Testing**
    -   Perform end-to-end testing of the full workflow: from asking a question in the Streamlit app to the agent's orchestration and the final result display.
    -   Clean up any unused files and finalize the `README.md` with updated instructions.