# Implementation Plan: Intelligent Graph-Based SQL Federation Middleware (Revised) This document outlines the revised strategy to implement the target features by integrating valuable assets from the `semantic-query-router` codebase into our existing architecture. ### Overall Strategy The core task is to evolve the current single-step agent into a multi-step, GraphRAG-powered orchestrator using LangChain. We will enhance the MCP server with advanced core logic, replace PostgreSQL with a rich life sciences SQLite dataset, and transform the Streamlit monitor into a fully conversational chat UI. The `frontend/` Next.js application will be deprecated. --- ### Phase 1: Integrate New Dataset & Core Logic (Due by Friday, Oct 3rd) **Goal**: Replace the existing data foundation with the life sciences dataset and upgrade the MCP server with advanced, reusable logic from the `semantic-query-router` project. - **Task 1.1: Adopt Life Sciences Dataset** - Integrate the `generate_sample_databases.py` script into our `ops/scripts/` directory. - Create a new `make seed-db` command in the `Makefile` to generate the `clinical_trials.db`, `laboratory.db`, and `drug_discovery.db` SQLite files. - Update `docker-compose.yml` to remove the PostgreSQL service and mount the new `data/` directory for the SQLite databases. - **Task 1.2: Enhance MCP Server with Core Logic** - Create a new `mcp/core/` directory. - Migrate the advanced logic from `semantic-query-router/src/core/` (`discovery.py`, `graph.py`, `intelligence.py`) into our `mcp/core/` directory. - Refactor these modules to fit our project structure and standards. - **Task 1.3: Create a Dedicated Ingestion Process** - Create a new script, `ops/scripts/ingest.py`, that uses the new core logic to perform a one-time ingestion of the SQLite database schemas into Neo4j. - Create a `make ingest` command in the `Makefile` to run this script. This separates the schema ingestion process from the agent's runtime duties, making the system more modular. - Remove the schema discovery logic from `agent/main.py`. --- ### Phase 2: Rebuild Agent with LangChain (Due by Tuesday, Oct 7th) **Goal**: Re-architect the agent from a simple script into a robust LangChain-powered orchestrator that leverages the enhanced MCP server. - **Task 2.1: Refactor Agent to use LangChain** - Overhaul `agent/main.py` to implement the `AgentExecutor` pattern from `langchain_integration.py`. - Define a formal agent prompt that instructs the LLM on how to use the available tools to answer questions. - **Task 2.2: Implement Custom LangChain Tools** - Create a new `agent/tools.py` file. - Implement custom LangChain tools that make authenticated REST API calls to our enhanced MCP server. - The tools will include: `SchemaSearchTool`, `JoinPathFinderTool`, and `QueryExecutorTool`. These tools will act as clients to the powerful logic we integrated into the MCP in Phase 1. - **Task 2.3: Update Agent's Main Loop** - Modify the agent's main loop to delegate tasks to the LangChain `AgentExecutor` instead of handling instructions directly. The agent's primary role will now be to orchestrate the LangChain agent and log the results. --- ### Phase 3: Build the Chat UI & Finalize (Due by Thursday, Oct 9th) **Goal**: Replace the basic Streamlit monitor with a full-featured conversational chat interface and complete the final integration for the demo. - **Task 3.1: Implement Conversational Chat UI** - Replace the entire contents of `streamlit/app.py` with the conversational UI logic from `semantic-query-router/src/chat_app.py`. - Adapt the UI to work with our project's MCP REST API (instead of WebSocket) for submitting questions and fetching results. - **Task 3.2: Integrate Demo-Specific Features** - Ensure the new Streamlit UI includes the required demo features: - Display of execution phases (e.g., "Searching Schema," "Finding Join Path," "Executing Query"). - A final results view that shows both the natural language summary from the agent and a clean data table (Pandas DataFrame) of the raw results. - A "Download CSV" button for the results table. - A sidebar that displays the connection status of the Neo4j and SQLite databases. - **Task 3.3: Final Integration and Testing** - Perform end-to-end testing of the full workflow: from asking a question in the Streamlit app to the agent's orchestration and the final result display. - Clean up any unused files and finalize the `README.md` with updated instructions.