--- title: Agent MCP SQL emoji: 🧠 sdk: streamlit app_file: space_app.py python_version: 3.11 pinned: false --- # GraphRAG Agentic System ## Overview This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI. ## Key Features πŸ€– **LangChain Agent**: Orchestrates tools for schema discovery, pathfinding, and query execution. πŸ•ΈοΈ **GraphRAG Enabled**: Uses a Neo4j knowledge graph of database schemas for intelligent query planning. πŸ”¬ **Life Sciences Dataset**: Comes with a rich dataset across clinical trials, drug discovery, and lab results. conversational **Conversational UI**: A Streamlit-based chat interface for interacting with the agent. πŸ”Œ **RESTful MCP Server**: All core logic is exposed via a secure and scalable FastAPI server. ## Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Streamlit Chat │──────│ Agent β”‚ β”‚ MCP Server β”‚ β”‚ (UI) β”‚ β”‚ (LangChain) β”‚ β”‚ (FastAPI) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Neo4j β”‚ β”‚ clinical_ β”‚ β”‚ laboratory β”‚ β”‚ (Schema KG) β”‚ β”‚ trials.db β”‚ β”‚ .db β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ drug_ β”‚ β”‚ discovery.dbβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Components - **Streamlit**: Provides a conversational chat interface for users to ask questions. - **Agent**: A LangChain-powered orchestrator that uses custom tools to query the MCP server. - **MCP Server**: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution. - **Neo4j**: Stores a knowledge graph of the schemas of all connected SQLite databases. - **SQLite Databases**: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources. ## Quick Start ### Prerequisites - Docker & Docker Compose - LLM API key (e.g., for OpenAI) ### Setup 1. **Clone and configure**: ```bash git clone cd touch .env ``` 2. **Add your LLM API key** to the `.env` file. ``` LLM_API_KEY="sk-your-llm-api-key-here" ``` 3. **Start the system**: ```bash make up ``` 4. **Seed the databases and ingest schema**: ```bash make seed-db make ingest ``` 5. **Open the interface**: - Streamlit UI: http://localhost:8501 - Neo4j Browser: http://localhost:7474 (neo4j/password) ## Usage Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example: - "What are the names of the trials and their primary purpose for studies on 'Cancer'?" - "Find all drugs with 'Aspirin' in their name." - "Show me lab results for patient '123'." The agent will then: 1. Use the `SchemaSearchTool` to find relevant tables. 2. Use the `JoinPathFinderTool` to determine how to join them. 3. Construct a SQL query. 4. Execute the query using the `QueryExecutorTool`. 5. Return the final answer to the UI. ### Deploying a Hugging Face Space (Streamlit front-end only) This repo includes a self-contained Streamlit app for Hugging Face Spaces: `space_app.py`. It connects to your externally reachable Agent and MCP services. 1) Expose your services (public host or tunnel) - Agent FastAPI endpoint: `https:///query` - MCP FastAPI base: `https:///mcp` 2) In a new HF Space (Streamlit), add these files: - `space_app.py` (entrypoint) - `requirements.txt` with: ``` streamlit==1.28.0 requests==2.31.0 pandas==2.1.0 ``` 3) In Space Settings β†’ Variables and secrets, set: - `AGENT_URL` (e.g., `https://your-agent-host/query`) - `MCP_URL` (e.g., `https://your-mcp-host/mcp`) - `MCP_API_KEY` (the MCP auth key) - (Optional) `AGENT_HEALTH_URL`, `NEO4J_URL` 4) Configure the Space to run `space_app.py` as the Streamlit app file. Once the Space starts, it will display the same chat UI and stream responses from your hosted Agent. ## Development ### Running the Agent Manually To test the agent's logic directly without the full Docker stack, you can run it from your terminal. 1. **Set up the environment**: Make sure the MCP and Neo4j services are running (`make up`). Create a Python virtual environment and install dependencies: ```bash python -m venv venv source venv/bin/activate pip install -r agent/requirements.txt ``` 2. **Set your API key**: ```bash export LLM_API_KEY="sk-your-llm-api-key-here" ``` 3. **Run the agent**: ```bash python agent/main.py ``` The agent will run with the hardcoded example question and print the execution trace and final answer to your console. ### File Structure ``` β”œβ”€β”€ agent/ # The LangChain agent and its tools β”œβ”€β”€ streamlit/ # The Streamlit conversational UI β”œβ”€β”€ mcp/ # FastAPI server with core logic β”œβ”€β”€ neo4j/ # Neo4j configuration and data β”œβ”€β”€ data/ # SQLite databases β”œβ”€β”€ ops/ # Operational scripts (seeding, ingestion, etc.) β”œβ”€β”€ docker-compose.yml β”œβ”€β”€ Makefile └── README.md ``` testing, what do you see from the mcp and db?