---
title: Agent MCP SQL
emoji: 🧠
sdk: streamlit
app_file: space_app.py
python_version: 3.11
pinned: false
---

# GraphRAG Agentic System

## Overview
This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI.

## Key Features

🤖 **LangChain Agent**: Orchestrates tools for schema discovery, pathfinding, and query execution.  
🕸️ **GraphRAG Enabled**: Uses a Neo4j knowledge graph of database schemas for intelligent query planning.  
🔬 **Life Sciences Dataset**: Comes with a rich dataset across clinical trials, drug discovery, and lab results.  
 conversational **Conversational UI**: A Streamlit-based chat interface for interacting with the agent.  
🔌 **RESTful MCP Server**: All core logic is exposed via a secure and scalable FastAPI server.

## Architecture

```
┌─────────────────┐      ┌───────────────┐      ┌─────────────────┐
│ Streamlit Chat  │──────│  Agent        │      │   MCP Server    │
│      (UI)       │      │ (LangChain)   │      │    (FastAPI)    │
└─────────────────┘      └───────────────┘      └─────────────────┘
                                                       │
                               ┌───────────────────────┼───────────────────────┐
                               │                       │                       │
                         ┌─────────────┐         ┌─────────────┐         ┌─────────────┐
                         │   Neo4j     │         │ clinical_   │         │ laboratory  │
                         │ (Schema KG) │         │ trials.db   │         │ .db         │
                         └─────────────┘         └─────────────┘         └─────────────┘
                                                       │
                                                 ┌─────────────┐
                                                 │ drug_       │
                                                 │ discovery.db│
                                                 └─────────────┘

```

### Components

- **Streamlit**: Provides a conversational chat interface for users to ask questions.
- **Agent**: A LangChain-powered orchestrator that uses custom tools to query the MCP server.
- **MCP Server**: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution.
- **Neo4j**: Stores a knowledge graph of the schemas of all connected SQLite databases.
- **SQLite Databases**: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources.

## Quick Start

### Prerequisites
- Docker & Docker Compose
- LLM API key (e.g., for OpenAI)

### Setup
1. **Clone and configure**:
   ```bash
   git clone <repository-url>
   cd <repository-name>
   touch .env
   ```

2. **Add your LLM API key** to the `.env` file.
   ```
   LLM_API_KEY="sk-your-llm-api-key-here"
   ```

3. **Start the system**:
   ```bash
   make up
   ```

4. **Seed the databases and ingest schema**:
   ```bash
   make seed-db
   make ingest
   ```

5. **Open the interface**:
   - Streamlit UI: http://localhost:8501
   - Neo4j Browser: http://localhost:7474 (neo4j/password)

## Usage
Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example:
- "What are the names of the trials and their primary purpose for studies on 'Cancer'?"
- "Find all drugs with 'Aspirin' in their name."
- "Show me lab results for patient '123'."

The agent will then:
1. Use the `SchemaSearchTool` to find relevant tables.
2. Use the `JoinPathFinderTool` to determine how to join them.
3. Construct a SQL query.
4. Execute the query using the `QueryExecutorTool`.
5. Return the final answer to the UI.

### Deploying a Hugging Face Space (Streamlit front-end only)

This repo includes a self-contained Streamlit app for Hugging Face Spaces: `space_app.py`.
It connects to your externally reachable Agent and MCP services.

1) Expose your services (public host or tunnel)
   - Agent FastAPI endpoint: `https://<your-host>/query`
   - MCP FastAPI base: `https://<your-host>/mcp`

2) In a new HF Space (Streamlit), add these files:
   - `space_app.py` (entrypoint)
   - `requirements.txt` with:
     ```
     streamlit==1.28.0
     requests==2.31.0
     pandas==2.1.0
     ```

3) In Space Settings → Variables and secrets, set:
   - `AGENT_URL` (e.g., `https://your-agent-host/query`)
   - `MCP_URL` (e.g., `https://your-mcp-host/mcp`)
   - `MCP_API_KEY` (the MCP auth key)
   - (Optional) `AGENT_HEALTH_URL`, `NEO4J_URL`

4) Configure the Space to run `space_app.py` as the Streamlit app file.

Once the Space starts, it will display the same chat UI and stream responses from your hosted Agent.

## Development

### Running the Agent Manually
To test the agent's logic directly without the full Docker stack, you can run it from your terminal.

1.  **Set up the environment**:
    Make sure the MCP and Neo4j services are running (`make up`).
    Create a Python virtual environment and install dependencies:
    ```bash
    python -m venv venv
    source venv/bin/activate
    pip install -r agent/requirements.txt
    ```

2.  **Set your API key**:
    ```bash
    export LLM_API_KEY="sk-your-llm-api-key-here"
    ```

3.  **Run the agent**:
    ```bash
    python agent/main.py
    ```
    The agent will run with the hardcoded example question and print the execution trace and final answer to your console.

### File Structure
```
├── agent/          # The LangChain agent and its tools
├── streamlit/      # The Streamlit conversational UI
├── mcp/            # FastAPI server with core logic
├── neo4j/          # Neo4j configuration and data
├── data/           # SQLite databases
├── ops/            # Operational scripts (seeding, ingestion, etc.)
├── docker-compose.yml
├── Makefile
└── README.md
```


testing, what do you see from the mcp and db?