agent-mcp-sql / README.md
ohmygaugh's picture
HF Space: add metadata + root requirements
5b72c99
---
title: Agent MCP SQL
emoji: 🧠
sdk: streamlit
app_file: space_app.py
python_version: 3.11
pinned: false
---
# GraphRAG Agentic System
## Overview
This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI.
## Key Features
πŸ€– **LangChain Agent**: Orchestrates tools for schema discovery, pathfinding, and query execution.
πŸ•ΈοΈ **GraphRAG Enabled**: Uses a Neo4j knowledge graph of database schemas for intelligent query planning.
πŸ”¬ **Life Sciences Dataset**: Comes with a rich dataset across clinical trials, drug discovery, and lab results.
conversational **Conversational UI**: A Streamlit-based chat interface for interacting with the agent.
πŸ”Œ **RESTful MCP Server**: All core logic is exposed via a secure and scalable FastAPI server.
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Streamlit Chat │──────│ Agent β”‚ β”‚ MCP Server β”‚
β”‚ (UI) β”‚ β”‚ (LangChain) β”‚ β”‚ (FastAPI) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Neo4j β”‚ β”‚ clinical_ β”‚ β”‚ laboratory β”‚
β”‚ (Schema KG) β”‚ β”‚ trials.db β”‚ β”‚ .db β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ drug_ β”‚
β”‚ discovery.dbβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### Components
- **Streamlit**: Provides a conversational chat interface for users to ask questions.
- **Agent**: A LangChain-powered orchestrator that uses custom tools to query the MCP server.
- **MCP Server**: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution.
- **Neo4j**: Stores a knowledge graph of the schemas of all connected SQLite databases.
- **SQLite Databases**: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources.
## Quick Start
### Prerequisites
- Docker & Docker Compose
- LLM API key (e.g., for OpenAI)
### Setup
1. **Clone and configure**:
```bash
git clone <repository-url>
cd <repository-name>
touch .env
```
2. **Add your LLM API key** to the `.env` file.
```
LLM_API_KEY="sk-your-llm-api-key-here"
```
3. **Start the system**:
```bash
make up
```
4. **Seed the databases and ingest schema**:
```bash
make seed-db
make ingest
```
5. **Open the interface**:
- Streamlit UI: http://localhost:8501
- Neo4j Browser: http://localhost:7474 (neo4j/password)
## Usage
Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example:
- "What are the names of the trials and their primary purpose for studies on 'Cancer'?"
- "Find all drugs with 'Aspirin' in their name."
- "Show me lab results for patient '123'."
The agent will then:
1. Use the `SchemaSearchTool` to find relevant tables.
2. Use the `JoinPathFinderTool` to determine how to join them.
3. Construct a SQL query.
4. Execute the query using the `QueryExecutorTool`.
5. Return the final answer to the UI.
### Deploying a Hugging Face Space (Streamlit front-end only)
This repo includes a self-contained Streamlit app for Hugging Face Spaces: `space_app.py`.
It connects to your externally reachable Agent and MCP services.
1) Expose your services (public host or tunnel)
- Agent FastAPI endpoint: `https://<your-host>/query`
- MCP FastAPI base: `https://<your-host>/mcp`
2) In a new HF Space (Streamlit), add these files:
- `space_app.py` (entrypoint)
- `requirements.txt` with:
```
streamlit==1.28.0
requests==2.31.0
pandas==2.1.0
```
3) In Space Settings β†’ Variables and secrets, set:
- `AGENT_URL` (e.g., `https://your-agent-host/query`)
- `MCP_URL` (e.g., `https://your-mcp-host/mcp`)
- `MCP_API_KEY` (the MCP auth key)
- (Optional) `AGENT_HEALTH_URL`, `NEO4J_URL`
4) Configure the Space to run `space_app.py` as the Streamlit app file.
Once the Space starts, it will display the same chat UI and stream responses from your hosted Agent.
## Development
### Running the Agent Manually
To test the agent's logic directly without the full Docker stack, you can run it from your terminal.
1. **Set up the environment**:
Make sure the MCP and Neo4j services are running (`make up`).
Create a Python virtual environment and install dependencies:
```bash
python -m venv venv
source venv/bin/activate
pip install -r agent/requirements.txt
```
2. **Set your API key**:
```bash
export LLM_API_KEY="sk-your-llm-api-key-here"
```
3. **Run the agent**:
```bash
python agent/main.py
```
The agent will run with the hardcoded example question and print the execution trace and final answer to your console.
### File Structure
```
β”œβ”€β”€ agent/ # The LangChain agent and its tools
β”œβ”€β”€ streamlit/ # The Streamlit conversational UI
β”œβ”€β”€ mcp/ # FastAPI server with core logic
β”œβ”€β”€ neo4j/ # Neo4j configuration and data
β”œβ”€β”€ data/ # SQLite databases
β”œβ”€β”€ ops/ # Operational scripts (seeding, ingestion, etc.)
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Makefile
└── README.md
```
testing, what do you see from the mcp and db?