Spaces:

ohmygaugh
/

agent-mcp-sql

No application file

App Files Files Community

agent-mcp-sql / README.md

ohmygaugh

HF Space: add metadata + root requirements

5b72c99 about 1 month ago

preview code

raw

history blame contribute delete

6.85 kB

	---
	title: Agent MCP SQL
	emoji: 🧠
	sdk: streamlit
	app_file: space_app.py
	python_version: 3.11
	pinned: false
	---

	# GraphRAG Agentic System

	## Overview
	This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI.

	## Key Features

	🤖 LangChain Agent: Orchestrates tools for schema discovery, pathfinding, and query execution.
	🕸️ GraphRAG Enabled: Uses a Neo4j knowledge graph of database schemas for intelligent query planning.
	🔬 Life Sciences Dataset: Comes with a rich dataset across clinical trials, drug discovery, and lab results.
	conversational Conversational UI: A Streamlit-based chat interface for interacting with the agent.
	🔌 RESTful MCP Server: All core logic is exposed via a secure and scalable FastAPI server.

	## Architecture

	```
	┌─────────────────┐ ┌───────────────┐ ┌─────────────────┐
	│ Streamlit Chat │──────│ Agent │ │ MCP Server │
	│ (UI) │ │ (LangChain) │ │ (FastAPI) │
	└─────────────────┘ └───────────────┘ └─────────────────┘
	│
	┌───────────────────────┼───────────────────────┐
	│ │ │
	┌─────────────┐ ┌─────────────┐ ┌─────────────┐
	│ Neo4j │ │ clinical_ │ │ laboratory │
	│ (Schema KG) │ │ trials.db │ │ .db │
	└─────────────┘ └─────────────┘ └─────────────┘
	│
	┌─────────────┐
	│ drug_ │
	│ discovery.db│
	└─────────────┘

	```

	### Components

	- Streamlit: Provides a conversational chat interface for users to ask questions.
	- Agent: A LangChain-powered orchestrator that uses custom tools to query the MCP server.
	- MCP Server: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution.
	- Neo4j: Stores a knowledge graph of the schemas of all connected SQLite databases.
	- SQLite Databases: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources.

	## Quick Start

	### Prerequisites
	- Docker & Docker Compose
	- LLM API key (e.g., for OpenAI)

	### Setup
	1. Clone and configure:
	```bash
	git clone <repository-url>
	cd <repository-name>
	touch .env
	```

	2. Add your LLM API key to the `.env` file.
	```
	LLM_API_KEY="sk-your-llm-api-key-here"
	```

	3. Start the system:
	```bash
	make up
	```

	4. Seed the databases and ingest schema:
	```bash
	make seed-db
	make ingest
	```

	5. Open the interface:
	- Streamlit UI: http://localhost:8501
	- Neo4j Browser: http://localhost:7474 (neo4j/password)

	## Usage
	Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example:
	- "What are the names of the trials and their primary purpose for studies on 'Cancer'?"
	- "Find all drugs with 'Aspirin' in their name."
	- "Show me lab results for patient '123'."

	The agent will then:
	1. Use the `SchemaSearchTool` to find relevant tables.
	2. Use the `JoinPathFinderTool` to determine how to join them.
	3. Construct a SQL query.
	4. Execute the query using the `QueryExecutorTool`.
	5. Return the final answer to the UI.

	### Deploying a Hugging Face Space (Streamlit front-end only)

	This repo includes a self-contained Streamlit app for Hugging Face Spaces: `space_app.py`.
	It connects to your externally reachable Agent and MCP services.

	1) Expose your services (public host or tunnel)
	- Agent FastAPI endpoint: `https://<your-host>/query`
	- MCP FastAPI base: `https://<your-host>/mcp`

	2) In a new HF Space (Streamlit), add these files:
	- `space_app.py` (entrypoint)
	- `requirements.txt` with:
	```
	streamlit==1.28.0
	requests==2.31.0
	pandas==2.1.0
	```

	3) In Space Settings → Variables and secrets, set:
	- `AGENT_URL` (e.g., `https://your-agent-host/query`)
	- `MCP_URL` (e.g., `https://your-mcp-host/mcp`)
	- `MCP_API_KEY` (the MCP auth key)
	- (Optional) `AGENT_HEALTH_URL`, `NEO4J_URL`

	4) Configure the Space to run `space_app.py` as the Streamlit app file.

	Once the Space starts, it will display the same chat UI and stream responses from your hosted Agent.

	## Development

	### Running the Agent Manually
	To test the agent's logic directly without the full Docker stack, you can run it from your terminal.

	1. Set up the environment:
	Make sure the MCP and Neo4j services are running (`make up`).
	Create a Python virtual environment and install dependencies:
	```bash
	python -m venv venv
	source venv/bin/activate
	pip install -r agent/requirements.txt
	```

	2. Set your API key:
	```bash
	export LLM_API_KEY="sk-your-llm-api-key-here"
	```

	3. Run the agent:
	```bash
	python agent/main.py
	```
	The agent will run with the hardcoded example question and print the execution trace and final answer to your console.

	### File Structure
	```
	├── agent/ # The LangChain agent and its tools
	├── streamlit/ # The Streamlit conversational UI
	├── mcp/ # FastAPI server with core logic
	├── neo4j/ # Neo4j configuration and data
	├── data/ # SQLite databases
	├── ops/ # Operational scripts (seeding, ingestion, etc.)
	├── docker-compose.yml
	├── Makefile
	└── README.md
	```


	testing, what do you see from the mcp and db?