Spaces:
No application file
No application file
| title: Agent MCP SQL | |
| emoji: π§ | |
| sdk: streamlit | |
| app_file: space_app.py | |
| python_version: 3.11 | |
| pinned: false | |
| # GraphRAG Agentic System | |
| ## Overview | |
| This project implements an intelligent, multi-step GraphRAG-powered agent that uses LangChain to orchestrate complex queries against a federated life sciences dataset. The agent leverages a Neo4j graph database to understand the relationships between disparate SQLite databases, constructs SQL queries, and returns unified results through a conversational UI. | |
| ## Key Features | |
| π€ **LangChain Agent**: Orchestrates tools for schema discovery, pathfinding, and query execution. | |
| πΈοΈ **GraphRAG Enabled**: Uses a Neo4j knowledge graph of database schemas for intelligent query planning. | |
| π¬ **Life Sciences Dataset**: Comes with a rich dataset across clinical trials, drug discovery, and lab results. | |
| conversational **Conversational UI**: A Streamlit-based chat interface for interacting with the agent. | |
| π **RESTful MCP Server**: All core logic is exposed via a secure and scalable FastAPI server. | |
| ## Architecture | |
| ``` | |
| βββββββββββββββββββ βββββββββββββββββ βββββββββββββββββββ | |
| β Streamlit Chat ββββββββ Agent β β MCP Server β | |
| β (UI) β β (LangChain) β β (FastAPI) β | |
| βββββββββββββββββββ βββββββββββββββββ βββββββββββββββββββ | |
| β | |
| βββββββββββββββββββββββββΌββββββββββββββββββββββββ | |
| β β β | |
| βββββββββββββββ βββββββββββββββ βββββββββββββββ | |
| β Neo4j β β clinical_ β β laboratory β | |
| β (Schema KG) β β trials.db β β .db β | |
| βββββββββββββββ βββββββββββββββ βββββββββββββββ | |
| β | |
| βββββββββββββββ | |
| β drug_ β | |
| β discovery.dbβ | |
| βββββββββββββββ | |
| ``` | |
| ### Components | |
| - **Streamlit**: Provides a conversational chat interface for users to ask questions. | |
| - **Agent**: A LangChain-powered orchestrator that uses custom tools to query the MCP server. | |
| - **MCP Server**: A FastAPI application that exposes core logic for schema discovery, graph pathfinding, and federated query execution. | |
| - **Neo4j**: Stores a knowledge graph of the schemas of all connected SQLite databases. | |
| - **SQLite Databases**: A set of life sciences databases (`clinical_trials.db`, `drug_discovery.db`, `laboratory.db`) that serve as the federated data sources. | |
| ## Quick Start | |
| ### Prerequisites | |
| - Docker & Docker Compose | |
| - LLM API key (e.g., for OpenAI) | |
| ### Setup | |
| 1. **Clone and configure**: | |
| ```bash | |
| git clone <repository-url> | |
| cd <repository-name> | |
| touch .env | |
| ``` | |
| 2. **Add your LLM API key** to the `.env` file. | |
| ``` | |
| LLM_API_KEY="sk-your-llm-api-key-here" | |
| ``` | |
| 3. **Start the system**: | |
| ```bash | |
| make up | |
| ``` | |
| 4. **Seed the databases and ingest schema**: | |
| ```bash | |
| make seed-db | |
| make ingest | |
| ``` | |
| 5. **Open the interface**: | |
| - Streamlit UI: http://localhost:8501 | |
| - Neo4j Browser: http://localhost:7474 (neo4j/password) | |
| ## Usage | |
| Once the system is running, open the Streamlit UI and ask a question about the life sciences data, for example: | |
| - "What are the names of the trials and their primary purpose for studies on 'Cancer'?" | |
| - "Find all drugs with 'Aspirin' in their name." | |
| - "Show me lab results for patient '123'." | |
| The agent will then: | |
| 1. Use the `SchemaSearchTool` to find relevant tables. | |
| 2. Use the `JoinPathFinderTool` to determine how to join them. | |
| 3. Construct a SQL query. | |
| 4. Execute the query using the `QueryExecutorTool`. | |
| 5. Return the final answer to the UI. | |
| ### Deploying a Hugging Face Space (Streamlit front-end only) | |
| This repo includes a self-contained Streamlit app for Hugging Face Spaces: `space_app.py`. | |
| It connects to your externally reachable Agent and MCP services. | |
| 1) Expose your services (public host or tunnel) | |
| - Agent FastAPI endpoint: `https://<your-host>/query` | |
| - MCP FastAPI base: `https://<your-host>/mcp` | |
| 2) In a new HF Space (Streamlit), add these files: | |
| - `space_app.py` (entrypoint) | |
| - `requirements.txt` with: | |
| ``` | |
| streamlit==1.28.0 | |
| requests==2.31.0 | |
| pandas==2.1.0 | |
| ``` | |
| 3) In Space Settings β Variables and secrets, set: | |
| - `AGENT_URL` (e.g., `https://your-agent-host/query`) | |
| - `MCP_URL` (e.g., `https://your-mcp-host/mcp`) | |
| - `MCP_API_KEY` (the MCP auth key) | |
| - (Optional) `AGENT_HEALTH_URL`, `NEO4J_URL` | |
| 4) Configure the Space to run `space_app.py` as the Streamlit app file. | |
| Once the Space starts, it will display the same chat UI and stream responses from your hosted Agent. | |
| ## Development | |
| ### Running the Agent Manually | |
| To test the agent's logic directly without the full Docker stack, you can run it from your terminal. | |
| 1. **Set up the environment**: | |
| Make sure the MCP and Neo4j services are running (`make up`). | |
| Create a Python virtual environment and install dependencies: | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate | |
| pip install -r agent/requirements.txt | |
| ``` | |
| 2. **Set your API key**: | |
| ```bash | |
| export LLM_API_KEY="sk-your-llm-api-key-here" | |
| ``` | |
| 3. **Run the agent**: | |
| ```bash | |
| python agent/main.py | |
| ``` | |
| The agent will run with the hardcoded example question and print the execution trace and final answer to your console. | |
| ### File Structure | |
| ``` | |
| βββ agent/ # The LangChain agent and its tools | |
| βββ streamlit/ # The Streamlit conversational UI | |
| βββ mcp/ # FastAPI server with core logic | |
| βββ neo4j/ # Neo4j configuration and data | |
| βββ data/ # SQLite databases | |
| βββ ops/ # Operational scripts (seeding, ingestion, etc.) | |
| βββ docker-compose.yml | |
| βββ Makefile | |
| βββ README.md | |
| ``` | |
| testing, what do you see from the mcp and db? | |