Spaces:

MCP-1st-Birthday
/

Vault.MCP

Running

App Files Files Community

bigwolfe commited on 18 days ago

Commit

05c9551

1 Parent(s): 2e98ac7

init

Browse files

Files changed (9) hide show

CLAUDE.md +7 -0
specs/004-gemini-vault-chat/checklists/requirements.md +37 -0
specs/004-gemini-vault-chat/contracts/rag-api.yaml +219 -0
specs/004-gemini-vault-chat/data-model.md +122 -0
specs/004-gemini-vault-chat/plan.md +130 -0
specs/004-gemini-vault-chat/quickstart.md +144 -0
specs/004-gemini-vault-chat/research.md +152 -0
specs/004-gemini-vault-chat/spec.md +122 -0
specs/004-gemini-vault-chat/tasks.md +228 -0

CLAUDE.md CHANGED Viewed

@@ -271,3 +271,10 @@ Current active feature: `001-obsidian-docs-viewer`
 ```
 Obtain JWT: `POST /api/tokens` after HF OAuth login.

 ```
 Obtain JWT: `POST /api/tokens` after HF OAuth login.
+## Active Technologies
+- Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI (004-gemini-vault-chat)
+- Filesystem vault (existing), LlamaIndex persisted vector store (new, under `data/llamaindex/`) (004-gemini-vault-chat)
+## Recent Changes
+- 004-gemini-vault-chat: Added Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI

specs/004-gemini-vault-chat/checklists/requirements.md ADDED Viewed

	@@ -0,0 +1,37 @@

+# Specification Quality Checklist: Gemini Vault Chat Agent
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2025-11-28
+**Feature**: [spec.md](../spec.md)
+## Content Quality
+- [x] No implementation details (languages, frameworks, APIs)
+- [x] Focused on user value and business needs
+- [x] Written for non-technical stakeholders
+- [x] All mandatory sections completed
+## Requirement Completeness
+- [x] No [NEEDS CLARIFICATION] markers remain
+- [x] Requirements are testable and unambiguous
+- [x] Success criteria are measurable
+- [x] Success criteria are technology-agnostic (no implementation details)
+- [x] All acceptance scenarios are defined
+- [x] Edge cases are identified
+- [x] Scope is clearly bounded
+- [x] Dependencies and assumptions identified
+## Feature Readiness
+- [x] All functional requirements have clear acceptance criteria
+- [x] User scenarios cover primary flows
+- [x] Feature meets measurable outcomes defined in Success Criteria
+- [x] No implementation details leak into specification
+## Notes
+- All validation items passed on first review
+- Spec is ready for `/speckit.plan` phase
+- The feature input was detailed from existing planning documents, enabling a complete spec without clarifications

specs/004-gemini-vault-chat/contracts/rag-api.yaml ADDED Viewed

	@@ -0,0 +1,219 @@

+openapi: 3.0.3
+info:
+  title: Document-MCP RAG Chat API
+  description: Gemini-powered RAG chat endpoints for vault content querying
+  version: 1.0.0
+paths:
+  /api/rag/chat:
+    post:
+      summary: Send a message and get AI-generated response
+      description: |
+        Accepts conversation history and returns an AI-synthesized answer
+        based on vault content, along with source references.
+      operationId: ragChat
+      tags:
+        - RAG
+      requestBody:
+        required: true
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/ChatRequest'
+            example:
+              messages:
+                - role: user
+                  content: "How does authentication work in this project?"
+      responses:
+        '200':
+          description: Successful response with answer and sources
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/ChatResponse'
+              example:
+                answer: "Authentication in this project uses JWT tokens..."
+                sources:
+                  - path: "API Documentation.md"
+                    title: "API Documentation"
+                    snippet: "The system uses JWT-based authentication..."
+                    score: 0.92
+                notes_written: []
+        '400':
+          description: Invalid request (empty messages, invalid format)
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/ErrorResponse'
+        '503':
+          description: AI service unavailable
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/ErrorResponse'
+              example:
+                error: "AI service temporarily unavailable"
+                code: "SERVICE_UNAVAILABLE"
+  /api/rag/status:
+    get:
+      summary: Check RAG service status
+      description: Returns whether the index is ready and service is operational
+      operationId: ragStatus
+      tags:
+        - RAG
+      responses:
+        '200':
+          description: Service status
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/StatusResponse'
+components:
+  schemas:
+    ChatMessage:
+      type: object
+      required:
+        - role
+        - content
+      properties:
+        role:
+          type: string
+          enum: [user, assistant]
+          description: Message author
+        content:
+          type: string
+          maxLength: 10000
+          description: Message text
+        timestamp:
+          type: string
+          format: date-time
+          description: When message was created
+        sources:
+          type: array
+          items:
+            $ref: '#/components/schemas/SourceReference'
+          description: Referenced notes (assistant messages only)
+        notes_written:
+          type: array
+          items:
+            $ref: '#/components/schemas/NoteWritten'
+          description: Notes created by agent (Phase 2)
+    SourceReference:
+      type: object
+      required:
+        - path
+        - title
+        - snippet
+      properties:
+        path:
+          type: string
+          description: Relative path in vault
+          example: "guides/authentication.md"
+        title:
+          type: string
+          description: Note title
+          example: "Authentication Guide"
+        snippet:
+          type: string
+          maxLength: 500
+          description: Relevant text excerpt
+        score:
+          type: number
+          format: float
+          minimum: 0
+          maximum: 1
+          description: Relevance score
+    NoteWritten:
+      type: object
+      required:
+        - path
+        - title
+        - action
+      properties:
+        path:
+          type: string
+          description: Path to created/updated note
+          pattern: "^agent-notes/.+\\.md$"
+        title:
+          type: string
+          description: Note title
+        action:
+          type: string
+          enum: [created, updated]
+          description: What the agent did
+    ChatRequest:
+      type: object
+      required:
+        - messages
+      properties:
+        messages:
+          type: array
+          minItems: 1
+          items:
+            $ref: '#/components/schemas/ChatMessage'
+          description: Conversation history, last message must be from user
+    ChatResponse:
+      type: object
+      required:
+        - answer
+        - sources
+        - notes_written
+      properties:
+        answer:
+          type: string
+          description: AI-generated response
+        sources:
+          type: array
+          items:
+            $ref: '#/components/schemas/SourceReference'
+          description: Notes used in response
+        notes_written:
+          type: array
+          items:
+            $ref: '#/components/schemas/NoteWritten'
+          description: Notes created (Phase 2, may be empty)
+    StatusResponse:
+      type: object
+      required:
+        - status
+        - index_ready
+      properties:
+        status:
+          type: string
+          enum: [ready, initializing, error]
+        index_ready:
+          type: boolean
+          description: Whether the vector index is loaded
+        documents_indexed:
+          type: integer
+          description: Number of documents in index
+        error_message:
+          type: string
+          description: Error details if status is error
+    ErrorResponse:
+      type: object
+      required:
+        - error
+        - code
+      properties:
+        error:
+          type: string
+          description: Human-readable error message
+        code:
+          type: string
+          description: Machine-readable error code
+          enum:
+            - INVALID_REQUEST
+            - EMPTY_MESSAGES
+            - SERVICE_UNAVAILABLE
+            - INDEX_NOT_READY
+            - RATE_LIMITED

specs/004-gemini-vault-chat/data-model.md ADDED Viewed

	@@ -0,0 +1,122 @@

+# Data Model: Gemini Vault Chat Agent
+**Feature**: 004-gemini-vault-chat
+**Date**: 2025-11-28
+## Entities
+### ChatMessage
+Represents a single message in a conversation.
+| Field | Type | Description | Constraints |
+|-------|------|-------------|-------------|
+| role | enum | Message author | `user` or `assistant` |
+| content | string | Message text | Max 10,000 characters |
+| timestamp | datetime | When message was created | ISO 8601 format |
+| sources | SourceReference[] | Referenced notes (assistant only) | Optional, empty for user messages |
+| notes_written | NoteWritten[] | Notes created by agent | Optional, Phase 2 only |
+### SourceReference
+Metadata about a note used to generate a response.
+| Field | Type | Description | Constraints |
+|-------|------|-------------|-------------|
+| path | string | Relative path in vault | Valid vault path, ends in `.md` |
+| title | string | Note title | Derived from frontmatter/H1/filename |
+| snippet | string | Relevant text excerpt | Max 500 characters |
+| score | float | Relevance score | 0.0 to 1.0, optional |
+### NoteWritten (Phase 2)
+Metadata about a note created or updated by the agent.
+| Field | Type | Description | Constraints |
+|-------|------|-------------|-------------|
+| path | string | Path to created/updated note | Must be in `agent-notes/` folder |
+| title | string | Note title | Required |
+| action | enum | What the agent did | `created` or `updated` |
+### ChatRequest
+Request payload for the RAG chat endpoint.
+| Field | Type | Description | Constraints |
+|-------|------|-------------|-------------|
+| messages | ChatMessage[] | Conversation history | At least 1 message, last must be `user` |
+### ChatResponse
+Response payload from the RAG chat endpoint.
+| Field | Type | Description | Constraints |
+|-------|------|-------------|-------------|
+| answer | string | AI-generated response | Required |
+| sources | SourceReference[] | Notes used in response | May be empty |
+| notes_written | NoteWritten[] | Notes created (Phase 2) | May be empty |
+## State Transitions
+### Conversation Session
+```
+[No Session] ---(user opens chat panel)---> [Active Session]
+[Active Session] ---(user sends message)---> [Waiting for Response]
+[Waiting for Response] ---(response received)---> [Active Session]
+[Active Session] ---(page refresh/close)---> [No Session]
+```
+### Index Lifecycle
+```
+[No Index] ---(startup, no persist dir)---> [Building Index]
+[Building Index] ---(indexing complete)---> [Index Ready]
+[No Index] ---(startup, persist dir exists)---> [Loading Index]
+[Loading Index] ---(load successful)---> [Index Ready]
+[Loading Index] ---(load failed)---> [Building Index]
+[Index Ready] ---(query received)---> [Index Ready]
+```
+## Validation Rules
+### ChatMessage Validation
+1. `role` must be exactly `user` or `assistant`
+2. `content` must not be empty (whitespace-only is invalid)
+3. `content` must be ≤10,000 characters
+4. `sources` must be empty for `user` role messages
+### SourceReference Validation
+1. `path` must be a valid vault path (see `validate_note_path` in vault.py)
+2. `title` must not be empty
+3. `snippet` must be ≤500 characters
+4. `score` if present must be between 0.0 and 1.0
+### NoteWritten Validation (Phase 2)
+1. `path` must start with `agent-notes/`
+2. `path` must be a valid vault path
+3. `action` must be `created` or `updated`
+## Relationships
+```
+ChatSession (frontend state)
+  └── contains 0..* ChatMessage
+                      └── assistant messages contain 0..* SourceReference
+                                                    └── references 1 VaultNote (existing)
+                      └── assistant messages may contain 0..* NoteWritten
+                                                    └── creates/updates 1 VaultNote
+```
+## Persistence
+| Entity | Storage | Lifetime |
+|--------|---------|----------|
+| ChatMessage | Frontend memory | Session (cleared on refresh) |
+| SourceReference | Derived from query | Per response |
+| NoteWritten | VaultService (filesystem) | Permanent |
+| Vector Index | LlamaIndex persist dir | Until rebuild |

specs/004-gemini-vault-chat/plan.md ADDED Viewed

	@@ -0,0 +1,130 @@

+# Implementation Plan: Gemini Vault Chat Agent
+**Branch**: `004-gemini-vault-chat` | **Date**: 2025-11-28 | **Spec**: [spec.md](./spec.md)
+**Input**: Feature specification from `/specs/004-gemini-vault-chat/spec.md`
+## Summary
+Add a Gemini-powered RAG chat agent to the Document-MCP platform. Users can ask natural language questions about their Markdown vault and receive AI-synthesized answers grounded in their documents. The system uses LlamaIndex for document indexing and retrieval, with Gemini as both the LLM and embedding model. An optional Phase 2 adds constrained note-writing capabilities.
+## Technical Context
+**Language/Version**: Python 3.11+ (backend), TypeScript (frontend)
+**Primary Dependencies**: FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI
+**Storage**: Filesystem vault (existing), LlamaIndex persisted vector store (new, under `data/llamaindex/`)
+**Testing**: pytest (backend), manual verification (frontend)
+**Target Platform**: Hugging Face Spaces (Docker), Linux server
+**Project Type**: Web application (frontend + backend)
+**Performance Goals**: <5 seconds for RAG response (per SC-001)
+**Constraints**: Must not break existing MCP server or ChatGPT widget
+**Scale/Scope**: Hackathon scale—index rebuilds acceptable on restart
+## Constitution Check
+*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
+| Principle | Status | Notes |
+|-----------|--------|-------|
+| I. Brownfield Integration | ✅ Pass | Uses existing VaultService, adds new routes/services alongside existing code |
+| II. Test-Backed Development | ✅ Pass | Plan includes pytest tests for RAG service; frontend is manual verification |
+| III. Incremental Delivery | ✅ Pass | P1 stories (read-only RAG) can ship before P3 (write tools) |
+| IV. Specification-Driven | ✅ Pass | All work traced to spec.md; Phase 2 is optional per spec |
+| No Magic | ✅ Pass | Direct LlamaIndex usage, no custom abstractions |
+| Single Source of Truth | ✅ Pass | Vault remains source of truth; index is derived view |
+| Error Handling | ✅ Pass | Spec requires FR-011 error messages for AI unavailability |
+**Technology Stack Compliance**:
+- Backend: Python 3.11+, FastAPI, Pydantic ✅
+- Frontend: React 18+, TypeScript, Tailwind, Shadcn/UI ✅
+- Storage: Filesystem-based (LlamaIndex persisted store) ✅
+## Project Structure
+### Documentation (this feature)
+```text
+specs/004-gemini-vault-chat/
+├── plan.md              # This file
+├── research.md          # Phase 0 output
+├── data-model.md        # Phase 1 output
+├── quickstart.md        # Phase 1 output
+├── contracts/           # Phase 1 output
+│   └── rag-api.yaml     # OpenAPI spec for RAG endpoints
+└── tasks.md             # Phase 2 output (created by /speckit.tasks)
+```
+### Source Code (repository root)
+```text
+backend/
+├── src/
+│   ├── api/
+│   │   └── routes/
+│   │       └── rag.py           # NEW: RAG chat endpoint
+│   ├── models/
+│   │   └── rag.py               # NEW: Pydantic models for RAG
+│   └── services/
+│       └── rag_index.py         # NEW: LlamaIndex service
+└── tests/
+    └── unit/
+        └── test_rag_service.py  # NEW: RAG service tests
+frontend/
+├── src/
+│   ├── components/
+│   │   ├── ChatPanel.tsx        # NEW: Chat interface
+│   │   ├── ChatMessage.tsx      # NEW: Message component
+│   │   └── SourceList.tsx       # NEW: Source references
+│   ├── services/
+│   │   └── rag.ts               # NEW: RAG API client
+│   └── types/
+│       └── rag.ts               # NEW: TypeScript types
+data/
+└── llamaindex/                  # NEW: Persisted vector index
+```
+**Structure Decision**: Web application structure (Option 2). New files added alongside existing code per Constitution Principle I.
+## Complexity Tracking
+> No violations requiring justification.
+## Implementation Phases
+### Phase 1: Core RAG Query (P1 Stories)
+Implements User Stories 1-2: Ask questions, view sources.
+**Backend Tasks**:
+1. Add LlamaIndex dependencies to `requirements.txt`
+2. Create `rag_index.py` service with `get_or_build_index()` singleton
+3. Create `rag.py` Pydantic models for request/response
+4. Create `rag.py` route with `POST /api/rag/chat` endpoint
+5. Add unit tests for RAG service
+**Frontend Tasks**:
+1. Create `ChatPanel.tsx` component with message list and composer
+2. Create `ChatMessage.tsx` for rendering user/assistant messages
+3. Create `SourceList.tsx` for collapsible source references
+4. Add `rag.ts` API client service
+5. Integrate ChatPanel into MainApp layout
+### Phase 2: Multi-Turn Conversation (P2 Story)
+Implements User Story 3: Context-aware follow-ups.
+**Tasks**:
+1. Maintain chat history in frontend state
+2. Pass full message history to backend
+3. Update RAG service to use chat history for context
+### Phase 3: Agent Note Writing (P3 Story, Optional)
+Implements User Story 4: Create/append notes via agent.
+**Tasks**:
+1. Create constrained write helpers (`create_note`, `append_to_note`)
+2. Register as LlamaIndex agent tools
+3. Add `notes_written` to response model
+4. Show created notes badge in UI

specs/004-gemini-vault-chat/quickstart.md ADDED Viewed

	@@ -0,0 +1,144 @@

+# Quickstart: Gemini Vault Chat Agent
+**Feature**: 004-gemini-vault-chat
+**Date**: 2025-11-28
+## Prerequisites
+1. Python 3.11+ installed
+2. Node.js 18+ installed
+3. Google API key with Gemini access
+## Setup
+### 1. Install Backend Dependencies
+```bash
+cd backend
+pip install llama-index llama-index-llms-google-genai llama-index-embeddings-google-genai
+```
+### 2. Configure Environment
+Add to your `.env` file (or export in terminal):
+```bash
+GOOGLE_API_KEY=your-gemini-api-key-here
+VAULT_DIR=data/vaults/demo-user
+LLAMAINDEX_PERSIST_DIR=data/llamaindex
+```
+### 3. Start Backend
+```bash
+cd backend
+uvicorn src.api.main:app --reload --port 8000
+```
+The RAG index will be built on first startup (may take a few seconds).
+### 4. Start Frontend
+```bash
+cd frontend
+npm install
+npm run dev
+```
+## Verify Installation
+### Check RAG Status
+```bash
+curl http://localhost:8000/api/rag/status
+```
+Expected response:
+```json
+{
+  "status": "ready",
+  "index_ready": true,
+  "documents_indexed": 15
+}
+```
+### Test RAG Chat
+```bash
+curl -X POST http://localhost:8000/api/rag/chat \
+  -H "Content-Type: application/json" \
+  -d '{"messages": [{"role": "user", "content": "What is this project about?"}]}'
+```
+Expected response:
+```json
+{
+  "answer": "This project is Document-MCP, a...",
+  "sources": [
+    {
+      "path": "Getting Started.md",
+      "title": "Getting Started",
+      "snippet": "Document-MCP provides...",
+      "score": 0.89
+    }
+  ],
+  "notes_written": []
+}
+```
+## Development Workflow
+### Backend Changes
+1. Edit files in `backend/src/services/rag_index.py` or `backend/src/api/routes/rag.py`
+2. Server auto-reloads with `--reload` flag
+3. Run tests: `cd backend && pytest tests/unit/test_rag_service.py -v`
+### Frontend Changes
+1. Edit files in `frontend/src/components/` (ChatPanel, ChatMessage, SourceList)
+2. Vite auto-reloads on save
+3. Open browser at `http://localhost:5173`
+### Rebuilding the Index
+Delete the persist directory and restart:
+```bash
+rm -rf data/llamaindex
+# Restart backend
+```
+## File Locations
+| Component | Path |
+|-----------|------|
+| RAG Service | `backend/src/services/rag_index.py` |
+| RAG Routes | `backend/src/api/routes/rag.py` |
+| RAG Models | `backend/src/models/rag.py` |
+| Chat Panel | `frontend/src/components/ChatPanel.tsx` |
+| API Client | `frontend/src/services/rag.ts` |
+| Types | `frontend/src/types/rag.ts` |
+| Index Storage | `data/llamaindex/` |
+## Troubleshooting
+### "GOOGLE_API_KEY not set"
+Ensure the environment variable is exported:
+```bash
+export GOOGLE_API_KEY=your-key-here
+```
+### "Index not ready"
+Wait a few seconds after startup for indexing to complete. Check logs for errors.
+### "Rate limited"
+Gemini API has rate limits. Wait and retry, or check your API quota.
+### Empty sources in response
+Check that your vault has Markdown files. Run `ls data/vaults/demo-user/` to verify.

specs/004-gemini-vault-chat/research.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# Research: Gemini Vault Chat Agent
+**Feature**: 004-gemini-vault-chat
+**Date**: 2025-11-28
+## LlamaIndex Integration
+### Decision: Use LlamaIndex Core with Google GenAI Extensions
+**Rationale**: LlamaIndex provides a mature, well-documented framework for building RAG applications. The `llama-index-llms-google-genai` and `llama-index-embeddings-google-genai` packages provide first-class Gemini support without requiring custom integration code.
+**Alternatives Considered**:
+- **LangChain**: More complex, larger dependency footprint. LlamaIndex is more focused on document retrieval use cases.
+- **Direct Gemini API**: Would require implementing chunking, embedding, and retrieval logic manually. Higher development effort.
+- **OpenAI + pgvector**: Requires PostgreSQL, conflicts with SQLite-only approach in constitution.
+### Key LlamaIndex Patterns
+```python
+# Singleton index pattern (recommended)
+from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
+from llama_index.core import load_index_from_storage
+from llama_index.llms.google_genai import GoogleGenAI
+from llama_index.embeddings.google_genai import GoogleGenAIEmbedding
+_index: VectorStoreIndex | None = None
+def get_or_build_index(vault_path: Path, persist_dir: Path) -> VectorStoreIndex:
+    global _index
+    if _index is not None:
+        return _index
+    if persist_dir.exists():
+        storage_context = StorageContext.from_defaults(persist_dir=str(persist_dir))
+        _index = load_index_from_storage(storage_context)
+    else:
+        documents = SimpleDirectoryReader(str(vault_path), recursive=True).load_data()
+        _index = VectorStoreIndex.from_documents(documents)
+        _index.storage_context.persist(persist_dir=str(persist_dir))
+    return _index
+```
+## Gemini Model Selection
+### Decision: gemini-1.5-flash for LLM, text-embedding-004 for Embeddings
+**Rationale**:
+- `gemini-1.5-flash` offers good balance of speed and quality for interactive chat
+- `text-embedding-004` is Google's latest text embedding model with 768 dimensions
+- Both are cost-effective for hackathon/demo scale
+**Alternatives Considered**:
+- `gemini-1.5-pro`: Higher quality but slower and more expensive
+- `gemini-2.0-flash-exp`: Experimental, may not be stable
+### Environment Variables
+```
+GOOGLE_API_KEY=<api-key>
+VAULT_DIR=data/vaults/demo-user  # Or dynamically per user
+LLAMAINDEX_PERSIST_DIR=data/llamaindex
+```
+## Source Attribution Strategy
+### Decision: Extract source metadata from LlamaIndex response nodes
+**Rationale**: LlamaIndex query responses include source nodes with file paths and text chunks. We can map these back to vault note paths and extract snippets for display.
+```python
+response = query_engine.query(question)
+sources = []
+for node in response.source_nodes:
+    sources.append({
+        "path": node.metadata.get("file_path"),
+        "title": derive_title_from_path(node.metadata.get("file_path")),
+        "snippet": node.text[:200] + "..." if len(node.text) > 200 else node.text,
+        "score": node.score
+    })
+```
+## Multi-Turn Conversation
+### Decision: Use LlamaIndex ChatEngine for conversation memory
+**Rationale**: LlamaIndex provides `as_chat_engine()` which wraps the index with conversation memory. This handles context naturally without custom implementation.
+```python
+chat_engine = index.as_chat_engine(
+    chat_mode="context",
+    llm=GoogleGenAI(model="gemini-1.5-flash")
+)
+response = chat_engine.chat("Follow-up question here")
+```
+**Note**: For MVP, we'll use a simpler approach where the frontend passes full message history and we construct context in the query. This avoids server-side session state.
+## Agent Tools (Phase 2)
+### Decision: Use LlamaIndex FunctionTool with constrained paths
+**Rationale**: LlamaIndex supports registering Python functions as tools for agentic use. We can constrain write operations to an `agent-notes/` subdirectory.
+```python
+from llama_index.core.tools import FunctionTool
+def create_note(title: str, content: str) -> str:
+    """Create a new note in the agent folder."""
+    safe_filename = slugify(title)
+    path = f"agent-notes/{safe_filename}.md"
+    vault_service.write_note(user_id, path, title=title, body=content)
+    return f"Created note: {path}"
+create_note_tool = FunctionTool.from_defaults(fn=create_note)
+```
+## Error Handling
+### Decision: Graceful degradation with user-friendly messages
+**Patterns**:
+1. API key missing → 503 "AI service not configured"
+2. API rate limit → 429 "Please wait and try again"
+3. Network error → 503 "AI service temporarily unavailable"
+4. Empty vault → 200 with message "No documents indexed"
+## Performance Considerations
+### Index Persistence
+- First indexing: ~1-5 seconds for small vaults (<100 notes)
+- Subsequent loads: ~100ms from persisted storage
+- Query latency: ~1-3 seconds depending on Gemini API response time
+### Recommendations
+1. Load index on startup (not on first request)
+2. Use environment variable to configure persist directory
+3. For large vaults, consider lazy loading or background indexing (post-MVP)
+## Dependencies to Add
+```
+# requirements.txt additions
+llama-index
+llama-index-llms-google-genai
+llama-index-embeddings-google-genai
+```
+**Note**: These packages have their own dependencies (e.g., `google-generativeai`). Tested compatible with Python 3.11+.

specs/004-gemini-vault-chat/spec.md ADDED Viewed

	@@ -0,0 +1,122 @@

+# Feature Specification: Gemini Vault Chat Agent
+**Feature Branch**: `004-gemini-vault-chat`
+**Created**: 2025-11-28
+**Status**: Draft
+**Input**: User description: "Add a Gemini-powered planning chat agent using LlamaIndex for RAG over the Markdown vault. Use Gemini as both LLM and embedding model. Include a new chat panel in the HF Space frontend that calls a RAG backend endpoint, displays assistant responses with linked sources, and optionally allows the agent to write notes."
+## User Scenarios & Testing *(mandatory)*
+### User Story 1 - Ask Questions About Vault Content (Priority: P1)
+A user opens the Gemini Planning Agent panel and asks a question about content stored in their Markdown vault. The system searches the vault, retrieves relevant passages, and returns an AI-generated answer that synthesizes information from the relevant notes.
+**Why this priority**: This is the core value proposition—enabling users to query their knowledge base conversationally and get AI-synthesized answers grounded in their own documents.
+**Independent Test**: Can be fully tested by typing a question and verifying the response is relevant to vault content, with sources listed.
+**Acceptance Scenarios**:
+1. **Given** a vault containing notes about project architecture, **When** user asks "How does authentication work?", **Then** the system returns an answer citing relevant notes with snippets
+2. **Given** a vault with multiple related notes, **When** user asks a question that spans multiple topics, **Then** the system synthesizes information from multiple sources and lists all referenced notes
+3. **Given** a vault with no relevant content, **When** user asks an unrelated question, **Then** the system responds that no relevant information was found in the vault
+---
+### User Story 2 - View Source Notes (Priority: P1)
+After receiving an answer from the chat agent, the user can see which notes were used to generate the response. They can click on a source to view the note in the existing document viewer or see an inline snippet.
+**Why this priority**: Source attribution is essential for trust and verification. Users need to know where information comes from and validate AI responses against original content.
+**Independent Test**: Can be tested by receiving an answer and clicking on a listed source to verify it opens the correct note.
+**Acceptance Scenarios**:
+1. **Given** an assistant response with sources, **When** user clicks a source link, **Then** the corresponding note opens in the document viewer
+2. **Given** an assistant response with sources, **When** user expands a source, **Then** they see a snippet of the relevant passage
+3. **Given** an assistant response, **When** sources are displayed, **Then** each source shows the note title and path
+---
+### User Story 3 - Multi-Turn Conversation (Priority: P2)
+Users can have a multi-turn conversation with the agent, asking follow-up questions that build on previous context. The agent maintains conversation history for coherent responses.
+**Why this priority**: Natural conversation flow improves user experience, but basic single-query functionality delivers core value first.
+**Independent Test**: Can be tested by asking a question, then asking a follow-up that references "it" or "that", and verifying the agent understands the context.
+**Acceptance Scenarios**:
+1. **Given** a previous question about "authentication", **When** user asks "How do I configure it?", **Then** the agent understands "it" refers to authentication
+2. **Given** an ongoing conversation, **When** user starts a new topic, **Then** the agent responds appropriately to the new context
+3. **Given** a conversation session, **When** user refreshes the page, **Then** conversation history is cleared (new session starts)
+---
+### User Story 4 - Agent Creates Notes (Priority: P3)
+Users can instruct the agent to create new notes based on the conversation. The agent writes notes to a dedicated folder in the vault and informs the user what was created.
+**Why this priority**: Note creation adds significant value but requires more complex safety controls. Core reading/query functionality should be solid first.
+**Independent Test**: Can be tested by asking the agent to "create a summary note about X" and verifying a new note appears in the designated folder.
+**Acceptance Scenarios**:
+1. **Given** a conversation about a topic, **When** user asks "create a summary note", **Then** the agent creates a new Markdown note in the agent folder
+2. **Given** an agent-created note, **When** user views the response, **Then** a badge or link shows the created note path
+3. **Given** an existing note, **When** user asks the agent to append content, **Then** the agent updates the existing note appropriately
+---
+### Edge Cases
+- What happens when the vault is empty or has no indexed content? → System returns a friendly message indicating no documents are available
+- How does the system handle very long user queries? → Query is truncated to reasonable limits with user notification
+- What happens if the AI service is unavailable? → System shows an error message and suggests retrying
+- How are malformed or non-Markdown files handled? → Non-Markdown files are ignored during indexing
+- What if the agent tries to write outside the designated folder? → Write operations are constrained to the agent folder only
+## Requirements *(mandatory)*
+### Functional Requirements
+- **FR-001**: System MUST provide a chat interface for users to ask natural language questions about vault content
+- **FR-002**: System MUST search the vault and retrieve relevant passages to answer user queries
+- **FR-003**: System MUST generate AI responses that synthesize information from retrieved content
+- **FR-004**: System MUST display source notes for each response, including note title and path
+- **FR-005**: System MUST allow users to navigate from a source reference to the full note
+- **FR-006**: System MUST maintain conversation history within a session for multi-turn dialogue
+- **FR-007**: System MUST build and persist a searchable index of vault content
+- **FR-008**: System MUST load an existing index on startup if available
+- **FR-009**: System MUST constrain agent write operations to a designated agent folder only
+- **FR-010**: System MUST display a notification when the agent creates or updates a note
+- **FR-011**: System MUST show an appropriate error message if the AI service is unavailable
+### Key Entities
+- **Chat Message**: Represents a single message in the conversation (role: user or assistant, content, timestamp)
+- **Chat Session**: A collection of messages in a single conversation context (started when user opens panel, cleared on page refresh)
+- **Source Reference**: Metadata about a note used to generate a response (note title, path, relevant snippet)
+- **Agent Note**: A Markdown note created by the agent, stored in the designated agent folder
+## Success Criteria *(mandatory)*
+### Measurable Outcomes
+- **SC-001**: Users receive a relevant answer with sources within 5 seconds of submitting a query
+- **SC-002**: 90% of responses include at least one source reference when relevant content exists
+- **SC-003**: Users can navigate from a source reference to the full note in one click
+- **SC-004**: Multi-turn conversations correctly reference previous context in 80% of follow-up questions
+- **SC-005**: Agent-created notes appear in the designated folder and are visible in the vault viewer within 2 seconds
+- **SC-006**: System gracefully handles AI service unavailability with a clear error message
+## Assumptions
+- Users have a Markdown vault with content they want to query
+- The existing document viewer from the Docs Widget can be reused for viewing source notes
+- Index rebuilds are acceptable on service restarts for the initial release
+- Session history is ephemeral and not persisted across page refreshes
+- Agent write operations are limited to creating and appending to notes (no deletion)

specs/004-gemini-vault-chat/tasks.md ADDED Viewed

	@@ -0,0 +1,228 @@

+# Tasks: Gemini Vault Chat Agent
+**Input**: Design documents from `/specs/004-gemini-vault-chat/`
+**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/ ✅
+**Tests**: Unit tests for RAG service included per Constitution (Test-Backed Development).
+**Organization**: Tasks grouped by user story for independent implementation and testing.
+## Format: `[ID] [P?] [Story] Description`
+- **[P]**: Can run in parallel (different files, no dependencies)
+- **[Story]**: Which user story this task belongs to (US1, US2, US3, US4)
+- Include exact file paths in descriptions
+## Path Conventions
+- **Backend**: `backend/src/`, `backend/tests/`
+- **Frontend**: `frontend/src/`
+- **Data**: `data/llamaindex/`
+---
+## Phase 1: Setup (Shared Infrastructure)
+**Purpose**: Add dependencies and create type definitions
+- [ ] T001 Add LlamaIndex dependencies to `backend/requirements.txt`: llama-index, llama-index-llms-google-genai, llama-index-embeddings-google-genai
+- [ ] T002 [P] Create TypeScript types in `frontend/src/types/rag.ts`: ChatMessage, SourceReference, NoteWritten, ChatRequest, ChatResponse
+- [ ] T003 [P] Add GOOGLE_API_KEY and LLAMAINDEX_PERSIST_DIR to environment configuration in `backend/src/services/config.py`
+---
+## Phase 2: Foundational (Blocking Prerequisites)
+**Purpose**: Core backend infrastructure for RAG that all user stories depend on
+**⚠️ CRITICAL**: No user story work can begin until this phase is complete
+- [ ] T004 Create Pydantic models in `backend/src/models/rag.py`: ChatMessage, SourceReference, NoteWritten, ChatRequest, ChatResponse, StatusResponse, ErrorResponse
+- [ ] T005 Create RAG index service skeleton in `backend/src/services/rag_index.py` with `get_or_build_index()` singleton pattern
+- [ ] T006 Implement index persistence: load from `data/llamaindex/` if exists, otherwise build and persist in `backend/src/services/rag_index.py`
+- [ ] T007 Create `backend/tests/unit/test_rag_service.py` with test stubs for index loading, query execution, and error handling
+- [ ] T008 Register RAG routes in `backend/src/api/main.py` (import and include rag router)
+**Checkpoint**: Foundation ready - RAG service can load/build index on startup
+---
+## Phase 3: User Story 1 & 2 - Ask Questions + View Sources (Priority: P1) 🎯 MVP
+**Goal**: Users can ask questions and receive AI-synthesized answers with source attribution
+**Independent Test**: Type a question in the chat panel, verify response includes answer text and clickable source references
+### Backend Implementation (US1+US2)
+- [ ] T009 [US1] Implement `rag_chat()` function in `backend/src/services/rag_index.py` that queries index and returns answer with sources
+- [ ] T010 [US1] Extract source metadata from LlamaIndex response nodes (path, title, snippet, score) in `backend/src/services/rag_index.py`
+- [ ] T011 [US1] Create POST `/api/rag/chat` endpoint in `backend/src/api/routes/rag.py` wrapping `rag_chat()`
+- [ ] T012 [P] [US1] Create GET `/api/rag/status` endpoint in `backend/src/api/routes/rag.py` returning index status
+- [ ] T013 [US1] Implement unit tests for `rag_chat()` in `backend/tests/unit/test_rag_service.py`: happy path, no results, error handling
+### Frontend Implementation (US1+US2)
+- [ ] T014 [P] [US2] Create RAG API client in `frontend/src/services/rag.ts` with `sendMessage()` and `getStatus()` functions
+- [ ] T015 [P] [US2] Create ChatMessage component in `frontend/src/components/ChatMessage.tsx` rendering user/assistant messages
+- [ ] T016 [P] [US2] Create SourceList component in `frontend/src/components/SourceList.tsx` with collapsible source references
+- [ ] T017 [US1] Create ChatPanel component in `frontend/src/components/ChatPanel.tsx` with message list and composer textarea
+- [ ] T018 [US1] Integrate ChatPanel into MainApp layout in `frontend/src/pages/MainApp.tsx` as new panel/tab
+- [ ] T019 [US2] Wire SourceList click handler to open note in document viewer via existing navigation
+**Checkpoint**: User can ask a question, see AI answer with sources, and click source to view note
+---
+## Phase 4: User Story 3 - Multi-Turn Conversation (Priority: P2)
+**Goal**: Users can have context-aware follow-up conversations
+**Independent Test**: Ask "What is authentication?", then ask "How do I configure it?" - verify agent understands "it" refers to authentication
+### Implementation (US3)
+- [ ] T020 [US3] Add message history state management in `frontend/src/components/ChatPanel.tsx` using React useState
+- [ ] T021 [US3] Pass full message history array to `POST /api/rag/chat` in `frontend/src/services/rag.ts`
+- [ ] T022 [US3] Update `rag_chat()` in `backend/src/services/rag_index.py` to construct context from message history
+- [ ] T023 [US3] Add conversation reset button in `frontend/src/components/ChatPanel.tsx` to clear history
+- [ ] T024 [US3] Add unit test for multi-turn context handling in `backend/tests/unit/test_rag_service.py`
+**Checkpoint**: Multi-turn conversation maintains context; page refresh clears history
+---
+## Phase 5: User Story 4 - Agent Creates Notes (Priority: P3, Optional)
+**Goal**: Agent can create/append notes in a designated folder
+**Independent Test**: Ask "create a summary note about authentication" - verify note appears in `agent-notes/` folder
+### Implementation (US4)
+- [ ] T025 [US4] Create `create_note()` helper in `backend/src/services/rag_index.py` constrained to `agent-notes/` folder
+- [ ] T026 [US4] Create `append_to_note()` helper in `backend/src/services/rag_index.py` for updating existing notes
+- [ ] T027 [US4] Register helpers as LlamaIndex FunctionTools in `backend/src/services/rag_index.py`
+- [ ] T028 [US4] Update `rag_chat()` to use agent mode with tools when write intent detected
+- [ ] T029 [US4] Add `notes_written` to ChatResponse and include in API response from `backend/src/api/routes/rag.py`
+- [ ] T030 [P] [US4] Add NoteWritten badge component in `frontend/src/components/ChatMessage.tsx` showing created note path
+- [ ] T031 [US4] Wire badge click to navigate to created note in vault viewer
+- [ ] T032 [US4] Add unit tests for constrained write operations in `backend/tests/unit/test_rag_service.py`
+**Checkpoint**: Agent can create notes; writes constrained to `agent-notes/` folder
+---
+## Phase 6: Polish & Cross-Cutting Concerns
+**Purpose**: Error handling, edge cases, and validation
+- [ ] T033 [P] Implement error handling for missing GOOGLE_API_KEY with 503 response in `backend/src/services/rag_index.py`
+- [ ] T034 [P] Implement error handling for API rate limits with 429 response in `backend/src/api/routes/rag.py`
+- [ ] T035 [P] Add loading state and error display in `frontend/src/components/ChatPanel.tsx`
+- [ ] T036 [P] Add empty vault message when no documents indexed in `backend/src/services/rag_index.py`
+- [ ] T037 Run quickstart.md validation: verify all setup steps work
+- [ ] T038 Manual E2E test: full user journey through all implemented stories
+---
+## Dependencies & Execution Order
+### Phase Dependencies
+- **Setup (Phase 1)**: No dependencies - can start immediately
+- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
+- **User Stories (Phase 3-5)**: All depend on Foundational phase completion
+  - US1+US2 (Phase 3) must complete before US3 (Phase 4)
+  - US3 can complete before US4 (Phase 5 is optional)
+- **Polish (Phase 6)**: Can run after Phase 3 minimum
+### User Story Dependencies
+- **User Story 1+2 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
+- **User Story 3 (P2)**: Depends on US1+US2 for chat panel and history structure
+- **User Story 4 (P3)**: Depends on US1+US2 for basic chat flow; optional feature
+### Within Each Phase
+- Backend models before services
+- Services before routes
+- Backend before frontend integration
+- Core implementation before error handling
+### Parallel Opportunities
+**Phase 1:**
+- T002 (TS types) and T003 (config) can run in parallel
+**Phase 2:**
+- T004 (models) must complete before T005-T008
+**Phase 3:**
+- T014, T015, T016 (frontend components) can run in parallel
+- T012 (/status endpoint) can run in parallel with other backend work
+**Phase 4:**
+- T020-T024 are sequential (frontend then backend integration)
+**Phase 5:**
+- T025, T026 (helpers) sequential
+- T030 (badge) can run parallel with backend once T029 complete
+**Phase 6:**
+- T033, T034, T035, T036 all parallel (different files)
+---
+## Parallel Example: Phase 3 Frontend
+```bash
+# Launch all independent frontend components together:
+Task: "Create RAG API client in frontend/src/services/rag.ts"
+Task: "Create ChatMessage component in frontend/src/components/ChatMessage.tsx"
+Task: "Create SourceList component in frontend/src/components/SourceList.tsx"
+```
+---
+## Implementation Strategy
+### MVP First (Phase 1-3 Only)
+1. Complete Phase 1: Setup (T001-T003)
+2. Complete Phase 2: Foundational (T004-T008)
+3. Complete Phase 3: User Story 1+2 (T009-T019)
+4. **STOP and VALIDATE**: Test RAG query and source display independently
+5. Deploy/demo if ready - this is the MVP!
+### Incremental Delivery
+1. Complete Setup + Foundational → Foundation ready
+2. Add US1+US2 → Test independently → **Deploy/Demo (MVP!)**
+3. Add US3 → Test multi-turn → Deploy/Demo
+4. Add US4 (optional) → Test note creation → Deploy/Demo
+5. Each story adds value without breaking previous stories
+### Estimated Effort
+| Phase | Tasks | Estimated Hours |
+|-------|-------|-----------------|
+| Setup | 3 | 0.5 |
+| Foundational | 5 | 2 |
+| US1+US2 (MVP) | 11 | 4 |
+| US3 | 5 | 2 |
+| US4 (optional) | 8 | 3 |
+| Polish | 6 | 1.5 |
+| **Total** | **38** | **13** |
+---
+## Notes
+- [P] tasks = different files, no dependencies
+- [Story] label maps task to specific user story for traceability
+- US1 and US2 combined in Phase 3 since they're tightly coupled (source display is part of query response)
+- US4 is optional per spec - can skip if time-constrained
+- Constitution requires pytest tests for backend features
+- Frontend testing is manual verification per Constitution