## Gemini LlamaIndex Vault Chat Agent Spec

Feature spec: Gemini + LlamaIndex Vault Chat Agent (HF Space)

    Objective

    Add a second planning chat interface to the Hugging Face Space.
    Use LlamaIndex for retrieval-augmented generation over the same Markdown vault used by Document-MCP.
    Use Gemini (via LlamaIndex) as both the LLM and embedding model.
    Optionally allow the agent to write new notes into the vault via constrained tools.

Non-goals

    Do not change the existing MCP server or ChatGPT App widget behavior.
    Do not introduce a new external database; rely on LlamaIndex storage or simple filesystem persistence for the hackathon.

    High-Level Architecture

    Vault: directory of Markdown notes already used by Document-MCP.
    Indexer: Python module using LlamaIndex to scan the vault, split notes into chunks, and build a VectorStoreIndex backed by a simple vector store.
    Chat backend: FastAPI endpoints that load the index, run RAG queries with Gemini, and return answers plus source notes.
    HF Space frontend: a new chat panel that calls the backend, shows the assistant response, and lists linked sources (note titles and paths).

    Backend details

    Dependencies: llama-index core, llama-index-llms-google-genai, llama-index-embeddings-google-genai.
    Env vars: GOOGLE_API_KEY, VAULT_DIR, LLAMAINDEX_PERSIST_DIR.
    On startup: if a persisted index exists, load it; otherwise, scan VAULT_DIR for markdown files, build a new index, and persist it under LLAMAINDEX_PERSIST_DIR.
    Provide a helper get_or_build_index that returns a singleton VectorStoreIndex.
    Implement a function rag_chat(messages) that:
        Takes a simple chat history array.
        Uses index.as_query_engine with Gemini as the LLM.
        Runs a query on the latest user message.
        Returns a dict with fields: answer (string), sources (list of title, path, snippet), notes_written (empty list for now).
    Expose POST /api/rag/chat in FastAPI that wraps rag_chat.

    Frontend details

    Add a new panel or tab labeled Gemini Planning Agent.
    Layout: left side may keep the existing docs UI; right side is a chat view.
    Chat view: list of messages and a composer textarea with a Send button.
    On send: push the user message into local history, POST to /api/rag/chat, then append the assistant answer and its sources when the response arrives.
    Under each assistant message, show a collapsible Sources section; clicking a source should either open the note in the existing viewer or show the snippet inline.

    Index refresh strategy

    On every backend startup, attempt to load an existing index; rebuild if missing or invalid.
    For hackathon scale, it is acceptable that index updates require a restart or redeploy.

    Phase 2 (optional write tools)

    Implement safe note-writing helpers (create_note, append_to_note, tag_note) that operate only in a dedicated agent folder inside the vault.
    Register these as tools for a LlamaIndex-based agent using Gemini as the reasoning model.
    Extend /api/rag/chat so that responses can include notes_written metadata when the agent creates or updates notes.
    In the UI, show a small badge when a new note is created, with a link into the vault viewer.

    Implementation order

    Wire dependencies and environment variables.
    Implement get_or_build_index and verify indexing works.
    Implement rag_chat and the /api/rag/chat endpoint.
    Build the frontend chat UI and hook it up to the endpoint.
    If time allows, add Phase 2 tools and surface created notes in the UI.