https://www.youtube.com/watch?v=p0FERNkpyHE

This video dives deep into building advanced Retrieval-Augmented Generation (RAG) systems for AI agents, specifically focusing on combining Agentic RAG with Knowledge Graphs. The speaker aims to find the most effective way for AI agents to search and utilize custom knowledge bases. Here’s a detailed breakdown of the video’s content: **1. The Problem & The Solution (0:00)**The speaker has been exploring various RAG strategies to find the “best way possible” to give his AI agents the ability to search through his custom knowledge. He identifies Agentic RAG and Knowledge Graphs as the two most powerful strategies, and highlights that they can be easily combined to create extremely powerful knowledge retrieval systems. **2. Live Demo Introduction (0:30)**The video kicks off with a demonstration of the system in a Command Line Interface (CLI). The Agentic RAG agent has access to both a vector database (PostgreSQL with pgvector) and a knowledge graph (Neo4j with Graphiti) through agent tools. This allows the agent to intelligently pick and choose how it explores the knowledge base. The speaker also mentions using Claude Code to help build this system. 3. Deep Dive into the Knowledge Sources (1:30)

  • Vector Database (PostgreSQL via Neon): The speaker shows his PostgreSQL database, hosted on Neon, containing chunks of markdown documents, each with a corresponding embedding vector. For the demo, he has a single document titled “Big Tech AI Initiatives 2024,” which is chunked and embedded.
  • Knowledge Graph (Neo4j via Graphiti): He then showcases the same information represented in Neo4j as a knowledge graph. This relational representation highlights how different entities (like companies) are connected (e.g., Amazon relates to Anthropic because Amazon invested in Anthropic; OpenAI relates to Azure because Azure exclusively hosts OpenAI’s models). This relational structure is key for advanced querying.

**4. Agentic RAG in Action: CLI Demo with Tool Usage Visibility (3:15)**The speaker demonstrates the agent’s intelligent tool selection:

  • Semantic Query (Vector Search): When asked, “What are the AI initiatives for Google?”, the agent correctly uses vector_search to retrieve relevant document chunks.
  • Relational Query (Graph Search): When asked, “How are OpenAI and Microsoft related?”, the agent utilizes graph_search to identify the partnership details and Azure’s role.
  • Hybrid Query (Both Search Types): For a more complex query like, “What are the initiatives for Microsoft? How does that relate to Anthropic? Use both search types”, the agent leverages both vector_search and graph_search to provide a comprehensive answer, showcasing its ability to reason and combine information from different knowledge representations.

**5. Understanding Agentic RAG vs. Vanilla RAG (6:30)**The video uses diagrams from a Weaviate blog post to explain the fundamental difference:

  • Vanilla RAG (Inflexible): Documents are chunked, embedded into a vector database, and then a query retrieves the most similar chunks as direct context for the Large Language Model (LLM). The LLM is forced to use only this retrieved context. This approach is rigid and doesn’t allow for dynamic search strategies.
  • Agentic RAG (Flexible): A “Retrieval Agent” acts as an intermediary. It receives the user query, then intelligently decides which tools (e.g., different vector search engines, knowledge graph search, web search, calculators) to use to gather relevant information. This information is then fed to the LLM, allowing for more nuanced and accurate responses. The agent’s reasoning ability is the core enhancement.

**6. Getting the Agentic RAG System Set Up (10:37)**The speaker guides viewers through setting up their own instance of the project.

  • Prerequisites: Python 3.11+, PostgreSQL (Neon recommended), Neo4j (local-ai-packaged or desktop recommended), and an LLM Provider API key (OpenAI, Ollama, Gemini, etc.).
  • Installation: Create a virtual environment, install dependencies from requirements.txt.
  • PostgreSQL Setup: Execute schema.sql to create necessary tables, indexes, and functions. The speaker demonstrates this in the Neon Console.
  • Neo4j Setup: Instructions are provided for two common setup methods.
  • Environment Variables (.env): Configure database connection strings, Neo4j credentials, LLM provider choices (allowing for flexible switching between APIs like OpenAI, Ollama, OpenRouter, Gemini), and embedding models.
  • Document Ingestion: Markdown documents are placed in the documents/ folder. Running python -m ingestion.ingest —clean processes these documents:
    • Parses and semantically chunks the documents.
    • Generates embeddings for vector search.
    • Extracts entities and relationships for the knowledge graph (a computationally expensive step that uses LLM calls).
    • Stores everything in PostgreSQL (Neon) and Neo4j.
  • Running the API Server: python -m agent.api starts the FastAPI server, exposing the agent’s API endpoints.
  • Using the CLI: python cli.py launches the interactive command-line interface to chat with the agent and observe its tool usage.

**7. AI-Assisted Coding with Claude Code (28:10)**The speaker shares how Claude Code was instrumental in building this complex system:

  • Beyond Vibe Coding: He emphasizes the importance of structured development over spontaneous coding.
  • MCP Servers: He uses two “MCP” (Master Control Program) servers to manage the AI-assisted development:
    • crawl4ai-rag-mcp-server: Handles RAG operations, external documentation crawling, and specific PyDantic AI features.
    • mcp-server-neon: Manages Neon database operations (creating projects, running SQL, managing tables).
  • Planning Mode (PLANNING.md, TASK.md): Claude Code’s “plan mode” (activated by Shift+Tab twice) forces the AI to create a comprehensive plan and task list before writing code. This ensures a structured approach.
  • Automated Development Process: With these files as guidance, Claude Code can:
    • Create Neon database projects.
    • Execute SQL schemas and data operations.
    • Manage database tables and validate schema.
    • Fetch documentation (e.g., PyDantic AI).
    • Generate large chunks of application code.
    • Implement testing and error handling.
    • Crucially, it can literally run for an extended period (e.g., 35 minutes) to build the entire application, handling database interactions, API calls, and code generation automatically, simply requiring user approval for key actions.
  • Power of Agentic AI Development: This approach frees developers from tedious, low-level tasks, allowing them to focus on high-level design and validation. The AI effectively becomes a highly capable, autonomous developer.

The video concludes by reiterating the power of combining Agentic RAG and Knowledge Graphs, and highlights the potential of AI coding assistants like Claude Code to revolutionize software development.