https://www.youtube.com/watch?v=QxBJ9ORecMY Here is a Markdown summary of the video transcript, detailing the architecture and functionality of the Hybrid Agentic File Search system.
Agentic File Search vs. Traditional RAG: The Hybrid Approach
This project explores solving the limitations of Traditional RAG (loss of context) and Pure Agentic Search (slow speed) by combining them into a Dual-Path Search Pipeline. The goal is to use semantic and metadata search to filter documents before an agent creates a deep dive, offering both speed and accuracy.
⚠️ The Problem
- Traditional RAG: Fast, but chunks lose context. Splitting documents destroys relationships between sections, and cross-references are invisible.
- Pure Agentic Search: Extremely accurate and capable of “backtracking” (following cross-references like a human), but extremely slow because it must read/scan every document in a folder to find relevance.
🛠 The Solution: Indexed Retrieval Pipeline
To fix the speed issue, the system introduces an offline indexing layer to reduce the search space for the agent.
1. Offline Indexing (Data Ingestion)
Instead of reading raw files at query time, the system pre-processes data:
- Parsing: Uses Docling to convert all documents (PDF, DOCX, etc.) into Markdown.
- Smart Chunking: Splits documents into chunks and computes embeddings using Gemini.
- Metadata Extraction: Uses LangExtract (Gemini-powered) to extract structured metadata (e.g., invoice amounts, dates, organization names) automatically or via a custom schema.
- Storage: All data (Documents, Chunks, Embeddings, Metadata) is stored in DuckDB.
2. Query Time Workflow
When a user asks a question:
- Filtering: The query runs through Semantic Search (embeddings) and Metadata Filtering (SQL-like).
- Ranking: Documents (not just chunks) are ranked based on relevance.
- Agent Handoff: Only the relevant candidate documents are passed to the Agentic File Search.
- Agent Execution: The agent performs its standard loop (read, reason, cross-reference) on this smaller subset of files.
⚙️ Operating Modes
The architecture is flexible and supports four distinct modes:
- Pure Agentic (Original): No indexing. The agent scans the entire folder. (High accuracy, Low speed).
- Semantic + Agentic: Uses vector embeddings to pre-filter documents before the agent takes over.
- Metadata + Agentic: Uses structured metadata (e.g., “Find documents related to Acme Corp”) to filter documents.
- Full Hybrid (Best of All): Combines Semantic Search, Metadata Filtering, and Agentic reasoning.
🖥️ Usage & Interface
The project (Open Source) includes both a CLI and a Web UI.
Key Features
- Auto-Discovery: When indexing a new folder, the system can automatically detect the document type and suggest a metadata schema (e.g., identifying “Risk Factors” or “Purchase Price” in legal docs).
- Backtracking: Even with filtering, the agent retains the ability to “backtrack.” If a document references an exhibit not in the initial filter, the agent can request to read that specific file.
- Model Support: Optimized for Gemini (due to long context window and needle-in-haystack performance) but supports local models (e.g., 32B parameters) via a separate branch.
Installation
# Clone the repo
git clone https://github.com/PromtEngineer/agentic-file-search.git
# Install dependencies (using uv is recommended)
uv pip install .
# Configure API Key
# Create .env file with GOOGLE_API_KEY=...
Running the System
CLI:
uv run explore --task "What is the purchase price in data/test_acquisition"
Web UI:
uv run uvicorn fs_explorer.server:app --port 8000
📝 Conclusion
This hybrid approach represents a “Harness Engineering” pattern—giving agents generalized tools rather than rigid workflows. It allows for production-level speed by filtering noise while maintaining the agent’s ability to understand deep context and cross-references.
Related Concepts
- Dual-Path Search Pipeline — Wikipedia
- Traditional RAG — Wikipedia
- Pure Agentic Search — Wikipedia
- Semantic Search — Wikipedia
- Metadata Search — Wikipedia
- Agentic File Search — Wikipedia