Private RAG system using notebookLM
https://www.youtube.com/watch?v=aj2FkaaL1co The AI Automators
This video demonstrates how to set up and run a fully local, open-source version of Google’s NotebookLM, called InsightsLM. The presenter, Daniel Walsh, highlights the benefits of running AI agents locally, including privacy, compliance, and cost reduction by reducing reliance on cloud infrastructure. Key Features and Architecture: InsightsLM allows users to chat with AI agents grounded in their own documents. The system runs completely locally using Docker containers for various services:
- InsightsLM (Frontend): Runs on
localhost:3010. - Supabase (Backend Database & Storage): Provides PostgreSQL for data storage, a vector store for embeddings, and functions for backend logic. It includes
supabase-db,supabase-storage,supabase-edge-functions,supabase-studio,supabase-kong,supabase-auth,supabase-vector, and more. - n8n (Workflow Automation): Acts as the glue logic, connecting different AI services and orchestrating workflows. It runs on
localhost:5678. - Ollama (Local LLM Inference): Used for generating text responses and embeddings from local large language models (LLMs). The demo uses
qwen3:8b-q4_K_Mandnomic-embed-text. - Whisper-ASR (Audio Transcription): Converts audio files (MP3) into text locally.
- Coqui-TTS (Text-to-Speech): Generates audio from text for features like podcast generation.
Demonstration Highlights:
-
Document Ingestion: PDF Upload: A 180-page “2025 Formula 1 Technical Regulations” PDF is uploaded. Workflow Execution (n8n): The “Generate Notebook Details” workflow is triggered, which uses Ollama to create a title and description for the document, and assigns a random color. The “Upsert to Vector Store” workflow then splits the PDF into chunks (674 in this case) and generates embeddings using Ollama’s
nomic-embed-textmodel, storing them in Supabase’s vector store. This process took about 56 seconds, with 39 seconds for Ollama inference and ~8 seconds for embedding. Audio Upload: An MP3 file is uploaded and transcribed using the local Whisper-ASR container. Website Scraping: Multiple Formula 1 website URLs are provided, and their content is scraped and ingested. -
Chatting with Documents (RAG): Users can ask questions about the ingested documents. Workflow Execution (n8n): The “Chat” workflow is triggered: It fetches chat history. Uses Ollama to generate a search query. Queries the Supabase Vector Store for relevant document chunks. Injects these chunks into Ollama to generate a response, along with citations. Processes the citations to link back to specific sections of the source document in the UI. The system successfully answers questions like “What are the minimum mass requirements for Formula One cars in 2025?” and “What is the plank assembly?”, providing accurate answers with clickable citations. The presenter notes that smaller local LLMs (8B parameters) require more intricate prompt engineering and may struggle with complex “overview” or “summary” questions that require access to more chunks than typically returned by a vector store due to VRAM limitations.
-
Podcast Generation: The “Deep Dive Conversation” feature allows generating an audio podcast from the notebook content using Coqui-TTS. The quality is noted as robotic but functional for a local model. The “Podcast Generation” workflow is shown, which truncates sources, uses Ollama to generate a transcript, and then uses Coqui-TTS to generate the audio.
Setup Guide: The video provides a step-by-step guide to setting up the local environment:
- Prerequisites: Python, Git/GitHub Desktop, Docker/Docker Desktop.
- Clone
**local-ai-packaged**: Clone Cole Medin’s GitHub repository. - Clone
**insights-lm-local-package**: Clone Daniel Walsh’s specific InsightsLM local package into thelocal-ai-packageddirectory. - Configure Environment Variables: Make a copy of
.env.example(inlocal-ai-packaged) and rename it to.env. Update required secret keys for n8n, Supabase, and optionally Neo4j/Langfuse. Supabase API keys (JWT Secret, Anon, Service Role) need to be auto-generated from the local Supabase dashboard. Copy contents frominsights-lm-local-package/.env.copyinto the main.envfile for specific InsightsLM webhook URLs and Whisper/Coqui-TTS settings. - Modify Docker Compose Files: In
local-ai-packaged/docker-compose.yml: Add awhisper_cachevolume. Underservices, add definitions forinsights-lm,coqui-tts, andwhisper-asrfrominsights-lm-local-package/docker-compose.copy.yml. Ensure GPU capabilities are set correctly for your hardware (e.g.,gpu-nvidia). Update the Ollama model in theollama-pull-llamacommand toqwen3:8b-q4_K_M. Insupabase/docker/docker-compose.yml: Move InsightsLM functions frominsights-lm-local-package/supabase-functionstosupabase/docker/volumes/deno/functions. Update thesupabase-edge-functionsservice insupabase/docker/docker-compose.ymlto include specific InsightsLM environment variables (likeNOTEBOOK_CHAT_URL,AUDIO_GENERATION_WEBHOOK_URL, etc.) from your main.envfile. - Start Services: Run
python start_services.py --profile gpu-nvidia(or relevant profile) from thelocal-ai-packageddirectory. This will download Docker images and spin up all containers. - Run Supabase Migrations: Access the local Supabase Studio (
localhost:8000), log in (default usernamesupabase, password from.env). Go to the SQL Editor and paste/run thesupabase-migration.sqlscript (frominsights-lm-local-package/supabase-migration.sql) to create the necessary tables and policies for InsightsLM. - Configure n8n Credentials: Access local n8n (
localhost:5678). Create a new n8n API key (Settings → n8n API). Copy its key. Create a Header Auth credential (Credentials → Add new credential → Header Auth). Name itHeader Authand paste the n8n API key as the value. Create a Supabase API credential (Credentials → Add new credential → Supabase API). Host ishttp://kong:8000, Service Role Secret is from.env. Create an Ollama credential (Credentials → Add new credential → Ollama). Base URL ishttp://ollama:11434. - Import n8n Workflows: In n8n, go to Personal → Create Workflow → Import from File. Select
Local_Import_Insights_LM_Workflows.json(frominsights-lm-local-package/n8n). In the “Enter User Values” node, input the IDs of your newly created Supabase, Header Auth, and Ollama credentials. Execute this workflow. It will download, modify, and import all six InsightsLM workflows (Chat, Podcast Generation, Process Additional Sources, Upsert to Vector Store, Generate Notebook Details, Extract Text) into your n8n instance. - Activate Workflows: Activate all imported InsightsLM workflows in n8n (except “Extract Text” and “Local_Import_Insights_LM_Workflows”).
- Create User in Supabase: Go to Supabase Studio → Authentication → Users → Add User. Create a new user with an email and password.
- Access InsightsLM: Go to
localhost:3010and log in with the created user credentials.
The system is now fully set up and ready for use.