Vector Databases

Specialized databases optimized for storing, indexing, and retrieving high-dimensional vector embeddings. Enable efficient similarity search (e.g., nearest neighbor queries) for applications like semantic search, recommendation systems, and LLM-powered retrieval.

Core Functionality

  • Approximate Nearest Neighbor (ANN) Search: Uses algorithms like HNSW, IVF, or FAISS for scalable similarity matching.
  • Embedding Support: Stores vectors generated from text, images, or other modalities via models like BERT, CLIP, or Sentence Transformers.
  • Scalability: Handles millions/billions of vectors with low-latency queries.

Key Limitations

  • Model Constraint: Requires the same embedding model for both vector generation and retrieval (e.g., BERT embeddings must be searched with BERT).
  • Semantic Rigidity: Struggles with complex relationships beyond vector similarity (e.g., hierarchical or causal links).

GraphRAG: Flexible Alternative

  • GraphRAG leverages knowledge graphs and LLMs to query structured data, eliminating the model constraint of vector search.
  • Flexibility: Uses different models for graph construction (e.g., GNNs) and retrieval (e.g., LLMs), enabling richer context.
  • Advantage: Better handles complex queries involving relationships (e.g., “Show me products similar to X that are also used with Y”) compared to pure vector similarity.

Integration Ecosystem

References

  • 2026 04 14 IBM Explainer creating GraphRAG

Source Notes