What is pgvector?
pgvector is an open-source PostgreSQL extension that adds vector similarity search capabilities to PostgreSQL databases, enabling storage and querying of embeddings alongside traditional relational data. It provides native vector data types, indexing methods for approximate nearest neighbor search, and distance functions for similarity comparison, all within the familiar PostgreSQL environment. This allows developers to add semantic search to existing applications without introducing separate vector database infrastructure.
The extension supports various distance metrics including cosine distance, L2 (Euclidean) distance, and inner product. It offers multiple indexing options: IVFFlat for partitioned approximate search and HNSW for higher-quality approximate nearest neighbor retrieval. pgvector integrates seamlessly with PostgreSQL's features like transactions, joins, and filtering, enabling queries that combine vector similarity with traditional SQL conditions and relational operations.
pgvector has become popular for applications that need both vector search and traditional database capabilities, particularly for teams already using PostgreSQL who want to add RAG or semantic search features without operating multiple databases. While dedicated vector databases may offer better performance at massive scale, pgvector provides a compelling solution for small to medium datasets where operational simplicity and integration with existing data models are priorities. It's well-supported by frameworks like LangChain and Supabase offers managed PostgreSQL with pgvector built-in.