Semantic Search with PostgreSQL and pgvector: An End-to-End Python Tutorial

Spin up Postgres with pgvector, generate Cohere or OpenAI embeddings, build an HNSW index, run k-NN queries, and layer metadata filters — production-shaped.

20 May 2026 Updated 20 May 2026 ~16 min read

pgvector GitHub repository README showing the open-source PostgreSQL extension for vector similarity search that this tutorial walks through end-to-end

Image: pgvector GitHub repository, used for editorial coverage of the extension taught in this tutorial.

What you’ll build

By the end of this tutorial you will have a working semantic-search service backed by PostgreSQL. You will ingest a small Wikipedia corpus, generate embeddings with either Cohere’s embed-v4.0 or OpenAI’s text-embedding-3-small, store them in a vector column with an HNSW index, and run k-nearest-neighbour queries with metadata filters. The final step benchmarks the indexed query against a naive linear-scan cosine to show what the index actually buys you.

The Postgres-native angle matters. If your application already runs on Postgres, pgvector lets you keep transactional data and vector embeddings in the same database, with the same connection pool, the same backup strategy, and the same SQL surface. You get joins, filters, and WHERE clauses for free, which is the part that gets awkward when vectors live in a separate dedicated store.

This tutorial is end-to-end runnable. Every code block is a copy-paste step; the final script is around 120 lines of Python.

What you’ll need

Python 3.10 or later, with a fresh virtual environment.
Either Docker (for local Postgres) or a free Supabase project (managed Postgres with pgvector preinstalled)¹.
An API key from Cohere or OpenAI. The Cohere path uses embed-v4.0; the OpenAI path uses text-embedding-3-small. Both work; the differences are flagged inline.
Comfort with SQL basics (CREATE TABLE, INSERT, SELECT).

Embedding cost for the tutorial corpus (under 50,000 tokens total) is well under one cent on either provider. Cohere’s embed-v4.0 is listed at $0.12 per million input tokens on Cohere’s pricing page². OpenAI’s text-embedding-3-small is listed at $0.02 per million input tokens on the OpenAI API pricing page³. Verify prices on the vendor pages before running at scale; rates change.

Budget about 45 minutes start to finish.

Time required

Around 45 minutes: 10 to spin up Postgres and install dependencies, 10 to wire the embedding client, 15 to ingest documents and build the index, 10 to run queries and benchmark.

Steps

1. Spin up Postgres with pgvector

Two paths. Pick one.

Path A — Local Docker. The pgvector/pgvector image bundles the extension with a recent Postgres release, so you do not need to compile anything⁴.

docker run -d \
    --name pgvector-tutorial \
    -e POSTGRES_PASSWORD=changeme \
    -p 5432:5432 \
    pgvector/pgvector:pg16

Wait a few seconds for the container to start, then connect with psql (or any Postgres client) and enable the extension:

docker exec -it pgvector-tutorial \
    psql -U postgres -c "CREATE EXTENSION IF NOT EXISTS vector;"

You should see CREATE EXTENSION. If you see “extension already exists”, that’s fine too — Supabase and a few cloud Postgres providers preinstall it.

Path B — Supabase free tier. Create a project at supabase.com, open the SQL editor, and run CREATE EXTENSION IF NOT EXISTS vector; once. Supabase’s pgvector guide walks through the dashboard’s “Extensions” toggle as an alternative¹. Copy the connection string from Project Settings → Database; you’ll paste it into Python in Step 3.

Verify the extension is loaded:

SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';

You should see a row with vector and the version string (0.8.x or later as of May 2026).

pgvector GitHub repository README showing the extension's CREATE EXTENSION syntax and vector type primitives

Image: pgvector GitHub repository, used for editorial coverage of the extension’s vector-type primitives.

2. Create the project and install Python dependencies

Create a project folder, a virtual environment, and install the libraries:

mkdir pgvector-search && cd pgvector-search
python3 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install "psycopg[binary]>=3.2" "pgvector>=0.3" "cohere>=5" "openai>=1.40" numpy

psycopg is the modern Python adapter for PostgreSQL (the [binary] extra pulls in the prebuilt libpq wheel so you don’t need a system Postgres client)⁵. The pgvector Python package registers a type adapter so psycopg knows how to send and receive vector values without manual string serialisation.

3. Configure the embedding client and the database connection

Export your API key and database URL:

export DATABASE_URL="postgresql://postgres:changeme@localhost:5432/postgres"
# Or your Supabase connection string.

export COHERE_API_KEY="..."   # if using Cohere
export OPENAI_API_KEY="sk-..." # if using OpenAI

Create search.py and wire the embedding helper. Pick one provider; both functions return a Python list of floats, so the rest of the tutorial is provider-agnostic.

# search.py
import os
import numpy as np
import psycopg
from pgvector.psycopg import register_vector

DATABASE_URL = os.environ["DATABASE_URL"]

# --- Provider A: Cohere embed-v4.0 ---
import cohere

co = cohere.ClientV2(api_key=os.environ.get("COHERE_API_KEY", ""))

def embed_cohere(texts: list[str], input_type: str) -> list[list[float]]:
    """input_type is 'search_document' for ingest, 'search_query' for queries."""
    resp = co.embed(
        texts=texts,
        model="embed-v4.0",
        input_type=input_type,
        embedding_types=["float"],
    )
    return resp.embeddings.float_

# --- Provider B: OpenAI text-embedding-3-small ---
from openai import OpenAI

oai = OpenAI()

def embed_openai(texts: list[str]) -> list[list[float]]:
    resp = oai.embeddings.create(model="text-embedding-3-small", input=texts)
    return [item.embedding for item in resp.data]

Two things to flag. First, Cohere’s API distinguishes search_document (for the docs you store) from search_query (for the query you compare against) — the model is trained on the asymmetry, and skipping it costs you measurable recall⁶. OpenAI does not require this distinction. Second, both providers return unit-normalised vectors by default, which means cosine similarity and inner-product distance produce the same ranking. Useful in Step 5 when we choose an index operator.

4. Create the table and ingest sample documents

text-embedding-3-small outputs 1536-dimensional vectors by default; Cohere’s embed-v4.0 outputs 1536-dimensional float vectors when you request the default float embedding type⁷. We’ll use 1536 for both.

DIM = 1536  # matches both text-embedding-3-small and embed-v4.0 defaults

DDL = f"""
CREATE TABLE IF NOT EXISTS documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    chunk TEXT NOT NULL,
    source TEXT NOT NULL,
    embedding vector({DIM}) NOT NULL
);
"""

with psycopg.connect(DATABASE_URL) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        cur.execute(DDL)
    conn.commit()

The register_vector(conn) call teaches psycopg how to bind Python lists / NumPy arrays to the vector type. Without it you’d have to format vectors as '[0.1,0.2,...]' strings by hand.

Now ingest a small corpus. The snippet below uses six short Wikipedia-style chunks for demonstration; in your own project, you’d chunk longer articles to roughly 200-500 token segments and ingest in batches of 96 (the Cohere batch ceiling) or 2048 (the OpenAI batch ceiling).

SAMPLE = [
    ("Postgres",        "PostgreSQL is an open-source object-relational database with strong ACID guarantees.", "wiki/PostgreSQL"),
    ("pgvector",        "pgvector is a PostgreSQL extension for vector similarity search, supporting L2, inner product, and cosine distance.", "wiki/pgvector"),
    ("HNSW",            "Hierarchical Navigable Small World is a graph-based approximate nearest neighbour algorithm.", "wiki/HNSW"),
    ("Embeddings",      "An embedding is a dense vector that places semantically similar items near each other in vector space.", "wiki/Embeddings"),
    ("Cosine similarity", "Cosine similarity measures the cosine of the angle between two vectors, common for normalised embeddings.", "wiki/Cosine_similarity"),
    ("RAG",             "Retrieval-augmented generation combines a retriever and a generator, grounding LLM outputs in retrieved context.", "wiki/RAG"),
]

def ingest(rows):
    texts = [chunk for _, chunk, _ in rows]
    vectors = embed_cohere(texts, input_type="search_document")
    # Or: vectors = embed_openai(texts)

    with psycopg.connect(DATABASE_URL) as conn:
        register_vector(conn)
        with conn.cursor() as cur:
            cur.executemany(
                "INSERT INTO documents (title, chunk, source, embedding) VALUES (%s, %s, %s, %s)",
                [(title, chunk, source, np.array(vec))
                 for (title, chunk, source), vec in zip(rows, vectors)],
            )
        conn.commit()

ingest(SAMPLE)

Two batching notes for real corpora. First, batch your embedding calls — both providers charge per token, not per call, but round-trip latency dominates for one-at-a-time loops. Second, wrap inserts in a single transaction; per-row autocommit is what makes naive ingest scripts crawl.

Supabase pgvector documentation showing the managed Postgres connection string and Extensions toggle

Image: Supabase pgvector guide, used for editorial coverage of the managed-Postgres path referenced in Step 1.

5. Build an HNSW index

A linear scan over six rows is instant; over a million, it is not. pgvector ships two index types: IVFFlat (older, requires ANALYZE-style training on existing data) and HNSW (graph-based, builds incrementally, generally better recall-vs-latency trade-off). The pgvector README recommends HNSW for most workloads⁸.

Cosine distance in pgvector is the <=> operator. For normalised embeddings (which both Cohere and OpenAI return by default), inner-product distance <#> gives identical ranking and is fractionally faster, but cosine is the safer default if you ever mix in unnormalised vectors. Pick one operator at index-creation time:

CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

In Python:

INDEX_DDL = """
CREATE INDEX IF NOT EXISTS documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
"""

with psycopg.connect(DATABASE_URL) as conn:
    with conn.cursor() as cur:
        cur.execute(INDEX_DDL)
    conn.commit()

The two knobs matter:

m is the number of bidirectional links per node in the HNSW graph. Higher m means higher recall and a larger index. The pgvector default is m = 16⁸.
ef_construction is how aggressively the build searches when inserting each node. Higher values mean a slower build but a higher-quality graph. Default is ef_construction = 64⁸.

At query time, the corresponding runtime knob is hnsw.ef_search, which trades recall against latency per query rather than per index build:

SET hnsw.ef_search = 100;  -- default is 40

Conceptually, cosine similarity between vectors $\mathbf{a}$ and $\mathbf{b}$ is

$\text{cos}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{\lVert \mathbf{a} \rVert \, \lVert \mathbf{b} \rVert}$

pgvector’s <=> operator returns distance, not similarity — specifically 1 - cosine_similarity. Smaller is closer. ORDER BY embedding <=> query_vector ASC returns the nearest neighbours.

6. Run a k-NN query

def search(query: str, k: int = 3):
    qvec = embed_cohere([query], input_type="search_query")[0]
    # Or: qvec = embed_openai([query])[0]

    with psycopg.connect(DATABASE_URL) as conn:
        register_vector(conn)
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT title, chunk, source, embedding <=> %s AS distance
                FROM documents
                ORDER BY embedding <=> %s
                LIMIT %s
                """,
                (np.array(qvec), np.array(qvec), k),
            )
            return cur.fetchall()

for title, chunk, source, distance in search("how do graph-based ANN indexes work?"):
    print(f"{distance:.4f}  {title}  ({source})")

You should see HNSW come back first, with pgvector and Embeddings close behind. The exact distances depend on the provider; ranking should be stable.

A note on the query: the embedding for the query uses input_type="search_query" if you’re on Cohere. Using search_document for the query is a common silent bug — the results still come back ranked, just at lower recall than the model is capable of.

7. Add metadata filtering

The Postgres-native part starts paying off here. Filter by source, title, recency, or any other column with a plain WHERE clause. pgvector’s HNSW index supports filtered queries — Postgres can apply the filter pre- or post-index-scan depending on selectivity.

def search_filtered(query: str, source_prefix: str, k: int = 3):
    qvec = embed_cohere([query], input_type="search_query")[0]

    with psycopg.connect(DATABASE_URL) as conn:
        register_vector(conn)
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT title, chunk, source, embedding <=> %s AS distance
                FROM documents
                WHERE source LIKE %s
                ORDER BY embedding <=> %s
                LIMIT %s
                """,
                (np.array(qvec), f"{source_prefix}%", np.array(qvec), k),
            )
            return cur.fetchall()

for row in search_filtered("vector similarity", "wiki/", k=3):
    print(row)

For highly selective filters (a WHERE that eliminates 99% of rows), pre-filter with a B-tree index on the metadata column and use a smaller LIMIT on the vector scan. For weak filters, post-filter — let the HNSW index return more candidates and prune in SQL. The query planner handles this automatically; you can inspect with EXPLAIN ANALYZE.

Cohere embeddings API documentation showing the embed-v4.0 endpoint, input_type parameter, and supported embedding types

Image: Cohere embeddings API documentation, used for editorial coverage of the embed-v4.0 endpoint and input_type parameter.

8. Benchmark vs naive cosine

The point of an index is throughput on big tables. On six rows, the index is overhead, not a speedup. To see the trade-off honestly, scale the corpus up — duplicate the sample 10,000 times with small perturbations, or ingest a larger Wikipedia dump — and compare an indexed query against an index-skipping scan.

import time

def bench(query: str, k: int = 10, use_index: bool = True):
    qvec = embed_cohere([query], input_type="search_query")[0]
    with psycopg.connect(DATABASE_URL) as conn:
        register_vector(conn)
        with conn.cursor() as cur:
            if not use_index:
                cur.execute("SET LOCAL enable_indexscan = off;")
                cur.execute("SET LOCAL enable_bitmapscan = off;")
            t0 = time.perf_counter()
            cur.execute(
                "SELECT id FROM documents ORDER BY embedding <=> %s LIMIT %s",
                (np.array(qvec), k),
            )
            cur.fetchall()
            return (time.perf_counter() - t0) * 1000  # ms

indexed_ms = bench("graph-based nearest neighbour", use_index=True)
naive_ms   = bench("graph-based nearest neighbour", use_index=False)
print(f"indexed: {indexed_ms:.1f} ms   naive: {naive_ms:.1f} ms")

On a corpus of around 100,000 rows the HNSW index typically returns in single-digit milliseconds while the naive scan runs in hundreds of milliseconds — the ratio grows linearly with corpus size on the naive side and roughly logarithmically on the indexed side. Run the benchmark on your own data; the absolute numbers depend on hardware, Postgres tuning, and how cold the buffer cache is.

If you want to measure recall (not just latency), run the same query both ways with the index off and on, collect the top-10 IDs from each, and compute the intersection size. HNSW is approximate; at default ef_search = 40 recall against an exact scan is typically in the 0.95-0.99 range and rises toward 1.0 as ef_search grows.

OpenAI embeddings guide showing the text-embedding-3-small model and the embeddings API call shape

Image: OpenAI embeddings guide, used for editorial coverage of the text-embedding-3-small model referenced throughout this tutorial.

Common pitfalls

A few patterns trip up first-time pgvector users.

Forgetting register_vector(conn) on a new connection. psycopg silently treats the column as a string, vectors arrive in Python as '[0.1, 0.2, ...]', and your distance math breaks in confusing ways. Register on every connection, ideally inside a connection-pool factory.
Using the wrong distance operator for your index. An index created with vector_cosine_ops accelerates <=> queries only; <-> (L2) and <#> (inner product) will fall back to a sequential scan. Match the operator to the index, or build a second index.
Skipping input_type on Cohere. Asymmetric query / document embedding is real; the published recall numbers on Cohere’s docs assume the correct input_type⁶.
Tiny LIMIT with a very selective filter and HNSW. The graph search may not find enough matching rows because the filter eliminates most candidates inside the index walk. Either widen the LIMIT, raise hnsw.ef_search, or restructure as a pre-filter with a B-tree.
Comparing apples to oranges across providers. Cohere embed-v4.0 and OpenAI text-embedding-3-small both return 1536-d vectors at their defaults, but the vector spaces are entirely different. Re-embed the whole corpus if you switch providers; you cannot mix vectors from two models in one index.

When to use a different approach

pgvector is the right pick when Postgres is already in your stack, your corpus fits comfortably on one node (low millions to tens of millions of vectors at 1536-d), and you value joins / transactions / metadata filters living next to vectors. The cited Supabase guide, the pgvector README’s tuning notes, and the wider Postgres ecosystem all line up behind this use-case¹⁹.

Reach for a dedicated vector database (Qdrant, Weaviate, Milvus, Pinecone) when you need hundreds of millions to billions of vectors on a single index, sharding across nodes with built-in replication, or vector-native features like multi-vector documents and learned-sparse retrieval. The cost is operational: another system to run, monitor, back up, and keep in sync with your transactional store.

The short heuristic: pgvector when “we have Postgres and want semantic search”; dedicated vector DB when “vector search is the workload”.

Where to go next

The pgvector README’s “Performance” section documents the full set of HNSW knobs (m, ef_construction, ef_search), maintenance commands, and parallel-build settings worth knowing before you scale⁹. Supabase’s guide adds managed-Postgres specifics — connection pooling via PgBouncer, row-level security on the documents table, and edge functions that can call into the same search query¹.

Two natural next steps. First, swap the toy corpus for real chunks: a Wikipedia category dump, your own documentation, or a public dataset, chunked at 200-500 tokens with 50-token overlap. Second, layer in a reranker — Cohere’s rerank-v3.5 endpoint takes the top-20 from your pgvector query and reorders them with a cross-encoder, which typically lifts top-3 precision meaningfully on real corpora.

How this article was made: an autonomous AI pipeline researched, drafted, fact-checked, and reviewed this piece, aggregating publicly-available information from the sources consulted below. AI (artificial intelligence) can make mistakes, so please cross-check the consulted sources before acting on anything here. Neural Tech Daily is not liable for decisions or outcomes based on this article.

Sources consulted

Cited Sources

1. Supabase pgvector guide — managed Postgres with the vector extension preinstalled, dashboard Extensions toggle, and connection-string instructions (accessed 2026-05-20) ↩
2. Cohere pricing page — embed-v4.0 listed at \$0.12 per million input tokens (verify before scaling; rates change) (accessed 2026-05-20) ↩
3. OpenAI API pricing page — text-embedding-3-small listed at \$0.02 per million input tokens (verify before scaling; rates change) (accessed 2026-05-20) ↩
4. pgvector GitHub repository README — official Docker image `pgvector/pgvector:pg16` bundles the extension with PostgreSQL 16 (accessed 2026-05-20) ↩
5. psycopg 3 documentation — `[binary]` extra bundles a prebuilt libpq wheel; the modern Python adapter for PostgreSQL (accessed 2026-05-20) ↩
6. Cohere embeddings API documentation — input_type parameter distinguishes `search_document` from `search_query` for asymmetric retrieval (accessed 2026-05-20) ↩
7. Cohere embeddings API documentation — embed-v4.0 default float embeddings at 1536 dimensions (accessed 2026-05-20) ↩
8. pgvector GitHub README, HNSW section — defaults `m = 16` and `ef_construction = 64`; runtime `hnsw.ef_search` default 40 (accessed 2026-05-20) ↩
9. pgvector GitHub repository — Performance and Tuning section covering index knobs, maintenance, and parallel build settings (accessed 2026-05-20) ↩