Skip to content

SQL for RAG (pgvector + Full‑Text + Structured Filters)

RAG isn’t always “embed everything”.

If your data is structured (tables, metrics, events), SQL is often the best retrieval layer, and the LLM’s job is to: - interpret the user intent, - summarize results, - and produce a helpful answer with citations.

This page covers 3 practical patterns: 1. SQL-first (structured truth) 2. Hybrid retrieval (full-text + pgvector) in Postgres 3. Guardrailed text-to-SQL


Pattern 1: SQL-first (structured truth)

Use SQL-first when: - the user asks for counts/sums/filters - correctness matters (finance, analytics) - you have a clean schema that answers the question directly

Example flow

1) Run a SQL query that returns a small, bounded result set
2) Pass the results to the LLM for summarization
3) Cite the rows/ids that supported the answer

Example query (tickets):

SELECT
  status,
  count(*) AS count
FROM tickets
WHERE created_at >= now() - interval '30 days'
GROUP BY status
ORDER BY count DESC
LIMIT 50;

Then give the result rows to the LLM:

{
  "query": "…",
  "rows": [
    {"status": "open", "count": 128},
    {"status": "closed", "count": 92}
  ]
}

Pattern 2: Hybrid retrieval in Postgres (FTS + pgvector)

Dense vectors are great for fuzzy semantic matches. Full-text search (FTS) is great for: - IDs, codes, names - exact keywords - short queries

With Postgres, you can combine both.

Assumptions (from the pgvector schema): - chunks.embedding is a vector(...) - chunks.tsv is a tsvector generated from chunks.content

Dense search (pgvector)

SELECT id, content
FROM chunks
ORDER BY embedding <=> $1::vector
LIMIT 50;

Sparse search (full-text)

SELECT id, content
FROM chunks
WHERE tsv @@ websearch_to_tsquery('english', $1)
ORDER BY ts_rank_cd(tsv, websearch_to_tsquery('english', $1)) DESC
LIMIT 50;

Combine results with Reciprocal Rank Fusion (RRF)

RRF is a simple, strong default for hybrid retrieval: it merges ranked lists without requiring model training.

WITH
fts_ranked AS (
  SELECT
    id,
    row_number() OVER (
      ORDER BY ts_rank_cd(tsv, websearch_to_tsquery('english', $1)) DESC
    ) AS rnk
  FROM chunks
  WHERE tsv @@ websearch_to_tsquery('english', $1)
  LIMIT 100
),
dense_ranked AS (
  SELECT
    id,
    row_number() OVER (ORDER BY embedding <=> $2::vector) AS rnk
  FROM chunks
  LIMIT 100
),
rrf AS (
  SELECT
    id,
    sum(1.0 / (60 + rnk)) AS score
  FROM (
    SELECT id, rnk FROM fts_ranked
    UNION ALL
    SELECT id, rnk FROM dense_ranked
  ) u
  GROUP BY id
)
SELECT chunks.id, chunks.content, rrf.score
FROM rrf
JOIN chunks USING (id)
ORDER BY rrf.score DESC
LIMIT 10;

Notes: - The constant 60 is a typical starting point; tune it. - Always inspect retrieved chunks first when debugging RAG quality.


Pattern 3: Guardrailed text-to-SQL

Text-to-SQL is powerful and dangerous if you don’t add guardrails.

Hard rules (non-negotiable)

  • Use a read-only database role.
  • Enforce a statement timeout and row limit.
  • Only allow queries against an allowlisted schema.
  • Disallow DDL/DML (CREATE, DROP, INSERT, UPDATE, DELETE, ALTER).
  • Log queries and results for auditing.

A safe response contract

Make the model return structured output (example JSON):

{
  "sql": "SELECT ... WHERE created_at >= $1 LIMIT 100",
  "params": ["2026-01-01"],
  "explanation": "…"
}

Then your application: 1) validates the SQL (read-only + allowlist) 2) executes it with parameterization 3) passes results to the LLM for explanation

Prompt snippet (for the SQL generator)

You write SQL for Postgres. Follow these rules:
- Read-only queries only. Never output INSERT/UPDATE/DELETE/CREATE/DROP/ALTER.
- Use only these tables: public.tickets, public.customers
- Always include a LIMIT <= 200.
- Return JSON with keys: sql, params, explanation.

Next steps