SQL for RAG (pgvector + Full‑Text + Structured Filters)¶

RAG isn’t always “embed everything”.

If your data is structured (tables, metrics, events), SQL is often the best retrieval layer, and the LLM’s job is to: - interpret the user intent, - summarize results, - and produce a helpful answer with citations.

This page covers 3 practical patterns: 1. SQL-first (structured truth) 2. Hybrid retrieval (full-text + pgvector) in Postgres 3. Guardrailed text-to-SQL

Pattern 1: SQL-first (structured truth)¶

Use SQL-first when: - the user asks for counts/sums/filters - correctness matters (finance, analytics) - you have a clean schema that answers the question directly

Example flow¶

1) Run a SQL query that returns a small, bounded result set
2) Pass the results to the LLM for summarization
3) Cite the rows/ids that supported the answer

Example query (tickets):

SELECT
  status,
  count(*) AS count
FROM tickets
WHERE created_at >= now() - interval '30 days'
GROUP BY status
ORDER BY count DESC
LIMIT 50;

Then give the result rows to the LLM:

{
  "query": "…",
  "rows": [
    {"status": "open", "count": 128},
    {"status": "closed", "count": 92}
  ]
}

Pattern 2: Hybrid retrieval in Postgres (FTS + pgvector)¶

Dense vectors are great for fuzzy semantic matches. Full-text search (FTS) is great for: - IDs, codes, names - exact keywords - short queries

With Postgres, you can combine both.

Assumptions (from the pgvector schema): - chunks.embedding is a vector(...) - chunks.tsv is a tsvector generated from chunks.content

Dense search (pgvector)¶

SELECT id, content
FROM chunks
ORDER BY embedding <=> $1::vector
LIMIT 50;

Sparse search (full-text)¶

SELECT id, content
FROM chunks
WHERE tsv @@ websearch_to_tsquery('english', $1)
ORDER BY ts_rank_cd(tsv, websearch_to_tsquery('english', $1)) DESC
LIMIT 50;

Combine results with Reciprocal Rank Fusion (RRF)¶

RRF is a simple, strong default for hybrid retrieval: it merges ranked lists without requiring model training.

WITH
fts_ranked AS (
  SELECT
    id,
    row_number() OVER (
      ORDER BY ts_rank_cd(tsv, websearch_to_tsquery('english', $1)) DESC
    ) AS rnk
  FROM chunks
  WHERE tsv @@ websearch_to_tsquery('english', $1)
  LIMIT 100
),
dense_ranked AS (
  SELECT
    id,
    row_number() OVER (ORDER BY embedding <=> $2::vector) AS rnk
  FROM chunks
  LIMIT 100
),
rrf AS (
  SELECT
    id,
    sum(1.0 / (60 + rnk)) AS score
  FROM (
    SELECT id, rnk FROM fts_ranked
    UNION ALL
    SELECT id, rnk FROM dense_ranked
  ) u
  GROUP BY id
)
SELECT chunks.id, chunks.content, rrf.score
FROM rrf
JOIN chunks USING (id)
ORDER BY rrf.score DESC
LIMIT 10;

Notes: - The constant 60 is a typical starting point; tune it. - Always inspect retrieved chunks first when debugging RAG quality.

Pattern 3: Guardrailed text-to-SQL¶

Text-to-SQL is powerful and dangerous if you don’t add guardrails.

Hard rules (non-negotiable)¶

Use a read-only database role.
Enforce a statement timeout and row limit.
Only allow queries against an allowlisted schema.
Disallow DDL/DML (CREATE, DROP, INSERT, UPDATE, DELETE, ALTER).
Log queries and results for auditing.

A safe response contract¶

Make the model return structured output (example JSON):

{
  "sql": "SELECT ... WHERE created_at >= $1 LIMIT 100",
  "params": ["2026-01-01"],
  "explanation": "…"
}

Then your application: 1) validates the SQL (read-only + allowlist) 2) executes it with parameterization 3) passes results to the LLM for explanation

Prompt snippet (for the SQL generator)¶

You write SQL for Postgres. Follow these rules:
- Read-only queries only. Never output INSERT/UPDATE/DELETE/CREATE/DROP/ALTER.
- Use only these tables: public.tickets, public.customers
- Always include a LIMIT <= 200.
- Return JSON with keys: sql, params, explanation.

Next steps¶

If your data is tabular, decide whether to embed it: Parsing CSVs
Improve retrieval quality systematically: Evaluating RAG