SQL for RAG (pgvector + Full‑Text + Structured Filters)¶
RAG isn’t always “embed everything”.
If your data is structured (tables, metrics, events), SQL is often the best retrieval layer, and the LLM’s job is to: - interpret the user intent, - summarize results, - and produce a helpful answer with citations.
This page covers 3 practical patterns: 1. SQL-first (structured truth) 2. Hybrid retrieval (full-text + pgvector) in Postgres 3. Guardrailed text-to-SQL
Pattern 1: SQL-first (structured truth)¶
Use SQL-first when: - the user asks for counts/sums/filters - correctness matters (finance, analytics) - you have a clean schema that answers the question directly
Example flow¶
1) Run a SQL query that returns a small, bounded result set
2) Pass the results to the LLM for summarization
3) Cite the rows/ids that supported the answer
Example query (tickets):
SELECT
status,
count(*) AS count
FROM tickets
WHERE created_at >= now() - interval '30 days'
GROUP BY status
ORDER BY count DESC
LIMIT 50;
Then give the result rows to the LLM:
{
"query": "…",
"rows": [
{"status": "open", "count": 128},
{"status": "closed", "count": 92}
]
}
Pattern 2: Hybrid retrieval in Postgres (FTS + pgvector)¶
Dense vectors are great for fuzzy semantic matches. Full-text search (FTS) is great for: - IDs, codes, names - exact keywords - short queries
With Postgres, you can combine both.
Assumptions (from the pgvector schema):
- chunks.embedding is a vector(...)
- chunks.tsv is a tsvector generated from chunks.content
Dense search (pgvector)¶
SELECT id, content
FROM chunks
ORDER BY embedding <=> $1::vector
LIMIT 50;
Sparse search (full-text)¶
SELECT id, content
FROM chunks
WHERE tsv @@ websearch_to_tsquery('english', $1)
ORDER BY ts_rank_cd(tsv, websearch_to_tsquery('english', $1)) DESC
LIMIT 50;
Combine results with Reciprocal Rank Fusion (RRF)¶
RRF is a simple, strong default for hybrid retrieval: it merges ranked lists without requiring model training.
WITH
fts_ranked AS (
SELECT
id,
row_number() OVER (
ORDER BY ts_rank_cd(tsv, websearch_to_tsquery('english', $1)) DESC
) AS rnk
FROM chunks
WHERE tsv @@ websearch_to_tsquery('english', $1)
LIMIT 100
),
dense_ranked AS (
SELECT
id,
row_number() OVER (ORDER BY embedding <=> $2::vector) AS rnk
FROM chunks
LIMIT 100
),
rrf AS (
SELECT
id,
sum(1.0 / (60 + rnk)) AS score
FROM (
SELECT id, rnk FROM fts_ranked
UNION ALL
SELECT id, rnk FROM dense_ranked
) u
GROUP BY id
)
SELECT chunks.id, chunks.content, rrf.score
FROM rrf
JOIN chunks USING (id)
ORDER BY rrf.score DESC
LIMIT 10;
Notes:
- The constant 60 is a typical starting point; tune it.
- Always inspect retrieved chunks first when debugging RAG quality.
Pattern 3: Guardrailed text-to-SQL¶
Text-to-SQL is powerful and dangerous if you don’t add guardrails.
Hard rules (non-negotiable)¶
- Use a read-only database role.
- Enforce a statement timeout and row limit.
- Only allow queries against an allowlisted schema.
- Disallow DDL/DML (
CREATE,DROP,INSERT,UPDATE,DELETE,ALTER). - Log queries and results for auditing.
A safe response contract¶
Make the model return structured output (example JSON):
{
"sql": "SELECT ... WHERE created_at >= $1 LIMIT 100",
"params": ["2026-01-01"],
"explanation": "…"
}
Then your application: 1) validates the SQL (read-only + allowlist) 2) executes it with parameterization 3) passes results to the LLM for explanation
Prompt snippet (for the SQL generator)¶
You write SQL for Postgres. Follow these rules:
- Read-only queries only. Never output INSERT/UPDATE/DELETE/CREATE/DROP/ALTER.
- Use only these tables: public.tickets, public.customers
- Always include a LIMIT <= 200.
- Return JSON with keys: sql, params, explanation.
Next steps¶
- If your data is tabular, decide whether to embed it: Parsing CSVs
- Improve retrieval quality systematically: Evaluating RAG