PostgreSQL — Queries & Performance

For baseline SQL syntax see 05 — Basic Query Commands. This file focuses on query diagnostics, indexing, and performance patterns.

EXPLAIN ANALYZE

Main tool for understanding why a query is fast or slow.

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE user_id = 42;

Reading the Output

Index Scan using idx_orders_user on orders  (cost=0.43..8.45 rows=5 width=64)
  (actual time=0.023..0.031 rows=5 loops=1)
  Index Cond: (user_id = 42)
  Buffers: shared hit=4
Planning Time: 0.102 ms
Execution Time: 0.051 ms

Key Metric	What to Look For
Seq Scan	Reads full table; often means index is missing
Index Scan	Uses index; ideal for selective filters
Bitmap Heap Scan	Index found many rows; fetches in bulk then rechecks
Rows Removed by Filter	Much bigger than returned rows = poor filtering path
shared hit	Data came from cache (fast)
shared read	Data came from disk (slower)
actual time	Real wall-clock time at each step

Red Flags

Problem	Symptom	Fix
Missing index	`Seq Scan` on large table	`CREATE INDEX` on filter columns
Wrong index	Index exists but not used	Check column order, run `ANALYZE`
Too many rows filtered	`Rows Removed by Filter` >> returned	Add or adjust index
Sorting on disk	`Sort Method: external merge`	Add index matching ORDER BY; tune `work_mem`
Hash join spill	`Batches: N` (N > 1)	Increase `work_mem` or reduce result set

Indexing Strategy

CREATE INDEX idx_orders_status ON orders (status);

CREATE INDEX idx_orders_user_status ON orders (user_id, status);

CREATE INDEX idx_orders_covering ON orders (user_id) INCLUDE (total, status);

CREATE INDEX idx_orders_pending ON orders (created_at) WHERE status = 'pending';

Strategy	When to Use
Single-column	One column in WHERE
Composite	Multi-column WHERE; most selective column first
Covering (`INCLUDE`)	Avoids table lookup for extra columns
Partial (`WHERE`)	Only a known subset of rows is queried

80/20 rule: Most slowdowns come from missing or incorrect indexes. Always verify with EXPLAIN ANALYZE.

Advanced Filtering Performance

WHERE metadata @> '{"plan": "pro"}'         -- JSONB containment (needs GIN index)
WHERE tags && ARRAY['python', 'rust']       -- Array overlap (needs GIN index)
WHERE to_tsvector('english', body) @@ to_tsquery('postgres & performance')

Pattern	Required Index Type
JSONB containment (`@>`)	`USING gin (col)`
Array overlap (`&&`)	`USING gin (col)`
Full-text search (`@@`)	`USING gin (to_tsvector(...))`
Geospatial	`USING gist (col)`

Pagination

SELECT * FROM users ORDER BY id LIMIT 20 OFFSET 100;

SELECT * FROM users WHERE id > 100 ORDER BY id LIMIT 20;

Method	Pros	Cons
Offset-based	Simple, any page jump	Slow on large offsets (scans skipped rows)
Keyset (cursor)	Constant speed, stable ordering	Cannot jump to arbitrary page

Prefer keyset pagination for large datasets.

Window Functions Performance

SELECT id, total,
       row_number() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM orders;

Window functions run after WHERE/GROUP BY, so filter early to reduce the working set.
Adding an index on (partition_col, order_col) helps avoid in-memory sorts.
row_number() is cheaper than rank() / dense_rank() when you only need position.

Common Anti-Patterns

Anti-Pattern	Why It Hurts	Better Approach
`SELECT *` in production	Transfers unnecessary data	List needed columns explicitly
`ORDER BY random()`	Full table scan + sort	Sample with `TABLESAMPLE` or app-side random IDs
`count(*)` on huge table	Scans all rows	Use `pg_stat_user_tables.n_live_tup` for estimates
Nested subqueries	Hard to optimize	Rewrite as JOINs or CTEs
`NOT IN (subquery)` with NULLs	Returns no rows if subquery has NULL	Use `NOT EXISTS` instead