PostgreSQL Performance Optimization: From Basics to Advanced
Master PostgreSQL performance tuning with indexing strategies, query optimization, connection pooling, and monitoring techniques for production databases.

Introduction
PostgreSQL is a powerful database, but without proper optimization, even simple queries can become bottlenecks. Whether you're dealing with slow queries, high CPU usage, or connection issues, understanding PostgreSQL's internals and optimization techniques is crucial. This comprehensive guide covers everything from basic indexing to advanced query optimization, helping you build fast, scalable database architectures that can handle millions of records efficiently.
Understanding EXPLAIN and Query Plans
Before optimizing, you need to understand what's slow. EXPLAIN and EXPLAIN ANALYZE show you exactly how PostgreSQL executes your queries. The query planner chooses the most efficient execution path, but you need to read these plans to identify bottlenecks. Look for sequential scans on large tables, expensive sorts, and nested loops on large datasets—these are often optimization opportunities.
-- Basic EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
-- EXPLAIN ANALYZE (actually runs the query)
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5;
-- Look for these red flags:
-- ❌ Seq Scan on large tables
-- ❌ High cost values
-- ❌ Nested Loop on large datasets
-- ✅ Index Scan or Index Only Scan
-- ✅ Hash Join or Merge JoinIndexing Strategies
Indexes are your first line of defense against slow queries. B-tree indexes (default) work for most cases, but specialized indexes can dramatically improve specific query types. Use partial indexes for frequently filtered subsets, composite indexes for multi-column queries, and covering indexes to enable index-only scans. Remember: indexes speed up reads but slow down writes, so index strategically based on your query patterns.
-- Basic B-tree index (most common)
CREATE INDEX idx_users_email ON users(email);
-- Composite index (order matters!)
CREATE INDEX idx_orders_user_created
ON orders(user_id, created_at DESC);
-- Partial index (smaller, faster)
CREATE INDEX idx_active_users
ON users(email)
WHERE active = true;
-- Covering index (index-only scan)
CREATE INDEX idx_users_email_name
ON users(email) INCLUDE (name);
-- GIN index for full-text search
CREATE INDEX idx_posts_content
ON posts USING GIN(to_tsvector('english', content));
-- BRIN index for sequential data
CREATE INDEX idx_logs_created
ON logs USING BRIN(created_at);Query Optimization Techniques
Writing efficient queries is an art. Avoid SELECT *, use explicit column lists to enable covering indexes and reduce data transfer. Replace subqueries with JOINs when possible, as they're usually faster. Use EXISTS instead of IN for large datasets, and leverage CTEs (Common Table Expressions) for complex queries to improve readability and sometimes performance. Always filter early in your query to reduce the working dataset.
-- ❌ Bad: Subquery in SELECT
SELECT
u.name,
(SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
FROM users u;
-- ✅ Good: Use JOIN
SELECT
u.name,
COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
-- ❌ Bad: Using IN with subquery
SELECT * FROM products
WHERE category_id IN (
SELECT id FROM categories WHERE active = true
);
-- ✅ Good: Use EXISTS or JOIN
SELECT p.* FROM products p
WHERE EXISTS (
SELECT 1 FROM categories c
WHERE c.id = p.category_id AND c.active = true
);
-- Use LIMIT with pagination
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 20 OFFSET 0; -- First pageConnection Pooling
Database connections are expensive to create. Connection pooling reuses connections, dramatically improving performance under load. In Node.js, use pg-pool or Prisma's built-in pooling. Configure pool size based on your database's max connections and application's concurrency needs. A common formula: pool size = (core_count * 2) + effective_spindle_count. Monitor connection usage and adjust as needed.
import { Pool } from 'pg';
const pool = new Pool({
host: process.env.DB_HOST,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
// Pool configuration
max: 20, // Maximum pool size
min: 5, // Minimum pool size
idleTimeoutMillis: 30000, // Close idle connections
connectionTimeoutMillis: 2000, // Connection timeout
});
// Using the pool
const getUser = async (id: string) => {
const client = await pool.connect();
try {
const result = await client.query(
'SELECT * FROM users WHERE id = $1',
[id]
);
return result.rows[0];
} finally {
client.release(); // Always release!
}
};Vacuuming and Maintenance
PostgreSQL uses MVCC (Multi-Version Concurrency Control), which creates dead tuples over time. VACUUM reclaims this space and updates statistics for the query planner. Autovacuum runs automatically, but for high-write tables, you may need manual vacuuming. ANALYZE updates statistics without cleaning up, helping the planner choose better query plans. Run VACUUM ANALYZE regularly, especially after large data changes.
-- Vacuum and analyze specific table
VACUUM ANALYZE users;
-- Full vacuum (locks table, use carefully)
VACUUM FULL users;
-- Check table bloat
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
-- Enable autovacuum (in postgresql.conf)
autovacuum = on
autovacuum_naptime = 1minMonitoring and Metrics
You can't optimize what you don't measure. Monitor key metrics: slow queries (log queries over 100ms), cache hit ratio (should be >99%), connection count, and lock waits. Use pg_stat_statements extension to identify problematic queries. Set up alerts for connection pool exhaustion, high replication lag, and disk space. Tools like pgAdmin, DataDog, or Prometheus can help visualize these metrics.
-- Enable pg_stat_statements
CREATE EXTENSION pg_stat_statements;
-- Find slowest queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Check cache hit ratio
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM pg_statio_user_tables;
-- Active connections
SELECT count(*) FROM pg_stat_activity
WHERE state = 'active';Partitioning for Scale
When tables grow to millions of rows, partitioning can significantly improve performance. Partition by range (dates), list (categories), or hash (distributed load). Queries that filter on the partition key only scan relevant partitions, dramatically reducing I/O. Common use case: partitioning logs or time-series data by month. Partitioning also makes archiving old data easier.
Conclusion
PostgreSQL optimization is an ongoing process. Start with proper indexing and query optimization, implement connection pooling, and establish monitoring to identify issues early. Use EXPLAIN to understand query execution, vacuum regularly to maintain performance, and consider partitioning for very large tables. Remember that premature optimization is still a concern—measure first, then optimize based on real bottlenecks. With these techniques, your PostgreSQL database can efficiently handle millions of records while maintaining sub-second query times.