Index Bloat in Postgres: Why It Matters, How to Identify, How to Resolve

Index Bloat in Postgres: Why It Matters, How to Identify, How to Resolve

By Kendra Little on β€’ 7 min read

Category: indexing , postgresql
Index Bloat in Postgres: Why It Matters, How to Identify, How to Resolve 7 min read
Index Bloat in Postgres: Why It Matters, How to Identify, How to Resolve

Index bloat in Postgres can cause problems, but it’s easy to miss.

I’ve written about how vacuum problems can prevent PostgreSQL from using covering indexes, and index bloat is one of the things that can make vacuum struggle.

Here’s what you need to know about index bloat, how to find it, and how to fix it.

Illustration representing index bloat in PostgreSQL

What Is Index Bloat?

“Index bloat” describes an index that takes up significantly more space on disk than it actually needs. This occurs because Postgres stores indexes in 8 KB fixed-size pages which can accumulate empty space over time.

When you delete or update rows, the old index entries become “dead tuples”β€”they’re no longer valid, but they remain on the page taking up space. Vacuum marks that space as available for reuse, but it doesn’t compact the pages themselves. You end up with pages that have a lot of empty space scattered throughout them.

πŸ”₯ VAC FACT: VACUUM FULL can remove bloat, but it requires heavy locks and rewrites the whole table, so this isn't a workable option for databases unless there is a regular long downtime period and you don't mind the IO.

How Index Bloat Happens

Index bloat accumulates through normal database operations:

  • Deletes: When you delete rows, the index entries pointing to those rows become dead tuples, but they stay in the index until vacuum runs
  • Updates: Postgres implements updates as delete-plus-insert under MVCC (Multi-Version Concurrency Control). The old index entry becomes dead, and Postgres creates a new one
  • Page splits: When a B-tree index page becomes full, PostgreSQL splits it into two pages.
  • Low fillfactor: You can reduce page splits by setting a lower fillfactor when creating indexes, which leaves more free space on each page. B-tree indexes default to 90% fillfactor. Indexes with lower fillfactor leave more empty space. This isn’t the same thing as bloat (it’s on purpose), but going too low can have similar impacts.
  • High update/delete activity: Tables with frequent updates or deletes accumulate bloat faster

To reiterate what’s mentioned above (it’s a common misunderstanding): autovacuum removes dead tuples from tables and marks space in indexes as available for reuse, but it doesn’t reclaim space in indexes.

Why Index Bloat Causes Problems

When you have significant index bloat, vacuum has to work harder. Vacuum needs to scan indexes to find and remove references to dead tuples. If your indexes are bloated, they contain more pages because the pages aren’t compactedβ€”an index that should be 10 GB might be 15 GB or more instead if bloat gets bad, and imagine how this expands for larger indexes. Vacuum has to scan through all the extra pages, which takes more time and consumes more CPU and IO resources. The more bloat, the more pages to scan, and the longer vacuum takes.

In systems with heavy write workloads, this can create a vicious cycle: vacuum runs slowly because of bloat, which means it can’t keep up with new dead tuples, which creates more bloat, etc.

I’ve seen cases where vacuum was taking hours on tables that should have been much quicker to vacuum, all because the indexes were severely bloated.

How to Identify Index Bloat

AWS provides a useful script for calculating index bloat percentage in RDS PostgreSQL instances: Rebuilding indexes. This script calculates bloat_pct for each index and helps you identify which indexes need attention.

The script compares the actual index size to the expected size based on the number of tuples, giving you a percentage of bloat. AWS recommends running REINDEX when the bloat percentage is greater than 20%, but let your conscience be your guide.

How to Fix Index Bloat

Postgres supports online index rebuilds using REINDEX CONCURRENTLY. Under concurrent mode, the index remains available for reads and writes during the rebuild operation. The PostgreSQL documentation explains that REINDEX CONCURRENTLY builds a new index alongside the existing one, performs multiple passes to capture changes made during the rebuild, then switches constraints and names to point to the new index before dropping the old one. This process takes longer than a standard rebuild but allows normal operations to continue throughout.

Here’s the syntax:

REINDEX INDEX CONCURRENTLY index_name;

Or for all indexes on a table:

REINDEX TABLE CONCURRENTLY table_name;

The CONCURRENTLY keyword makes it online. Without it, REINDEX takes an exclusive lock on the table, blocking all access.

A Gotcha: Failed Concurrent Rebuilds Leave Invalid Indexes

If a REINDEX CONCURRENTLY operation fails partway through, Postgres leaves behind an “invalid” index that you can’t use. According to the PostgreSQL documentation, when REINDEX CONCURRENTLY fails, Postgres leaves behind a temporary index with a name ending in _ccnew. This index won’t function for reads (but still has overhead for updates), so drop it before trying the rebuild again.

You can find invalid indexes by looking for indexes whose name ends in %_ccnew:

SELECT
    schemaname,
    indexname,
    indexdef
FROM pg_indexes
WHERE schemaname = 'public'
/* Concurrent rebuilds create temporary indexes named like this */
AND (
    indexname LIKE '%\_ccnew'  
    OR indexname NOT IN (
        SELECT indexrelname 
        FROM pg_stat_user_indexes
    )
);

Or query the pg_index, pg_class, and pg_namespace system catalogs to find indexes marked as invalid:

SELECT
    n.nspname AS schema_name,
    c.relname AS index_name,
    pg_size_pretty(pg_relation_size(c.oid)) AS index_size
FROM pg_index i
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE NOT i.indisvalid
AND n.nspname = 'public';

If you find invalid indexes from a failed concurrent rebuild, drop them before trying again. You can use DROP INDEX or DROP INDEX CONCURRENTLY:

DROP INDEX CONCURRENTLY schema_name.invalid_index_name;

The CONCURRENTLY keyword avoids blocking by not requiring an ACCESS EXCLUSIVE lock on the table.

Sample Scripts to Reproduce Index Bloat

Here’s a script you can use on a test database to create and observe index bloat.

πŸ”₯ Note: This demo creates a small table that you probably wouldn't worry about in a production environment, but it'll give you an idea of how this works. The script requires the pgstattuple extension, and is not as comprehensive and efficient as the script linked to in the How to Identify Index Bloat section above.
CREATE EXTENSION IF NOT EXISTS pgstattuple;

DROP TABLE IF EXISTS test_bloat;

CREATE TABLE test_bloat (
    id integer PRIMARY KEY,
    user_id integer NOT NULL,
    order_date date NOT NULL,
    status_code integer NOT NULL
);

CREATE INDEX idx_test_bloat_multi ON test_bloat (user_id, order_date, status_code);

/* Insert some sample data */
INSERT INTO test_bloat (id, user_id, order_date, status_code)
SELECT 
    generate_series(1, 1000000),
    (random() * 10000)::integer,
    CURRENT_DATE - (random() * 365)::integer,
    (random() * 10)::integer;

/* Check index size and empty space on pages */
SELECT
    s.schemaname,
    s.indexrelname AS indexname,
    /* Total size of the index on disk */
    pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,
    /* Total number of pages: leaf pages + internal pages + empty pages + deleted pages */
    (stats.leaf_pages + stats.internal_pages + stats.empty_pages + stats.deleted_pages) AS total_pages,
    /* Average density of leaf pages (0-100, lower = more empty space on pages) */
    ROUND(stats.avg_leaf_density::numeric, 1) AS avg_leaf_density,
    /* Empty space on leaf pages based on current density */
    /* If density is 50%, then 50% of leaf page space is empty */
    pg_size_pretty(
        (
            stats.leaf_pages * current_setting('block_size')::bigint * 
            (100.0 - stats.avg_leaf_density) / 100.0
        )::bigint
    ) AS empty_space,
    /* Percentage of space on leaf pages that's currently empty */
    ROUND(
        (100.0 - stats.avg_leaf_density)::numeric,
        1
    ) AS empty_space_pct
FROM pg_stat_user_indexes AS s
/* Use pgstatindex() to get actual page-level statistics from the index */
CROSS JOIN LATERAL pgstatindex(s.schemaname || '.' || s.indexrelname) AS stats
WHERE s.relname = 'test_bloat';

Here’s the initial size and density - the primary key pages are 90.1% full, the multi column index pages are 70.1% full.

Query results showing initial index size with high density and minimal empty space before any deletes
/* Delete a large portion of rows */
DELETE FROM test_bloat WHERE id % 2 = 0;  

/* You can recheck the size using the query above, but nothing will
be different until you run VACUUM. */

/* Run VACUUM ANALYZE  */
VACUUM ANALYZE test_bloat;

/* Recheck the size using the query above */

Now here’s our index size and empty space: the primary key pages are 45% full and the multi column index pages are 35.5% full.

Query results showing increased empty space and lower density after deleting rows and running VACUUM ANALYZE
/* Rebuild index concurrently to reclaim space */
REINDEX INDEX CONCURRENTLY idx_test_bloat_multi;
REINDEX INDEX CONCURRENTLY test_bloat_pkey;

/* Recheck the size using the query above */

Here’s the size and empty space after rebuilds: each index is 90% full.

Query results showing reduced index size and higher density after running REINDEX CONCURRENTLY to reclaim space
/* Clean up */
DROP TABLE test_bloat;

Index Bloat Creeps Up Gradually

Have you checked your indexes for bloat? Have you implemented automated maintenance for bloat?