Opinions expressed on this site are solely those of Kendra Little of Catalyze SQL, LLC. Content policy: Short excerpts of blog posts (3 sentences) may be republished, but longer excerpts and artwork cannot be shared without explicit permission.
By Kendra Little on June 26, 2025
Dear Postgres, Why won’t you use my covering index?
Lately I’ve been learning to tune queries running against PostgreSQL, and it’s pretty delightful. One fun question that I worked through struck me something that most Postgres users probably encounter one way or another: sometimes you may create the perfect index that covers a given query, but the query planner will choose to ignore it and scan the base table.
Why in the world would it do that?

The perfect index
Let’s say you have a relatively large table – something like the Posts table in the Stack Overflow database. You want to run a query to count the number of posts that are questions created after a specific date that have an AnswerCount of 0.
The query for this would use multiple columns:
- PostTypeId with an equality predicate (Questions have PostTypeId=1)
- AnswerCount with an equality predicate (AnswerCount = 0)
- CreationDate with a range predicate (> Date Value)
To “cover” this query, you create a btree index on the table with the index keys PostTypeId, AnswerCount, and CreationDate, in that order.
This index is built and is a nice, beautiful structure with all of those columns. This structure is ordered so that all the rows where PostTypeId = 1 are together, then they are ordered by AnswerCount, so we can quickly find everything where AnswerCount = 0 for our questions. The third ordering column is CreationDate, so we can seek and find all the rows greater than the value we are looking for and count them.
This index seems to cover everything we need in the query.
But it might not get used
I had a situation similar to this: I created the ideal index for a simple query. Then I asked Postgres for an explain plan– basically to run the query and show me how it executed it with some details.
It did so, and showed me that it didn’t use my index: instead, it scanned the base table with a sequential scan.
I re-checked my index. IT WAS PERFECT. I ran the explain plan again – maybe I was mistaken? Nope, sequential scan against the heap.
What if I force it to use it?
PostgreSQL has some costing settings that influence whether the query planner chooses to use nonclustered indexes:
seq_page_cost
: “Sets the planner’s estimate of the cost of a disk page fetch that is part of a series of sequential fetches. The default is 1.0.”random_page_cost
: “Sets the planner’s estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0. …Reducing this value relative to seq_page_cost will cause the system to prefer index scans; raising it will make index scans look relatively more expensive.”
Now, my index was perfect in my mind, but I went ahead and ran SET random_page_cost = 1.1;
to change this for my session.
I then re-ran explain analyze for my query and I saw that PostgreSQL used my index.
But I also saw that the execution time for my query was much slower than it had taken to generate the explain analyze plan that used the sequential scan of the table.
I’d forced the query planner to use my index, but it made the query slower.
Where did this random IO come from?
Thinking a little about the setting I’d tweaked, I realized that I’d only gotten the query planner to use the index by saying that random IO was not such a big deal. It clearly had rejected the index because it didn’t want to do random IO.
So why wasn’t the index “covering” my query? What would it need outside of the index?
This is covered nicely in the docs on Index-Only Scans and Covering Indexes:
But there is an additional requirement for any table scan in PostgreSQL: it must verify that each retrieved row be “visible” to the query’s MVCC snapshot, as discussed in Chapter 13. Visibility information is not stored in index entries, only in heap entries…
PostgreSQL tracks, for each page in a table’s heap, whether all rows stored in that page are old enough to be visible to all current and future transactions. This information is stored in a bit in the table’s visibility map. An index-only scan, after finding a candidate index entry, checks the visibility map bit for the corresponding heap page. If it’s set, the row is known visible and so the data can be returned with no further work. If it’s not set, the heap entry must be visited to find out whether it’s visible, so no performance advantage is gained over a standard index scan.
I was querying a table where a significant amount of data has been changed, and I was running a query that was going to look at a lot of rows.
When I had a higher value for random_page_cost
, the query planner realized that it could use the nonclustered index to seek to the data, but that it was going to have to check a whole lot of of corresponding heap pages, which was a lot of random IO.
Tidying up my tuples
Vacuuming is the process in Postgres that helps with this. Vacuumming Basics explains that the vacuum process is essential for multiple purposes:
- To recover or reuse disk space occupied by updated or deleted rows.
- To update data statistics used by the PostgreSQL query planner.
- To update the visibility map, which speeds up index-only scans.
- To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
Item 3 – that’s our issue.
Sure enough, I looked at the tables used by my query and realized that although the autovacuum process was enabled, the default thresholds were high enough that a whole lot of data had changed in this table without the process kicking in.
I ran a manual vacuum against the table and found that executing my query was now much faster when it used the nonclustered index, and that the query planner was willing to use the nonclustered index when my session had higher values for random_page_cost
.
I’ll dig in more in future posts on considerations I’ve found with tweaking these settings, as well as autovacuum and autoanalyze settings.