- Denish Patel
- 16th January 2016
Last week, I received request to increase PostgreSQL’s config parameter default_statistics_target on one of the very large DB clusters. The developers filled a ticket to increase the default_statistics_target to 10,000 at cluster level because one of the queries has started full sequencial scan instead of index scan.
By default, Postges cluster sets default_statistics_target to 100 but you can set up to 10,000. This parameter defines how many values are to be stored in the list of most common values, and also indicates the number of rows to be inspected by ANALYZE process.
Out of curiosity, I dig into Postgres ANALYZE code to figure out how many rows is being scanned by ANALYZE based on default_statistics_target value. As you can see in line 1763, ANALYZE inspects rows up to multiple of 300 & default_statistics_target.
1763 stats->minrows = 300 * attr->attstattarget;
If the default_statistics_target value is 100, ANALYZE could scan up to 300 * 100= 30,000 rows to gather accurate statistics.
As per Postgres documentation on ANALYZE, if you increase the value blindly at cluster level it could hurt the smaller queries because query planning time could increase significantly by setting the default_statistics_target value to maximum, which can eventually degrade the peformance for simple queries. The next logical step for me to find out optimal value for the default_statistics_target for the specific table with slow query. I will probably start with increasing value to 1000 and increase/decrease value based on EXPLAIN ANALYZE query results.
When you try to tune default_statistics _target for your database server, hopefully this finding will help. Let me know, if you have alternate views or idea for tuning the setting. Thanks!
- Running VACUUM FULL in Non-blocking Mode
- Connection Scaling
- PSQL Helper: Managing Connections and Simplifying Queries
- Vacuum Those MVs!
- Working With Repmgr: Using Other 3rd Party Tools for Setting up a Standby
- Working with Amazon Aurora PostgreSQL: dag, standby rebooted again!
- Working with Amazon Aurora PostgreSQL: what happened to the stats?
- How to set application_name for psql command line utility?
- Is there a limit on number of partitions handled by Postgres?
- Postgres 11 partitioning
The Postgres community has a fear of planning time that sometimes approaches paranoia.
You can see how Postgres stores statistics in their raw form by looking at the pg_statistic (NOT pg_stats) table. The bulk of the storage for any sizable table will be in one or more of the stanumbers and stavalues array columns. Generally, those arrays will be elements long. I’m guessing that searches in those arrays are done with a binary search. So yes, a very large statistics target will increase planning time, but we’re talking microseconds here. In almost any normal usage a statsistics target of 1000 or ever 1000 should be fine.
I encounter problems with default_statistics_target very frequently. There is a fundamental problem with the sampling algorithm for estimating n_distinct values for clustered distributions. For a longer description and discussion see:
I did not find a really good solution. My best approach so far is to increase the statistics-target on one column in the table, which has only very limited numbers of values (2 would be perfect). Thus all columns benefit from more exhaustive sampling for estimating n_distinct, while no large histograms are produced. Therefore, planning time is not affected either.
@Stephen Thanks for providing workaround.
Leave a comment