Analyzing effect on default_statistics_target on ANALYZE

by Denish Patel
Date 16th January 2016
Categories postgresql

Last week, I received request to increase PostgreSQL’s config parameter default_statistics_target on one of the very large DB clusters. The developers filled a ticket to increase the default_statistics_target to 10,000 at cluster level because one of the queries has started full sequencial scan instead of index scan.

By default, Postges cluster sets default_statistics_target to 100 but you can set up to 10,000. This parameter defines how many values are to be stored in the list of most common values, and also indicates the number of rows to be inspected by ANALYZE process.

Out of curiosity, I dig into Postgres ANALYZE code to figure out how many rows is being scanned by ANALYZE based on default_statistics_target value. As you can see in line 1763, ANALYZE inspects rows up to multiple of 300 & default_statistics_target.

1763 stats->minrows = 300 * attr->attstattarget;

If the default_statistics_target value is 100, ANALYZE could scan up to 300 * 100= 30,000 rows to gather accurate statistics.

As per Postgres documentation on ANALYZE, if you increase the value blindly at cluster level it could hurt the smaller queries because query planning time could increase significantly by setting the default_statistics_target value to maximum, which can eventually degrade the peformance for simple queries. The next logical step for me to find out optimal value for the default_statistics_target for the specific table with slow query. I will probably start with increasing value to 1000 and increase/decrease value based on EXPLAIN ANALYZE query results.

When you try to tune default_statistics _target for your database server, hopefully this finding will help. Let me know, if you have alternate views or idea for tuning the setting. Thanks!

Join the conversation

3 Comments

Jim Nasby says:

16th January 2016 at 7:43 pm

The Postgres community has a fear of planning time that sometimes approaches paranoia.

You can see how Postgres stores statistics in their raw form by looking at the pg_statistic (NOT pg_stats) table. The bulk of the storage for any sizable table will be in one or more of the stanumbers and stavalues array columns. Generally, those arrays will be elements long. I’m guessing that searches in those arrays are done with a binary search. So yes, a very large statistics target will increase planning time, but we’re talking microseconds here. In almost any normal usage a statsistics target of 1000 or ever 1000 should be fine.

Reply
Stefan Andreatta says:

20th January 2016 at 12:45 pm

I encounter problems with default_statistics_target very frequently. There is a fundamental problem with the sampling algorithm for estimating n_distinct values for clustered distributions. For a longer description and discussion see:

I did not find a really good solution. My best approach so far is to increase the statistics-target on one column in the table, which has only very limited numbers of values (2 would be perfect). Thus all columns benefit from more exhaustive sampling for estimating n_distinct, while no large histograms are produced. Therefore, planning time is not affected either.

Reply
Denish says:

20th January 2016 at 2:45 pm

@Stephen Thanks for providing workaround.

Reply

Menu

Contact Info

Folow us on social

Analyzing effect on default_statistics_target on ANALYZE

Join the conversation

Leave a Reply to Stefan Andreatta Cancel reply

Leave a comment

Categories

Recent Blog

ARCHIVES

Menu

Contact Info

Folow us on social

Search

Join the conversation

Leave a Reply to Stefan Andreatta Cancel reply

Leave a comment

Categories

Recent Blog

ARCHIVES

Want to get started ? Email us info@elephas.io