Consider fillfactor when estimating relation size
authorTomas Vondra <[email protected]>
Mon, 3 Jul 2023 16:55:31 +0000 (18:55 +0200)
committerTomas Vondra <[email protected]>
Mon, 3 Jul 2023 16:55:31 +0000 (18:55 +0200)
When table_block_relation_estimate_size() estimated the number of tuples
in a relation without statistics (e.g. right after load), it did not
consider fillfactor when calculating density. With non-default
fillfactor values, this may result in significant overestimate of the
number of tuples - up to 10x with the minimum 10% fillfactor. This may
have unexpected consequences, e.g. when creating hash indexes.

This considers the current fillfactor value in the "no statistics" code
path.  If the fillfactor changes after loading data into the table, the
estimate may be off. But that seems much less likely than changing the
fillfactor before the data load.

Reviewed-by: Corey Huinker, Peter Eisentraut
Discussion: https://postgr.es/m/cf154ef9-6bac-d268-b735-67a3443debba@enterprisedb.com

src/backend/access/table/tableam.c

index 771438c8cecb9c0f9160b54fc8011ef4740fd6d8..c6bdb7e1c6851dd2b59bf0b4d3d2f89ef77e1fef 100644 (file)
@@ -737,11 +737,19 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
                 * and (c) different table AMs might use different padding schemes.
                 */
                int32           tuple_width;
+               int                     fillfactor;
+
+               /*
+                * Without reltuples/relpages, we also need to consider fillfactor.
+                * The other branch considers it implicitly by calculating density
+                * from actual relpages/reltuples statistics.
+                */
+               fillfactor = RelationGetFillFactor(rel, HEAP_DEFAULT_FILLFACTOR);
 
                tuple_width = get_rel_data_width(rel, attr_widths);
                tuple_width += overhead_bytes_per_tuple;
                /* note: integer division is intentional here */
-               density = usable_bytes_per_page / tuple_width;
+               density = (usable_bytes_per_page * fillfactor / 100) / tuple_width;
        }
        *tuples = rint(density * (double) curpages);