Fix ndistinct estimates with system attributes
authorTomas Vondra <[email protected]>
Fri, 26 Mar 2021 21:34:53 +0000 (22:34 +0100)
committerTomas Vondra <[email protected]>
Fri, 26 Mar 2021 21:34:58 +0000 (22:34 +0100)
When estimating the number of groups using extended statistics, the code
was discarding information about system attributes. This led to strange
situation that

    SELECT 1 FROM t GROUP BY ctid;

could have produced higher estimate (equal to pg_class.reltuples) than

    SELECT 1 FROM t GROUP BY a, b, ctid;

with extended statistics on (a,b). Fixed by retaining information about
the system attribute.

Backpatch all the way to 10, where extended statistics were introduced.

Author: Tomas Vondra
Backpatch-through: 10

src/backend/utils/adt/selfuncs.c
src/test/regress/expected/stats_ext.out

index 52314d3aa1c5c1829c161c98783b8bc2b425a839..2348d4a772a0a1a9901e59b36a34d70bb532fc0e 100644 (file)
@@ -3987,11 +3987,11 @@ estimate_multivariate_ndistinct(PlannerInfo *root, RelOptInfo *rel,
 
            attnum = ((Var *) varinfo->var)->varattno;
 
-           if (!AttrNumberIsForUserDefinedAttr(attnum))
+           if (AttrNumberIsForUserDefinedAttr(attnum) &&
+               bms_is_member(attnum, matched))
                continue;
 
-           if (!bms_is_member(attnum, matched))
-               newlist = lappend(newlist, varinfo);
+           newlist = lappend(newlist, varinfo);
        }
 
        *varinfos = newlist;
index 431b3fa3de1f4f87205e7e27a99ef1cf337f1590..d80e6a3907c658037bae23fdc0b597ac79ffe7a3 100644 (file)
@@ -260,7 +260,7 @@ SELECT s.stxkind, d.stxdndistinct
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY ctid, a, b');
  estimated | actual 
 -----------+--------
-        11 |   1000
+      1000 |   1000
 (1 row)
 
 -- Hash Aggregate, thanks to estimates improved by the statistic