Berserk Autovacuum (let's save next Mandrill)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
158 messages Options
1 ... 5678
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Tom Lane-2
David Rowley <[hidden email]> writes:
> I don't believe any of the current buildfarm failures can be
> attributed to any of the recent changes to autovacuum, but I'll
> continue to monitor the farm to see if anything is suspect.

I agree none of the failures I see right now are related to that
(there's some "No space left on device" failures, Windows randomicity,
snapper's compiler bug, and don't-know-what on hyrax).

But the ones that were seemingly due to that were intermittent,
so we'll have to watch for awhile.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

akapila
On Mon, Mar 30, 2020 at 7:47 AM Tom Lane <[hidden email]> wrote:

>
> David Rowley <[hidden email]> writes:
> > I don't believe any of the current buildfarm failures can be
> > attributed to any of the recent changes to autovacuum, but I'll
> > continue to monitor the farm to see if anything is suspect.
>
> I agree none of the failures I see right now are related to that
> (there's some "No space left on device" failures, Windows randomicity,
> snapper's compiler bug, and don't-know-what on hyrax).
>
> But the ones that were seemingly due to that were intermittent,
> so we'll have to watch for awhile.
>

Today, stats_ext failed on petalura [1].  Can it be due to this?  I
have also committed a patch but immediately I don't see it to be
related to my commit.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Tom Lane-2
Amit Kapila <[hidden email]> writes:
> On Mon, Mar 30, 2020 at 7:47 AM Tom Lane <[hidden email]> wrote:
>> But the ones that were seemingly due to that were intermittent,
>> so we'll have to watch for awhile.

> Today, stats_ext failed on petalura [1].  Can it be due to this?  I
> have also committed a patch but immediately I don't see it to be
> related to my commit.

Yeah, this looks just like petalura's previous failures, so the
problem is still there :-(

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Laurenz Albe
In reply to this post by David Rowley
On Sat, 2020-03-28 at 19:21 +1300, David Rowley wrote:
> Thank you.  Pushed.

Thanks for your efforts on this, and thanks for working on the fallout.

How can it be that even after an explicit VACUUM, this patch can cause
unstable regression test results?

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

David Rowley
On Mon, 30 Mar 2020 at 17:57, Laurenz Albe <[hidden email]> wrote:
> How can it be that even after an explicit VACUUM, this patch can cause
> unstable regression test results?

I only added vacuums for mcv_lists. The problem with petalura [1] is
with the functional_dependencies table.

I'll see if I can come up with some way to do this in a more
deterministic way to determine which tables to add vacuums for, rather
than waiting for and reacting post-failure.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

David Rowley
On Mon, 30 Mar 2020 at 19:49, David Rowley <[hidden email]> wrote:
> I'll see if I can come up with some way to do this in a more
> deterministic way to determine which tables to add vacuums for, rather
> than waiting for and reacting post-failure.

I ended up running make installcheck on an instance with
autovacuum_naptime set to 1s with a small additional debug line in
autovacuum.c, namely:

diff --git a/src/backend/postmaster/autovacuum.c
b/src/backend/postmaster/autovacuum.c
index 7e97ffab27..ad81e321dc 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -3099,6 +3099,9 @@ relation_needs_vacanalyze(Oid relid,
                *dovacuum = force_vacuum || (vactuples > vacthresh) ||
                                        (vac_ins_base_thresh >= 0 &&
instuples > vacinsthresh);
                *doanalyze = (anltuples > anlthresh);
+
+               if (vac_ins_base_thresh >= 0 && instuples > vacinsthresh)
+                       elog(LOG, "******** %s", NameStr(classForm->relname));
        }
        else
        {

I grepped the log after the installcheck to grab the table names that
saw an insert vacuum during the test then grepped the test output to
see if the table appears to pose a risk of test instability.

I've classed each table with a risk factor. "VeryLow" seems like
there's almost no risk because we don't ever look at EXPLAIN.  Low
risk tables look at EXPLAIN, but I feel are not quite looking in
enough detail to cause issues. Medium risk look at EXPLAIN and I feel
there's a risk of some change, I think these are all Append nodes
which do order subnodes based on their cost. High risk.... those are
the ones I'm about to look into changing.

The full results of my analysis are:

Table: agg_group_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow
Table: agg_hash_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow
Table: atest12 privileges.out. Lots of looking at EXPLAIN, but nothing
appears to look into row estimates in detail. Risk:Low
Table: brin_test brin.out.  Test already does VACUUM ANALYZE. Risk:VeryLow
Table: bt_f8_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: bt_i4_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: bt_name_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: bt_txt_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: dupindexcols create_index.out. Some looking at EXPLAIN plans,
but nothing appears to look into row estimates in detail. Risk:Low
Table: fast_emp4000 create_am.out, create_index.out, create_misc.out.
Lots of looking at EXPLAIN, but nothing appears to look into row
estimates in detail. Risk:Low
Table: functional_dependencies stats_ext.out. Lots of looking at
EXPLAIN output. Test looks at row estimates. Risk:High
Table: gist_tbl gist.out. Lots of looking at EXPLAIN, but nothing
appears to look into row estimates in detail. Risk:Low
Table: hash_f8_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: hash_i4_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: hash_name_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: hash_txt_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: kd_point_tbl create_index_spgist.out. Lots of looking at
EXPLAIN, but nothing appears to look into row estimates in detail.
Risk:Low
Table: mcv_lists stats_ext.out. Lots of looking at EXPLAIN, but tests
appear to VACUUM after loading rows. Risk:Low
Table: mcv_lists_arrays stats_ext.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: mcv_lists_bool stats_ext.out.  Lots of looking at EXPLAIN
output. Test looks at row estimates. Risk:High
Table: ndistinct stats_ext.out.  Lots of looking at EXPLAIN output.
Test looks at row estimates. Only 1000 rows are loaded initially and
then 5000 after a truncate. 1000 rows won't trigger the auto-vacuum.
Risk:High
Table: onek Lots of files. Sees a VACUUM in sanity_check test,
however, some tests run before sanity_check, e.g. create_index,
select, copy, none of which appear to pay particular attention to
anything vacuum might change. Risk:Low
Table: pagg_tab_ml_p2_s1 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_ml_p2_s2 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_ml_p3_s1 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_ml_p3_s2 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_para_p1 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_para_p2 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_para_p3 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pg_attribute Seen in several tests. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: pg_depend Seen in several tests. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: pg_largeobject Seen in several tests. Nothing appears to look
at EXPLAIN. Risk:VeryLow
Table: quad_box_tbl box.out. Sees some use of EXPLAIN, but nothing
looks critical. Risk:Low
Table: quad_box_tbl_ord_seq1 box.out. No EXPLAIN usage. Risk:VeryLow
Table: quad_box_tbl_ord_seq2 box.out. No EXPLAIN usage. Risk:VeryLow
Table: quad_point_tbl create_index_spgist.out Sees some use of
EXPLAIN. Index Only Scans are already being used. Risk:Low
Table: quad_poly_tbl polygon.out Some usages of EXPLAIN. Risk:Low
Table: radix_text_tbl create_index_spgist.out Some usages of EXPLAIN. Risk:Low
Table: road various tests. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: slow_emp4000 various tests. Nothing appears to look at EXPLAIN.
Risk:VeryLow
Table: spgist_box_tbl spgist.out. Nothing appears to look at EXPLAIN.
Risk:VeryLow
Table: spgist_point_tbl spgist.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: spgist_text_tbl spgist.out. Nothing appears to look at EXPLAIN.
Risk:VeryLow
Table: tenk1 aggregates.out, groupingsets.out, join.out, limit.out,
misc_functions.out, rowtypes.out,select_distinct.out,
select_parallel.out, subselect.out, tablesample.out, tidscan.out,
union.out, window.out and write_parallel.out are after vacuum in
sanity_check. EXPLAIN used in create_index.out and inherit.out, which
are all run before sanity_check does the vacuum. Risk:Medium
Table: tenk2 Only sees EXPLAIN usages in select_parallel.out, which is
after the table is vacuumed in sanity_check. Risk:Low
Table: test_range_gist rangetypes.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: test_range_spgist rangetypes.out. Some EXPLAIN usage. Risk:Low
Table: testjsonb jsonb.out. Some EXPLAIN usage. Risk:Low
Table: transition_table_level2 plpgsql.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: transition_table_status plpgsql.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow

I'd like to wait to see if we get failures for the ones I've classed
as medium risk.

David


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

David Rowley
In reply to this post by David Rowley
On Sat, 28 Mar 2020 at 22:22, David Rowley <[hidden email]> wrote:
> I'm unsure yet if this has caused an instability on lousyjack's run in
> [1].

pogona has just joined in on the fun [1], so, we're not out the woods
on this yet. I'll start having a look at this in more detail.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Dean Rasheed-3
On Tue, 31 Mar 2020 at 04:39, David Rowley <[hidden email]> wrote:

>
> On Sat, 28 Mar 2020 at 22:22, David Rowley <[hidden email]> wrote:
> > I'm unsure yet if this has caused an instability on lousyjack's run in
> > [1].
>
> pogona has just joined in on the fun [1], so, we're not out the woods
> on this yet. I'll start having a look at this in more detail.
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
>

I had a go at reproducing this. I wasn't able to produce the reported
failure, but I can reliably produce an Assert failure that may be
related by doing a VACUUM FULL simultaneously with an ANALYZE that is
generating extended stats, which produces:

#0  0x00007f28081c9520 in raise () from /lib64/libc.so.6
#1  0x00007f28081cab01 in abort () from /lib64/libc.so.6
#2  0x0000000000aad1ad in ExceptionalCondition (conditionName=0xb2f1a1
"ItemIdIsNormal(lp)", errorType=0xb2e7c9 "FailedAssertion",
fileName=0xb2e848 "heapam.c", lineNumber=3016) at assert.c:67
#3  0x00000000004fb79e in heap_update (relation=0x7f27feebeda8,
otid=0x2d881fc, newtup=0x2d881f8, cid=0, crosscheck=0x0, wait=true,
tmfd=0x7ffc568a5900, lockmode=0x7ffc568a58fc) at heapam.c:3016
#4  0x00000000004fdead in simple_heap_update (relation=0x7f27feebeda8,
otid=0x2d881fc, tup=0x2d881f8) at heapam.c:3902
#5  0x00000000005be860 in CatalogTupleUpdate (heapRel=0x7f27feebeda8,
otid=0x2d881fc, tup=0x2d881f8) at indexing.c:230
#6  0x00000000008df898 in statext_store (statOid=18964, ndistinct=0x0,
dependencies=0x2a85fe0, mcv=0x0, stats=0x2a86570) at
extended_stats.c:553
#7  0x00000000008deec0 in BuildRelationExtStatistics
(onerel=0x7f27feed9008, totalrows=5000, numrows=5000, rows=0x2ad5a30,
natts=7, vacattrstats=0x2a75f40) at extended_stats.c:187
#8  0x000000000065c1b7 in do_analyze_rel (onerel=0x7f27feed9008,
params=0x7ffc568a5fc0, va_cols=0x0, acquirefunc=0x65ce37
<acquire_sample_rows>, relpages=31, inh=false, in_outer_xact=false,
elevel=13) at analyze.c:606
#9  0x000000000065b532 in analyze_rel (relid=18956,
relation=0x29b0bc0, params=0x7ffc568a5fc0, va_cols=0x0,
in_outer_xact=false, bstrategy=0x2a7dfa0) at analyze.c:263
#10 0x00000000006fd768 in vacuum (relations=0x2a7e148,
params=0x7ffc568a5fc0, bstrategy=0x2a7dfa0, isTopLevel=true) at
vacuum.c:468
#11 0x00000000006fd22c in ExecVacuum (pstate=0x2a57a00,
vacstmt=0x29b0ca8, isTopLevel=true) at vacuum.c:251

It looks to me as though the problem is that statext_store() needs to
take its lock on pg_statistic_ext_data *before* searching for the
stats tuple to update.

It's late here, so I haven't worked up a patch yet, but it looks
pretty straightforward.

Regards,
Dean


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Tom Lane-2
Dean Rasheed <[hidden email]> writes:
> I had a go at reproducing this. I wasn't able to produce the reported
> failure, but I can reliably produce an Assert failure that may be
> related by doing a VACUUM FULL simultaneously with an ANALYZE that is
> generating extended stats, which produces:
> ...
> It looks to me as though the problem is that statext_store() needs to
> take its lock on pg_statistic_ext_data *before* searching for the
> stats tuple to update.

Hmm, yeah, that seems like clearly a bad idea.

> It's late here, so I haven't worked up a patch yet, but it looks
> pretty straightforward.

I can take care of it.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Tom Lane-2
I wrote:
> Dean Rasheed <[hidden email]> writes:
>> I had a go at reproducing this. I wasn't able to produce the reported
>> failure, but I can reliably produce an Assert failure that may be
>> related by doing a VACUUM FULL simultaneously with an ANALYZE that is
>> generating extended stats, which produces:
>> ...
>> It looks to me as though the problem is that statext_store() needs to
>> take its lock on pg_statistic_ext_data *before* searching for the
>> stats tuple to update.

> Hmm, yeah, that seems like clearly a bad idea.

I pushed a fix for that, but I think it must be unrelated to the
buildfarm failures we're seeing.  For that coding to be a problem,
it would have to run concurrently with a VACUUM FULL or CLUSTER
on pg_statistic_ext_data (which would give all the tuples new TIDs).
AFAICS that won't happen with the tests that are giving trouble.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Dean Rasheed-3
On Tue, 31 Mar 2020 at 22:16, Tom Lane <[hidden email]> wrote:

>
> > Dean Rasheed <[hidden email]> writes:
> >> ...
> >> It looks to me as though the problem is that statext_store() needs to
> >> take its lock on pg_statistic_ext_data *before* searching for the
> >> stats tuple to update.
>
> > Hmm, yeah, that seems like clearly a bad idea.
>
> I pushed a fix for that

Thanks for doing that (looks like it was my mistake originally).

> but I think it must be unrelated to the
> buildfarm failures we're seeing.  For that coding to be a problem,
> it would have to run concurrently with a VACUUM FULL or CLUSTER
> on pg_statistic_ext_data (which would give all the tuples new TIDs).
> AFAICS that won't happen with the tests that are giving trouble.
>

Yeah, that makes sense. I still can't see what might be causing those
failures. The tests that were doing an ALTER COLUMN and then expecting
to see the results of a non-analysed table ought to be fixed by
0936d1b6f, but that doesn't match the buildfarm failures. Possibly
0936d1b6f will help with those anyway, but if so, it'll be annoying
not understanding why.

Regards,
Dean


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Tom Lane-2
Dean Rasheed <[hidden email]> writes:
> Yeah, that makes sense. I still can't see what might be causing those
> failures. The tests that were doing an ALTER COLUMN and then expecting
> to see the results of a non-analysed table ought to be fixed by
> 0936d1b6f, but that doesn't match the buildfarm failures. Possibly
> 0936d1b6f will help with those anyway, but if so, it'll be annoying
> not understanding why.

Quite :-(.  While it's too early to declare victory, we've seen no
more failures of this ilk since 0936d1b6f, so it's sure looking like
autovacuum did have something to do with it.

Just to save people repeating the search I did, these are the buildfarm
failures of interest so far:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2020%3A03%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-03-28%2022%3A00%3A19
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-03-29%2006%3A45%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2020-03-30%2006%3A00%3A06
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2006%3A10%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-03-31%2005%3A00%3A35

The first of those is unlike the rest, and I'm not trying to account for
it here.  In the rest, what we see is that sometimes the estimates are off
by a little bit from what's expected, up or down just a percent or two.
And those deltas kick at inconsistent spots partway through a series of
similar tests, so it's hard to deny that *something* asynchronous to the
test script is causing it.

After contemplating the failures for awhile, I have a theory that
at least partially matches the data.  What I think is happening is
that autovacuum (NOT auto-analyze) launches on the table, and since
it is running concurrently with the foreground test script, it fails
to immediately acquire buffer lock on one or more of the table pages.
Since this isn't an aggressive vacuum scan, it just politely backs
off and doesn't scan those pages.  And that translates to not getting
a perfectly accurate reltuples estimate at the end of the vacuum.
On my x86_64 machine, which matches the buildfarm critters having
trouble, the actual contents of both of the troublesome tables will
be 5000 tuples in 31 pages --- which comes out to be 30 pages with
162 tuples each and then 140 tuples in the last page.  Working through
the math in vac_estimate_reltuples (and assuming that the "old" values
were accurate numbers from the test script's own ANALYZE), what I find
is that autovacuum will conclude there are 4999 tuples if it misses
scanning one of the first 30 pages, or 5021 tuples if it misses scanning
the last page, because its interpolation from the old tuple density
figure will underestimate or overestimate the number of missed tuples
accordingly.  Once that slightly-off number gets pushed into pg_class,
we start to get slightly-off rowcount estimates in the test cases.

So what I'm hypothesizing is that the pg_statistic data is perfectly
fine but pg_class.reltuples goes off a little bit after autovacuum.
The percentage changes in reltuples that I predict this way don't
quite square with the percentage changes we see in the overall
rowcount estimates, which is a problem for this theory.  But the test
cases are exercising some fairly complex estimation logic, and it
wouldn't surprise me much if the estimates aren't linearly affected by
reltuples.  (Tomas, do you want to comment further on that point?)

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

David Rowley
On Thu, 2 Apr 2020 at 16:13, Tom Lane <[hidden email]> wrote:

>
> Dean Rasheed <[hidden email]> writes:
> > Yeah, that makes sense. I still can't see what might be causing those
> > failures. The tests that were doing an ALTER COLUMN and then expecting
> > to see the results of a non-analysed table ought to be fixed by
> > 0936d1b6f, but that doesn't match the buildfarm failures. Possibly
> > 0936d1b6f will help with those anyway, but if so, it'll be annoying
> > not understanding why.
>
> Quite :-(.  While it's too early to declare victory, we've seen no
> more failures of this ilk since 0936d1b6f, so it's sure looking like
> autovacuum did have something to do with it.

How about [1]? It seems related to me and also post 0936d1b6f.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-04-01%2017%3A03%3A05


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Tomas Vondra-4
In reply to this post by Tom Lane-2
On Wed, Apr 01, 2020 at 11:13:12PM -0400, Tom Lane wrote:

>Dean Rasheed <[hidden email]> writes:
>> Yeah, that makes sense. I still can't see what might be causing those
>> failures. The tests that were doing an ALTER COLUMN and then expecting
>> to see the results of a non-analysed table ought to be fixed by
>> 0936d1b6f, but that doesn't match the buildfarm failures. Possibly
>> 0936d1b6f will help with those anyway, but if so, it'll be annoying
>> not understanding why.
>
>Quite :-(.  While it's too early to declare victory, we've seen no
>more failures of this ilk since 0936d1b6f, so it's sure looking like
>autovacuum did have something to do with it.
>
>Just to save people repeating the search I did, these are the buildfarm
>failures of interest so far:
>
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2020%3A03%3A03
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-03-28%2022%3A00%3A19
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-03-29%2006%3A45%3A02
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2020-03-30%2006%3A00%3A06
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2006%3A10%3A05
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-03-31%2005%3A00%3A35
>
>The first of those is unlike the rest, and I'm not trying to account for
>it here.  In the rest, what we see is that sometimes the estimates are off
>by a little bit from what's expected, up or down just a percent or two.
>And those deltas kick at inconsistent spots partway through a series of
>similar tests, so it's hard to deny that *something* asynchronous to the
>test script is causing it.
>
>After contemplating the failures for awhile, I have a theory that
>at least partially matches the data.  What I think is happening is
>that autovacuum (NOT auto-analyze) launches on the table, and since
>it is running concurrently with the foreground test script, it fails
>to immediately acquire buffer lock on one or more of the table pages.
>Since this isn't an aggressive vacuum scan, it just politely backs
>off and doesn't scan those pages.  And that translates to not getting
>a perfectly accurate reltuples estimate at the end of the vacuum.
>On my x86_64 machine, which matches the buildfarm critters having
>trouble, the actual contents of both of the troublesome tables will
>be 5000 tuples in 31 pages --- which comes out to be 30 pages with
>162 tuples each and then 140 tuples in the last page.  Working through
>the math in vac_estimate_reltuples (and assuming that the "old" values
>were accurate numbers from the test script's own ANALYZE), what I find
>is that autovacuum will conclude there are 4999 tuples if it misses
>scanning one of the first 30 pages, or 5021 tuples if it misses scanning
>the last page, because its interpolation from the old tuple density
>figure will underestimate or overestimate the number of missed tuples
>accordingly.  Once that slightly-off number gets pushed into pg_class,
>we start to get slightly-off rowcount estimates in the test cases.
>
>So what I'm hypothesizing is that the pg_statistic data is perfectly
>fine but pg_class.reltuples goes off a little bit after autovacuum.
>The percentage changes in reltuples that I predict this way don't
>quite square with the percentage changes we see in the overall
>rowcount estimates, which is a problem for this theory.  But the test
>cases are exercising some fairly complex estimation logic, and it
>wouldn't surprise me much if the estimates aren't linearly affected by
>reltuples.  (Tomas, do you want to comment further on that point?)
>

I think this theory makes perfect sense. I think it's much less likely
to see the last page skipped, so we're likely to end up with reltuples
lower than 5000 (as opposed to seeing the 5021). That kinda matches the
reports, where we generally see estimates reduced by 1 or 2. The -1
change could be explained by rounding errors, I guess - with 5000 we
might have produced 139.51, rounded up to 140, a slight drop may get us
139. Not sure about the -2 changes, but I suppose it's possible we might
actually skip multiple pages, reducing the reltuples estimate even more?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Tom Lane-2
In reply to this post by David Rowley
David Rowley <[hidden email]> writes:
> On Thu, 2 Apr 2020 at 16:13, Tom Lane <[hidden email]> wrote:
>> Quite :-(.  While it's too early to declare victory, we've seen no
>> more failures of this ilk since 0936d1b6f, so it's sure looking like
>> autovacuum did have something to do with it.

> How about [1]? It seems related to me and also post 0936d1b6f.

That looks much like the first lousyjack failure, which as I said
I wasn't trying to account for at that point.

After looking at those failures, though, I believe that the root cause
may be the same, ie small changes in pg_class.reltuples due to
autovacuum not seeing all pages of the tables.  The test structure
is a bit different, but it is accessing the tables in between EXPLAIN
attempts, so it could be preventing a concurrent autovac from seeing
all pages.

I see your fix at cefb82d49, but it feels a bit brute-force.  Unlike
stats_ext.sql, we're not (supposed to be) dependent on exact planner
estimates in this test.  So I think the real problem here is crappy test
case design.  Namely, that these various sub-tables are exactly the
same size, despite which the test is expecting that the planner will
order them consistently --- with a planning algorithm that prefers
to put larger tables first in parallel appends (cf. create_append_path).
It's not surprising that the result is unstable in the face of small
variations in the rowcount estimates.

I'd be inclined to undo what you did in favor of initializing the
test tables to contain significantly different numbers of rows,
because that would (a) achieve plan stability more directly,
and (b) demonstrate that the planner is actually ordering the
tables by cost correctly.  Maybe somewhere else we have a test
that is verifying (b), but these test cases abysmally fail to
check that point.

I'm not really on board with disabling autovacuum in the regression
tests anywhere we aren't absolutely forced to do so.  It's not
representative of real world practice (or at least not real world
best practice ;-)) and it could help hide actual bugs.  We don't seem
to have much choice with the stats_ext tests as they are constituted,
but those tests look really fragile to me.  Let's not adopt that
technique where we have other possible ways to stabilize test results.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Tom Lane-2
I wrote:
> I'd be inclined to undo what you did in favor of initializing the
> test tables to contain significantly different numbers of rows,
> because that would (a) achieve plan stability more directly,
> and (b) demonstrate that the planner is actually ordering the
> tables by cost correctly.  Maybe somewhere else we have a test
> that is verifying (b), but these test cases abysmally fail to
> check that point.

Concretely, I suggest the attached, which replaces the autovac disables
with adjusting partition boundaries so that the partitions contain
different numbers of rows.

I did not touch the partition boundaries for pagg_tab1 and pagg_tab2,
because that would have required also changing the associated test
queries (which are designed to access only particular partitions).
It seemed like too much work to verify that the answers were still
right, and it's not really necessary because those tables are so
small as to fit in single pages.  That means that autovac will either
see the whole table or none of it, and in either case it won't change
reltuples.

This does cause the order of partitions to change in a couple of the
pagg_tab_ml EXPLAIN results, but I think that's fine; the ordering
does now match the actual sizes of the partitions.

                        regards, tom lane


diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index d8a6836..a4dc12b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -2,9 +2,9 @@
 -- PARTITION_AGGREGATE
 -- Test partitionwise aggregation on partitioned tables
 --
--- Note: Various tests located within are sensitive to tables being
--- auto-vacuumed while the tests are running.  For this reason we
--- run autovacuum_enabled = off for all tables.
+-- Note: to ensure plan stability, it's a good idea to make the partitions of
+-- any one partitioned table in this test all have different numbers of rows.
+--
 -- Enable partitionwise aggregate, which by default is disabled.
 SET enable_partitionwise_aggregate TO true;
 -- Enable partitionwise join, which by default is disabled.
@@ -15,9 +15,9 @@ SET max_parallel_workers_per_gather TO 0;
 -- Tests for list partitioned tables.
 --
 CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY LIST(c);
-CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003') WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0004', '0005', '0006', '0007') WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0008', '0009', '0010', '0011') WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003', '0004');
+CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005', '0006', '0007', '0008');
+CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009', '0010', '0011');
 INSERT INTO pagg_tab SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i % 30 FROM generate_series(0, 2999) i;
 ANALYZE pagg_tab;
 -- When GROUP BY clause matches; full aggregation is performed for each partition.
@@ -400,13 +400,13 @@ SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
 
 -- JOIN query
 CREATE TABLE pagg_tab1(x int, y int) PARTITION BY RANGE(x);
-CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30);
 CREATE TABLE pagg_tab2(x int, y int) PARTITION BY RANGE(y);
-CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30);
 INSERT INTO pagg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
 INSERT INTO pagg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
 ANALYZE pagg_tab1;
@@ -820,9 +820,9 @@ SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a
 
 -- Partition by multiple columns
 CREATE TABLE pagg_tab_m (a int, b int, c int) PARTITION BY RANGE(a, ((a+b)/2));
-CREATE TABLE pagg_tab_m_p1 PARTITION OF pagg_tab_m FOR VALUES FROM (0, 0) TO (10, 10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_m_p2 PARTITION OF pagg_tab_m FOR VALUES FROM (10, 10) TO (20, 20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_m_p3 PARTITION OF pagg_tab_m FOR VALUES FROM (20, 20) TO (30, 30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_m_p1 PARTITION OF pagg_tab_m FOR VALUES FROM (0, 0) TO (12, 12);
+CREATE TABLE pagg_tab_m_p2 PARTITION OF pagg_tab_m FOR VALUES FROM (12, 12) TO (22, 22);
+CREATE TABLE pagg_tab_m_p3 PARTITION OF pagg_tab_m FOR VALUES FROM (22, 22) TO (30, 30);
 INSERT INTO pagg_tab_m SELECT i % 30, i % 40, i % 50 FROM generate_series(0, 2999) i;
 ANALYZE pagg_tab_m;
 -- Partial aggregation as GROUP BY clause does not match with PARTITION KEY
@@ -926,15 +926,15 @@ SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab_m GROUP BY (a+b)/2, 2, 1 HAV
 
 -- Test with multi-level partitioning scheme
 CREATE TABLE pagg_tab_ml (a int, b int, c text) PARTITION BY RANGE(a);
-CREATE TABLE pagg_tab_ml_p1 PARTITION OF pagg_tab_ml FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_ml_p1 PARTITION OF pagg_tab_ml FOR VALUES FROM (0) TO (10);
 CREATE TABLE pagg_tab_ml_p2 PARTITION OF pagg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY LIST (c);
-CREATE TABLE pagg_tab_ml_p2_s1 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0000', '0001') WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_ml_p2_s2 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0002', '0003') WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_ml_p2_s1 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0000', '0001', '0002');
+CREATE TABLE pagg_tab_ml_p2_s2 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0003');
 -- This level of partitioning has different column positions than the parent
 CREATE TABLE pagg_tab_ml_p3(b int, c text, a int) PARTITION BY RANGE (b);
-CREATE TABLE pagg_tab_ml_p3_s1(c text, a int, b int) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_ml_p3_s2 PARTITION OF pagg_tab_ml_p3 FOR VALUES FROM (5) TO (10) WITH (autovacuum_enabled = off);
-ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE pagg_tab_ml_p3_s1(c text, a int, b int);
+CREATE TABLE pagg_tab_ml_p3_s2 PARTITION OF pagg_tab_ml_p3 FOR VALUES FROM (7) TO (10);
+ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0) TO (7);
 ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO (30);
 INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM0000') FROM generate_series(0, 29999) i;
 ANALYZE pagg_tab_ml;
@@ -1253,14 +1253,14 @@ SELECT b, sum(a), count(*) FROM pagg_tab_ml GROUP BY b ORDER BY 1, 2, 3;
                                  Group Key: pagg_tab_ml_1.b
                                  ->  Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
                            ->  Partial HashAggregate
-                                 Group Key: pagg_tab_ml_2.b
-                                 ->  Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
-                           ->  Partial HashAggregate
                                  Group Key: pagg_tab_ml_3.b
                                  ->  Parallel Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
                            ->  Partial HashAggregate
                                  Group Key: pagg_tab_ml_4.b
                                  ->  Parallel Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
+                           ->  Partial HashAggregate
+                                 Group Key: pagg_tab_ml_2.b
+                                 ->  Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
 (24 rows)
 
 SELECT b, sum(a), count(*) FROM pagg_tab_ml GROUP BY b HAVING avg(a) < 15 ORDER BY 1, 2, 3;
@@ -1292,10 +1292,6 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
                      Filter: (avg(pagg_tab_ml_1.b) > '7'::numeric)
                      ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
                ->  HashAggregate
-                     Group Key: pagg_tab_ml_2.a, pagg_tab_ml_2.b, pagg_tab_ml_2.c
-                     Filter: (avg(pagg_tab_ml_2.b) > '7'::numeric)
-                     ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
-               ->  HashAggregate
                      Group Key: pagg_tab_ml_3.a, pagg_tab_ml_3.b, pagg_tab_ml_3.c
                      Filter: (avg(pagg_tab_ml_3.b) > '7'::numeric)
                      ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
@@ -1303,6 +1299,10 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
                      Group Key: pagg_tab_ml_4.a, pagg_tab_ml_4.b, pagg_tab_ml_4.c
                      Filter: (avg(pagg_tab_ml_4.b) > '7'::numeric)
                      ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
+               ->  HashAggregate
+                     Group Key: pagg_tab_ml_2.a, pagg_tab_ml_2.b, pagg_tab_ml_2.c
+                     Filter: (avg(pagg_tab_ml_2.b) > '7'::numeric)
+                     ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
 (25 rows)
 
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 ORDER BY 1, 2, 3;
@@ -1332,9 +1332,9 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
 -- costing such plans.
 SET parallel_setup_cost TO 10;
 CREATE TABLE pagg_tab_para(x int, y int) PARTITION BY RANGE(x);
-CREATE TABLE pagg_tab_para_p1 PARTITION OF pagg_tab_para FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_para_p2 PARTITION OF pagg_tab_para FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_para_p3 PARTITION OF pagg_tab_para FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_para_p1 PARTITION OF pagg_tab_para FOR VALUES FROM (0) TO (12);
+CREATE TABLE pagg_tab_para_p2 PARTITION OF pagg_tab_para FOR VALUES FROM (12) TO (22);
+CREATE TABLE pagg_tab_para_p3 PARTITION OF pagg_tab_para FOR VALUES FROM (22) TO (30);
 INSERT INTO pagg_tab_para SELECT i % 30, i % 20 FROM generate_series(0, 29999) i;
 ANALYZE pagg_tab_para;
 -- When GROUP BY clause matches; full aggregation is performed for each partition.
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index 8350ddc..946197f 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -2,9 +2,9 @@
 -- PARTITION_AGGREGATE
 -- Test partitionwise aggregation on partitioned tables
 --
--- Note: Various tests located within are sensitive to tables being
--- auto-vacuumed while the tests are running.  For this reason we
--- run autovacuum_enabled = off for all tables.
+-- Note: to ensure plan stability, it's a good idea to make the partitions of
+-- any one partitioned table in this test all have different numbers of rows.
+--
 
 -- Enable partitionwise aggregate, which by default is disabled.
 SET enable_partitionwise_aggregate TO true;
@@ -17,9 +17,9 @@ SET max_parallel_workers_per_gather TO 0;
 -- Tests for list partitioned tables.
 --
 CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY LIST(c);
-CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003') WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0004', '0005', '0006', '0007') WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0008', '0009', '0010', '0011') WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003', '0004');
+CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005', '0006', '0007', '0008');
+CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009', '0010', '0011');
 INSERT INTO pagg_tab SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i % 30 FROM generate_series(0, 2999) i;
 ANALYZE pagg_tab;
 
@@ -94,14 +94,14 @@ SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
 -- JOIN query
 
 CREATE TABLE pagg_tab1(x int, y int) PARTITION BY RANGE(x);
-CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30);
 
 CREATE TABLE pagg_tab2(x int, y int) PARTITION BY RANGE(y);
-CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30);
 
 INSERT INTO pagg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
 INSERT INTO pagg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
@@ -177,9 +177,9 @@ SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a
 -- Partition by multiple columns
 
 CREATE TABLE pagg_tab_m (a int, b int, c int) PARTITION BY RANGE(a, ((a+b)/2));
-CREATE TABLE pagg_tab_m_p1 PARTITION OF pagg_tab_m FOR VALUES FROM (0, 0) TO (10, 10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_m_p2 PARTITION OF pagg_tab_m FOR VALUES FROM (10, 10) TO (20, 20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_m_p3 PARTITION OF pagg_tab_m FOR VALUES FROM (20, 20) TO (30, 30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_m_p1 PARTITION OF pagg_tab_m FOR VALUES FROM (0, 0) TO (12, 12);
+CREATE TABLE pagg_tab_m_p2 PARTITION OF pagg_tab_m FOR VALUES FROM (12, 12) TO (22, 22);
+CREATE TABLE pagg_tab_m_p3 PARTITION OF pagg_tab_m FOR VALUES FROM (22, 22) TO (30, 30);
 INSERT INTO pagg_tab_m SELECT i % 30, i % 40, i % 50 FROM generate_series(0, 2999) i;
 ANALYZE pagg_tab_m;
 
@@ -202,17 +202,17 @@ SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab_m GROUP BY (a+b)/2, 2, 1 HAV
 -- Test with multi-level partitioning scheme
 
 CREATE TABLE pagg_tab_ml (a int, b int, c text) PARTITION BY RANGE(a);
-CREATE TABLE pagg_tab_ml_p1 PARTITION OF pagg_tab_ml FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_ml_p1 PARTITION OF pagg_tab_ml FOR VALUES FROM (0) TO (10);
 CREATE TABLE pagg_tab_ml_p2 PARTITION OF pagg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY LIST (c);
-CREATE TABLE pagg_tab_ml_p2_s1 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0000', '0001') WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_ml_p2_s2 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0002', '0003') WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_ml_p2_s1 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0000', '0001', '0002');
+CREATE TABLE pagg_tab_ml_p2_s2 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0003');
 
 -- This level of partitioning has different column positions than the parent
 CREATE TABLE pagg_tab_ml_p3(b int, c text, a int) PARTITION BY RANGE (b);
-CREATE TABLE pagg_tab_ml_p3_s1(c text, a int, b int) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_ml_p3_s2 PARTITION OF pagg_tab_ml_p3 FOR VALUES FROM (5) TO (10) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_ml_p3_s1(c text, a int, b int);
+CREATE TABLE pagg_tab_ml_p3_s2 PARTITION OF pagg_tab_ml_p3 FOR VALUES FROM (7) TO (10);
 
-ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0) TO (5);
+ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0) TO (7);
 ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO (30);
 
 INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM0000') FROM generate_series(0, 29999) i;
@@ -287,9 +287,9 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
 SET parallel_setup_cost TO 10;
 
 CREATE TABLE pagg_tab_para(x int, y int) PARTITION BY RANGE(x);
-CREATE TABLE pagg_tab_para_p1 PARTITION OF pagg_tab_para FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_para_p2 PARTITION OF pagg_tab_para FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_para_p3 PARTITION OF pagg_tab_para FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_para_p1 PARTITION OF pagg_tab_para FOR VALUES FROM (0) TO (12);
+CREATE TABLE pagg_tab_para_p2 PARTITION OF pagg_tab_para FOR VALUES FROM (12) TO (22);
+CREATE TABLE pagg_tab_para_p3 PARTITION OF pagg_tab_para FOR VALUES FROM (22) TO (30);
 
 INSERT INTO pagg_tab_para SELECT i % 30, i % 20 FROM generate_series(0, 29999) i;
 
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

David Rowley
On Fri, 3 Apr 2020 at 04:46, Tom Lane <[hidden email]> wrote:

>
> I wrote:
> > I'd be inclined to undo what you did in favor of initializing the
> > test tables to contain significantly different numbers of rows,
> > because that would (a) achieve plan stability more directly,
> > and (b) demonstrate that the planner is actually ordering the
> > tables by cost correctly.  Maybe somewhere else we have a test
> > that is verifying (b), but these test cases abysmally fail to
> > check that point.
>
> Concretely, I suggest the attached, which replaces the autovac disables
> with adjusting partition boundaries so that the partitions contain
> different numbers of rows.

I've looked over this and I agree that it's a better solution to the problem.

I'm happy for you to go ahead on this.

David


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Tom Lane-2
David Rowley <[hidden email]> writes:
> On Fri, 3 Apr 2020 at 04:46, Tom Lane <[hidden email]> wrote:
>> Concretely, I suggest the attached, which replaces the autovac disables
>> with adjusting partition boundaries so that the partitions contain
>> different numbers of rows.

> I've looked over this and I agree that it's a better solution to the problem.
> I'm happy for you to go ahead on this.

Pushed, thanks for looking at it!

                        regards, tom lane


1 ... 5678